Prefix Sums on a Mesh

13.2.4 Prefix Sums on a Mesh

We conclude this section by showing how the prefix sums of a sequence can be computed on a mesh-connected array of processors. Our motivation to study a

parallel algorithm to solve the problem on this model is due to two reasons:

1. As shown in the conclusion of section when the time taken by a signal to travel along a wire is proportional to the length of that wire, the mesh is preferable to the tree for solving a number of problems. These problems are characterized by the fact that their solution time is proportional to the distance (i) from root to leaf in the tree and (ii) from top row to bottom row in the mesh. The problem of computing the prefix sums of an input sequence is one such problem.

2. As indicated in section 4.8, a mesh with n processors can sort a sequence of n inputs faster than a tree with n leaves regardless of any assumptions we make about the signal propagation time along the wires. This is particularly relevant

since sorting is an important component of our solution to the problems described in the next section.

For ease of presentation, we assume in what follows that n is a perfect square and let m =

can be computed on an m x m mesh-connected computer as follows. Let the n processors

The prefix sums of X =

be arranged in row-major order. Initially, contains

P,, ...,

When the algorithm terminates, contains

The algorithm consists of three steps. In the first step, with all rows operating in parallel, the prefix sums for the elements in each row are computed sequentially: Each processor adds to its contents the contents of its left neighbor. In the second step, the prefix sums of the contents in the rightmost column are computed. Finally, again with all rows operating in parallel, the contents of the rightmost processor in row k - 1 are added to those of all the processors in row k (except the rightmost). The algorithm is given in what follows as procedure MESH PREFIX SUMS. In it we denote the contents of the processor in row k and column j by

procedure MESH PREFIX S U M S (X, S )

Step

1: for k = to m - 1 do in parallel

end for end for.

Step

2: for k = 1 to m - 1 do

end for.

Decision and Optimization Chap. 13

Step 3: for k = 1 to m

1 do in parallel

for j = m - 2 do end for

end for.

Note that in step 3, is propagated along row k from the processor in column m - 1 to that in column each processor adding it to its contents and passing it to its left neighbor.

Analysis.

Each step requires time. Therefore,

Since

= n,

which is not optimal.

Example 13.2

Let n = 16. The behavior of procedure MESH PREFIX SUMS is illustrated in Fig. 13.4. In the figure,

Now assume that an

mesh of processors is available, where

n.

each processor initially receives

T o compute the prefix sums of X .

elements from X and computes their prefix sums. Procedure MESH

(a) INITIALLY (b) AFTER STEP 1

AFTER STEP 2 (d) AFTER STEP 3

Figure 13.4 Computing prefix sums using procedure MESH PREFIX SUMS.

13.3 Applications 351

PREFIX SUMS can now be modified, in the same way as the tree algorithm in the previous section, so that when it terminates, each processor contains

prefix sums of

X. The modified procedure has a running time of

and a cost of

This cost is optimal when N =