SORTING O N A LINEAR ARRAY

4.3 SORTING O N A LINEAR ARRAY

In this section we describe a parallel sorting algorithm for an SIMD computer where the processors are connected to form a linear array as depicted in

1.6. The algorithm uses n processors P,,

... , to sort the sequences S = ... , At

any time during the execution of the algorithm, processor holds one element of the input sequence; we denote this element by

for all 1 i n. Initially = It is required that, upon termination,

be the ith element of the sorted sequence. The

algorithm consists of two steps that are performed repeatedly. In the first step, all odd-

+ ,. If > ,, then and ,

numbered processors

obtain

from

exchange the elements they held at the beginning of this step. In the second step, all even-numbered processors perform the same operations as did the odd-numbered

ones in the first step. After repetitions of these two steps in this order, no further exchanges of elements can take place. Hence the algorithm terminates with

90 Sorting Chap. 4

for all i n - The algorithm is given in what follows as procedure EVEN TRANSPOSITION.

procedure ODD-EVEN TRANSPOSITION (S)

- 1 do in parallel

if

then end if

end for

(2) for i =

- 11/21 do in parallel

if then

end if end for end for.

Example 4.2

The contents of the linear array for this input during the execution of procedure ODD-EVEN TRANSPOSITION are illustrated in Fig. 4.2. Note that although a sorted sequence is produced after four iterations of steps 1 and 2, two more (redundant) iterations are performed, that is, a total of

as required by the procedure's statement.

Each of steps 1 and 2 consists of one comparison and two routing operations and hence requires constant time. These two steps are executed times. The running time of procedure ODD-EVEN TRANSPOSITION is therefore

Analysis.

x = O(n 2 ), which is not optimal.

Since = n, the procedure's cost is given by

From this analysis, procedure ODD-EVEN TRANSPOSITION does not appear to be too attractive. Indeed,

(i) with respect to procedure QUICKSORT, it achieves a speedup of n ) only,

(ii) it uses a number of processors equal to the size of the input, which is

unreasonable, and it is not cost optimal.

The only redeeming feature of procedure ODD-EVEN TRANSPOSITION seems to be its extreme simplicity. We are therefore tempted to salvage its basic idea in order to obtain a new algorithm with optimal cost. There are two obvious ways for

doing this: either (1) reduce the running time or (2) reduce the number of processors used. The first approach is hopeless: The running time of procedure ODD-EVEN TRANSPOSITION is the smallest possible achievable on a linear array with n processors. To see this, assume that the largest element in S is initially in

and must therefore move n - 1 steps across the linear array before settling in its final position in P,. This requires

time.

4.3 Sorting on

a Linear Array

AFTER STEP

Figure 4.2 Sorting sequence of eleven elements using procedure ODD-EVEN TRANSPOSITION.

n, are available, then they can simulate the algorithm in n x

Now consider the second approach. If N processors, where N

time. The cost remains n x which as we know is not optimal.

A more subtle simulation, however, allows us to achieve cost optimality. Assume that each of the N processors in the linear array holds

a subsequence of S of length (It may be necessary to add some dummy elements to S if n is not a multiple of N . ) In the new algorithm, the comparison-exchange

92 Sorting

Chap. 4

operations of procedure ODD-EVEN TRANSPOSITION are now replaced with merge-split operations on subsequences. Let

denote the subsequence held by processor

Initially, the are random subsequences of S. In step 1, each sorts using procedure QUICKSORT. In step 2.1 each odd-numbered processor

merges the two subsequences

It retains the first half of

and

into a sorted sequence

the second half. Step 2.2 is identical to

and assigns to its neighbor

2.1 except that it is performed by all even-numbered processors. Steps 2.1 and 2.2 are repeated alternately. After

iterations no further exchange of elements can take place between two processors. The algorithm is given in what follows as procedure MERGE SPLIT. When it terminates, the sequence S = S,, ... , is sorted.

procedure MERGE SPLIT ( S )

Step 1: for i = 1 to N do in parallel

QUICKSORT

end for.

Step 2: for j = 1 to

do

(2.1) for i = 1, 3,. ..

- 1 do in parallel

(i) SEQUENTIAL MERGE ...,

end for

(2.2) for i =

do in parallel

(i) SEQUENTIAL MERGE (ii)

end for. Example 4.3

Let S = 2, 5, 10, 1, 7 , 3, 12, 6, 1 1 , 4, 9 ) and N = 4. The contents of the various processors during the execution of procedure MERGE SPLIT for this input is illustrated in Fig. 4.3.

steps. Transferring to merging by SEQUENTIAL MERGE, and returning

to all require time. The total running time of procedure MERGE SPLIT is therefore

log

and its cost is

= O(n log n) +

which is optimal when N

log n.

4.4 Sorting on the CRCW Model

INITIALLY

AFTER STEP

Figure 4.3 Sorting sequence of twelve elements using procedure MERGE SPILIT.