SORTING O N A LINEAR ARRAY
4.3 SORTING O N A LINEAR ARRAY
In this section we describe a parallel sorting algorithm for an SIMD computer where the processors are connected to form a linear array as depicted in
1.6. The algorithm uses n processors P,,
... , to sort the sequences S = ... , At
any time during the execution of the algorithm, processor holds one element of the input sequence; we denote this element by
for all 1 i n. Initially = It is required that, upon termination,
be the ith element of the sorted sequence. The
algorithm consists of two steps that are performed repeatedly. In the first step, all odd-
+ ,. If > ,, then and ,
numbered processors
obtain
from
exchange the elements they held at the beginning of this step. In the second step, all even-numbered processors perform the same operations as did the odd-numbered
ones in the first step. After repetitions of these two steps in this order, no further exchanges of elements can take place. Hence the algorithm terminates with
90 Sorting Chap. 4
for all i n - The algorithm is given in what follows as procedure EVEN TRANSPOSITION.
procedure ODD-EVEN TRANSPOSITION (S)
- 1 do in parallel
if
then end if
end for
(2) for i =
- 11/21 do in parallel
if then
end if end for end for.
Example 4.2
The contents of the linear array for this input during the execution of procedure ODD-EVEN TRANSPOSITION are illustrated in Fig. 4.2. Note that although a sorted sequence is produced after four iterations of steps 1 and 2, two more (redundant) iterations are performed, that is, a total of
as required by the procedure's statement.
Each of steps 1 and 2 consists of one comparison and two routing operations and hence requires constant time. These two steps are executed times. The running time of procedure ODD-EVEN TRANSPOSITION is therefore
Analysis.
x = O(n 2 ), which is not optimal.
Since = n, the procedure's cost is given by
From this analysis, procedure ODD-EVEN TRANSPOSITION does not appear to be too attractive. Indeed,
(i) with respect to procedure QUICKSORT, it achieves a speedup of n ) only,
(ii) it uses a number of processors equal to the size of the input, which is
unreasonable, and it is not cost optimal.
The only redeeming feature of procedure ODD-EVEN TRANSPOSITION seems to be its extreme simplicity. We are therefore tempted to salvage its basic idea in order to obtain a new algorithm with optimal cost. There are two obvious ways for
doing this: either (1) reduce the running time or (2) reduce the number of processors used. The first approach is hopeless: The running time of procedure ODD-EVEN TRANSPOSITION is the smallest possible achievable on a linear array with n processors. To see this, assume that the largest element in S is initially in
and must therefore move n - 1 steps across the linear array before settling in its final position in P,. This requires
time.
4.3 Sorting on
a Linear Array
AFTER STEP
Figure 4.2 Sorting sequence of eleven elements using procedure ODD-EVEN TRANSPOSITION.
n, are available, then they can simulate the algorithm in n x
Now consider the second approach. If N processors, where N
time. The cost remains n x which as we know is not optimal.
A more subtle simulation, however, allows us to achieve cost optimality. Assume that each of the N processors in the linear array holds
a subsequence of S of length (It may be necessary to add some dummy elements to S if n is not a multiple of N . ) In the new algorithm, the comparison-exchange
92 Sorting
Chap. 4
operations of procedure ODD-EVEN TRANSPOSITION are now replaced with merge-split operations on subsequences. Let
denote the subsequence held by processor
Initially, the are random subsequences of S. In step 1, each sorts using procedure QUICKSORT. In step 2.1 each odd-numbered processor
merges the two subsequences
It retains the first half of
and
into a sorted sequence
the second half. Step 2.2 is identical to
and assigns to its neighbor
2.1 except that it is performed by all even-numbered processors. Steps 2.1 and 2.2 are repeated alternately. After
iterations no further exchange of elements can take place between two processors. The algorithm is given in what follows as procedure MERGE SPLIT. When it terminates, the sequence S = S,, ... , is sorted.
procedure MERGE SPLIT ( S )
Step 1: for i = 1 to N do in parallel
QUICKSORT
end for.
Step 2: for j = 1 to
do
(2.1) for i = 1, 3,. ..
- 1 do in parallel
(i) SEQUENTIAL MERGE ...,
end for
(2.2) for i =
do in parallel
(i) SEQUENTIAL MERGE (ii)
end for. Example 4.3
Let S = 2, 5, 10, 1, 7 , 3, 12, 6, 1 1 , 4, 9 ) and N = 4. The contents of the various processors during the execution of procedure MERGE SPLIT for this input is illustrated in Fig. 4.3.
steps. Transferring to merging by SEQUENTIAL MERGE, and returning
to all require time. The total running time of procedure MERGE SPLIT is therefore
log
and its cost is
= O(n log n) +
which is optimal when N
log n.
4.4 Sorting on the CRCW Model
INITIALLY
AFTER STEP
Figure 4.3 Sorting sequence of twelve elements using procedure MERGE SPILIT.