Given an m-permutation (p, ... the procedure first checks whether it is updatable.

1. Given an m-permutation (p, ... the procedure first checks whether it is updatable.

2. If the permutation is updatable, then its rightmost element is checked first to determine whether it can be incremented; if it can, then the procedure increments it and terminates.

6.3 Generating Permutations in Parallel 151

3. Determining whether p, can be incremented requires scanning no more than m positions of array whose entries indicate which of the integers

... , n}

currently appear in

. .. p,) and which do not. This scanning also yields the

new value of p, in case the latter can be incremented.

4. If the rightmost element cannot be incremented, then the procedure finds the first element to the left of p, that is smaller than its right neighbor. This element,

call it is incremented by the procedure and all elements to its right are updated.

5. Determining the new value of requires scanning no more than m positions of u.

9. Updating all positions to the right of requires scanning no more than the m positions of u.

These observations indicate that the algorithm in section 6.2.1 lends itself quite naturally to parallel implementation. Assume that m processors are available on an EREW SM SIMD computer. We give our first parallel m-permutation generator as procedure PARALLEL PERMUTATIONS. The procedure takes n and m as input and produces all

m-permutations of

.. , n). It assumes that processor has

access to position i of an output register where each successive permutation is produced. There are three arrays in shared memory:

1. p =

... , which stores the current permutation.

2. u

..., where = if i is in the current permutation

... p,);

otherwise =

1. Initially, = 1 for 1 i

n.

3. x =

used to store intermediate results.

Procedure PARALLEL PERMUTATIONS also invokes the following four procedures for EREW SM SIMD computers:

1. Procedure BROADCAST (a, m, x) studied in chapter 2, which uses an array ... , x, to distribute the value of a to m processors

.. . , P,.

2. Procedure

... , x,) also studied in chapter 2, which uses m

processors to compute the prefix sums of the array

... , x, and replace with

for 1

3. Procedure MINIMUM

... , x,) given in what follows, which uses m

processors to find the smallest element in the array

... , x, and return it in :

procedure MINIMUM (x,,

for j = to (log m - 1) do

for i = 1 to m in step of

do in parallel

obtains

through shared memory

end for end for.

1 52 Generating Permutations and Combinations Chap. 6

... , x,), which uses m processors to find largest element in the array

4. Procedure MAXIMUM (x,,

This procedure is identical to procedure MINIMUM, except that step

. . . , x, and return it in

2 now reads

if

then x ,

end if.

5. Procedure PARALLEL SCAN (p,, n), which is helpful in searching for the next available integer to increment a given element p, of an m-permutation

.. . of .. , n). Given p, and n, array u in shared memory is used to

determine which of the m integers p, + 1, + 2,. .. , + m satisfy the two conditions of

(i) being smaller than or equal to n and (ii) being not present in

... p,)

and are therefore available for incrementing p,. Array x in shared memory is used to keep track of these integers.

procedure PARALLEL SCAN (p,, n)

for i

to m do in parallel

else end if

end for.

From chapter 2 we know that procedures BROADCAST and run in

m) time as well. Procedure PARALLEL SCAN takes constant time. We are now ready to state procedure PARALLEL PERMUTATIONS:

m) time. Procedures MINIMUM and MAXIMUM clearly require

procedure PARALLEL PERMUTATIONS (n, m) Step

(1.1) for i = to m do in parallel Pi

(ii) produce

as output

end for

(1.2) {Initialize array u} for i = 1 to

do for j = 1 to m do in parallel

(i) k

(i - l)m j

(ii) if k

then

1 end if

end for end for.

6.3 Generating Permutations in Parallel Step 2:

for t = 1 to

1) do

(2.1) for i = 1 to m do in parallel

end for

(2.2) {Check whether rightmost element of p, ... can be in- cremented;

if there is a j,

< j < n, such that j# for

(i) BROADCAST

m, x)

(ii) PARALLEL SCAN

n)

(2.3) {If several j satisfying the condition in (2.2) are found, the smallest is assigned to p,} (i) {The smallest of the

is found and placed in MINIMUM

x,, ...

if

then (a)

(c) (d) Go to step (2.7)

end if

(2.4) {Rightmost element cannot be incremented; find rightmost element such

(i) for i = 1 to m -1 do in parallel

(ii) {The largest of the is found and placed in

MAXIMUM

(iii) k (iv) BROADCAST (k, m, x)

(v) BROADCAST

m, x)

(2.5) {Increment the smallest available integer larger than is assigned to (i) for i= k to m do in parallel

end for

(ii) PARALLEL SCAN

n)

(iii) MINIMUM

(2.6) {Find the smallest m - k integers that are available and assign their

, , . . . , p,, respectively. This reduces to finding

values to

the first m - k positions of that are equal to

(i)

i=

1 to m do in parallel

(iii) for i = 1 to m do in parallel

then

Generating Permutations and Combinations Chap. 6

(2.7) {Clean up array and output current m-permutation)

(i) for i = 1 to k do in parallel

end for

(ii) for i = 1 to m do in parallel

produce

as output

end for

end for. Analysis.

Step 1 takes time. There are "P, - iterations of step 2, each requiring

time, as can be easily verified. The overall running time of PARALLEL PERMUTATIONS is therefore

m). Since m processors are used, the procedure's cost is

log m).

Example 6.1

We illustrate the working of procedure PARALLEL PERMUTATIONS by showing how a permutation is updated. Let S =

= ( 5 1 4 3) be a 4- permutation to be updated during an iteration of step 2. In step 2.1 array is set up as shown in Fig.

and let

3 is broadcast to all four processors to check whether any of the integers

In step 2.2,

3, and

+ 4 is available. The processors

assign values to array x as shown in Fig. This leads to the discovery in step 2.3 that cannot be incremented. In step 2.4 the processors assign values to array x to indicate the positions of those elements in the permutation that are smaller than their right neighbor, as shown in Fig.

The largest entry in x is determined to be 2; this means that

is to be incremented and all the positions to its right are to be updated. Now 2 and are broadcast to the four processors. In step 2.5 array is updated to indicate that the old values of

The processors now check whether any of the integers

and

are now available, as shown in Fig.

4 is available and indicate their findings by setting up array x as shown in Fig.

+ 1, + 2, + 3, and

The smallest entry in x

is found to be 2: is assigned the value 2 and is set to as shown in Fig. In step 2.6 the smallest two available integers are found by setting array x equal to the first four positions of array u. Now procedure

is applied to array x with the result shown in Fig.

is assigned the value 1. Similarly, since

Since x,

4 2 and

is assigned the value 3. Finally, in step 2.7 positions 2 and 5 of array are set to 1 and the 4-permutation (p,

= 4 - 2 and

= (5 2 1 3) is produced as output.

We conclude this section with two remarks on procedure PARALLEL PERMUTATIONS.

Discussion.

1. The procedure has a cost of log m), which is not optimal in view of the operations sufficient to generate all m-permutations of n items by procedure SEQUENTIAL PERMUTATIONS.

2. The procedure is not adaptive as it requires the presence of m processors in order to function properly. As pointed out earlier, it is usually reasonable to assume that the number of processors on a shared memory parallel computer is not only fixed but also smaller than the size of the typical problem.

6.3 Generating Permutations in Parallel

6.1 Updating permutation using

procedure

PARALLEL PERMUTA-

TIONS.

The preceding remarks lead naturally to the following questions: