13 An x mesh-connected SIMD computer (see Fig. 1.7) is used to compute the sum of n 2

Example 1.13 An x mesh-connected SIMD computer (see Fig. 1.7) is used to compute the sum of n 2

numbers. Initially, there is one number per processor. Processor P(n - 1, n - 1) is to produce the output. Since the number initially in

has to be part of the sum, it must somehow find its way to P(n - 1, n - 1). This requires at least 2(n - 1) routing steps. Thus the lower bound on computing the sum is

steps.

These ideas are further elaborated on in the following section.

1.3.1.3 Speedup. In evaluating a parallel algorithm for a given problem, it is quite natural to d o it in terms of the best available sequential algorithm for that problem. Thus a good indication of the quality of a parallel algorithm is the speedup it produces. This is defined as

Speedup = worst-case running time of fastest known sequential algorithm for problem

worst-case running time of parallel algorithm Clearly, the larger the speedup, the better the parallel algorithm.

Example 1.14

In example 1.4, a file of n entries is searched by an algorithm running on a CREW SM SIMD computer with N processors in

time. Since the running time of the best possible sequential algorithm is

the speedup is equal to For most problems, the speedup achieved in this example is usually the largest

that can be obtained with N processors. To see this, assume that the fastest sequential algorithm for a problem requires time

that a parallel algorithm for the same problem requires time

N. We now observe that any parallel algorithm can be simulated on a sequential computer. The simulation is carried out as follows: The (only) processor on the sequential computer executes the parallel steps serially by pretending that it is

and that

and so on. The time taken by the simulation is the sum of the times taken to imitate all N processors, which is at most N times

then that it is

But implying that the simulation we have just performed solves

1.3 Analyzing Algorithms 25

the problem faster than the sequential algorithm believed to be the fastest for that problem. This can mean one of two things:

The sequential algorithm with running time is not really the fastest possible and we have just found a faster one with running time

thus improving the state of the art of sequential computing, or

(ii) there is an error in our analysis! Suppose we know that a sequential algorithm for a given problem is indeed the

fastest possible. Ideally, of course, one hopes to achieve the maximum speedup of N when solving such a problem using N processors operating in parallel. In practice, such a speedup cannot be achieved for every problem since

(i) it is not always possible to decompose a problem into N tasks, each requiring

of the time taken by one processor to solve the original problem, and (ii) in most cases the structure of the parallel computer used to solve a problem

usually imposes restrictions that render the desired running time unattainable.

Example 1.15

The problem of adding numbers discussed in example 1.5 is solved in n) time on a tree-connected parallel computer using - 1 processors. Here the speedup is

n) since the best possible sequential algorithm requires

additions. This speedup is far from the ideal - 1 and is due to the fact that the numbers were input at the leaves and the sum output at the root. Any algorithm for such a model necessarily requires

n) time, that is, the time required for a single datum to propagate from input to output through all levels of the tree.

Number of Processors

The second most important criterion in evaluating a parallel algorithm is the number of processors it requires to solve a problem. It costs money to purchase, maintain, and run computers. When several processors are present, the problem of maintenance, in particular, is compounded, and the price paid to guarantee a high degree of reliability rises sharply. Therefore, the larger the number of processors an algorithm uses to solve a problem, the more expensive the solution becomes to obtain. For a problem of size n, the number of processors required by an algorithm, a function of n, will be

denoted by Sometimes the number of processors is a constant independent of n.

Example 1.16

In example 1.5, the size of the tree depends on n, the number of terms to be added, and = - On the other hand, in example 1.4, N, the number of processors on the memory computer, is in no way related to n, the size of the file to be searched (except for the fact that N

n). Nevertheless, given a value of it is possible to express N in terms of as follows: N =

where

1. Thus

26 Introduction Chap. 1