Running Time

1.3.1 Running Time

Since speeding up computations appears to be the main reason behind our interest in building parallel computers, the most important measure in evaluating a parallel algorithm is therefore its running time. This is defined as the time taken by the algorithm to solve a problem on a parallel computer, that is, the time elapsed from the moment the algorithm starts to the moment it terminates. If the various processors do not all begin and end their computation simultaneously, then the running time is

22 Introduction Chap. 1 equal to the time elapsed between the moment the first processor to begin computing

starts and the moment the last processor to end computing terminates.

1.3.1.1 Counting Steps. Before actually implementing an algorithm (whether sequential or parallel) on a computer, it is customary to conduct a theoretical analysis of the time it will require to solve the computational problem at hand. This is usually done by counting the number of basic operations, or steps, executed by the algorithm in the worst case. This yields an expression describing the number of such steps as a function of the input size. The definition of what constitutes

a step varies of course from one theoretical model of computation to another. Intuitively, however, comparing, adding, or swapping two numbers are commonly accepted basic operations in most models. Indeed, each of these operations requires a

constant number of time units, or cycles, on a typical (SISD) computer. The running time of a parallel algorithm is usually obtained by counting two kinds of steps: computational steps and routing steps. A computational step is an arithmetic or logic

operation performed on a datum within a processor. In a routing step, on the other hand, a datum travels from one processor to another via the shared memory or through the communication network. For a problem of size n, the parallel worst-case running time of an algorithm, a function of n, will be denoted by

Strictly speaking, the running time is also a function of the number of processors. Since the latter can always be expressed as a function of n, we shall write t as a function of the size of the input to avoid complicating our notation.

Example 1.8 In example 1.4 we studied a parallel algorithm that searches a file with n entries on an

processor EREW SM SIMD computer. The algorithm requires log N parallel steps to broadcast the value to be searched for and

comparison steps within each processor. Assuming that each step (broadcast or comparison) requires one time unit, we say that

the algorithms runs in log N + time, that is, = log N +

In general, computational steps and routing steps d o not necessarily require the same number of time units. A routing step usually depends on the distance between the processors and typically takes a little longer to execute than a computational step.

Lower and Upper Bounds. Given a computational problem for which a new sequential algorithm has just been designed, it is common practice among algorithm designers to ask the following two questions:

(i) Is it the fastest possible algorithm for the problem? (ii) If not, how does it compare with other existing algorithms for the same

problem? The answer to the first question is usually obtained by comparing the number of

steps executed by the algorithm to a known lower bound on the number of steps required to solve the problem in the worst case.

1.3 Analyzing Algorithms 23 Example 1.9

Say that we want to compute the product of two x matrices. Since the resulting matrix has n 2 entries, at least this many steps are needed by any matrix multiplication algorithm simply to produce the output.

1.9, are usually known as obvious or trivial lower bounds, as they are obtained by counting the number of steps needed during input and/or output. A more sophisticated lower bound is derived in the next example.

Lower bounds, such as the one in example

Example 1.10

The problem of sorting is defined as follows: A set of n numbers in random order is given; arrange the numbers in nondecreasing order. There are n! possible permutations of the

input and log n!

on the order of n log n) bits are needed to distinguish among them. Therefore, in the worst case, any algorithm for sorting requires on the order of log n steps at least to recognize a particular output.

If the number of steps a n algorithm executes in the worst case is equal t o (or of the same order as) the lower bound, then the algorithm is the fastest possible and is said to be optimal. Otherwise, a faster algorithm may have to be invented, or it may be possible t o improve the lower bound. In any case, if the new algorithm is faster than all known algorithms for the problem, then we say that it has established a new upper bound on the number of steps required to solve that problem in the worst case. Question (ii) is therefore always settled by comparing the running time of the new algorithm with the existing upper bound for the problem (established by the fastest previously known algorithm).

Example 1.11

To date, no algorithm is known for multiplying two n x matrices in steps. The standard textbook algorithm requires on the order of n 3 operations. However, the upper bound on this problem is established at the time of this writing by an algorithm requiring on the order of

operations at most, where < 2.5. By contrast, several sorting algorithms exist that require on the order of at most log operations and are hence optimal.

In the preceding discussion, we used the phrase "on the order of" to express lower and upper bounds. We now introduce some notation for that purpose. Let f (n) and

be functions from the positive integers to the positive reals: (i) The function

f (n)), if there are positive constants c and

is said to be of order at least f (n), denoted

cf (n) for all n The function

such that

f (n)), if there are positive constants c and

is said to be of order at most f (n), denoted

cf (n) for all n This notation allows us to concentrate on the dominating term in an expression

such that

describing a lower o r upper bound and to ignore any multiplicative constants.

24 Introduction Chap. 1

Example 1.12

For matrix multiplication, the lower bound is R(n 2 ) and the upper bound For

sorting, the lower bound is R(n log n) and the upper bound O(n log n).

Our treatment of lower and upper bounds in this section has so far concentrated on sequential algorithms. Clearly, the same general ideas also apply to parallel algorithms while taking two additional factors into consideration:

(i) the model of parallel computation used and the number of processors involved.