SOLVING SYSTEMS OF LINEAR EQUATIONS

8.2 SOLVING SYSTEMS OF LINEAR EQUATIONS

Given an n x n matrix A and an n x 1 vector b, it is required to solve Ax = b for the unknown n x 1 vector x. When n =

4, for example, we have to solve the following system of linear equations for x , ,

and

8.2.1 An S I M D Algorithm

A well-known sequential algorithm for this problem is the Gauss-Jordan method. It consists in eliminating all unknowns but from the ith equation. The solution is then obtained directly. A direct parallelization of the Gauss-Jordan method is now

presented. It is designed to run on a CREW SM SIMD computer with n 2 + n processors that can be thought of as being arranged in an n x (n + 1) array. The algorithm is given as procedure SIMD GAUSS JORDAN. In it we denote by

procedure SIMD GAUSS JORDAN (A, b, x)

Step

for j = to do

for i = 1 to n do in parallel

for k = j to n + 1 do in parallel

if (i j) then

end if end for end for end for.

Step

2: for i=

1 to do in parallel

end for.

Note that the procedure allows concurrent-read operations since more than one processor will need to read

and

simultaneously.

202 Numerical Problems Chap. 8

Analysis. Step 1 consists of n constant time iterations, while step 2 takes constant time. Thus

= O(n 3 ). Although this cost matches the number of steps required by a sequential implementation of the Gauss- Jordan algorithm, it is not optimal. To see this, note that the system Ax = can be

Since

= O(n 2 ),

solved by first computing the inverse of A and then obtaining x from

The inverse of A can be computed as follows. We begin by writing

where the are

submatrices of A, and B =

The

matrices I and are the identity matrix (whose main diagonal elements are 1 and all the rest are zeros) and zero matrix (all of whose elements are zero), respectively. The inverse of A is then given by the matrix product

where and are computed by applying the same process recursively. This requires two inversions, six multiplications, and two additions of

x matrices. Denoting the time required by these operations by the functions and

respectively, we get

Since = n 2 /4 and

2.5 (as pointed out in example 1.1

where 2 x

we get = O(n x ). Thus, in sequential computation the time required to compute the inverse of an n x n matrix matches, up to a constant multiplicative factor, the time required to multiply two n x n matrices. Furthermore, multiplying

by b can be done in O(n Z ) steps. The overall running time of this sequential solution of A x x = is therefore O(n ), 2 x 2.5.

Example 8.1 Let us apply procedure SIMD GAUSS JORDAN to the system

In the first iteration of step

1 and the following values are computed in parallel:

8.2 Solving Systems of Linear Equations 203 In the second iteration of step j = 2 and the following values are computed in parallel:

In step 2, the answer is obtained as x, = and

An Algorithm

A different sequential algorithm for solving the set of equations Ax = b is the Seidel method. We begin by writing

E, D, and F are n x n matrices whose elements respectively, are given by

Thus ( E + D + = b and D x = b - Ex - Fx. For n = 3, say, we have

Starting with a vector (an arbitrary initial estimate of x), the solution vector is obtained through an iterative process where the kth iteration is given by

In other words, during the kth iteration the current estimates of the unknowns are substituted in the right-hand sides of the equations to produce new estimates. Again

for n = 4 and k = 1, we get

The method is said to converge if, for some k,

where abs denotes the absolute value function and is a prespecified error tolerance. The algorithm does not appear to be easily adapatable for an SIMD computer. Given N processors, we may assign each processor the job of computing the new

204 Numerical Problems Chap. 8 iterates for

components of the vector x. At the end of each iteration, all processors must be synchronized before starting the next iteration. The cost of this synchronization may be high because of the following:

(i) The cannot be computed until is available, for all j i; this forces the processor computing to wait for those computing

i, and then forces all processors to wait for the one computing x,.

(ii) Some components may be possible to update faster than others depending on

the values involved in its computation (some of which may be zero, say). Typically, this would lead to an algorithm that is not significantly faster than its

sequential counterpart. There are two ways to remedy this situation:

1. The most recently available values are used to compute there is no need to wait for

i).

2. No synchronization is imposed on the behavior of the processors. Both of these changes are incorporated in an algorithm designed to run on a CREW

SM MIMD computer with N processors, where N n. The algorithm creates n processes, each of which is in charge of computing one of the components of x. These

processes are executed by the N processors in an asynchronous fashion, as described in chapter

1. The algorithm is given in what follows as procedure MIMD MODIFIED GS. In it

old,, and new, denote the initial value, the previous value, and the current value of component

respectively. As mentioned earlier, c is the desired accuracy. Also note that the procedure allows concurrent-read operations since more than one process may need new, simultaneously.

procedure MIMD MODIFIED GS (A, x,

(1.1) old, (1.2) (1.3) create process i

end for.

Step 2: Process i

- old,) c

new,.

Note that step 2 states one of the n identical processes created in step 1.

8.2 Solving Systems of Linear Equations 205

Discussion. In an actual implementation of the preceding procedure, care must be taken to prevent a process from reading a variable while another process is updating it, as this would most likely result in the first process reading an incorrect value. There are many ways to deal with this problem. One approach uses special variables called semaphores. For each shared variable

there is a corresponding semaphore whose value is set as

if

is free,

is currently being updated. When a process needs to read

1 if

If = then the process reads otherwise it waits for it to be available. When a process needs to update

it first tests

it first sets to 1 and then proceeds to update As pointed out in chapter

1, MIMD algorithms in general are extremely difficult to analyze theoretically due to their asynchronous nature. In the case of procedure

MIMD MODIFIED GS the analysis is further complicated by the use of semaphores and, more importantly, by the uncertainty regarding the number of iterations required for convergence. An accurate evaluation of the procedure's behavior is best obtained empirically.

Example 8.2 Consider the system of example 8.1 and assume that two processors are available on a

CREW SM MIMD computer. Take

and c =

0.02. Process 1 sets old, =

and computes

new,

Simultaneously, process 2 sets old, = and computes

new, =

The computation then proceeds as follows

0.02, the procedure terminates.

206 Numerical Problems Chap. 8