SOLVING SYSTEMS OF LINEAR EQUATIONS
8.2 SOLVING SYSTEMS OF LINEAR EQUATIONS
Given an n x n matrix A and an n x 1 vector b, it is required to solve Ax = b for the unknown n x 1 vector x. When n =
4, for example, we have to solve the following system of linear equations for x , ,
and
8.2.1 An S I M D Algorithm
A well-known sequential algorithm for this problem is the Gauss-Jordan method. It consists in eliminating all unknowns but from the ith equation. The solution is then obtained directly. A direct parallelization of the Gauss-Jordan method is now
presented. It is designed to run on a CREW SM SIMD computer with n 2 + n processors that can be thought of as being arranged in an n x (n + 1) array. The algorithm is given as procedure SIMD GAUSS JORDAN. In it we denote by
procedure SIMD GAUSS JORDAN (A, b, x)
Step
for j = to do
for i = 1 to n do in parallel
for k = j to n + 1 do in parallel
if (i j) then
end if end for end for end for.
Step
2: for i=
1 to do in parallel
end for.
Note that the procedure allows concurrent-read operations since more than one processor will need to read
and
simultaneously.
202 Numerical Problems Chap. 8
Analysis. Step 1 consists of n constant time iterations, while step 2 takes constant time. Thus
= O(n 3 ). Although this cost matches the number of steps required by a sequential implementation of the Gauss- Jordan algorithm, it is not optimal. To see this, note that the system Ax = can be
Since
= O(n 2 ),
solved by first computing the inverse of A and then obtaining x from
The inverse of A can be computed as follows. We begin by writing
where the are
submatrices of A, and B =
The
matrices I and are the identity matrix (whose main diagonal elements are 1 and all the rest are zeros) and zero matrix (all of whose elements are zero), respectively. The inverse of A is then given by the matrix product
where and are computed by applying the same process recursively. This requires two inversions, six multiplications, and two additions of
x matrices. Denoting the time required by these operations by the functions and
respectively, we get
Since = n 2 /4 and
2.5 (as pointed out in example 1.1
where 2 x
we get = O(n x ). Thus, in sequential computation the time required to compute the inverse of an n x n matrix matches, up to a constant multiplicative factor, the time required to multiply two n x n matrices. Furthermore, multiplying
by b can be done in O(n Z ) steps. The overall running time of this sequential solution of A x x = is therefore O(n ), 2 x 2.5.
Example 8.1 Let us apply procedure SIMD GAUSS JORDAN to the system
In the first iteration of step
1 and the following values are computed in parallel:
8.2 Solving Systems of Linear Equations 203 In the second iteration of step j = 2 and the following values are computed in parallel:
In step 2, the answer is obtained as x, = and
An Algorithm
A different sequential algorithm for solving the set of equations Ax = b is the Seidel method. We begin by writing
E, D, and F are n x n matrices whose elements respectively, are given by
Thus ( E + D + = b and D x = b - Ex - Fx. For n = 3, say, we have
Starting with a vector (an arbitrary initial estimate of x), the solution vector is obtained through an iterative process where the kth iteration is given by
In other words, during the kth iteration the current estimates of the unknowns are substituted in the right-hand sides of the equations to produce new estimates. Again
for n = 4 and k = 1, we get
The method is said to converge if, for some k,
where abs denotes the absolute value function and is a prespecified error tolerance. The algorithm does not appear to be easily adapatable for an SIMD computer. Given N processors, we may assign each processor the job of computing the new
204 Numerical Problems Chap. 8 iterates for
components of the vector x. At the end of each iteration, all processors must be synchronized before starting the next iteration. The cost of this synchronization may be high because of the following:
(i) The cannot be computed until is available, for all j i; this forces the processor computing to wait for those computing
i, and then forces all processors to wait for the one computing x,.
(ii) Some components may be possible to update faster than others depending on
the values involved in its computation (some of which may be zero, say). Typically, this would lead to an algorithm that is not significantly faster than its
sequential counterpart. There are two ways to remedy this situation:
1. The most recently available values are used to compute there is no need to wait for
i).
2. No synchronization is imposed on the behavior of the processors. Both of these changes are incorporated in an algorithm designed to run on a CREW
SM MIMD computer with N processors, where N n. The algorithm creates n processes, each of which is in charge of computing one of the components of x. These
processes are executed by the N processors in an asynchronous fashion, as described in chapter
1. The algorithm is given in what follows as procedure MIMD MODIFIED GS. In it
old,, and new, denote the initial value, the previous value, and the current value of component
respectively. As mentioned earlier, c is the desired accuracy. Also note that the procedure allows concurrent-read operations since more than one process may need new, simultaneously.
procedure MIMD MODIFIED GS (A, x,
(1.1) old, (1.2) (1.3) create process i
end for.
Step 2: Process i
- old,) c
new,.
Note that step 2 states one of the n identical processes created in step 1.
8.2 Solving Systems of Linear Equations 205
Discussion. In an actual implementation of the preceding procedure, care must be taken to prevent a process from reading a variable while another process is updating it, as this would most likely result in the first process reading an incorrect value. There are many ways to deal with this problem. One approach uses special variables called semaphores. For each shared variable
there is a corresponding semaphore whose value is set as
if
is free,
is currently being updated. When a process needs to read
1 if
If = then the process reads otherwise it waits for it to be available. When a process needs to update
it first tests
it first sets to 1 and then proceeds to update As pointed out in chapter
1, MIMD algorithms in general are extremely difficult to analyze theoretically due to their asynchronous nature. In the case of procedure
MIMD MODIFIED GS the analysis is further complicated by the use of semaphores and, more importantly, by the uncertainty regarding the number of iterations required for convergence. An accurate evaluation of the procedure's behavior is best obtained empirically.
Example 8.2 Consider the system of example 8.1 and assume that two processors are available on a
CREW SM MIMD computer. Take
and c =
0.02. Process 1 sets old, =
and computes
new,
Simultaneously, process 2 sets old, = and computes
new, =
The computation then proceeds as follows
0.02, the procedure terminates.
206 Numerical Problems Chap. 8