MIMD Computers

1.2.4 MIMD Computers

This class of computers is the most general and most powerful in our paradigm of parallel computation that classifies parallel computers according to whether the

instruction and/or the data streams are duplicated. Here we have N processors, N streams of instructions, and N streams of data, as shown in Fig. 1.12. The processors

here are of the type used in MISD computers in the sense that each possesses its own control unit in addition to its local memory and arithmetic and logic unit. This makes

these processors more powerful than the ones used for SIMD computers. Each processor operates under the control of an instruction stream issued by its control unit. Thus the processors are potentially all executing different programs on different data while solving different subproblems of a single problem. This means that

the processors typically operate asynchronously. As with SIMD computers, commu- nication between processors is performed through a shared memory or an intercon- nection network. MIMD computers sharing a common memory are often referred to

as multiprocessors (or tightly coupled machines) while those with an interconnection network are known as multicomputers (or loosely coupled machines).

Since the processors on a multiprocessor computer share a common memory, the discussion in section 1.2.3.1 regarding the various modes of concurrent memory access applies here as well. Indeed, two or more processors executing an asynchronous algorithm may, by accident or by design, wish to gain access to the same memory location. We can therefore talk of EREW, CREW, ERCW, and CRCW SM MIMD computers and algorithms, and various methods should be established for resolving memory access conflicts in models that disallow them.

Introduction Chap. 1

SHARED MEMORY OR

INTERCONNECTION NETWORK

DATA STREAM

STREAM 1 2 N

PROCESSOR

PROCESSOR

PROCESSOR 1 2 N

INSTRUCTION

INSTRUCTION STREAM

INSTRUCTION

STREAM

STREAM 1 2 N

Figure 1.12 MIMD computer.

Multicomputers are sometimes referred to as distributed systems. The distinction is usually based on the physical distance separating the processors and is therefore often subjective. A rule of thumb is the following: If all the processors are in close proximity of one another (they are all in the same room, say), then they are a multicomputer; otherwise (they are in different cities, say) they are a distributed system. The nomenclature is relevant only when it comes to evaluating parallel algorithms. Because processors in a distributed system are so far apart, the number of data exchanges among them is significantly more important than the number of computational steps performed by any of them.

The following example examines an application where the great flexibility of MIMD computers is exploited.

Example 1.6

Computer programs that play games of strategy, such as chess, do so by generating and searching so-called game trees. The root of the tree is the current game configuration or

position from which the program is to make a move. Children of the root represent all the positions reached through one move by the program. Nodes at the next level represent all positions reached through the opponent's reply. This continues up to some predefined

19 number of levels. Each leaf position is now assigned a value representing its "goodness"

1.2 Models of Computation

from the program's point of view. The program then determines the path leading to the best position it can reach assuming that the opponent plays a perfect game. Finally, the original move on this path

an edge leaving the root) is selected for the program. As there are typically several moves per position, game trees tend to be very large. In order to cut down on the search time, these trees are generated as they are searched. The idea is to explore the tree using the depth-first search method. From the given root

position, paths are created and examined one by one. First, a complete path is built from the root to a leaf. The next path is obtained by backing up from the current leaf to a position all of whose descendants have not yet been explored and building a new path. During the generation of such a path it may happen that a position is reached that, based on information collected so far, definitely leads to leaves that are no better than the ones already examined. In this case the program interrupts its search along that path and all descendants of that position are ignored. A cutoff is said to have occurred. Search can now resume along a new path.

So far we have described the search procedure as it would be executed sequentially. One way to implement it on an MIMD computer would be to distribute the

of the root among the processors and let as many

as possible be explored in parallel. During the search the processors may exchange various pieces of information. For example, one processor may obtain from another the best move found so far: This may lead to further cutoffs. Another datum that may be communicated is whether a processor has finished searching its

that is still under consideration, then an idle processor may be assigned the job of searching part of that

If there is a

This approach clearly does not lend itself to implementation on an SIMD computer as the sequence of operations involved in the search is not predictable in advance. At any given point, the instruction being executed varies from one processor to another: While one processor may

be generating a new position, a second may be evaluating a leaf, a third may be executing a cutoff, a fourth may be backing up to start a new path, a fifth may be communicating its best move, a sixth may be signaling the end of its search, and so on.

1.2.4.1 Programming MIMD Computers. As mentioned earlier, the M I M D model of parallel computation is the most general and powerful possible. Computers in this class are used to solve in parallel those problems that lack the regular structure required by the SIMD model. This generality does not come for free: Asynchronous algorithms are difficult to design, evaluate, and implement. In order to appreciate the complexity involved in programming M I M D computers, it is import-

ant to distinguish between the notion of a process and that of a processor. An asynchronous algorithm is a collection of processes some or all of which are executed

simultaneously on a number of available processors. Initially, all processors are free.

The parallel algorithm starts its execution on an arbitrarily chosen processor. Shortly thereafter it creates a number of computational tasks, or processes, t o be performed. A process thus corresponds to a section of the algorithm: There may be several processes

associated with the same algorithm section, each with a different parameter. Once a process is created, it must be executed on a processor. If a free processor

20 Introduction Chap. 1 is available, the process is assigned to the processor that performs the computations

specified by the process. Otherwise (if no free processor is available), the process is queued and waits for a processor to be free.

When a processor completes execution of a process, it becomes free. If a process is waiting t o be executed, then it can be assigned to the processor just freed. Otherwise (if no process is waiting), the processor is queued and waits for a process to be created.

The order in which processes are executed by processors can obey any policy that assigns priorities to processes. For example, processes can be executed in a in-first-out or in a last-in-first-out order. Also, the availability of a processor is sometimes not sufficient for the processor to be assigned a waiting process. An additional condition may have to be satisfied before the process starts. Similarly, if a processor has already been assigned a process and an unsatisfied condition is

encountered during execution, then the processor is freed. When the condition for resumption of that process is later satisfied, a processor (not necessarily the original one) is assigned to it. These are but a few of the scheduling problems that characterize the programming of multiprocessors. Finding efficient solutions to these problems is of paramount importance if MIMD computers are to be considered useful. Note that none of these scheduling problems arise on the less flexible but easier to program

SIMD computers.

1.2.4.2 S p e c i a l - P u r p o s e A r c h i t e c t u r e s . In theory, any parallel al- gorithm can be executed efficiently on the MIMD model. The latter can therefore be used to build parallel computers with a wide variety of applications. Such computers are said to have a general-purpose architecture. In practice, by contrast, it is quite sensible in many applications t o assemble several processors in a configuration specifically designed for the problem at hand. The result is a parallel computer well suited for solving that problem very quickly but that cannot in general be used for any other purpose. Such a computer is said to have a special-purpose architecture. With a particular problem in mind, there are several ways to design a special-purpose parallel computer. For example, a collection of specialized or very simple processors may be used in one of the standard networks such as the mesh. Alternatively, one may interconnect a number of standard processors in a custom geometry. These two approaches may also be combined.

Example 1.7

Black-and-white pictures are stored in computers in the form of two-dimensional arrays.

Each array entry represents a picture element, or pixel.

A entry represents a white pixel,

a 1 entry a black pixel. The larger the array, the more pixels we have, and hence the higher the resolution, that is, the precision with which the picture is represented. Once a picture is stored in that way, it can be processed, for example, to remove any noise that may be present, increase the sharpness, fill in missing details, and determine contours of objects.

Assume that it is desired to execute a very simple noise removal algorithm that gets rid of "salt" and "pepper" in pictures, that is, sparse white dots on a black background and sparse black dots on a white background, respectively. Such an algorithm can be implemented very efficiently on a set of very simple processors in a two-dimensional

1.3 Analyzing Algorithms 21

configuration where each processor is linked to its eight closest neighbors the mesh with diagonal connections in addition to horizontal and vertical ones). Each processor corresponds to a pixel and stores its value. All the processors can now execute the following step in parallel: if a pixel is

and all its neighbors are it changes its value to

One final observation is in order in concluding this section. Having studied a variety of approaches to building parallel computers, it is natural to ask: How is one to choose a parallel computer from among the available models? We already saw how one model can use its computational abilities to simulate an algorithm designed for another model. In fact, we shall show in the next section that one processor is capable of executing any parallel algorithm. This indicates that all the models of parallel

computers are equivalent in terms of the problems that they can solve. What distinguishes one from another is the ease and speed with which it solves a particular

problem. Therefore, the range of applications for which the computer will be used and the urgency with which answers to problems are needed are important factors in

deciding what parallel computer to use. However, as with many things in life, the choice of a parallel computer is mostly dictated by economic considerations.