Handbook of Nature Inspired and Innovative Computing

  

HANDBOOK OF NATURE-INSPIRED AND INNOVATIVE COMPUTING

  Integrating Classical Models with Emerging Technologies

HANDBOOK OF NATURE-INSPIRED AND INNOVATIVE COMPUTING

  Integrating Classical Models with Emerging Technologies Edited by

  Albert Y. Zomaya The University of Sydney, Australia Library of Congress Control Number: 2005933256 Handbook of Nature-Inspired and Innovative Computing: Integrating Classical Models with Emerging Technologies Edited by Albert Y. Zomaya

  

ISBN-10: 0-387-40532-1 e-ISBN-10: 0-387-27705-6

  

ISBN-13: 978-0387-40532-2 e-ISBN-13: 978-0387-27705-9

Printed on acid-free paper. © 2006 Springer Science +Business Media, Inc.

All rights reserved. This work may not be translated or copied in whole or in part without the written

permission of the publisher (Springer Science +Business Media, Inc., 233 Spring Street, New York,

NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in

connection with any form of information storage and retrieval, electronic adaptation, computer soft-

ware, or by similar or dissimilar methodology now known or hereafter developed is forbidden.

The use in this publication of trade names, trademarks, service marks and similar terms, even if they

are not identified as such, is not to be taken as an expression of opinion as to whether or not they are

subject to proprietary rights. Printed in the United States of America. 9 8 7 6 5 4 3 2 1 SPIN 10942543 springeronline.com

  To my family for their help, support, and patience.

  Albert Zomaya

  Table of Contents Contributors ix Preface xiii Acknowledgements xv Section I: Models

  Chapter 1: Changing Challenges for Collaborative Algorithmics

  1 Arnold L. Rosenberg

  Chapter 2: ARM ++: A Hybrid Association Rule Mining Algorithm

  45 Zahir Tari and Wensheng Wu

  Chapter 3: Multiset Rule-Based Programming Paradigm for Soft-Computing in Complex Systems

  77 E.V. Krishnamurthy and Vikram Krishnamurthy

  

Chapter 4: Evolutionary Paradigms 111

Franciszek Seredynski

Chapter 5: Artificial Neural Networks 147

Javid Taheri and Albert Y. Zomaya

Chapter 6: Swarm Intelligence 187

James Kennedy Chapter 7: Fuzzy Logic 221

  Javid Taheri and Albert Y. Zomaya

  

Chapter 8: Quantum Computing 253 J. Eisert and M.M. Wolf Section II: Enabling Technologies

Chapter 9: Computer Architecture 287 Joshua J. Yi and David J. Lilja Chapter 10: A Glance at VLSI Optical Interconnects: From the Abstract Modelings of the 1980s to Today’s MEMS Implements 315 Mary M. Eshaghian-Wilner and Lili Hai viii

  Table of Contents

  

Chapter 11: Morphware and Configware 343

Reiner Hartenstein

Chapter 12: Evolving Hardware 387

Timothy G.W. Gordon and Peter J. Bentley

Chapter 13: Implementing Neural Models in Silicon 433

Leslie S. Smith

Chapter 14: Molecular and Nanoscale Computing and Technology 477

Mary M. Eshaghian-WIlner, Amar H. Flood, Alex Khitun, J. Fraser Stoddart and Kang Wang

Chapter 15: Trends in High-Performance Computing 511

Jack Dongarra Chapter 16: Cluster Computing: High-Performance, High-Availability and High-Throughput Processing on a Network of Computers 521 Chee Shin Yeo, Rajkumar Buyya, Hossein Pourreza, Rasit Eskicioglu, Peter Graham and Frank Sommers Chapter 17: Web Service Computing: Overview and Directions 553

  

Boualem Benatallah, Olivier Perrin, Fethi A. Rabhi

and Claude Godart Chapter 18: Predicting Grid Resource Performance Online

  575

Rich Wolski, Graziano Obertelli, Matthew Allen,

Daniel Nurm and John Brevik

  Section III: Application Domains

  Chapter 19: Pervasive Computing: Enabling Technologies and Challenges 613

  Mohan Kumar and Sajal K. Das

  

Chapter 20: Information Display 633

Peter Eades, Seokhee Hong, Keith Nesbitt and Masahiro Takatsuka

  

Chapter 21: Bioinformatics 657

Srinivas Aluru Chapter 22: Noise in Foreign Exchange Markets 697

  George G. Szpiro Index

  711

  Editor in Chief

  Albert Y. Zomaya Advanced Networks Research Group School of Information Technology The University of Sydney NSW 2006, Australia

  Advisory Board

  David Bader University of New Mexico Albuquerque, NM 87131, USA Richard Brent Oxford University Oxford OX1 3QD, UK Jack Dongarra University of Tennessee Knoxville, TN 37996 and Oak Ridge National Laboratory Oak Ridge, TN 37831, USA Mary Eshaghian-Wilner Dept of Electrical Engineering University of California, Los Angeles Los Angeles, CA 90095, USA Gerard Milburn University of Queensland St Lucia, QLD 4072, Australia Franciszek Seredynski Institute of Computer Science Polish Academy of Sciences Ordona 21, 01-237 Warsaw, Poland

  Authors/Co-authors of Chapters

  Matthew Allen Computer Science Dept University of California, Santa

  Barbara Santa Barbara, CA 93106, USA Srinivas Aluru Iowa State University Ames, IA 50011, USA Boualem Benatallah School of Computer Science and Engineering The University of New South

  Wales Sydney, NSW 2052, Australia Peter J. Bentley University College London London WC1E 6BT, UK John Brevik Computer Science Dept University of California, Santa

  Barbara Santa Barbara, CA 93106, USA Rajkumar Buyya Grid Computing and Distributed Systems Laboratory and NICTA

  Victoria Laboratory Dept of Computer Science and

  Software Engineering The University of Melbourne Victoria 3010, Australia

  CONTRIBUTORS

  Sajal K. Das Center for Research in Wireless

  Mobility and Networking (CReWMaN)

  The University of Texas, Arlington Arlington, TX 76019, USA Jack Dongarra University of Tennessee Knoxville, TN 37996 and Oak Ridge National Laboratory Oak Ridge, TN 37831, USA Peter Eades National ICT Australia Australian Technology Park Eveleigh NSW, Australia Jens Eisert Universität Potsdam Am Neuen Palais 10 14469 Potsdam, Germany and Imperial College London Prince Consort Road SW7 2BW London, UK Mary M. Eshaghian-Wilner Dept of Electrical Engineering University of California, Los Angeles Los Angeles, CA 90095, USA Rasit Eskicioglu Parallel and Distributed Systems

  Laboratory Dept of Computer Sciences The University of Manitoba Winniepeg, MB R3T 2N2, Canada Amar H. Flood Dept of Chemistry University of California, Los Angeles Los Angeles, CA 90095, USA Claude Godart

  INRIA-LORIA F-54506 Vandeuvre-lès-Nancy Cedex, France Timothy G. W. Gordon University College London London WC1E 6BT, UK

  Peter Graham Parallel and Distributed Systems

  Laboratory Dept of Computer Sciences The University of Manitoba Winniepeg, MB R3T 2N2, Canada Lili Hai State University of New York College at Old Westbury Old Westbury, NY 11568–0210, USA Reiner Hartenstein TU Kaiserslautern Kaiserslautern, Germany Seokhee Hong National ICT Australia Australian Technology Park Eveleigh NSW, Australia Jim Kennedy Bureau of Labor Statistics Washington, DC 20212, USA Alex Khitun Dept of Electrical Engineering University of California,

  Los Angeles Los Angeles, CA 90095, USA

  E. V. Krishnamurthy Computer Sciences Laboratory Australian National University,

  Canberra ACT 0200, Australia Vikram Krishnamurthy Dept of Electrical and Computer Engineering University of British Columbia Vancouver, V6T 1Z4, Canada Mohan Kumar Center for Research in Wireless

  Mobility and Networking (CReWMaN)

  The University of Texas, Arlington

  Arlington, TX 76019, USA x

  Contributors

  Contributors

  xi David J. Lilja Dept of Electrical and Computer

  Engineering University of Minnesota 200 Union Street SE Minneapolis, MN 55455, USA Keith Nesbitt Charles Sturt University School of Information Technology

  Panorama Ave Bathurst 2795, Australia Daniel Nurmi Computer Science Dept University of California, Santa

  Barbara Santa Barbara, CA 93106, USA Graziano Obertelli Computer Science Dept University of California, Santa

  Barbara Santa Barbara, CA 93106, USA Olivier Perrin

  INRIA-LORIA F-54506 Vandeuvre-lès-Nancy Cedex, France Hossein Pourreza Parallel and Distributed Systems

  Laboratory Dept of Computer Sciences The University of Manitoba Winniepeg, MB R3T 2N2, Canada Fethi A. Rabhi School of Information Systems,

  Technology and Management The University of New South Wales Sydney, NSW 2052, Australia Arnold L. Rosenberg Dept of Computer Science University of Massachusetts Amherst Amherst, MA 01003, USA Franciszek Seredynski Institute of Computer Science Polish Academy of Sciences Ordona 21, 01-237 Warsaw, Poland

  Leslie Smith Dept of Computing Science and

  Mathematics University of Stirling Stirling FK9 4LA, Scotland Frank Sommers Autospaces, LLC 895 S. Norton Avenue Los Angeles, CA 90005, USA J. Fraser Stoddart Dept of Chemistry University of California,

  Los Angeles Los Angeles, CA 90095, USA George G. Szpiro P.O.Box 6278, Jerusalem, Israel Javid Taheri Advanced Networks Research Group School of Information Technology The University of Sydney NSW 2006, Australia Masahiro Takatsuka The University of Sydney School of Information Technology NSW 2006, Australia Zahir Tari Royal Melbourne Institute of

  Technology School of Computer Science Melbourne, Victoria 3001, Australia Kang Wang Dept of Electrical Engineering University of California, Los Angeles Los Angeles, CA 90095, USA M.M. Wolf Max-Planck-Institut für Quantenoptik Hans-Kopfermann-Str. 1 85748 Garching, Germany Rich Wolski Computer Science Dept University of California, Santa

  Barbara Santa Barbara, CA 93106, USA xii

  Contributors

  Chee Shin Yeo Albert Y. Zomaya Grid Computing and Distributed Advanced Networks Research

  Systems Laboratory and NICTA Group Victoria Laboratory School of Information Technology

  Dept of Computer Science and The University of Sydney Software Engineering NSW 2006, Australia

  The University of Melbourne Victoria 3010, Australia Joshua J. Yi Freescale Semiconductor Inc, 7700 West Parmer Lane Austin, TX 78729, USA

  PREFACE

  The proliferation of computing devices in every aspect of our lives increases the demand for better understanding of emerging computing paradigms. For the last fifty years most, if not all, computers in the world have been built based on the von Neumann model, which in turn was inspired by the theoretical model proposed by Alan Turing early in the twentieth century. A Turing machine is the most famous theoretical model of computation (A. Turing, On Computable Numbers, with an Application to the Entscheidungsproblem, Proc. London Math.

  

Soc. (ser. 2), 42, pp. 230–265, 1936. Corrections appeared in: ibid., 43 (1937),

pp. 544–546.) that can be used to study a wide range of algorithms.

  The von Neumann model has been used to build computers with great success. It has also been extended to the development of the early supercomputers and we can also see its influence on the design of some of the high performance com- puters of today. However, the principles espoused by the von Neumann model are not adequate for solving many of the problems that have great theoretical and practical importance. In general, a von Neumann model is required to execute a precise algorithm that can manipulate accurate data. In many problems such con- ditions cannot be met. For example, in many cases accurate data are not available or a “fixed” or “static” algorithm cannot capture the complexity of the problem under study.

  Therefore, The Handbook of Nature-Inspired and Innovative Computing: Integrating Classical Models with Emerging Technologies seeks to provide an opportunity for researchers to explore the new computational paradigms and their impact on computing in the new millennium. The handbook is quite timely since the field of computing as a whole is undergoing many changes. Vast litera- ture exists today on such new paradigms and their implications for a wide range of applications -a number of studies have reported on the success of such tech- niques in solving difficult problems in all key areas of computing.

  The book is intended to be a Virtual Get Together of several researchers that one could invite to attend a conference on `futurism’ dealing with the theme of

  st

  Computing in the 21 Century. Of course, the list of topics that is explored here is by no means exhaustive but most of the conclusions provided can be extended to other research fields that are not covered here. There was a decision to limit the number of chapters while providing more pages for contributed authors to express their ideas, so that the handbook remains manageable within a single volume. xiv

  Preface

  It is also hoped that the topics covered will get readers to think of the impli- cations of such new ideas for developments in their own fields. Further, the enabling technologies and application areas are to be understood very broadly and include, but are not limited to, the areas included in the handbook.

  The handbook endeavors to strike a balance between theoretical and practical coverage of a range of innovative computing paradigms and applications. The handbook is organized into three main sections: (I) Models, (II) Enabling Technologies and (III) Application Domains; and the titles of the different chap- ters are self-explanatory to what is covered. The handbook is intended to be a repository of paradigms, technologies, and applications that target the different facets of the process of computing.

  The book brings together a combination of chapters that normally don’t appear in the same space in the wide literature, such as bioinformatics, molecular computing, optics, quantum computing, and others. However, these new para- digms are changing the face of computing as we know it and they will be influ- encing and radically revolutionizing traditional computational paradigms. So, this volume catches the wave at the right time by allowing the contributors to explore with great freedom and elaborate on how their respective fields are con- tributing to re-shaping the field of computing.

  The twenty-two chapters were carefully selected to provide a wide scope with minimal overlap between the chapters so as to reduce duplications. Each contrib- utor was asked to cover review material as well as current developments. In addi- tion, the choice of authors was made so as to select authors who are leaders in the respective disciplines.

  ACKNOWLEDGEMENTS

  First and foremost we would like to thank and acknowledge the contributors to this volume for their support and patience, and the reviewers for their useful comments and suggestions that helped in improving the earlier outline of the handbook and presentation of the material. Also, I should extend my deepest thanks to Wayne Wheeler and his staff at Springer (USA) for their collaboration, guidance, and most importantly, patience in finalizing this handbook. Finally, I would like to acknowledge the efforts of the team from Springer’s production department for their extensive efforts during the many phases of this project and the timely fashion in which the book was produced.

  Albert Y. Zomaya Chapter 1 CHANGING CHALLENGES FOR COLLABORATIVE ALGORITHMICS Arnold L. Rosenberg University of Massachusetts at Amherst

Abstract

Technological advances and economic considerations have led to a wide

variety of modalities of collaborative computing: the use of multiple comput-

ing agents to solve individual computational problems. Each new modality

creates new challenges for the algorithm designer. Older “parallel” algorith-

mic devices no longer work on the newer computing platforms (at least in

their original forms) and/or do not address critical problems engendered by

the new platforms’ characteristics. In this chapter, the field of collaborative

algorithmics is divided into four epochs, representing (one view of) the major

evolutionary eras of collaborative computing platforms. The changing chal-

lenges encountered in devising algorithms for each epoch are discussed, and

some notable sophisticated responses to the challenges are described.

1 INTRODUCTION

  Collaborative computing is a regime of computation in which multiple agents

  are enlisted in the solution of a single computational problem. Until roughly one decade ago, it was fair to refer to collaborative computing as parallel computing. Developments engendered by both economic considerations and technological advances make the older rubric both inaccurate and misleading, as the multi-

  

processors of the past have been joined by clusters—independent computers inter-

  connected by a local-area network (LAN)—and by various modalities of Internet

  

computing—loose confederations of computing agents of differing levels of com-

  mitment to the common computing enterprise. The agents in the newer collabo- rative computing milieux often do their computing at their own times and in their own locales—definitely not “in parallel.”

  Every major technological advance in all areas of computing creates signifi- cant new scheduling challenges even while enabling new levels of computational

  2 Arnold L. Rosenberg efficiency (measured in time and/or space and/or cost). This chapter presents one algorithmicist’s view of the paradigm-challenges milestones in the evolution of collaborative computing platforms and of the algorithmic challenges each change in paradigm has engendered. The chapter is organized around a some- what eccentric view of the evolution of collaborative computing technology through four “epochs,” each distinguished by the challenges one faced when devising algorithms for the associated computing platforms.

  1. In the epoch of shared-memory multiprocessors: One had to cope with partitioning one’s computational job into dis- joint subjobs that could proceed in parallel on an assemblage of identi- cal processors. One had to try to keep all processors fruitfully busy as much of the time as possible. (The qualifier “fruitfully” indicates that the processors are actually working on the problem to be solved, rather than on, say, bookkeeping that could be avoided with a bit more cleverness.)

  Communication between processors was effected through shared vari- ables, so one had to coordinate access to these variables. In particular, one had to avoid the potential races when two (or more) processors simultaneously vied for access to a single memory module, especially when some access was for the purpose of writing to the same shared variable. Since all processors were identical, one had, in many situations, to craft protocols that gave processors separate identities—the process of so- called symmetry breaking or leader election. (This was typically neces- sary when one processor had to take a coordinating role in an algorithm.)

  2. The epoch of message-passing multiprocessors added to the technology of the preceding epoch a user-accessible interconnection network—of known structure—across which the identical processors of one’s parallel computer communicated. On the one hand, one could now build much larger aggregations of processors than one could before. On the other hand:

  One now had to worry about coordinating the routing and transmission of messages across the network, in order to select short paths for mes- sages, while avoiding congestion in the network.

  One had to organize one’s computation to tolerate the often-consider- able delays caused by the point-to-point latency of the network and the effects of network bandwidth and congestion.

  Since many of the popular interconnection networks were highly sym- metric, the problem of symmetry breaking persisted in this epoch. Since communication was now over a network, new algorithmic avenues were needed to achieve symmetry breaking. Since the structure of the interconnection network underlying one’s multiprocessor was known, one could—and was well advised to—allo- cate substantial attention to network-specific optimizations when designing algorithms that strove for (near) optimality. (Typically, for instance, one would strive to exploit locality: the fact that a processor was closer to some processors than to others.) A corollary of this fact

  Changing Challenges for Collaborative Algorithmics

  3 is that one often needed quite disparate algorithmic strategies for dif- ferent classes of interconnection networks.

  3. The epoch of clusters—also known as networks of workstations (NOWs, for short)—introduced two new variables into the mix, even while rendering many sophisticated multiprocessor-based algorithmic tools obsolete. In Section 3, we outline some algorithmic approaches to the following new challenges.

  The computing agents in a cluster—be they pc’s, or multiprocessors, or the eponymous workstations—are now independent computers that communicate with each other over a local-area network (LAN). This means that communication times are larger and that communication pro- tocols are more ponderous, often requiring tasks such as breaking long messages into packets, encoding, computing checksums, and explicitly setting up communications (say, via a hand-shake). Consequently, tasks must now be coarser grained than with multiprocessors, in order to amortize the costs of communication. Moreover, the respective compu- tations of the various computing agents can no longer be tightly coupled, as they could be in a multiprocessor. Further, in general, network latency can no longer be “hidden” via the sophisticated techniques developed for multiprocessors. Finally, one can usually no longer translate knowledge of network topology into network-specific optimizations. The computing agents in the cluster, either by design or chance (such as being purchased at different times), are now often heterogeneous, dif- fering in speeds of processors and/or memory systems. This means that a whole range of algorithmic techniques developed for the earlier epochs of collaborative computing no longer work—at least in their original forms [127]. On the positive side, heterogeneity obviates sym- metry breaking, as processors are now often distinguishable by their unique combinations of computational resources and speeds.

  4. The epoch of Internet computing, in its several guises, has taken the algo- rithmics of collaborative computing precious near to—but never quite reaching—that of distributed computing. While Internet computing is still evolving in often-unpredictable directions, we detail two of its circa-2003 guises in Section 4. Certain characteristics of present-day Internet com- puting seem certain to persist.

  One now loses several types of predictability that played a significant background role in the algorithmics of prior epochs.

  • – Interprocessor communication now takes place over the Internet. In this environment:
    • a message shares the “airwaves” with an unpredictable number and assemblage of other messages; it may be dropped and resent; it may be routed over any of myriad paths. All of these factors make it impossible to predict a message’s transit time.
    • a message may be accessible to unknown (and untrusted) sites, increasing the need for security-enhancing measures.

  • – The predictability of interactions among collaborating comput- ing agents that anchored algorithm development in all prior epochs no longer obtains, due to the fact that remote agents are typically not

  4 Arnold L. Rosenberg dedicated to the collaborative task. Even the modalities of Internet computing in which remote computing agents promise to complete computational tasks that are assigned to them typically do not guar- antee when. Moreover, even the guarantee of eventual computation is not present in all modalities of Internet computing: in some modalities remote agents cannot be relied upon ever to complete assigned tasks.

  In several modalities of Internet computing, computation is now unre-

  liable in two senses:

  • – The computing agent assigned a task may, without announcement,

  “resign from” the aggregation, abandoning the task. (This is the extreme form of temporal unpredictability just alluded to.)

  • – Since remote agents are unknown and anonymous in some modal- ities, the computing agent assigned a task may maliciously return fallacious results. This latter threat introduces the need for computa- tion-related security measures (e.g., result-checking and agent moni- toring) for the first time to collaborative computing. This problem is discussed in a news article at

  ⟨http://www.wired.com/news/technology/ 0,1282,41838,00.html ⟩. In succeeding sections, we expand on the preceding discussion, defining the collaborative computing platforms more carefully and discussing the resulting challenges in more detail. Due to a number of excellent widely accessible sources that discuss and analyze the epochs of multiprocessors, both shared-memory and message-passing, our discussion of the first two of our epochs, in Section 2, will be rather brief. Our discussion of the epochs of cluster computing (in Section 3) and Internet computing (in Section 4) will be both broader and deeper. In each case, we describe the subject computing platforms in some detail and describe a variety of sophisticated responses to the algorithmic challenges of that epoch. Our goal is to highlight studies that attempt to develop algorithmic strategies that respond in novel ways to the challenges of an epoch. Even with this goal in mind, the reader should be forewarned that her guide has an eccentric view of the field, which may differ from the views of many other collaborative algorithmicists; some of the still-evolving collaborative computing platforms we describe will soon disappear, or at least morph into possibly unrecognizable forms; some of the “sophisticated responses” we discuss will never find application beyond the specific studies they occur in.

  This said, I hope that this survey, with all of its limitations, will convince the reader of the wonderful research opportunities that await her “just on the other side” of the systems and applications literature devoted to emerging collaborative computing technologies.

2 THE EPOCHS OF MULTIPROCESSORS

  The quick tour of the world of multiprocessors in this section is intended to convey a sense of what stimulated much of the algorithmic work on collaborative

  Changing Challenges for Collaborative Algorithmics

  5 computing on this computing platform. The following books and surveys pro- vide an excellent detailed treatment of many subjects that we only touch upon and even more topics that are beyond the scope of this chapter: [5, 45, 50, 80, 93, 97, 134].

2.1 Multiprocessor Platforms

  As technology allowed circuits to shrink, starting in the 1970s, it became fea- sible to design and fabricate computers that had many processors. Indeed, a few theorists had anticipated these advances in the 1960s [79]. The first attempts at designing such multiprocessors envisioned them as straightforward extensions of the familiar von Neumann architecture, in which a processor box—now pop- ulated with many processors—interacted with a single memory box; processors would coordinate and communicate with each other via shared variables. The resulting shared-memory multiprocessors were easy to think about, both for computer architects and computer theorists [61]. Yet using such multiproces- sors effectively turned out to present numerous challenges, exemplified by the following:

  Where/how does one identify the parallelism in one’s computational problem? This question persists to this day, feasible answers changing with evolving technology. Since there are approaches to this question that often do not appear in the standard references, we shall discuss the problem briefly in Section 2.2.

  How does one keep all available processors fruitfully occupied—the problem of load balancing? One finds sophisticated multiprocessor-based approaches to this problem in primary sources such as [58, 111, 123, 138].

  How does one coordinate access to shared data by the several processors of a multiprocessor (especially, a shared-memory multiprocessor)? The difficulty of this problem increases with the number of processors. One significant approach to sharing data requires establishing order among a multiprocessor’s indistinguishable processors by selecting “leaders” and “subleaders,” etc. How does one efficiently pick a “leader” among indistinguishable processors— the problem of symmetry breaking? One finds sophisticated solutions to this problem in primary sources such as [8, 46, 107, 108].

  A variety of technological factors suggest that shared memory is likely a bet- ter idea as an abstraction than as a physical actuality. This fact led to the devel- opment of distributed shared memory multiprocessors, in which each processor had its own memory module, and access to remote data was through an inter- connection network. Once one had processors communicating over an intercon- nection network, it was a small step from the distributed shared memory abstraction to explicit message-passing, i.e., to having processors communicate with each other directly rather than through shared variables. In one sense, the introduction of interconnection networks to parallel architectures was liberating: one could now (at least in principle) envision multiprocessors with many thou- sands of processors. On the other hand, the explicit algorithmic use of networks gave rise to a new set of challenges:

  ●

  How can one route large numbers of messages within a network without engen- dering congestion (“hot spots”) that renders communication insufferably slow? This is one of the few algorithmic challenges in parallel computing that has an acknowledged champion. The two-phase randomized routing strategy devel- oped in [150, 154] provably works well in a large range of interconnection net- works (including the popular butterfly and hypercube networks) and empirically works well in many others. Can one exploit the new phenomenon—locality—that allows certain pairs of processors to intercommunicate faster than others? The fact that locality can be exploited to algorithmic advantage is illustrated in [1, 101]. The phenome- non of locality in parallel algorithmics is discussed in [124, 156]. How can one cope with the situation in which the structure of one’s compu- tational problem—as exposed by the graph of data dependencies—is incom- patible with the structure of the interconnection network underlying the multiprocessor that one has access to? This is another topic not treated fully in the references, so we discuss it briefly in Section 2.2. How can one organize one’s computation so that one accomplishes valuable work while awaiting responses from messages, either from the memory sub- system (memory accesses) or from other processors? A number of innovative and effective responses to variants of this problem appear in the literature; see, e.g., [10, 36, 66]. In addition to the preceding challenges, one now also faced the largely unan- ticipated, insuperable problem that one’s interconnection network may not

  “scale.” Beginning in 1986, a series of papers demonstrated that the physical realizations of large instances of the most popular interconnection networks could not provide performance consistent with idealized analyses of those net- works [31, 155, 156, 157]. A word about this problem is in order, since the phe- nomenon it represents influences so much of the development of parallel architectures. We live in a three-dimensional world: areas and volumes in space grow polynomially fast when distances are measured in units of length. This physical polynomial growth notwithstanding, for many of the algorithmically attractive interconnection networks—hypercubes, butterfly networks, and de

  

Bruijn networks, to name just three—the number of nodes (read: “processors”)

  grows exponentially when distances are measured in number of interprocessor links. This means, in short, that the interprocessor links of these networks must grow in length as the networks grow in number of processors. Analyses that pre-

  

dict performance in number of traversed links do not reflect the effect of link-

length on actual performance. Indeed, the analysis in [31] suggests—on the

  preceding grounds—that only the polynomially growing meshlike networks can supply in practice efficiency commensurate with idealized theoretical analyses.

  1

  6 Arnold L. Rosenberg

  1 Figure 1.1 depicts the four mentioned networks. See [93, 134] for definitions and discussions of

these and related networks. Additional sources such as [4, 21, 90] illustrate the algorithmic use

of such networks.

  Changing Challenges for Collaborative Algorithmics 2,0 2,1 2,2 2,3 1,0 1,1 1,2 1,3 0,0 0,1 0,2 0,3 001 011

  7 0000 0010 1010 1000 3,0 3,1 3,2 3,3 000 010 101 111 Level 100 110 000 001 010 011 100 101 110 111 0100 0110 1110 1100 0101 0111 1111 1101 0001 0011 1011 1001 2 000 001 010 011 100 101 110 111 1 000 001 010 011 100 101 110 111 000 001 010 011 100 101 110 111

Figure 1.1. Four interconnection networks. Row 1: the 4

  ¥ 4 mesh and the 3-dimensional de Bruijn

network; row 2: the 4-dimensional boolean hypercube and the 3-level butterfly network (note the

two copies of level 0)

  We now discuss briefly a few of the challenges that confronted algorithmicists during the epochs of multiprocessors. We concentrate on topics that are not treated extensively in books and surveys, as well as on topics that retain their rel- evance beyond these epochs.

2.2 Algorithmic Challenges and Responses

  Finding Parallelism. The seminal study [37] was the first to systematically

  distinguish between the inherently sequential portion of a computation and the parallelizable portion. The analysis in that source led to Brent’s Scheduling

  

Principle, which states, in simplest form, that the time for a computation on

  a p-processor computer need be no greater than t + n/p, where t is the time for the inherently sequential portion of the computation and n is the total num- ber of operations that must be performed. While the study illustrates how to achieve the bound of the Principle for a class of arithmetic computations, it leaves open the challenge of discovering the parallelism in general computa- tions. Two major approaches to this challenge appear in the literature and are discussed here.

  Parallelizing computations via clustering/partitioning. Two related major

  approaches have been developed for scheduling computations on parallel com- puting platforms, when the computation’s intertask dependencies are represented by a computation-dag—a directed acyclic graph, each of whose arcs (xy) beto- kens the dependence of task y on task x; sources never appear on the right-hand side of an arc; sinks never appear on the left-hand side.

  The first such approach is to cluster a computation-dag’s tasks into “blocks” whose tasks are so tightly coupled that one would want to allocate each block to a single processor to obviate any communication when executing these tasks. A number of efficient heuristics have been developed to effect such clustering for general computation-dags [67, 83, 103, 139]. Such heuristics typically base their clustering on some easily computed characteristic of the dag, such as its critical

  8 Arnold L. Rosenberg

  

path—the most resource-consuming source-to-sink path, including both compu-

  tation time and volume of intertask data—or its dominant sequence—a source-to- sink path, possibly augmented with dummy arcs, that accounts for the entire makespan of the computation. Several experimental studies compare these heuristics in a variety of settings [54, 68], and systems have been developed to exploit such clustering in devising schedules [43, 140, 162]. Numerous algorithmic studies have demonstrated analytically the provable effectiveness of this approach for special scheduling classes of computation-dags [65, 117].

  Dual to the preceding clustering heuristics is the process of clustering by graph

  

separation. Here one seeks to partition a computation-dag into subdags by “cut-

  ting” arcs that interconnect loosely coupled blocks of tasks. When the tasks in each block are mapped to a single processor, the small numbers of arcs intercon- necting pairs of blocks lead to relatively small—hence, inexpensive—interproces- sor communications. This approach has been studied extensively in the parallel-algorithms literature with regard to myriad applications, ranging from circuit layout to numerical computations to nonserial dynamic programming. A small sampler of the literature on specific applications appears in [28, 55, 64, 99, 106]; heuristics for accomplishing efficient graph partitioning (especially into roughly equal-size subdags) appear in [40, 60, 82]; further sample applications, together with a survey of the literature on algorithms for finding graph separa- tors, appears in [134].

  Parallelizing using dataflow techniques. A quite different approach to finding

  parallelism in computations builds on the flow of data in the computation. This approach originated with the VLSI revolution fomented by Mead and Conway [105], which encouraged computer scientists to apply their tools and insights to the problem of designing computers. Notable among the novel ideas emerging from this influx was the notion of systolic array—a dataflow-driven special-pur- pose parallel (co)processor [86, 87]. A major impetus for the development of this area was the discovery, in [109, 120], that for certain classes of computations— including, e.g., those specifiable via nested for-loops—such machines could be designed “automatically.” This area soon developed a life of its own as a tech- nique for finding parallelism in computations, as well as for designing special-pur- pose parallel machines. There is now an extensive literature on the use of systolic design principles for a broad range of specific computations [38, 39, 89, 91, 122], as well as for large general classes of computations that are delimited by the struc- ture of their flow of data [49, 75, 109, 112, 120, 121].

  Mismatches between network and job structure. Parallel efficiency in multi-

  processors often demands using algorithms that accommodate the structure of one’s computation to that of the host multiprocessor’s network. This was noticed by systems builders [71] as well as algorithms designers [93, 149]. The reader can appreciate the importance of so tuning one’s algorithm by perusing the following studies of the operation of sorting: [30, 52, 52, 74, 77, 92, 125, 141, 148]. The overall groundrules in these studies are constant: one is striving to minimize the worst-case number of comparisons when sorting n numbers; only the underlying interconnection network changes. We now briefly describe two broadly applicable approaches to addressing potential mismatches with the host network.

  Changing Challenges for Collaborative Algorithmics

  9 Network emulations. The theory of network emulations focuses on the prob- lem of making one computation-graph—the host—“act like” or “look like” another—the guest. In both of the scenarios that motivate this endeavor, the host

  

H represents an existing interconnection network. In one scenario, the guest G is

  a directed graph that represents the intertask dependencies of a computation. In the other scenario, the guest G is an undirected graph that represents an ideal interconnection network that would be a congenial host for one’s computation. In l inter- both scenarios, computational efficiency would clearly be enhanced if H s l —or could be made to appear to. connection structure matched G s

  Almost all approaches to network emulation build on the theory of graph embeddings, which was first proposed as a general computational tool in [126].

  V , E ) into the graph H = ( V , E ) con-

  An embedding ⟨a, r⟩ of the graph G = (

  G G H H

  sists of a one-to-one map a : V

  V

  , together with a mapping of E into paths

  G " H G

  !

  u u E , the path r(u, u) connects nodes a(u) and

  in H such that, for each edge ( , )

  G

  a(u) in H . The two main measures of the quality of the embedding ⟨a, r⟩ are the

  

dilation, which is the length of the longest path of H that is the image, under r,

  of some edge of G ; and the congestion, which is the maximum, over all edges e of

  

H , of the number of r-paths in which edge e occurs. In other words, it is the max-

imum number of edges of G that are routed across e by the embedding.

  It is easy to use an embedding of a network G into a network H to translate an algorithm designed for G into a computationally equivalent algorithm for H . Basically: the mapping a identifies which node of H is to emulate which node of

  

G ; the mapping r identifies the routes in H that are used to simulate internode

  message-passing in G . This sketch suggests why the quantitative side of network- emulations-via-embeddings focuses on dilation and congestion as the main meas- ures of the quality of an embedding. A moment’s reflection suggests that, when one uses an embedding

  ⟨a, r⟩ of a graph G into a graph H as the basis for an emulation of G by H , any algorithm that is designed for G is slowed down by a factor O(congestion × dilation) when run on H . One can sometimes easily orches- trate communications to improve this factor to O(congestion + dilation); cf. [13]. Remarkably, one can always improve the slowdown to O(congestion + dilation): a nonconstructive proof of this fact appears in [94], and, even more remarkably, a constructive proof and efficient algorithm appear in [95].