Morgan Kaufmann The Art Of Multiprocessor Programming Mar 2008 ISBN 0123705916 pdf

  The Art of Multiprocessor Programming

  This page intentionally left blank

  The Art of Multiprocessor Programming Maurice Herlihy Nir Shavit AMSTERDAM • BOSTON • HEIDELBERG • LONDON NEW YORK • OXFORD • PARIS • SAN DIEGO

  SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO

  Acquisitions Editor Tiffany Gasbarrini Publishing Services Manager George Morrison Senior Production Editor Paul Gottehrer Cover Design Alisa Andreola Composition diacriTech Interior printer Sheridan Books Cover printer Phoenix Color Corp.

  Morgan Kaufmann Publishers is an imprint of Elsevier.

  30 Corporate Drive, Suite 400, Burlington, MA 01803, USA This book is printed on acid-free paper.

  © 2008 by Elsevier Inc. All rights reserved. Copyright Designations used by companies to distinguish their products are often claimed as trademarks or registered trademarks. In all instances in which Morgan Kaufmann Publishers is aware of a claim, the product names appear in initial capital or all capital letters. Readers, however, should contact the appropriate companies for more complete information regarding trademarks and registration.

  No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means—electronic, mechanical, photocopying, scanning, or otherwise—without prior written permission of the publisher. Permissions may be sought directly from Elsevier’s Science & Technology Rights Department in Oxford, UK: phone: (+44) 1865 843830, fax: (+44) 1865 853333, E-mail: permissions@elsevier.com. You may also complete your request online via the Elsevier homepage (http://elsevier.com), by selecting “Support & Contact” then “Copyright and Permission” and then “Obtaining Permissions.”

  Library of Congress Cataloging-in-Publication Data

  Application submitted

  ISBN: 978-0-12-370591-4 For information on all Morgan Kaufmann publications, visit our Web site at www.mkp.com or www.books.elsevier.com

  Printed and bound in the United States of America 09 10 11 12 13 5 4 3 2 1

  

For my parents, David and Patricia Herlihy, and for Liuba, David, and Anna.

  For my parents, Noun and Aliza, my beautiful wife Shafi, and my kids, Yonadav and Lior, for their love and their patience, their incredible,

unbelievable, and unwavering patience, throughout the writing of this book.

  This page intentionally left blank

  Contents

  12

  2.1 Time

  21

  2 Mutual Exclusion

  19

  PRINCIPLES

  16 I

  1.8 Exercises

  15

  1.7 Chapter Notes

  15

  1.6 Parallel Programming

  13

  1.5 The Harsh Realities of Parallelization

  1.4 The Readers–Writers Problem

  Acknowledgments xvii

  10

  1.3 The Producer–Consumer Problem

  9

  1.2.2 The Moral

  8

  1.2.1 Properties of Mutual Exclusion

  6

  1.2 A Fable

  3

  1.1 Shared Objects and Synchronization

  1

  1 Introduction

  Preface xix

  21

  viii Contents

  2.3 2-Thread Solutions

  24

  2.3.1 The LockOne Class

  25

  26

  2.3.2 The LockTwo Class

  27

  2.3.3 The Peterson Lock

  2.4 The Filter Lock

  28

  2.5 Fairness

  31

  2.6 Lamport’s Bakery Algorithm

  31

  2.7 Bounded Timestamps

  33

  2.8 Lower Bounds on the Number of Locations

  37

  2.9 Chapter Notes

  40

  2.10 Exercises

  41

3 Concurrent Objects

  45

  3.1 Concurrency and Correctness

  45

  3.2 Sequential Objects

  48

  3.3 Quiescent Consistency

  49

  3.3.1 Remarks

  51

  3.4 Sequential Consistency

  51

  52

  3.4.1 Remarks

  3.5 Linearizability

  54

  3.5.1 Linearization Points

  55

  3.5.2 Remarks

  55

  3.6 Formal Definitions

  55

  3.6.1 Linearizability

  57

  3.6.2 Compositional Linearizability

  57

  3.6.3 The Nonblocking Property

  58

  3.7 Progress Conditions

  59

  3.7.1 Dependent Progress Conditions

  60

  3.8 The Java Memory Model

  61

  3.8.1 Locks and Synchronized Blocks

  62 Contents ix

  3.9 Remarks

  64

  3.10 Chapter Notes

  65

  3.11 Exercises

  66

  4 Foundations of Shared Memory

  71

  4.1 The Space of Registers

  72

  4.2 Register Constructions

  77

  4.2.1 MRSW Safe Registers

  78

  4.2.2 A Regular Boolean MRSW Register

  78

  4.2.3 A Regular M-Valued MRSW Register

  79

  4.2.4 An Atomic SRSW Register

  81

  4.2.5 An Atomic MRSW Register

  82

  4.2.6 An Atomic MRMW Register

  85

  4.3 Atomic Snapshots

  87

  4.3.1 An Obstruction-Free Snapshot

  87

  4.3.2 A Wait-Free Snapshot

  88

  4.3.3 Correctness Arguments

  90

  4.4 Chapter Notes

  93

  4.5 Exercises

  94

  5 The Relative Power of Primitive

  99 Synchronization Operations

  5.1 Consensus Numbers 100

  5.1.1 States and Valence 101

  5.2 Atomic Registers 103

  5.3 Consensus Protocols 106

  5.4 FIFO Queues 106

  5.5 Multiple Assignment Objects 110

  5.6 Read–Modify–Write Operations 112

  5.7 Common2 RMW Operations 114

  5.8 The compareAndSet() Operation 116

  5.9 Chapter Notes 117

  x Contents

  6 Universality of Consensus 125

  6.1 Introduction 125

  6.2 Universality 126

  6.3 A Lock-Free Universal Construction 126

  6.4 A Wait-Free Universal Construction 130

  6.5 Chapter Notes 136

  6.6 Exercises 137

  PRACTICE

  139

  II

  7 Spin Locks and Contention 141

  7.1 Welcome to the Real World 141

  7.2 Test-And-Set Locks 144

  7.3 TAS-Based Spin Locks Revisited 146

  7.4 Exponential Backoff 147

  7.5 Queue Locks 149

  7.5.1 Array-Based Locks 150

  7.5.2 The CLH Queue Lock 151

  7.5.3 The MCS Queue Lock 154

  7.6 A Queue Lock with Timeouts 157

  7.7 A Composite Lock 159

  7.7.1 A Fast-Path Composite Lock 165

  7.8 Hierarchical Locks 167

  7.8.1 A Hierarchical Backoff Lock 167

  7.8.2 A Hierarchical CLH Queue Lock 168

  7.9 One Lock To Rule Them All 173

  7.10 Chapter Notes 173

  7.11 Exercises 174

  8 Monitors and Blocking Synchronization 177

  Contents xi

  9.4 Coarse-Grained Synchronization 200

  10.4 An Unbounded Total Queue 229

  10.3 A Bounded Partial Queue 225

  10.2 Queues 224

  10.1 Introduction 223

  

10 Concurrent Queues and the ABA Problem 223

  9.11 Exercises 219

  9.10 Chapter Notes 219

  9.9 Discussion 218

  9.8 Non-Blocking Synchronization 213

  9.7 Lazy Synchronization 208

  9.6 Optimistic Synchronization 205

  9.5 Fine-Grained Synchronization 201

  9.3 Concurrent Reasoning 198

  8.2 Monitor Locks and Conditions 178

  9.2 List-Based Sets 196

  9.1 Introduction 195

  

9 Linked Lists: The Role of Locking 195

  8.7 Exercises 190

  8.6 Chapter Notes 189

  8.5 Semaphores 189

  8.4 Our Own Reentrant Lock 187

  8.3.2 Fair Readers–Writers Lock 185

  8.3.1 Simple Readers–Writers Lock 184

  8.3 Readers–Writers Locks 183

  8.2.2 The Lost-Wakeup Problem 181

  8.2.1 Conditions 179

  10.5 An Unbounded Lock-Free Queue 230

  xii Contents

  10.7 Dual Data Structures 238

  10.8 Chapter Notes 241

  10.9 Exercises 241

  11 Concurrent Stacks and Elimination 245

  11.1 Introduction 245

  11.2 An Unbounded Lock-Free Stack 245

  11.3 Elimination 248

  11.4 The Elimination Backoff Stack 249 249

  11.4.1 A Lock-Free Exchanger 251

  11.4.2 The Elimination Array

  11.5 Chapter Notes 255

  11.6 Exercises 255

  12 Counting, Sorting, and Distributed Coordination

  259

  12.1 Introduction 259

  12.2 Shared Counting 259

  12.3 Software Combining 260

  12.3.1 Overview 261

  12.3.2 An Extended Example 267

  12.3.3 Performance and Robustness 269

  12.4 Quiescently Consistent Pools and Counters 269

  12.5 Counting Networks 270

  12.5.1 Networks That Count 270

  12.5.2 The Bitonic Counting Network 273

  12.5.3 Performance and Pipelining 280

  12.6 Diffracting Trees 282

  12.7 Parallel Sorting 286

  12.8 Sorting Networks 286 287

  12.8.1 Designing a Sorting Network

  12.9 Sample Sorting 290

  Contents xiii

  12.11 Chapter Notes 292

  12.12 Exercises 293

  13 Concurrent Hashing and Natural Parallelism

  299

  13.1 Introduction 299

  13.2 Closed-Address Hash Sets 300 302

  13.2.1 A Coarse-Grained Hash Set

  13.2.2 A Striped Hash Set 303

  13.2.3 A Refinable Hash Set 305

  13.3 A Lock-Free Hash Set 309

  13.3.1 Recursive Split-Ordering 309

  13.3.2 The BucketList Class 312 313

  13.3.3 The LockFreeHashSet<T> Class

  13.4 An Open-Addressed Hash Set 316

  13.4.1 Cuckoo Hashing 316

  13.4.2 Concurrent Cuckoo Hashing 318

  13.4.3 Striped Concurrent Cuckoo Hashing 322

  13.4.4 A Refinable Concurrent Cuckoo Hash Set 324

  13.5 Chapter Notes 325

  13.6 Exercises 326

  

14 Skiplists and Balanced Search 329

  14.1 Introduction 329

  14.2 Sequential Skiplists 329

  14.3 A Lock-Based Concurrent Skiplist 331

  14.3.1 A Bird’s-Eye View 331

  14.3.2 The Algorithm 333

  14.4 A Lock-Free Concurrent Skiplist 339

  14.4.1 A Bird’s-Eye View 339 341

  14.4.2 The Algorithm in Detail

  14.5 Concurrent Skiplists 348

  14.6 Chapter Notes 348

  xiv Contents

  15 Priority Queues

  351

  15.1 Introduction 351

  15.1.1 Concurrent Priority Queues 351

  15.2 An Array-Based Bounded Priority Queue 352

  15.3 A Tree-Based Bounded Priority Queue 353

  15.4 An Unbounded Heap-Based Priority Queue 355

  15.4.1 A Sequential Heap 356

  15.4.2 A Concurrent Heap 357

  15.5 A Skiplist-Based Unbounded Priority Queue 363

  15.6 Chapter Notes 366

  15.7 Exercises 366

  16 Futures, Scheduling, and Work Distribution 369

  16.1 Introduction 369

  16.2 Analyzing Parallelism 375

  16.3 Realistic Multiprocessor Scheduling 378

  16.4 Work Distribution 381

  16.4.1 Work Stealing 381

  16.4.2 Yielding and Multiprogramming 381

  16.5 Work-Stealing Dequeues 382

  16.5.1 A Bounded Work-Stealing Dequeue 383

  16.5.2 An Unbounded Work-Stealing DEQueue 386 390

  16.5.3 Work Balancing

  16.6 Chapter Notes 392

  16.7 Exercises 392

  17 Barriers

  397

  17.1 Introduction 397

  17.2 Barrier Implementations 398

  17.3 Sense-Reversing Barrier 399

  17.4 Combining Tree Barrier 401

  17.5 Static Tree Barrier 402

  Contents xv

  17.7 Chapter Notes 408

  17.8 Exercises 409

18 Transactional Memory 417

  18.1 Introduction 417

  18.1.1 What is Wrong with Locking? 417

  18.1.2 What is Wrong with compareAndSet()? 418

  18.1.3 What is Wrong with Compositionality? 420

  18.1.4 What can We Do about It? 421

  18.2 Transactions and Atomicity 421

  18.3 Software Transactional Memory 424

  18.3.1 Transactions and Transactional Threads 427

  18.3.2 Zombies and Consistency 428

  18.3.3 Atomic Objects 429

  18.3.4 Dependent or Independent Progress? 431

  18.3.5 Contention Managers 431

  18.3.6 Implementing Atomic Objects 433

  18.3.7 An Obstruction-Free Atomic Object 434

  18.3.8 A Lock-Based Atomic Object 438

  18.4 Hardware Transactional Memory 445

  18.4.1 Cache Coherence 446 447

  18.4.2 Transactional Cache Coherence 447

  18.4.3 Enhancements

  18.5 Chapter Notes 448

  18.6 Exercises 449 451

  APPENDIX

  III

  453

  A Software Basics

  A.1 Introduction 453

  A.2 Java 453

  A.2.1 Threads 453

  A.2.2 Monitors 455

  xvi Contents

  A.3 C# 460

  A.3.1 Threads 460 461

  A.3.2 Monitors 462

  A.3.3 Thread-Local Objects A.4 Pthreads

  464 A.4.1 Thread-Local Storage 465

  A.5 Chapter Notes 466

  B Hardware Basics

  469 B.1 Introduction (and a Puzzle) 469 B.2 Processors and Threads 472 B.3 Interconnect

  472 B.4 Memory

  473 B.5 Caches

  473 B.5.1 Coherence 474 B.5.2 Spinning

  476 B.6 Cache-Conscious Programming, or the Puzzle

  Solved 476

  B.7 Multi-Core and Multi-Threaded Architectures 477 478

  B.7.1 Relaxed Memory Consistency B.8 Hardware Synchronization Instructions 479 B.9 Chapter Notes

  481 B.10 Exercises

  481

  Bibliography

  483

  Index

  495

  Acknowledgments

  We would like to thank Doug Lea, Michael Scott, Ron Rivest, Tom Corman, Michael Sipser, Radia Pearlman, George Varghese and Michael Sipser for their help in finding the right publication venue for our book.

  We thank all the students, colleagues, and friends who read our draft chap- ters and sent us endless lists of comments and ideas: Yehuda Afek, Shai Ber, Martin Buchholz, Vladimir Budovsky, Christian Cachin, Cliff Click, Yoav Cohen, Dave Dice, Alexandra Fedorova, Pascal Felber, Christof Fetzer, Shafi Goldwasser, Rachid Guerraoui, Tim Harris, Danny Hendler, Maor Hizkiev, Eric Koskinen, Christos Kozyrakis, Edya Ladan, Doug Lea, Oren Lederman, Pierre Leone, Yossi Lev, Wei Lu, Victor Luchangco, Virendra Marathe, John Mellor-Crummey, Mark Moir, Dan Nussbaum, Kiran Pamnany, Ben Pere, Torvald Riegel, Vijay Saraswat, Bill Scherer, Warren Schudy, Michael Scott, Ori Shalev, Marc Shapiro, Yotam Soen, Ralf Suckow, Seth Syberg, Alex Weiss, and Zhenyuan Zhao. We apologize for any names inadvertently omitted.

  We thank Mark Moir, Steve Heller, and our colleagues in the Scalable Syn- chronization group at Sun Microsystems for their incredible support during the writing of the book.

  

This book offers complete code for all the examples, as well as

slides, updates, and other useful tools on its companion web page

at: http://books.elsevier.com/companions/9780123705914

  Preface

  This book is intended to serve both as a textbook for a senior-level undergraduate course, and as a reference for practitioners.

  Readers should know enough discrete mathematics to understand “big-O” notation, and what it means for a problem to be NP-complete. It is helpful to be familiar with elementary systems constructs such as processors, threads, and caches. A basic understanding of Java is needed to follow the examples. (We explain advanced language features before using them.) Two appendixes summa- rize what the reader needs to know: Appendix A covers programming language constructs, and Appendix B covers multiprocessor hardware architectures.

  The first third covers the principles of concurrent programming, showing how to think like a concurrent programmer. Like many other skills such as driving a car, cooking a meal, or appreciating caviar, thinking concurrently requires cultivation, but it can be learned with moderate effort. Readers who want to start program- ming right away may skip most of this section, but should still read Chapters 2 and 3 which cover the basic ideas necessary to understand the rest of the book.

  We first look at the classic mutual exclusion problem (Chapter 2). This chap- ter is essential for understanding why concurrent programming is a challenge. It covers basic concepts such as fairness and deadlock. We then ask what it means for a concurrent program to be correct (Chapter 3). We consider several alter- native conditions, and the circumstances one might want to use each one. We examine the properties of shared memory essential to concurrent computation (Chapter 4), and we look at the kinds of synchronization primitives needed to implement highly concurrent data structures (Chapters 5 and 6).

  We think it is essential that anyone who wants to become truly skilled in the art of multiprocessor programming spend time solving the problems presented in the first part of this book. Although these problems are idealized, they distill

  xx Preface

  important, they distill the style of thinking necessary to avoid the common mistakes committed by nearly all novice programmers when they first encounter concurrency.

  The next two-thirds describe the practice of concurrent programming. Each chapter has a secondary theme, illustrating either a particular programming pat- tern or algorithmic technique. At the level of systems and languages, Chapter 7 covers spin locks and contention. This chapter introduces the importance of the underlying architecture, since spin lock performance cannot be understood without understanding the multiprocessor memory hierarchy. Chapter 8 covers monitor locks and waiting, a common synchronization idiom, especially in Java.

  Chapter 16 covers work-stealing and parallelism, and Chapter 17 describes bar- riers, all of which are useful for structure concurrent applications. Other chapters cover concurrent data structures. All these chapters depend on Chapter 9, and the reader should read this chapter before reading the others. Linked lists illustrate different kinds of synchronization patterns, ranging from coarse-grained locking, to fine-grained locking, to lock-free structures (Chap- ter 9). The FIFO queues illustrate the ABA synchronization hazard that arises when using atomic synchronization primitives (Chapter 10), Stacks illustrate an important synchronization pattern called elimination (Chapter 11), Hash maps show how an algorithm can exploit natural parallelism (Chapter 13), Skip lists illustrate efficient parallel search (Chapter 14), and priority queues illustrate how one can sometimes weaken correctness guarantees to enhance performance (Chapter 15).

  Finally, Chapter 18 describes the emerging transactional approach to concur- rency, which we believe will become increasingly important in the near future. The importance of concurrency has not always been acknowledged. Here is a quote from a 1989 New York Times article on new operating systems for the

  IBM PC: Real concurrency–in which one program actually continues to function while you call up and use another–is more amazing but of small use to the average person.

  How many programs do you have that take more than a few seconds to perform any task? Read this book, and decide for yourself.

  Introduction

  1 The computer industry is undergoing, if not another revolution, certainly a

  vigorous shaking-up. The major chip manufacturers have, for the time being at least, given up trying to make processors run faster. Moore’s Law has not been repealed: each year, more and more transistors fit into the same space, but their clock speed cannot be increased without overheating. Instead, manufacturers are turning to “multicore” architectures, in which multiple processors (cores) com- municate directly through shared hardware caches. Multiprocessor chips make computing more effective by exploiting parallelism: harnessing multiple proces- sors to work on a single task.

  The spread of multiprocessor architectures will have a pervasive effect on how we develop software. Until recently, advances in technology meant advances in clock speed, so software would effectively “speed up” by itself over time. Now, however, this free ride is over. Advances in technology will mean increased par- allelism and not increased clock speed, and exploiting such parallelism is one of the outstanding challenges of modern Computer Science.

  This book focuses on how to program multiprocessors that communicate via a shared memory. Such systems are often called shared-memory multiprocessors or, more recently, multicores. Programming challenges arise at all scales of mul- tiprocessor systems—at a very small scale, processors within a single chip need to coordinate access to a shared memory location, and on a large scale, proces- sors in a supercomputer need to coordinate the routing of data. Multiprocessor programming is challenging because modern computer systems are inherently

  

asynchronous: activities can be halted or delayed without warning by interrupts,

  preemption, cache misses, failures, and other events. These delays are inherently unpredictable, and can vary enormously in scale: a cache miss might delay a pro- cessor for fewer than ten instructions, a page fault for a few million instructions, and operating system preemption for hundreds of millions of instructions.

  We approach multiprocessor programming from two complementary direc- tions: principles and practice. In the principles part of this book, we focus on

  

computability: figuring out what can be computed in an asynchronous concur-

  2

  Chapter 1 Introduction concurrent threads manipulate a set of shared objects. The sequence of the thread operations on the objects is called the concurrent program or concurrent algorithm.

  TM

  This model is essentially the model presented by the Java , C#, or C++ thread packages.

  Surprisingly, there are easy-to-specify shared objects that cannot be imple- mented by any concurrent algorithm. It is therefore important to understand what not to try, before proceeding to write multiprocessor programs. Many of the issues that will land multiprocessor programmers in trouble are consequences of fundamental limitations of the computational model, so we view the acquisi- tion of a basic understanding of concurrent shared-memory computability as a necessary step. The chapters dealing with principles take the reader through a quick tour of asynchronous computability, attempting to expose various com- putability issues, and how they are addressed through the use of hardware and software mechanisms.

  An important step in the understanding of computability is the specification and verification of what a given program actually does. This is perhaps best described as program correctness. The correctness of multiprocessor programs, by their very nature, is more complex than that of their sequential counterparts, and requires a different set of tools, even for the purpose of “informal reasoning” (which, of course, is what most programmers actually do). Sequential correct- ness is mostly concerned with safety properties. A safety property states that some “bad thing” never happens. For example, a traffic light never displays green in all directions, even if the power fails. Naturally, concurrent correctness is also con- cerned with safety, but the problem is much, much harder, because safety must be ensured despite the vast number of ways that the steps of concurrent threads can be interleaved. Equally important, concurrent correctness encompasses a variety of liveness properties that have no counterparts in the sequential world. A live-

  ness property states that a particular good thing will happen. For example, a red

  traffic light will eventually turn green. A final goal of the part of the book dealing with principles is to introduce a variety of metrologies and approaches for rea- soning about concurrent programs, which will later serve us when discussing the correctness of real-world objects and programs.

  The second part of the book deals with the practice of multiprocessor program- ming, and focuses on performance. Analyzing the performance of multiprocessor algorithms is also different in flavor from analyzing the performance of sequential programs. Sequential programming is based on a collection of well-established and well-understood abstractions. When we write a sequential program, we usu- ally do not need to be aware that underneath it all, pages are being swapped from disk to memory, and smaller units of memory are being moved in and out of a hierarchy of processor caches. This complex memory hierarchy is essentially invisible, hiding behind a simple programming abstraction.

  In the multiprocessor context, this abstraction breaks down, at least from a performance perspective. To achieve adequate performance, the programmer

  1.1 Shared Objects and Synchronization

  3

  Someday perhaps, concurrent architectures will provide the same degree of efficient abstraction now provided by sequential architectures, but in the mean- time, programmers should beware.

  The principles part of the book presents a progressive collection of shared objects and programming tools. Every object and tool is interesting in its own right, and we use each one to expose the reader to higher-level issues: spin-locks illustrate contention, linked lists illustrate the role of locking in data structure design, and so on. Each of these issues has important consequences for program performance. The hope is that the reader will understand the issue in a way that will later allow him or her to apply the lesson learned to specific multiprocessor systems. We culminate with a discussion of state-of-the-art technologies such as

  transactional memory.

  We would like to include few words about style. The book uses the Java pro- gramming language. There are, of course, other suitable languages which readers would have found equally appealing. We have a long list of reasons for our spe- cific choice, but perhaps it is more suitable to discuss them over a cup of coffee! In the appendix we explain how the concepts expressed in Java are expressed in other popular languages or libraries. We also provide a primer on multiprocessor hardware. Throughout the book, we avoid presenting specific performance num- bers for programs and algorithms, and stick to general trends. There is a good reason for this: multiprocessors vary greatly, and unfortunate though it may be, at this point in time, what works well on one machine may be significantly less impressive on another. Sticking to general trends is our way of guaranteeing that the validity of our assertions will be sustained over time.

  We provide references the end of each chapter. The reader will find a bibli- ographical survey of the material covered, with suggestions for further reading. Each chapter also includes a collection of exercises which readers can use to gauge their comprehension or entertain themselves on Sunday mornings.

  Shared Objects and Synchronization

1.1 On the first day of your new job, your boss asks you to find all primes between

  10

  1 and 10 (never mind why), using a parallel machine that supports ten con- current threads. This machine is rented by the minute, so the longer your pro- gram takes, the more it costs. You want to make a good impression. What do you do?

  As a first attempt, you might consider giving each thread an equal share of the

  9

  input domain. Each thread might check 10 numbers, as shown in Fig. 1.1. This approach fails, for an elementary, but important reason. Equal ranges of inputs do not necessarily produce equal amounts of work. Primes do not occur uniformly:

  

9

  9

  10 there are many primes between 1 and 10 and 10 .

  , but hardly any between 9 · 10

  4

  Chapter 1 Introduction 1 void primePrint { 2 int i = ThreadID.get(); // thread IDs are in {0..9} 3 int block = power(10, 9); 4 for (int j = (i * block) + 1; j <= (i + 1) block; j++) { 5 if (isPrime(j)) print(j);

  6 }

  7 }

  8 Balancing load by dividing up the input domain. Each thread in

  Figure 1.1

  {0..9} gets an equal subset of the range. Counter counter = new Counter(1); // shared by all threads

  1

  void primePrint {

  2

  long i = 0;

  3 4 long limit = power(10, 10); 5 while (i < limit) { // loop until all numbers taken 6 i = counter.getAndIncrement(); // take next untaken number 7 if (isPrime(i)) print(i);

  8 }

  9 }

  10 Balancing the work load using a shared counter. Each thread gets a dynamically

Figure 1.2 determined number of numbers to test.

  small number. In short, there is no reason to believe that the work will be divided equally among the threads, and it is not clear even which threads will have the most work.

  A more promising way to split the work among the threads is to assign each thread one integer at a time (Fig. 1.2). When a thread is finished with testing an integer, it asks for another. To this end, we introduce a shared counter, an object that encapsulates an integer value, and that provides a getAndIncrement() method that increments its value, and returns the counter’s prior value to the caller.

  Fig. 1.3 shows a na¨ıve implementation of Counter in Java. This counter imple- mentation works well when used by a single thread, but it fails when shared by multiple threads. The problem is that the expression

  return value++;

  is actually an abbreviation of the following, more complex code:

  long temp = value;

  value = temp + 1;

  return temp;

  In this code fragment, value is a field of the Counter object, and is shared among

  1.1 Shared Objects and Synchronization

  5

  1 public class Counter { 2 private long value; // counter starts at one

  // constructor initializes counter

  3 public Counter(int i) { value = i;

  4 }

  5 6 public long getAndIncrement() { // increment, returning prior value 7 return value++;

  }

  8 }

  9 An implementation of the shared counter.

  Figure 1.3

  Now imagine that threads A and B both call the counter’s getAndIncrement() method at about the same time. They might simultaneously read 1 from value, set their local temp variables to 1, value to 2, and both return 1. This behavior is not what we intended: concurrent calls to the counter’s getAndIncrement() return the same value, but we expect them to return distinct values. In fact, it could get even worse. One thread might read 1 from value, but before it sets value to 2, another thread would go through the increment loop several times, reading 1 and setting to 2, reading 2 and setting to 3. When the first thread finally completes its operation and sets value to 2, it will actually be setting the counter back from 3 to 2.

  The heart of the problem is that incrementing the counter’s value requires two distinct operations on the shared variable: reading the value field into a tempo- rary variable and writing it back to the Counter object.

  Something similar happens when you try to pass someone approaching you head-on in a corridor. You may find yourself veering right, then left several times to avoid the other person doing exactly the same thing. Sometimes you manage to avoid bumping into them and sometimes you do not, and in fact, as we see in

  1

  the later chapters, such collisions are provably unavoidable. On an intuitive level, what is going on is that each of you is performing two distinct steps: looking at (“reading”) the other’s current position, and moving (“writing”) to one side or the other. The problem is, when you read the other’s position, you have no way of knowing whether they have decided to stay or move. In the same way that you and the annoying stranger must decide who passes on the left and who on the right, threads accessing a shared Counter must decide who goes first and who goes second.

  As we will see in Chapter 5, modern multiprocessor hardware provides spe- cial read-modify-write instructions that allow threads to read, modify, and write a value to memory in one atomic (i.e., indivisible) hardware step. For the Counter object, we can use such hardware to increment the counter atomically.

1 A preventive approach such as “always sidestep to the right” does not work because the approach-

  6

  Chapter 1 Introduction We can also provide such atomic behavior by guaranteeing in software (using only read and write instructions) that only one thread executes the read-and-write sequence at a time. The problem of making sure that only one thread at a time can execute a particular block of code is called the mutual exclusion problem, and is one of the classic coordination problems in multiprocessor programming.

  As a practical matter, you are unlikely ever to find yourself having to design your own mutual exclusion algorithm (instead, you would probably call on a library). Nevertheless, understanding how to implement mutual exclusion from the basics is an essential condition for understanding concurrent computation in general. There is no more effective way to learn how to reason about essential and ubiquitous issues such as mutual exclusion, deadlock, bounded fairness, and blocking versus nonblocking synchronization.

  A Fable

1.2 Instead of treating coordination problems (such as mutual exclusion) as program-

  ming exercises, we prefer to think of concurrent coordination problems as if they were physics problems. We now present a sequence of fables, illustrating some of the basic problems. Like most authors of fables, we retell stories mostly invented by others (see the Chapter Notes at the end of this chapter).

  Alice and Bob are neighbors, and they share a yard. Alice owns a cat and Bob owns a dog. Both pets like to run around in the yard, but (naturally) they do not get along. After some unfortunate experiences, Alice and Bob agree that they should coordinate to make sure that both pets are never in the yard at the same time. Of course, we rule out trivial solutions that do not allow any animals into an empty yard.

  How should they do it? Alice and Bob need to agree on mutually compatible procedures for deciding what to do. We call such an agreement a coordination

  protocol (or just a protocol, for short).

  The yard is large, so Alice cannot simply look out of the window to check whether Bob’s dog is present. She could perhaps walk over to Bob’s house and knock on the door, but that takes a long time, and what if it rains? Alice might lean out the window and shout “Hey Bob! Can I let the cat out?” The problem is that Bob might not hear her. He could be watching TV, visiting his girlfriend, or out shopping for dog food. They could try to coordinate by cell phone, but the same difficulties arise if Bob is in the shower, driving through a tunnel, or recharging his phone’s batteries.

  Alice has a clever idea. She sets up one or more empty beer cans on Bob’s windowsill (Fig. 1.4), ties a string around each one, and runs the string back to her house. Bob does the same. When she wants to send a signal to Bob, she yanks the

  1.2 A Fable

  7

  Communicating with cans.

  Figure 1.4

  Up-ending beer cans by remote control may seem like a creative solution, but it is still deeply flawed. The problem is that Alice can place only a limited number of cans on Bob’s windowsill, and sooner or later, she is going to run out of cans to knock over. Granted, Bob resets a can as soon as he notices it has been knocked over, but what if he goes to Canc ´un for Spring Break? As long as Alice relies on Bob to reset the beer cans, sooner or later, she might run out.

  So Alice and Bob try a different approach. Each one sets up a flag pole, easily visible to the other. When Alice wants to release her cat, she does the following:

  1. She raises her flag.

  2. When Bob’s flag is lowered, she unleashes her cat.

  3. When her cat comes back, she lowers her flag.

  Bob’s behavior is a little more complicated.

  1. He raises his flag.

  2. While Alice’s flag is raised

  a) Bob lowers his flag

  b) Bob waits until Alice’s flag is lowered

  c) Bob raises his flag 3. As soon as his flag is raised and hers is down, he unleashes his dog.

  4. When his dog comes back, he lowers his flag.

  This protocol rewards further study as a solution to Alice and Bob’s problem. On an intuitive level, it works because of the following flag principle. If Alice and Bob each 1. raises his or her own flag, and then

  8

  Chapter 1 Introduction then at least one will see the other’s flag raised (clearly, the last one to look will see the other’s flag raised) and will not let his or her pet enter the yard. However, this observation does not prove that the pets will never be in the yard together. What if, for example, Alice lets her cat in and out of the yard several times while Bob is looking?

  To prove that the pets will never be in the yard together, assume by way of con- tradiction that there is a way the pets could end up in the yard together. Consider the last time Alice and Bob each raised their flag and looked at the other’s flag before sending the pet into the yard. When Alice last looked, her flag was already fully raised. She must have not seen Bob’s flag, or she would not have released the cat, so Bob must have not completed raising his flag before Alice started looking. It follows that when Bob looked for the last time, after raising his flag, it must have been after Alice started looking, so he must have seen Alice’s flag raised and would not have released his dog, a contradiction.

  This kind of argument by contradiction shows up over and over again, and it is worthwhile spending some time convincing ourselves why this claim is true. It is important to note that we never assumed that “raising my flag” or the “look- ing at your flag” happens instantaneously, nor did we make any assumptions about how long such activities take. All we care about is when these activities start or end.

  Properties of Mutual Exclusion

1.2.1 To show that the flag protocol is a correct solution to Alice and Bob’s problem, we

  must understand what properties are required of a solution, and then show that they are met by the protocol.

  First, we proved that the pets are excluded from being in the yard at the same time, a property we call mutual exclusion. Mutual exclusion is only one of several properties of interest. After all, as we noted earlier, a protocol in which Alice and Bob never release a pet satisfies the mutual exclusion property, but it is unlikely to satisfy their pets. Here is another property of central importance. First, if one pet wants to enter the yard, then it eventually succeeds. Second, if both pets want to enter the yard, then eventually at least one of them succeeds. We consider this deadlock-freedom property to be essential.

  We claim that Alice and Bob’s protocol is deadlock-free. Suppose both pets want to use the yard. Alice and Bob each raise their flags. Bob eventually notices that Alice’s flag is raised, and defers to her by lowering his flag, allowing her cat into the yard.

  Another property of compelling interest is starvation-freedom (sometimes called lockout-freedom): if a pet wants to enter the yard, will it eventually suc- ceed? Here, Alice and Bob’s protocol performs poorly. Whenever Alice and Bob

  1.2 A Fable

  9

  over and over again, while Bob’s dog becomes increasingly uncomfortable. Later on, we will see how to make protocols prevent starvation.

  The last property of interest concerns waiting. Imagine that Alice raises her flag, and is then suddenly stricken with appendicitis. She (and the cat) are taken to the hospital, and after a successful operation, she spends the next week under observation at the hospital. Although Bob is relieved that Alice is well, his dog cannot use the yard for an entire week until Alice returns. The problem is that the protocol states that Bob (and his dog) must wait for Alice to lower her flag. If Alice is delayed (even for a good reason), then Bob is also delayed (for no apparent good reason).

  The question of waiting is important as an example of fault-tolerance. Nor- mally, we expect Alice and Bob to respond to each other in a reasonable amount of time, but what if they do not do so? The mutual exclusion problem, by its very essence, requires waiting: no mutual exclusion protocol avoids it, no matter how clever. Nevertheless, we see that many other coordination problems can be solved without waiting, sometimes in unexpected ways.

  The Moral

1.2.2 Having reviewed both the strengths and weaknesses of Bob and Alice’s protocols, we now turn our attention back to Computer Science.

  First, we examine why shouting across the yard and placing cell phone calls did not work. Two kinds of communication occur naturally in concurrent systems:

  Transient communication requires both parties to participate at the same time.

  Shouting, gestures, or cell phone calls are examples of transient commun- ication.

  Persistent communication allows the sender and receiver to participate at dif-

  ferent times. Posting letters, sending email, or leaving notes under rocks are all examples of persistent communication. Mutual exclusion requires persistent communication. The problem with shouting across the yard or placing cell phone calls is that it may or may not be okay for Bob to unleash his dog, but if Alice is not able to respond to messages, he will never know.

  The can-and-string protocol might seem somewhat contrived, but it corre- sponds accurately to a common communication protocol in concurrent systems:

  

interrupts. In modern operating systems, one common way for one thread to get