Sistem terdistribusi

  5/8/2009 Introduction

  5/8/2009 Definition of a Distributed System (2)

  Note that the middleware layer extends over multiple machines.

  Goal of Distributed System Tujuan Sistem Terdistribusi

  5/8/2009 Transparency in a Distributed System Access Transparency Description resource is accessed Hide differences in data representation and how a Relocation Migration Hide that a resource may move to another location Location Location Hide where a resource is located Hide where a resource is located location while in use Hide that a resource may be moved to another Concurrency Replication Failure Hide the failure and recovery of a resource competitive users Hide that a resource may be shared by several competitive users Hide that a resource may be shared by several Persistence on disk Hide whether a (software) resource is in memory or Different forms of transparency in a distributed system.

  Scalability Problems

  5/8/2009 Scaling Techniques (1)

  a) a server or

  b) a client check forms as they are being filled Scaling Techniques (2)

  5/8/2009 Hardware Concepts

  1.6 Different basic organizations and memories in distributed computer systems Multiprocessors (1)

  5/8/2009 Multiprocessors (2)

  1.8

  a) A crossbar switch

  b) An omega switching network Homogeneous Multicomputer Systems DOS Tightly-coupled operating system for multi- processors and homogeneous Hide and manage DOS hardware processors and homogeneous multicomputers hardware resources NOS Loosely-coupled operating system for heterogeneous multicomputers (LAN and WAN) Offer local services to remote clients Middleware Additional layer atop of NOS implementing general-purpose services Provide distribution transparency An overview of

  Uniprocessor Operating Systems

  5/8/2009 Software Concepts System Description Main Goal

  • DOS (Distributed Operating Systems)
  • NOS (Network Operating Systems)
  • Middleware

  5/8/2009 Multiprocessor Operating Systems (1) monitor Counter { public: private: int count = 0; int value() { return count;} } } void decr() { count = count – 1;} void incr () { count = count + 1;} A monitor to protect an integer against concurrent access.

  Multiprocessor Operating Systems (2) monitor Counter {

  5/8/2009 Multicomputer Operating Systems (1)

  Multicomputer Operating Systems (2)

  5/8/2009 Multicomputer Operating Systems (3) Block sender until message sent No Not necessary Block sender until buffer not full Yes Not necessary Synchronization point Synchronization point Send buffer Send buffer Reliable comm. guaranteed? Block sender until message delivered No Necessary Block sender until message received No Necessary Relation between blocking, buffering, and reliable communications.

  Distributed Shared Memory Systems (1)

  5/8/2009 Distributed Shared Memory Systems (2)

  Network Operating System (1)

  5/8/2009 Network Operating System (2)

  1-20 Two clients and a server in a network operating system.

  Network Operating System (3)

  5/8/2009 Positioning Middleware

  1-22 General structure of a distributed system as middleware.

  Middleware and Openness

  5/8/2009 Comparison between Systems I tem Multiproc. Multicomp. Distributed OS Netw ork Middlew are- OS based OS Number of copies of OS Same OS on all nodes Yes Yes No No Degree of transparency Degree of transparency Very High Very High High High Low Low High High 1 N N N Resource management Per node Per node Basis for communication Messages Files Model specific Scalability No Moderately Yes Varies memory Global, Global, Shared central distributed Openness Closed Closed Open Open

  A comparison between multiprocessor operating systems, multicomputer operating systems, network operating systems, and middleware based distributed systems.

  Clients and Servers

  5/8/2009 An Example Client and Server (1) The header.h file used by the client and server.

  An Example Client and Server (2)

  5/8/2009 An Example Client and Server (3)

  1-27 b A client using the server to copy a file .

  Processing Level

  5/8/2009 Multitiered Architectures (1)

  1-29 Alternative client-server organizations (a) – (e).

  Multitiered Architectures (2)

  5/8/2009 Modern Architectures

  1-31 An example of horizontal distribution of a Web service. Communication

Layered Protocols (1)

Layers, interfaces, and protocols in the OSI model.

  2-1 Layered Protocols (2)

A typical message as it appears on the network.

  2-2 Data Link Layer

Discussion between a receiver and a sender in the data link layer.

  2-3 Client-Server TCP a) Normal operation of TCP.

  2-4 Middleware Protocols

An adapted reference model for networked communication.

  

2-5

Conventional Procedure Call

  

a) Parameter passing in a local procedure call: the stack before the call to read

  b) The stack while the called procedure is active

  Client and Server Stubs

Principle of RPC between a client and server program.

Steps of a Remote Procedure Call

  1. Client procedure calls client stub in normal way

  2. Client stub builds message, calls local OS

  3. Client's OS sends message to remote OS

  4. Remote OS gives message to server stub

  5. Server stub unpacks parameters, calls server

  6. Server does work, returns result to the stub

  7. Server stub packs it in message, calls local OS

  8. Server's OS sends message to client's OS

  9. Client's OS gives message to client stub

  10. Stub unpacks result, returns to client

  Passing Value Parameters (1)

  

Steps involved in doing remote computation through RPC

2-8

  Passing Value Parameters (2)

  a) Original message on the Pentium

  b) The message after receipt on the SPARC

  

c) The message after being inverted. The little numbers in boxes indicate the address of each byte Parameter Specification and Stub Generation

  Doors

The principle of using doors as IPC mechanism. Asynchronous RPC (1)

  2-12

  traditional RPC Asynchronous RPC (2)

  

A client and server interacting through two asynchronous RPCs

2-13

  Writing a Client and a Server

  2-14 Binding a Client to a Server Client-to-server binding in DCE.

  2-15 Distributed Objects

Common organization of a remote object with client-side proxy.

  2-16 Binding a Client to an Object

  a) (a) Example with implicit binding using only global references

  

b) (b) Example with explicit binding using global and local references

Distr_object* obj_ref; //Declare a systemwide object reference obj_ref = …; // Initialize the reference to a distributed object obj_ref-> do_something(); // Implicitly bind and invoke a method (a) Distr_object objPref; //Declare a systemwide object reference Local_object* obj_ptr; //Declare a pointer to local objects obj_ref = …; //Initialize the reference to a distributed object obj_ptr = bind(obj_ref); //Explicitly bind and obtain a pointer to the local proxy obj_ptr -> do_something(); //Invoke a method on the local proxy

  (b) Parameter Passing

The situation when passing an object by reference or by value.

  2-18 The DCE Distributed-Object Model a) Distributed dynamic objects in DCE.

  2-19

Persistence and Synchronicity in Communication (1)

  

General organization of a communication system in which hosts are

connected through a network

2-20

  

Persistence and Synchronicity in Communication (2)

Persistent communication of letters back in the days of the Pony Express.

  Persistence and Synchronicity in Communication (3) a) Persistent asynchronous communication

  b) Persistent synchronous communication 2-22.1 Persistence and Synchronicity in Communication (4) c) Transient asynchronous communication

  d) Receipt-based transient synchronous communication 2-22.2 Persistence and Synchronicity in Communication (5) e) Delivery-based transient synchronous communication at message delivery

  f) Response-based transient synchronous communication

  Berkeley Sockets (1) Socket primitives for TCP/IP.

  Primitive Meaning Socket Create a new communication endpoint Bind Attach a local address to a socket Listen Announce willingness to accept connections

Accept Block caller until a connection request arrives

Connect Actively attempt to establish a connection Send Send some data over the connection Receive Receive some data over the connection Close Release the connection

  Berkeley Sockets (2)

Connection-oriented communication pattern using sockets. The Message-Passing Interface (MPI) Some of the most intuitive message-passing primitives of MPI.

  Primitive Meaning

  MPI_bsend Append outgoing message to a local send buffer MPI_send Send a message and wait until copied to local or remote buffer MPI_ssend Send a message and wait until receipt starts MPI_sendrecv Send a message and wait for reply MPI_isend Pass reference to outgoing message, and continue MPI_issend Pass reference to outgoing message, and wait until receipt starts MPI_recv Receive a message; block if there are none MPI_irecv Check if there is an incoming message, but do not block

  Message-Queuing Model (1)

Four combinations for loosely-coupled communications using queues.

  2-26 Message-Queuing Model (2)

  Primitive Meaning

  Put Append a message to a specified queue

Get Block until the specified queue is nonempty, and remove the first message

Poll Check a specified queue for messages, and remove the first. Never block.

  Install a handler to be called when a message is put into the specified Notify queue.

  

Basic interface to a queue in a message-queuing system.

  

General Architecture of a Message-Queuing System (1)

The relationship between queue-level addressing and network-level

addressing.

  

General Architecture of a Message-Queuing System (2)

The general organization of a message-queuing system with routers.

  2-29 Message Brokers

  

The general organization of a message broker in a message-queuing

2-30

  Example: IBM MQSeries

General organization of IBM's MQSeries message-queuing system.

  2-31 Channels

Some attributes associated with message channel agents.

  Attribute Description

  Transport type Determines the transport protocol to be used

FIFO delivery Indicates that messages are to be delivered in the order they are sent

Message length Maximum length of a single message Setup retry count Specifies maximum number of retries to start up the remote MCA Delivery retries Maximum times MCA will try to put received message into queue

  Message Transfer (1)

  

The general organization of an MQSeries queuing network using routing

tables and aliases.

  Message Transfer (2)

  Primitives available in an IBM MQSeries MQI Primitive Description

  MQopen Open a (possibly remote) queue MQclose Close a queue MQput Put a message into an opened queue MQget Get a message from a (local) queue

  Data Stream (1)

Setting up a stream between two processes across a network.

  Data Stream (2)

Setting up a stream directly between two devices.

  

2-35.2 Data Stream (3)

An example of multicasting a stream to several receivers.

Specifying QoS (1) A flow specification.

  (bytes/sec)

  Characteristics of the Input Service Required

  • maximum data unit size (bytes)
  • Token bucket rate (bytes/sec)
  • Toke bucket size (bytes)
  • Maximum transmission rate
  • >Loss sensitivity (bytes)
  • Loss interval ( sec)
  • Burst loss sensitivity (data units)
  • Minimum delay noticed ( sec)
  • Maximum delay variation ( sec)
  • Quality of guarantee

  Specifying QoS (2)

The principle of a token bucket algorithm. Setting Up a Stream

  

The basic organization of RSVP for resource reservation in a distributed

system.

  Synchronization Mechanisms (1)

The principle of explicit synchronization on the level data units.

  Synchronization Mechanisms (2)

The principle of synchronization as supported by high-level interfaces.

  2-41 Processes

Thread Usage in Nondistributed Systems Context switching as the result of IPC

  Thread Implementation

Combining kernel-level lightweight processes and user-level threads. Multithreaded Servers (1)

A multithreaded server organized in a dispatcher/worker model.

  Multithreaded Servers (2)

Three ways to construct a server.

  Model

Characteristics

  Threads Parallelism, blocking system calls

Single-threaded process No parallelism, blocking system calls

  

Finite-state machine Parallelism, nonblocking system calls

  The X-Window System

  

The basic organization of the X Window System

  

Client-Side Software for Distribution Transparency

A possible approach to transparent replication of a remote

object using a client-side solution.

  Servers: General Design Issues

  a) Client-to-server binding using a daemon as in DCE

  

b) Client-to-server binding using a superserver as in UNIX

  3.7

Object Adapter (1)

  Organization of an object server supporting different activation policies.

Object Adapter (2)

  /* Definitions needed by caller of adapter and adapter */ #define TRUE #define MAX_DATA 65536 /* Definition of general message format */ struct message { long source /* senders identity */ long object_id; /* identifier for the requested object */ long method_id; /* identifier for the requested method */ unsigned size; /* total bytes in list of parameters */ char **data; /* parameters as sequence of bytes */

  }; /* General definition of operation to be called at skeleton of object */ typedef void (*METHOD_CALL)(unsigned, char* unsigned*, char**); long register_object (METHOD_CALL call); /* register an object */

void unrigester_object (long object)id); /* unrigester an object */

void invoke_adapter (message *request); /* call the adapter */

  The header.h file used by the adapter and any

  Object Adapter (3)

  

typedef struct thread THREAD; /* hidden definition of a thread */

thread *CREATE_THREAD (void (*body)(long tid), long thread_id); /* Create a thread by giving a pointer to a function that defines the actual */ /* behavior of the thread, along with a thread identifier */ void get_msg (unsigned *size, char **data); void put_msg(THREAD *receiver, unsigned size, char **data); /* Calling get_msg blocks the thread until of a message has been put into its */ /* associated buffer. Putting a message in a thread's buffer is a nonblocking */ /* operation. */ The thread.h file used by the adapter for using threads.

Object Adapter (4)

  The main part of an adapter that implements a thread-per-object policy.

  Reasons for Migrating Code

  

The principle of dynamically configuring a client to communicate to a server. The

client first fetches the necessary software, and then invokes the server.

  Models for Code Migration Alternatives for code migration. Migration and Local Resources 1

  Three Types of Process-to-resource binding

  1. Binding by Identifier (Strongest Binding): Proses memerlukan dengan tepat me-reference resource.

  Contoh : proses menggunakan URL untuk merujuk ke web site atau FTP Server pada alamat internet.

  2. Binding by Value (Weaker Binding) : Hanya value dari resource yang dibutuhkan. Eksekusi proses tidak berpengaruh jika resource lain menyediakan value yang sama.

  Contoh : Ketika program membutuhkan library standar seperti pemrograman pada C atau Java. Library tsb biasanya secara lokal ada, biarpun lokasinya pada sistem file lokal berbeda antara site.

  3. Binding by Type (Weakest Binding) : Proses membutuhkan hanya tipe tertentu dari resource.

  Contoh : me-reference ke device local, seperti monitor, printer. Migration and Local Resources 2

Three Types of Resource-to machine binding

  

1. Unattached resources : mudah dipindahkan (move) antar mesin (data atau

file yang berasosiasi dengan program).

  2. Fastened resource : move (memindahkan) atau copy dapat dilakukan

Contoh : database lokal dan web site kesluruhan (web site complete)

  3. Fixed resource : bound spesifik mesin atau linkungan dan tidak bisa dipindahkan (move). Fixed resource sering device lokal Contoh : komunikasi lokal ke endpoint

  Migration and Local Resources 3

  Resource-to machine binding

  Unattached Fastened Fixed

  Process-to-

  By identifier MV (or GR) GR (or MV) GR

  resource

  By value CP ( or MV, GR) GR (or CP) GR

  binding

  By type RB (or GR, CP) RB (or GR, CP) RB (or GR)

  GR : Establish a global Systemwide Reference

MV : Move The Resource

CV : Copy The Value of the Resource RB : Rebind process to locally available resource

  Actions to be taken with respect to the references to local

resources when migrating code to another machine. Migration in Heterogeneous Systems

  The principle of maintaining a migration stack to support migration of an execution segment in a heterogeneous environment 3-15

  

Overview of Code Migration in D'Agents (1)

proc factorial n { if ($n 1) { return 1; } # fac(1) = 1 expr $n * [ factorial [expr $n # fac(n) = n * fac(n

  • – 1] ] – 1) } # tells which factorial to compute set number … # identify the target machine set machine … agent_submit $machine
  • –procs factorial –vars number –script {factorial $number }

    # receive the results (left unspecified for simplicity)

    agent_receive …

  A simple example of a Tel agent in D'Agents submitting a script to a remote machine (adapted from [gray.r95])

  An example of a Tel agent in D'Agents migrating to different machines where it executes the UNIX who command (adapted from [gray.r95])

  

Overview of Code Migration in D'Agents (2)

  all_users $machines proc all_users machines { set list "" # Create an initially empty list foreach m $machines { # Consider all hosts in the set of given machines agent_jump $m # Jump to each host set users [exec who] # Execute the who command append list $users # Append the results to the list } return $list # Return the complete list when done } set machines … # Initialize the set of machines to jump to set this_machine # Set to the host that starts the agent # Create a migrating agent by submitting the script to this machine, from where # it will jump to all the others in $machines. agent_submit $this_machine

  • –procs all_users
    • vars machines
    • script { all_users $machines } agent_receive … #receive the results (left unspecified for simplicity)

  Implementation Issues (1)

The architecture of the D'Agents system.

  Implementation Issues (2) The parts comprising the state of an agent in D'Agents.

  Status Description

  Global interpreter variables Variables needed by the interpreter of an agent Global system variables Return codes, error codes, error strings, etc.

  Global program variables User-defined global variables in a program

Procedure definitions Definitions of scripts to be executed by an agent

Stack of commands Stack of commands currently being executed Stack of call frames

  Stack of activation records, one for each running command Software Agents in Distributed Systems

Some important properties by which different types of agents

can be distinguished.

  Property Common to all agents? Description Autonomous Yes Can act on its own Reactive Yes Responds timely to changes in its environment Proactive Yes Initiates actions that affects its environment Communicative Yes

  Can exchange information with users and other agents Continuous No Has a relatively long lifespan Mobile No Can migrate from one site to another Adaptive No Capable of learning Agent Technology

The general model of an agent platform (adapted from [fipa98-mgt]).

Agent Communication Languages (1)

  

Examples of different message types in the FIPA ACL [fipa98-acl], giving the purpose

of a message, along with the description of the actual message content.

  Message purpose Description Message Content

  INFORM Inform that a given proposition is true Proposition QUERY-IF Query whether a given proposition is true Proposition QUERY-REF Query for a give object Expression CFP Ask for a proposal Proposal specifics PROPOSE Provide a proposal Proposal ACCEPT-PROPOSAL Tell that a given proposal is accepted Proposal ID REJECT-PROPOSAL Tell that a given proposal is rejected Proposal ID

REQUEST Request that an action be performed Action specification

SUBSCRIBE Subscribe to an information source Reference to source

  Agent Communication Languages (2)

  

A simple example of a FIPA ACL message sent between two agents

using Prolog to express genealogy information.

  Field Value

  Purpose

  INFORM Sender max@http://fanclub-beatrix.royalty-spotters.nl:7239 Receiver elke@iiop://royalty-watcher.uk:5623 Language Prolog Ontology genealogy Content female(beatrix),parent(beatrix,juliana,bernhard)

  Naming Name Spaces (1)

A general naming graph with a single root node.

  Name Spaces (2)

  The general organization of the UNIX file system

implementation on a logical disk of contiguous disk blocks.

  Linking and Mounting (1)

The concept of a symbolic link explained in a naming graph.

  Linking and Mounting (2)

Mounting remote name spaces through a specific process protocol.

  Linking and Mounting (3)

  

Organization of the DEC Global Name Service

  Name Space Distribution (1)

  An example partitioning of the DNS name space, including Internet-accessible files, into three layers. Name Space Distribution (2)

  

A comparison between name servers for implementing nodes from a

large-scale name space partitioned into a global layer, as an administrational layer, and a managerial layer.

  Item Global Administrational Managerial Geographical scale of network Worldwide Organization Department

Total number of nodes Few Many Vast numbers Responsiveness to lookups Seconds Milliseconds Immediate Update propagation Lazy Immediate Immediate Number of replicas Many None or few None Is client-side caching applied? Yes Yes Sometimes Implementation of Name Resolution (1) The principle of iterative name resolution.

  Implementation of Name Resolution (2) The principle of recursive name resolution.

  

Implementation of Name Resolution (3)

Server for Should Passes to Receives Returns to

Looks up

node resolve child and caches requester

  • cs <ftp> #<ftp>

  #<ftp> vu <cs,ftp> #<cs> <ftp> #<ftp> #<cs> #<cs, ftp> ni <vu,cs,ftp> #<vu> <cs,ftp> #<cs> #<vu>

  #<cs,ftp> #<vu,cs> #<vu,cs,ftp> root <ni,vu,cs,ftp> #<nl> <vu,cs,ftp> #<vu> #<nl> #<vu,cs> #<nl,vu>

  #<vu,cs,ftp> #<nl,vu,cs> #<nl,vu,cs,ftp>

Recursive name resolution of <nl, vu, cs, ftp>. Name servers cache intermediate results for subsequent lookups. Implementation of Name Resolution (4) The comparison between recursive and iterative name resolution with respect to communication costs.

  The DNS Name Space

  

The most important types of resource records forming the

contents of nodes in the DNS name space.

  Type of record Associated entity Description

  SOA Zone Holds information on the represented zone A Host Contains an IP address of the host this node represents MX Domain Refers to a mail server to handle mail addressed to this node SRV Domain Refers to a server handling a specific service NS Zone Refers to a name server that implements the represented zone CNAME Node Symbolic link with the primary name of the represented node PTR Host Contains the canonical name of a host HINFO Host Holds information on the host this node represents TXT Any kind Contains any entity-specific information considered useful

  DNS Implementation (1)

  An excerpt from the DNS database for the zone cs.vu.nl.

  DNS Implementation (2)

  Part of the description for the vu.nl domain which contains the cs.vu.nl domain.

  Name Record type Record value

  cs.vu.nl NIS solo.cs.vu.nl solo.cs.vu.nl A 130.37.21.1

  The X.500 Name Space (1)

  

A simple example of a X.500 directory entry

using X.500 naming conventions.

  Attribute Abbr. Value

  Country C NL Locality L Amsterdam Organization L Vrije Universiteit OrganizationalUnit OU Math. & Comp. Sc.

  CommonName CN Main server Mail_Servers -- 130.37.24.6, 192.31.231,192.31.231.66 FTP_Server -- 130.37.21.11 WWW_Server -- 130.37.21.11

  The X.500 Name Space (2)

  Part of the directory information tree.

  The X.500 Name Space (3)

Two directory entries having Host_Name as RDN.

  

Attribute Value Attribute Value

Country NL Country NL Locality Amsterdam Locality Amsterdam

Organization Vrije Universiteit Organization Vrije Universiteit

OrganizationalUnit Math. & Comp. Sc. OrganizationalUnit

  Math. & Comp. Sc.

  CommonName Main server CommonName Main server Host_Name star Host_Name zephyr Host_Address 192.31.231.42 Host_Address 192.31.231.66

  Naming versus Locating Entities

a) Direct, single level mapping between names and addresses.

  Forwarding Pointers (1)

The principle of forwarding pointers using (proxy, skeleton) pairs.

  Forwarding Pointers (2)

  

Redirecting a forwarding pointer, by storing a shortcut in a proxy. Home-Based Approaches The principle of Mobile IP.

  Hierarchical Approaches (1)

Hierarchical organization of a location service into

domains, each having an associated directory node.

  Hierarchical Approaches (2)

An example of storing information of an entity

having two addresses in different leaf domains.

  

Hierarchical Approaches (3)

Looking up a location in a hierarchically organized location service.

Hierarchical Approaches (4)

  

a) An insert request is forwarded to the first node that

knows about entity E.

  b) A chain of forwarding pointers to the leaf node is created.

  Pointer Caches (1)

  

Caching a reference to a directory node of the lowest-level

domain in which an entity will reside most of the time.

  Pointer Caches (2)

  

A cache entry that needs to be invalidated because it returns a

nonlocal address, while such an address is available.

  Scalability Issues

  The scalability issues related to uniformly placing subnodes of a

partitioned root node across the network covered by a location service. The Problem of Unreferenced Objects

  An example of a graph representing objects containing

references to each other.

  Reference Counting (1)

  

The problem of maintaining a proper reference count in the

presence of unreliable communication.

  Reference Counting (2)

  and incrementing the counter too late b) A solution. Advanced Referencing Counting (1)

  a) The initial assignment of weights in weighted reference counting

  b) Weight assignment when creating a new Advanced Referencing Counting (2) c) Weight assignment when copying a reference.

  Advanced Referencing Counting (3)

  Creating an indirection when the partial weight of a reference has reached 1. Advanced Referencing Counting (4)

  Creating and copying a remote reference in generation reference counting.

  Tracing in Groups (1) Initial marking of skeletons.

  Tracing in Groups (2)

After local propagation in each process.

  Tracing in Groups (3) Final marking.

  Synchronization

Clock Synchronization

  

When each machine has its own clock, an event that

occurred after another event may nevertheless be assigned an earlier time.

  Physical Clocks (1)

Computation of the mean solar day.

Physical Clocks (2)

  

TAI seconds are of constant length, unlike solar seconds. Leap seconds are introduced when necessary to keep in phase with the sun. Clock Synchronization Algorithms

The relation between clock time and UTC when clocks tick at different rates.

  Cristian's Algorithm

Getting the current time from a time server.

The Berkeley Algorithm

  

a) The time daemon asks all the other machines for their clock values

  b) The machines answer

  c) The time daemon tells everyone how to adjust their clock

Lamport Timestamps

  

a) Three processes, each with its own clock. The clocks run at

different rates.

  b) Lamport's algorithm corrects the clocks. Example: Totally-Ordered Multicasting Updating a replicated database and leaving it in an inconsistent state.

Global State (1)

  a) A consistent cut

  

b) An inconsistent cut Global State (2)

  snapshot

  Global State (3)

  b) Process Q receives a marker for the first time and records its local state

  c) Q records all incoming message

  

d) Q receives a marker for its incoming channel and finishes recording

the state of the incoming channel

  The Bully Algorithm (1)

  The bully election algorithm

  • Process 4 holds an election
  • Process 5 and 6 respond, telling 4 to stop
  • Now 5 and 6 each hold an election

Global State (3)

  d) Process 6 tells 5 to stop

  

e) Process 6 wins and tells everyone

  A Ring Algorithm

Election algorithm using a ring.

  

Mutual Exclusion:

A Centralized Algorithm

a) Process 1 asks the coordinator for permission to enter a critical region.

  Permission is granted

  b) Process 2 then asks permission to enter the same critical region. The coordinator does not reply.

  c) When process 1 exits the critical region, it tells the coordinator, when

A Distributed Algorithm

  

a) Two processes want to enter the same critical region at the same

moment.

  b) Process 0 has the lowest timestamp, so it wins.

  

c) When process 0 is done, it sends an OK also, so 2 can now enter

  A Toke Ring Algorithm

a) An unordered group of processes on a network.

  Comparison

  Messages per Delay before entry Algorithm Problems

entry/exit (in message times)

  Centralized

  3

  2 Coordinator crash Crash of any

  Distributed 2 ( n 2 ( n

  • – 1 ) – 1 ) process Lost token,

  Token ring 1 to 0 to n

  • – 1 process crash

    A comparison of three mutual exclusion algorithms.

The Transaction Model (1)

Updating a master tape is fault tolerant.

  The Transaction Model (2) Examples of primitives for transactions.

  Primitive Description

  BEGIN_TRANSACTION Make the start of a transaction END_TRANSACTION Terminate the transaction and try to commit ABORT_TRANSACTION Kill the transaction and restore the old values READ

  Read data from a file, a table, or otherwise WRITE

  Write data to a file, a table, or otherwise The Transaction Model (3)

  a) Transaction to reserve three flights commits

  

b) Transaction aborts when third flight is unavailable

BEGIN_TRANSACTION reserve WP -> JFK; reserve JFK -> Nairobi; reserve Nairobi -> Malindi;

  END_TRANSACTION (a) BEGIN_TRANSACTION reserve WP -> JFK; reserve JFK -> Nairobi; reserve Nairobi -> Malindi full =>

  (b)

Distributed Transactions

  a) A nested transaction

  b) A distributed transaction

  Private Workspace

  a) The file index and disk blocks for a three-block file

  

b) The situation after a transaction has modified block 0 and

appended block 3

  c) After committing

  Writeahead Log

  b)

  x = 0; y = 0; BEGIN_TRANSACTION; x = x + 1; y = y + 2 x = y * y;

  END_TRANSACTION; (a)

  Log [x = 0 / 1]

  (b) Log [x = 0 / 1] [y = 0/2]

  (c) Log [x = 0 / 1] [y = 0/2] [x = 1/4]

  (d)

The log before each statement is executed

  • – d)

  Concurrency Control (1)

General organization of managers for handling transactions.

Concurrency Control (2)

  General organization of managers for handling distributed transactions.

Serializability

  BEGIN_TRANSACTION BEGIN_TRANSACTION BEGIN_TRANSACTION x = 0; x = 0; x = 0; x = x + 1; x = x + 2; x = x + 3; END_TRANSACTION END_TRANSACTION END_TRANSACTION

  (a) (b) (c)

Schedule 1 x = 0; x = x + 1; x = 0; x = x + 2; x = 0; x = x + 3 Legal

Schedule 2 x = 0; x = 0; x = x + 1; x = x + 2; x = 0; x = x + 3; Legal

Schedule 3 x = 0; x = 0; x = x + 1; x = 0; x = x + 2; x = x + 3; Illegal

  (d)

  a) Three transactions T , T , and T

  • – c)

  1

  2

  3

  d) Possible schedules

  Two-Phase Locking (1) Two-phase locking.

  Two-Phase Locking (2) Strict two-phase locking.

  Synchronization

Clock Synchronization

  

When each machine has its own clock, an event that

occurred after another event may nevertheless be assigned an earlier time.

  Physical Clocks (1)

Computation of the mean solar day.

Physical Clocks (2)

  

TAI seconds are of constant length, unlike solar seconds. Leap seconds are introduced when necessary to keep in phase with the sun. Clock Synchronization Algorithms

The relation between clock time and UTC when clocks tick at different rates.

  Cristian's Algorithm

Getting the current time from a time server.

The Berkeley Algorithm

  

a) The time daemon asks all the other machines for their clock values

  b) The machines answer

  c) The time daemon tells everyone how to adjust their clock

Lamport Timestamps

  

a) Three processes, each with its own clock. The clocks run at

different rates.

  b) Lamport's algorithm corrects the clocks. Example: Totally-Ordered Multicasting Updating a replicated database and leaving it in an inconsistent state.

Global State (1)

  a) A consistent cut

  

b) An inconsistent cut Global State (2)

  snapshot

  Global State (3)

  b) Process Q receives a marker for the first time and records its local state

  c) Q records all incoming message

  

d) Q receives a marker for its incoming channel and finishes recording

the state of the incoming channel

  The Bully Algorithm (1)

  The bully election algorithm

  • Process 4 holds an election
  • Process 5 and 6 respond, telling 4 to stop
  • Now 5 and 6 each hold an election

Global State (3)

  d) Process 6 tells 5 to stop

  

e) Process 6 wins and tells everyone

  A Ring Algorithm

Election algorithm using a ring.

  

Mutual Exclusion:

A Centralized Algorithm

a) Process 1 asks the coordinator for permission to enter a critical region.

  Permission is granted

  b) Process 2 then asks permission to enter the same critical region. The coordinator does not reply.

  c) When process 1 exits the critical region, it tells the coordinator, when

A Distributed Algorithm

  

a) Two processes want to enter the same critical region at the same

moment.

  b) Process 0 has the lowest timestamp, so it wins.

  

c) When process 0 is done, it sends an OK also, so 2 can now enter

  A Toke Ring Algorithm

a) An unordered group of processes on a network.

  Comparison

  Messages per Delay before entry Algorithm Problems

entry/exit (in message times)

  Centralized

  3

  2 Coordinator crash Crash of any

  Distributed 2 ( n 2 ( n

  • – 1 ) – 1 ) process Lost token,

  Token ring 1 to 0 to n

  • – 1 process crash

    A comparison of three mutual exclusion algorithms.

The Transaction Model (1)

Updating a master tape is fault tolerant.

  The Transaction Model (2) Examples of primitives for transactions.

  Primitive Description

  BEGIN_TRANSACTION Make the start of a transaction END_TRANSACTION Terminate the transaction and try to commit ABORT_TRANSACTION Kill the transaction and restore the old values READ

  Read data from a file, a table, or otherwise WRITE

  Write data to a file, a table, or otherwise The Transaction Model (3)

  a) Transaction to reserve three flights commits

  

b) Transaction aborts when third flight is unavailable

BEGIN_TRANSACTION reserve WP -> JFK; reserve JFK -> Nairobi; reserve Nairobi -> Malindi;

  END_TRANSACTION (a) BEGIN_TRANSACTION reserve WP -> JFK; reserve JFK -> Nairobi; reserve Nairobi -> Malindi full =>

  (b)

Distributed Transactions

  a) A nested transaction

  b) A distributed transaction

  Private Workspace

  a) The file index and disk blocks for a three-block file

  

b) The situation after a transaction has modified block 0 and

appended block 3

  c) After committing

  Writeahead Log

  b)

  x = 0; y = 0; BEGIN_TRANSACTION; x = x + 1; y = y + 2 x = y * y;

  END_TRANSACTION; (a)

  Log [x = 0 / 1]

  (b) Log [x = 0 / 1] [y = 0/2]

  (c) Log [x = 0 / 1] [y = 0/2] [x = 1/4]

  (d)

The log before each statement is executed

  • – d)

  Concurrency Control (1)

General organization of managers for handling transactions.

Concurrency Control (2)

  General organization of managers for handling distributed transactions.

Serializability

  BEGIN_TRANSACTION BEGIN_TRANSACTION BEGIN_TRANSACTION x = 0; x = 0; x = 0; x = x + 1; x = x + 2; x = x + 3; END_TRANSACTION END_TRANSACTION END_TRANSACTION

  (a) (b) (c)

Schedule 1 x = 0; x = x + 1; x = 0; x = x + 2; x = 0; x = x + 3 Legal

Schedule 2 x = 0; x = 0; x = x + 1; x = x + 2; x = 0; x = x + 3; Legal

Schedule 3 x = 0; x = 0; x = x + 1; x = 0; x = x + 2; x = x + 3; Illegal

  (d)

  a) Three transactions T , T , and T

  • – c)

  1

  2

  3

  d) Possible schedules

  Two-Phase Locking (1) Two-phase locking.

  Two-Phase Locking (2) Strict two-phase locking. Pessimistic Timestamp Ordering Concurrency control using timestamps.

  Consistency and Replication

  Object Replication (1)