Load Balancing in Distributed Systems

LOAD BALANCING
DISTRIBUTED OPERATING SYSTEMS, SCALABILITY, SS 2015

Hermann Härtig

ISSUES
■ starting points
■ independent Unix processes and
■ block synchronous execution

■ who does it
■ load migration mechanism (MosiX)
■ management algorithms (MosiX)
■ information dissemination
■ decision algorithms
Hermann Härtig, TU Dresden, Distributed OS, Load Balancing

2

EXTREME STARTING POINTS
■ independent OS processes

■ block synchronous execution (HPC)
■ sequence: compute - communicate
■ all processes wait for all other processes
■ often: message passing 

for example Message Passing Library (MPI)

Hermann Härtig, TU Dresden, Distributed OS, Load Balancing

3

SIMPLE PROGRAM MULTIPLE DATA
■ all processes execute same program
■ while (true) 

{ work; exchange data (barrier)}

■ common in 

High Performance Computing:

Message Passing Interface (MPI)

library
Hermann Härtig, TU Dresden, Distributed OS, Load Balancing


4

DIVIDE AND CONQUER

result 1

part 1

result 3

CPU #1

CPU #1
part 2
problem
result
result 2

part 3


result 4

CPU #2

CPU #2
part 4

node 1

Hermann Härtig, TU Dresden, Distributed OS, Load Balancing

node 2

5

MPI BRIEF OVERVIEW
■ Library for message-oriented parallel
programming


■ Programming model:
■ Multiple instances of same program
■ Independent calculation
■ Communication, synchronization
Hermann Härtig, TU Dresden, Distributed OS, Load Balancing

6

MPI STARTUP & TEARDOWN
■ MPI program is started on all processors
■ MPI_Init(), MPI_Finalize()
■ Communicators (e.g., MPI_COMM_WORLD)
■ MPI_Comm_size()
■ MPI_Comm_rank(): “Rank” of process within
this set

■ Typed messages

■ Dynamically create and spread processes
using MPI_Spawn() (since MPI-2)

Hermann Härtig, TU Dresden, Distributed OS, Load Balancing

7

MPI EXECUTION
■ Communication
■ Point-to-point
■ Collectives

■ Synchronization
■ Test

MPI_Reduce(
MPI_Recv(
MPI_Wait(
MPI_Test(
MPI_Send(
MPI_Bcast(
MPI_Barrier(
void*

buf,
sendbuf,
MPI_Request*
buffer,
MPI_Comm
comm request,
*recvbuf,
MPI_Status
*status
*flag,
int
count,
) void
count *status
) int
MPI_Status
MPI_Datatype,
int source,
) MPI_Datatype,
dest,

root,
int tag,
MPI_Op
op,
MPI_Comm
comm
MPI_Comm
root,comm,
comm
) int
MPI_Status
comm
*status
) MPI_Comm
)

■ Wait
■ Barrier
Hermann Härtig, TU Dresden, Distributed OS, Load Balancing


8

BLOCK AND SYNC
blocking call

non-blocking call

synchronous
communication

returns when message
has been delivered

returns immediately,
following test/wait
checks for delivery

asynchronous
communication


returns when send
buffer can be reused

returns immediately,
following test/wait
checks for send buffer

Hermann Härtig, TU Dresden, Distributed OS, Load Balancing

9

EXAMPLE
int rank, total;
MPI_Init();
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &total);
MPI_Bcast(...);
/* work on own part, determined by rank */
if (id == 0) {
for (int rr = 1; rr < total; ++rr)

MPI_Recv(...);
/* Generate final result */
} else {
MPI_Send(...);
}
MPI_Finalize();
Hermann Härtig, TU Dresden, Distributed OS, Load Balancing

10

AMDAHLS’ LAW
interpretation for parallel systems:

■ P:

section that can be parallelized

■ 1-P: serial section
■ N:


number of CPUs





Hermann Härtig, TU Dresden, Distributed OS, Load Balancing

11

AMDAHL’S LAW
Serial section: 

communicate, longest sequential section

Parallel, Serial,

possible speedup:

■ 1ms,

100 μs:

1/0.1

→ 10

■ 1ms,

1 μs:

1/0.001

→ 1000

■ 10 μs,

1 μs:

0.01/0.001 → 10

■ ...
Hermann Härtig, TU Dresden, Distributed OS, Load Balancing

12

WEAK VS. STRONG SCALING

Strong:

■ accelerate same problem size
Weak:

■ extend to larger problem size
Hermann Härtig, TU Dresden, Distributed OS, Load Balancing

13

WHO DOES IT

■ application
■ run-time library
■ operating system

Hermann Härtig, TU Dresden, Distributed OS, Load Balancing

14

SCHEDULER: GLOBAL RUN QUEUE

CPU 1

CPU 2



CPU N

all ready processes

immediate approach: global run queue
Hermann Härtig, TU Dresden, Distributed OS, Load Balancing

15

SCHEDULER: GLOBAL RUN QUEUE
■ … does not scale
■ shared memory only
■ contended critical section
■ cache affinity
■ …



separate run queues with

explicit movement of processes

Hermann Härtig, TU Dresden, Distributed OS, Load Balancing

16

OS/HW & APPLICATION

■ Operating System / Hardware:

“All” participating CPUs: active / inactive

■ Partitioning (HW)
■ Gang Scheduling (OS)
■ Within Gang/Partition:

Applications balance !!!

Hermann Härtig, TU Dresden, Distributed OS, Load Balancing

17

HW PARTITIONS & ENTRY QUEUE

request
queue

Application

ication

Hermann Härtig, TU Dresden, Distributed OS, Load Balancing

18

HW PARTITIONS
■ optimizes usage of network
■ takes OS off critical path
■ best for strong scaling
■ burdens application/library with balancing
■ potentially wastes resources
■ current state of the art in High
Performance Computing (HPC)
Hermann Härtig, TU Dresden, Distributed OS, Load Balancing

19

TOWARDS OS/RT BALANCING
time

work item

Barrier

Hermann Härtig, TU Dresden, Distributed OS, Load Balancing

20

EXECUTION TIME JITTER
time

work item

Barrier

Hermann Härtig, TU Dresden, Distributed OS, Load Balancing

21

SOURCES FOR EXECUTION JITTER

■ Hardware ???
■ Application
■ Operating system “noise”

Hermann Härtig, TU Dresden, Distributed OS, Load Balancing

22

OPERATING SYSTEM “NOISE”
Use common sense to avoid:

■ OS usually not directly on the critical path, 

BUT OS controls: interference via interrupts, caches,
network, memory bus, (RTS techniques)

■ avoid or encapsulate side activities
■ small critical sections (if any)
■ partition networks to isolate traffic of different
applications (HW: Blue Gene)

■ do not run Python scripts or printer daemons in parallel
Hermann Härtig, TU Dresden, Distributed OS, Load Balancing

23

SPLITTING BIG JOBS
time

work item

Barrier

overdecomposition & “oversubscription”
Hermann Härtig, TU Dresden, Distributed OS, Load Balancing

24

SMALL JOBS (NO DEPS)
time

work item

Barrier

Execute small jobs in parallel (if possible)

Hermann Härtig, TU Dresden, Distributed OS, Load Balancing

25

BALANCING AT LIBRARY LEVEL
Programming Model

■ many (small) decoupled work items
■ overdecompose 

create more work items than active units

■ run some balancing algorithm
Example: CHARM ++
Hermann Härtig, TU Dresden, Distributed OS, Load Balancing

26

BALANCING AT SYSTEM LEVEL
■ create many more processes
■ use OS information on run-time and
system state to balance load

■ examples:
■ create more MPI processes than nodes
■ run multiple applications
Hermann Härtig, TU Dresden, Distributed OS, Load Balancing

27

CAVEATS
added overhead

■ additional communication between
smaller work items (memory & cycles)

■ more context switches
■ OS on critical path 

(for example communication)
Hermann Härtig, TU Dresden, Distributed OS, Load Balancing

28

BALANCING ALGORITHMS
required:

■ mechanism for migrating load
■ information gathering
■ decisions
MosiX system as an example
-> Barak’s slides now
Hermann Härtig, TU Dresden, Distributed OS, Load Balancing

29

FORK IN MOSIX

Deputy

Remote
fork() syscall request
Establish a new

link

Deputy(child)

fork()

Remote(child)
fork()

Reply from fork()

Reply from fork()

Hermann Härtig, TU Dresden, Distributed OS, Load Balancing

30

PROCESS MIGRATION IN MOSIX

Process

migdaemon
migration request

Remote
Send state, memory m
aps, dirty
pages
ack

fork()

Deputy
Transition
Finalize migration
Ack
Migration completed

Hermann Härtig, TU Dresden, Distributed OS, Load Balancing

31

SOME PRACTICAL PROBLEMS
■ flooding

all processes jump to one new empty node

=> decide immediately before migration
commitment 

extra communication, piggy packed

■ ping pong

if thresholds are very close, processes
moved back and forth

=> tell a little higher load than real
Hermann Härtig, TU Dresden, Distributed OS, Load Balancing

32

PING PONG
Scenario:
compare load on nodes 1 and 2
node 1 moves process to node 2
Solutions:
add one + little bit to load 

average over time
Node 1

Node 2

One process two nodes

Solves short peaks problem as well
(short cron processes)

Hermann Härtig, TU Dresden, Distributed OS, Load Balancing

33

LESSONS

■ execution time jitter matters (Amdahl)
■ approaches: partition ./. balance
■ dynamic balance components:

migration of load, information, decision

who, when, where

Hermann Härtig, TU Dresden, Distributed OS, Load Balancing

34

Dokumen yang terkait

Analisis Komparasi Internet Financial Local Government Reporting Pada Website Resmi Kabupaten dan Kota di Jawa Timur The Comparison Analysis of Internet Financial Local Government Reporting on Official Website of Regency and City in East Java

19 819 7

ANTARA IDEALISME DAN KENYATAAN: KEBIJAKAN PENDIDIKAN TIONGHOA PERANAKAN DI SURABAYA PADA MASA PENDUDUKAN JEPANG TAHUN 1942-1945 Between Idealism and Reality: Education Policy of Chinese in Surabaya in the Japanese Era at 1942-1945)

1 29 9

Implementasi Prinsip-Prinsip Good Corporate Governance pada PT. Mitra Tani Dua Tujuh (The Implementation of the Principles of Good Coporate Governance in Mitra Tani Dua Tujuh_

0 45 8

Improving the Eighth Year Students' Tense Achievement and Active Participation by Giving Positive Reinforcement at SMPN 1 Silo in the 2013/2014 Academic Year

7 202 3

Improving the VIII-B Students' listening comprehension ability through note taking and partial dictation techniques at SMPN 3 Jember in the 2006/2007 Academic Year -

0 63 87

An Analysis of illocutionary acts in Sherlock Holmes movie

27 148 96

The Effectiveness of Computer-Assisted Language Learning in Teaching Past Tense to the Tenth Grade Students of SMAN 5 Tangerang Selatan

4 116 138

The correlation between listening skill and pronunciation accuracy : a case study in the firt year of smk vocation higt school pupita bangsa ciputat school year 2005-2006

9 128 37

Existentialism of Jack in David Fincher’s Fight Club Film

5 71 55

Phase response analysis during in vivo l 001

2 30 2