commun.ppt 958KB Jun 23 2011 12:06:48 PM
COMPUTER
COMMUNICATION
Sunggu Lee
EE Dept., POSTECH
Apr. 22, 2009
Computer Communication
Example
Send picture image and message to friend
sender
receiver
Microsoft Outlook
Hello!
Netscape Messenger
Hello!
Hello!
system software
system software
Communication
Channel
Packetization of Data
For transmission of a stream of data bits
(message), the message is typically
partitioned into “packets”
A packet consists of (at the very least)
Packet
header (destination, routing info, etc.)
Data payload (the bits of the message)
Check bits (redundant bits used to check for
errors in the received packet)
Communication Protocols
For successful transmission/receipt of a
packet, the transmitter and receiver must
agree on a “communication protocol”
Set of rules on how the packet is interpreted
How to sample the bits of the packet
Signaling method
Synchronization of the transmitter/receiver
How to determine which parts of the packet are the
packet header (destination info, etc.), data payload,
check bits, etc.
How to interpret the bits of the data payload
Integer, floating-point, character string, JPEG picture, etc.
Computer Communication Models and
Communication Protocol Suites
Most commonly used reference base
communication model is the Open
Systems Interconnection (OSI) model
Standardized by the International Organization
for Standardization (ISO)
Most common implementation of the OSI
model is a set of protocols referred to as
the TCP/IP protocol suite (or stack)
TCP = Transmission Control Protocol
IP = Internet Protocol
Communication Protocols
L7
L6
L5
L4
L3
L2
L1
[Forouzan 2003]
Layer-by-Layer (OSI Model)
View
packets
frames
Activities Required (Sender
Side)
Edit message and enter “send” MS Outlook Express
Convert into sequence of bits
Tags must be inserted so that original message can be
reconstructed at destination
Encrypt message if necessary for privacy
Compress if necessary
Partition into packets of fixed maximum size
E.g., “string” 01001000 … “JPEG” 110011101010 … “end”
11001100100010 … 101011111100 … 01111110
IP address
Attach header information (Packet ID, destination, checksum, …)
Intersperse with packets from messages created by other
applications
On first link of path,
Partition each packet into fixed-size frames (with headers)
Send each frame out onto the network
Activities Required on
Network
Route each packet to its destination
During each “hop” of the path
Send signals back and forth to coordinate the sending
and receiving of the stream of bits corresponding to a
frame
Check each frame for errors
Handshaking
Request retransmission in the case of errors
Arrange received frames into the proper order
Wait for all frames of the packet to be received
IP address
Once each packet reaches its destination node,
Store packet in a memory buffer at destination
Send signal to destination CPU to inform it of the arrival
of the new packet
Port Number
Activities at Destination
Node
Receive packets
Check each packet for errors and request retransmission
in the case of errors
Arrange received packets into the proper order
Once all packets have been received, form a complete
message
Decompress if necessary
Decrypt if necessary
Check for errors
Use tags in the bit stream to reconstruct the
message
Show message to user using email tool (e.g., MS
Outlook Express)
Network Addresses
IP (Internet Protocol) address
MAC (Medium Access Control) address
Address used to identify a computing node on the internet
Network layer (L3) address
E.g., 141.223.165.189 (Look up “properties” on “TCP/IP” on “Network”)
Address used to identify a LAN card – cannot be changed
Data link layer (L2) address
E.g., abcd1234 (Enter “ipconfig /all” from MS Windows “cmd” window)
Port address
Address used to identify a network interface point for an application
prog.
Corresponds to a memory buffer
Send a message - write to a memory buffer on a remote computer
Receive a message – read from a memory buffer on the local computer
Example: 39 (for FTP), 3000 (for a user-defined port)
Connection-Oriented and
Connectionless Networking
Connection-oriented networking
Uses a specific network path that is established for the
duration of a connection
Main advantage: reliable communication
Main implementation method: TCP (transfer control protocol)
Three phases: connection establishment, data transfer,
connection termination
Used in the “parallel merge sort” socket-based program (TCP sockets
interface)
Connectionless networking
Finds a new path for each packet sent
Main advantage: fast communication for short messages
Main implementation method: UDP (user datagram protocol)
Communication Performance
Parameters (1)
Throughput ( 데데데 데데데 )
Actual number of bits transmitted per second
Note 1: different from latency ( 데데데데 )
Note 2: different from bandwidth ( 데데데 )
Most important communication performance
parameter
Typical measurement method
Send a data file from a source node to a destination node
Record the time t1 when the first byte of the data is received
Record the time t2 when the last byte of the data is received
Divide amount of data received by (t2 – t1)
Note: Mbps = mega-bits-per-second (not bytes)
Communication Performance
Parameters (2)
Bandwidth
Maximum number of bits that can be transmitted
per second
Note 1: different from latency ( 데데 데데 )
Note 2: different from throughput ( 데데데 데데데 )
Measures performance of network only (not the
computer hardware or software)
Typical measurement method
Difficult to measure since effects of small data amounts,
software and hardware at source and destination nodes
must be removed
The “rated” figure stated in the specifications for the
relevant communication protocol is most commonly used
E.g., 11 Mbps for IEEE 802.11b
Communication Performance
Parameters (3)
Latency
Time required for the first byte of a message to be
transferred from the source to the destination node
Should include software processing time
Typical measurement method
At time t1, source node sends a very small message to
destination node
Destination node receives message and sends it back to
the source node
Source node receives message and records the time t2
One-way communication latency is (t2 – t1) / 2
Why can’t we measure latency directly (record time t3 at
destination and measure latency as t3 – t1)?
Computer Communication
Example (Revisited)
1. Polling
2. Interrupt
3. DMA
Send picture image and message to friend
sender
Microsoft Outlook
“zero
copy”
[IBM’08]
Hello!
User
Memory
Space
OS Kernel
Memory
Space
receiver
Hello!
Netscape Messenger
Hello!
system software
system software
NIC H/W
(LAN card)
NIC
H/W
Section 7.8 of [Culler 1999]
17
Communication Microbenchmarks at 3 levels
Basic network transaction
Shared address space
Message passing using MPI
Why must this
type of echo
test be used?
Network Transaction Performance
Echo test using Active Messages (AM) user-level
software network interface
source
Send message;
receive reply;
compute 1-way
communication delay
k-byte message
destination
k-byte message
EECE 550
Receive message
and immediately
send reply
LogP Communication Model
18
LogP model used for network transaction
performance modeling
L latency (within the physical network)
o overhead (= sending overhead + receiving
overhead)
g gap (the minimum gap between consecutive
message send operations)
P processing time (for normal processing of
application programs)
Refer to Figs. 7.30 and 7.31 [Culler 1999]
EECE 550
19
Message-Passing Operations
Simple model for overall time to send n bytes
T(n) = T0 + n/B
T0 is time to send initial byte of data over the network
Sending overhead + receiving overhead
n is number of bytes
B is the bandwidth of the network link
rinfinity
: asymptotic bandwidth
n½ : transfer size at which throughput = ½ *
rinfinity
EECE 550
Table 7.1 of [Culler 1999]: progressive
improvement in T0, B,
MFLOPS/processor
Berkeley NOW
T0
= 6 microseconds
rinfinity = 120 MB/s (Megabytes per second)
EECE 550
20
21
Application-Level
Performance
How does LogP affect application performance?
Depends on the characteristics of the application
General trends observable
Figures 7.35, 7.36, 7.37, 7.38 and Table 7.2 [Culler
1999]
T0 large larger messages are preferable
T0 small, B large small messages are acceptable
Larger numbers of processors smaller message
sizes, smaller working sets (size of data that fits
into faster memory, such as one cache line)
EECE 550
Synchronization Issues
22
Message-Passing Model
Locks are not necessary since mutual exclusion is not a
problem
Each process has exclusive access to its local memory
and uses message-passing to send/receive data from/to
other nodes
Group synchronization and group communication is still a
problem
Shared-Address-Space Model
Requires basic support for “locks” and “barriers”
Software algorithms execute on top of basic atomic
exchange primitives
Programming environment/hardware must provide
perception of atomic memory operations
EECE 550
23
Group Communication
Operations
Unicast (one-to-one)
Multicast (one-to-many)
Broadcast (one-to-all)
All-to-all broadcast
All-to-all personalized multicast (or broadcast)
Also referred to as “gossiping”
Special operations used for performance
improvement
Parallel prefix (used with parallel supercomputers)
Map-reduce (white paper written by Google engineers)
EECE 550
Communication Support in the
ESA Lab Cluster
1Gbps Ethernet cards and switches
Myrinet switches, Myrinet LAN cards (from Myricom)
1.28 Gbps/port
TCP/IP, Myrinet GM and BIP LAN interface software [Kim
2001]
Myrinet2000 switch and Myrinet2000 LAN cards
2.0 Gbps/port bandwidth (= 250MBps)
TCP sockets
Myrinet GM LAN interface software (www.myricom.com)
> 100 microsecond latency, much less than peak BW
Around 5 microsecond latency, close to peak BW
Note: current (2009) state-of-art is Myrinet10G, MX S/W
Around 2 microsecond latency, close to 10Gbps throughput
References
Behrouz A. Forouzan, TCP/IP Protocol Suite, 2nd
Ed., McGraw-Hill, Boston, 2003.
D. E. Culler, J. P. Singh and A. Gupta, Parallel
Computer Architecture: A Hardware/Software
Approach, Morgan Kaufmann, San Francisco,
1999.
http://www.ibm.com/developerworks/linux/library/
j-zerocopy/
, 2008.
S. C. Kim and S. Lee, ``Measurement and
prediction of communication latencies in Myrinet
networks,'' J. Parallel and Distributed Computing,
Vol. 61, No. 11, pp. 1692-1704, November 2001.
COMMUNICATION
Sunggu Lee
EE Dept., POSTECH
Apr. 22, 2009
Computer Communication
Example
Send picture image and message to friend
sender
receiver
Microsoft Outlook
Hello!
Netscape Messenger
Hello!
Hello!
system software
system software
Communication
Channel
Packetization of Data
For transmission of a stream of data bits
(message), the message is typically
partitioned into “packets”
A packet consists of (at the very least)
Packet
header (destination, routing info, etc.)
Data payload (the bits of the message)
Check bits (redundant bits used to check for
errors in the received packet)
Communication Protocols
For successful transmission/receipt of a
packet, the transmitter and receiver must
agree on a “communication protocol”
Set of rules on how the packet is interpreted
How to sample the bits of the packet
Signaling method
Synchronization of the transmitter/receiver
How to determine which parts of the packet are the
packet header (destination info, etc.), data payload,
check bits, etc.
How to interpret the bits of the data payload
Integer, floating-point, character string, JPEG picture, etc.
Computer Communication Models and
Communication Protocol Suites
Most commonly used reference base
communication model is the Open
Systems Interconnection (OSI) model
Standardized by the International Organization
for Standardization (ISO)
Most common implementation of the OSI
model is a set of protocols referred to as
the TCP/IP protocol suite (or stack)
TCP = Transmission Control Protocol
IP = Internet Protocol
Communication Protocols
L7
L6
L5
L4
L3
L2
L1
[Forouzan 2003]
Layer-by-Layer (OSI Model)
View
packets
frames
Activities Required (Sender
Side)
Edit message and enter “send” MS Outlook Express
Convert into sequence of bits
Tags must be inserted so that original message can be
reconstructed at destination
Encrypt message if necessary for privacy
Compress if necessary
Partition into packets of fixed maximum size
E.g., “string” 01001000 … “JPEG” 110011101010 … “end”
11001100100010 … 101011111100 … 01111110
IP address
Attach header information (Packet ID, destination, checksum, …)
Intersperse with packets from messages created by other
applications
On first link of path,
Partition each packet into fixed-size frames (with headers)
Send each frame out onto the network
Activities Required on
Network
Route each packet to its destination
During each “hop” of the path
Send signals back and forth to coordinate the sending
and receiving of the stream of bits corresponding to a
frame
Check each frame for errors
Handshaking
Request retransmission in the case of errors
Arrange received frames into the proper order
Wait for all frames of the packet to be received
IP address
Once each packet reaches its destination node,
Store packet in a memory buffer at destination
Send signal to destination CPU to inform it of the arrival
of the new packet
Port Number
Activities at Destination
Node
Receive packets
Check each packet for errors and request retransmission
in the case of errors
Arrange received packets into the proper order
Once all packets have been received, form a complete
message
Decompress if necessary
Decrypt if necessary
Check for errors
Use tags in the bit stream to reconstruct the
message
Show message to user using email tool (e.g., MS
Outlook Express)
Network Addresses
IP (Internet Protocol) address
MAC (Medium Access Control) address
Address used to identify a computing node on the internet
Network layer (L3) address
E.g., 141.223.165.189 (Look up “properties” on “TCP/IP” on “Network”)
Address used to identify a LAN card – cannot be changed
Data link layer (L2) address
E.g., abcd1234 (Enter “ipconfig /all” from MS Windows “cmd” window)
Port address
Address used to identify a network interface point for an application
prog.
Corresponds to a memory buffer
Send a message - write to a memory buffer on a remote computer
Receive a message – read from a memory buffer on the local computer
Example: 39 (for FTP), 3000 (for a user-defined port)
Connection-Oriented and
Connectionless Networking
Connection-oriented networking
Uses a specific network path that is established for the
duration of a connection
Main advantage: reliable communication
Main implementation method: TCP (transfer control protocol)
Three phases: connection establishment, data transfer,
connection termination
Used in the “parallel merge sort” socket-based program (TCP sockets
interface)
Connectionless networking
Finds a new path for each packet sent
Main advantage: fast communication for short messages
Main implementation method: UDP (user datagram protocol)
Communication Performance
Parameters (1)
Throughput ( 데데데 데데데 )
Actual number of bits transmitted per second
Note 1: different from latency ( 데데데데 )
Note 2: different from bandwidth ( 데데데 )
Most important communication performance
parameter
Typical measurement method
Send a data file from a source node to a destination node
Record the time t1 when the first byte of the data is received
Record the time t2 when the last byte of the data is received
Divide amount of data received by (t2 – t1)
Note: Mbps = mega-bits-per-second (not bytes)
Communication Performance
Parameters (2)
Bandwidth
Maximum number of bits that can be transmitted
per second
Note 1: different from latency ( 데데 데데 )
Note 2: different from throughput ( 데데데 데데데 )
Measures performance of network only (not the
computer hardware or software)
Typical measurement method
Difficult to measure since effects of small data amounts,
software and hardware at source and destination nodes
must be removed
The “rated” figure stated in the specifications for the
relevant communication protocol is most commonly used
E.g., 11 Mbps for IEEE 802.11b
Communication Performance
Parameters (3)
Latency
Time required for the first byte of a message to be
transferred from the source to the destination node
Should include software processing time
Typical measurement method
At time t1, source node sends a very small message to
destination node
Destination node receives message and sends it back to
the source node
Source node receives message and records the time t2
One-way communication latency is (t2 – t1) / 2
Why can’t we measure latency directly (record time t3 at
destination and measure latency as t3 – t1)?
Computer Communication
Example (Revisited)
1. Polling
2. Interrupt
3. DMA
Send picture image and message to friend
sender
Microsoft Outlook
“zero
copy”
[IBM’08]
Hello!
User
Memory
Space
OS Kernel
Memory
Space
receiver
Hello!
Netscape Messenger
Hello!
system software
system software
NIC H/W
(LAN card)
NIC
H/W
Section 7.8 of [Culler 1999]
17
Communication Microbenchmarks at 3 levels
Basic network transaction
Shared address space
Message passing using MPI
Why must this
type of echo
test be used?
Network Transaction Performance
Echo test using Active Messages (AM) user-level
software network interface
source
Send message;
receive reply;
compute 1-way
communication delay
k-byte message
destination
k-byte message
EECE 550
Receive message
and immediately
send reply
LogP Communication Model
18
LogP model used for network transaction
performance modeling
L latency (within the physical network)
o overhead (= sending overhead + receiving
overhead)
g gap (the minimum gap between consecutive
message send operations)
P processing time (for normal processing of
application programs)
Refer to Figs. 7.30 and 7.31 [Culler 1999]
EECE 550
19
Message-Passing Operations
Simple model for overall time to send n bytes
T(n) = T0 + n/B
T0 is time to send initial byte of data over the network
Sending overhead + receiving overhead
n is number of bytes
B is the bandwidth of the network link
rinfinity
: asymptotic bandwidth
n½ : transfer size at which throughput = ½ *
rinfinity
EECE 550
Table 7.1 of [Culler 1999]: progressive
improvement in T0, B,
MFLOPS/processor
Berkeley NOW
T0
= 6 microseconds
rinfinity = 120 MB/s (Megabytes per second)
EECE 550
20
21
Application-Level
Performance
How does LogP affect application performance?
Depends on the characteristics of the application
General trends observable
Figures 7.35, 7.36, 7.37, 7.38 and Table 7.2 [Culler
1999]
T0 large larger messages are preferable
T0 small, B large small messages are acceptable
Larger numbers of processors smaller message
sizes, smaller working sets (size of data that fits
into faster memory, such as one cache line)
EECE 550
Synchronization Issues
22
Message-Passing Model
Locks are not necessary since mutual exclusion is not a
problem
Each process has exclusive access to its local memory
and uses message-passing to send/receive data from/to
other nodes
Group synchronization and group communication is still a
problem
Shared-Address-Space Model
Requires basic support for “locks” and “barriers”
Software algorithms execute on top of basic atomic
exchange primitives
Programming environment/hardware must provide
perception of atomic memory operations
EECE 550
23
Group Communication
Operations
Unicast (one-to-one)
Multicast (one-to-many)
Broadcast (one-to-all)
All-to-all broadcast
All-to-all personalized multicast (or broadcast)
Also referred to as “gossiping”
Special operations used for performance
improvement
Parallel prefix (used with parallel supercomputers)
Map-reduce (white paper written by Google engineers)
EECE 550
Communication Support in the
ESA Lab Cluster
1Gbps Ethernet cards and switches
Myrinet switches, Myrinet LAN cards (from Myricom)
1.28 Gbps/port
TCP/IP, Myrinet GM and BIP LAN interface software [Kim
2001]
Myrinet2000 switch and Myrinet2000 LAN cards
2.0 Gbps/port bandwidth (= 250MBps)
TCP sockets
Myrinet GM LAN interface software (www.myricom.com)
> 100 microsecond latency, much less than peak BW
Around 5 microsecond latency, close to peak BW
Note: current (2009) state-of-art is Myrinet10G, MX S/W
Around 2 microsecond latency, close to 10Gbps throughput
References
Behrouz A. Forouzan, TCP/IP Protocol Suite, 2nd
Ed., McGraw-Hill, Boston, 2003.
D. E. Culler, J. P. Singh and A. Gupta, Parallel
Computer Architecture: A Hardware/Software
Approach, Morgan Kaufmann, San Francisco,
1999.
http://www.ibm.com/developerworks/linux/library/
j-zerocopy/
, 2008.
S. C. Kim and S. Lee, ``Measurement and
prediction of communication latencies in Myrinet
networks,'' J. Parallel and Distributed Computing,
Vol. 61, No. 11, pp. 1692-1704, November 2001.