BASICS OF PACKET SWITCHING
44
Fig. 2.19
The mean waiting time for output queuing as a function of the offered load p, for N ™ ⬁ and output FIFO sizes varying from b
s 1 to ⬁.
2.3.3 Completely Shared-Buffer Switches
With complete buffer sharing, all cells are stored in a common buffer shared by all inputs and outputs. One can expect that it will need less buffer
to achieve a given cell loss probability, due to the statistical nature of cell arrivals. Output queuing can be maintained logically with linked lists,
so that no cells will be blocked from reaching idle outputs, and we still can achieve the optimal throughput᎐delay performance, as with dedicated output
queuing.
In the following, we will take a look at how this approach improves the cell loss performance. Denote by Q
i
the number of cells destined for output i in
m
the buffer at the end of the mth time slot. The total number of cells in the shared buffer at the end of the mth time slot is Ý
N
Q
i
. If the buffer size is
i s1
m
infinite, then Q
i
s max 0, Q
i
q A
i
y 1 , 2.13
Ž .
4
m m
y1 m
where A
i
is the number of cells addressed to output i that arrive during the
m
mth time slot. With a finite buffer size, cell arrivals may fill up the shared buffer, and the
Ž .
resulting buffer overflow makes 2.13 only an approximation. However, we
PEFORMANCE OF BASIC SWITCHES
45
Fig. 2.20 The cell loss probability for completely shared buffering as a function of
Ž . the buffer size per output, b, and the switch size N, for offered loads a p
s 0.8 and Ž .
b p s 0.9.
BASICS OF PACKET SWITCHING
46
Ž are only interested in the region of low cell loss probability e.g., less than
y6
. 10
, in which this approximation is still good. When N is finite, A
i
, which is the number of cell arrivals destined for
j
Ž .
output i in the steady state, is not independent of A j i . This is because
at most N cells arrive at the switch, and a large number of cells arriving for one output implies a small number for the remaining outputs. As N goes to
i
Ž infinity, however, A becomes an independent Poisson random variable with
.
i
mean value p . Then Q , which is the number of cells in the buffer that are destined for output i in the steady state, also becomes independent of Q
j
Ž .
j i . We will use the Poisson and independence assumptions for finite N. These approximations are good for N
G 16. Therefore we model the steady-state distribution of Ý
N
Q
i
, the number
i s1
of cells in the buffer, as the N-fold convolution of N M rDr1 queues. With
the assumption of an infinite buffer size, we then approximate the cell loss w
N i
x Ž .
probability by the overflow probability Pr Ý Q
G Nb . Figure 2.20 a and
i s1
Ž . b show the numerical results.
REFERENCES
1. X. Chen and J. F. Hayes, ‘‘Call scheduling in multicast packet switching,’’ Proc. IEEE ICC ’92, pp. 895᎐899, 1992.
2. I. Cidon et al., ‘‘Real-time packet switching: a performance analysis,’’ IEEE J. Select. Areas Commun., vol. 6, no. 9, pp. 1576᎐1586, Dec. 1988.
3. FORE systems, Inc., ‘‘White paper: ATM switching architecture,’’ Nov. 1993. 4. J. N. Giacopelli, J. J. Hickey, W. S. Marcus, W. D. Sincoskie, and M. Littlewood,
‘‘Sunshine: a high-performance self-routing broadband packet switch architec- ture,’’ IEEE J. Select. Areas Commun., vol. 9, no. 8, pp. 1289᎐1298, Oct. 1991.
5. J. Hui and E. Arthurs, ‘‘A broadband packet switch for integrated transport,’’ IEEE J. Select. Areas Commun., vol. 5, no. 8, pp. 1264᎐1273, Oct. 1987.
6. A. Huang and S. Knauer, ‘‘STARLITE: a wideband digital switch,’’ Proc. IEEE GLOBECOM ’84, pp. 121᎐125, Dec. 1984.
7. M. J. Karol, M. G. Hluchyj, and S. P. Morgan, ‘‘Input Versus Output Queueing on a Space Division Packet Switch,’’ IEEE Trans. Commun., vol. COM-35, No. 12,
Dec. 1987. 8. T. Kozaki, N. Endo, Y. Sakurai, O. Matsubara, M. Mizukami, and K. Asano,
‘‘32 = 32 shared buffer type ATM switch VLSI’s for B-ISDN’s,’’ IEEE J. Select. Areas Commun., vol. 9, no. 8, pp. 1239᎐1247, Oct. 1991.
9. S. C. Liew and T. T. Lee, ‘‘ N log N dual shuffle-exchange network with error-cor- recting routing,’’ Proc. IEEE ICC ’92, Chicago, vol. 1, pp. 1173᎐1193, Jun. 1992.
10. P. Newman, ‘‘ATM switch design for private networks,’’ Issues in Broadband Networking, 2.1, Jan. 1992.
11. S. Nojima, E. Tsutsui, H. Fukuda, and M. Hashimoto, ‘‘Integrated services packet network using bus-matrix switch,’’ IEEE J. Select. Areas Commun., vol. 5, no. 10,
pp. 1284᎐1292, Oct. 1987.
REFERENCES
47
12. Y. Shobatake, M. Motoyama, E. Shobatake, T. Kamitake, S. Shimizu, M. Noda, and K. Sakaue, ‘‘A one-chip scalable 8 = 8 ATM switch LSI employing shared
buffer architecture,’’ IEEE J. Select. Areas Commun., vol. 9, no. 8, pp. 1248᎐1254, Oct. 1991.
13. H. Suzuki, H. Nagano, T. Suzuki, T. Takeuchi, and S. Iwasaki, ‘‘Output-buffer switch architecture for asynchronous transfer mode,’’ Proc. IEEE ICC ’89, pp.
99᎐103, Jun. 1989. 14. F. A. Tobagi, T. K. Kwok, and F. M. Chiussi, ‘‘Architecture, performance and
implementation of the tandem banyan fast packet switch,’’ IEEE J. Select. Areas Commun., vol. 9, no. 8, pp. 1173᎐1193, Oct. 1991.
15. Y. S. Yeh, M. G. Hluchyj, and A. S. Acampora, ‘‘The knockout switch: a simple, modular architecture for high-performance switching,’’ IEEE J. Select. Areas
Commun., vol. 5, no. 8, pp. 1274᎐1283, Oct. 1987.
H. Jonathan Chao, Cheuk H. Lam, Eiji Oki Copyright 䊚 2001 John Wiley Sons, Inc.
Ž .
Ž .
ISBNs: 0-471-00454-5 Hardback ; 0-471-22440-5 Electronic
CHAPTER 3
INPUT-BUFFERED SWITCHES
When high-speed packet switches were constructed for the first time, they used either internal shared buffer or input buffer and suffered the problem
of throughput limitation. As a result, most early-date research has focused on the output buffering architecture. Since the initial demand of switch capacity
is at the range of a few to 10᎐20 Gbitrs, output buffered switches seem to be a good choice for their high delay throughput performance and memory
Ž .
utilization for shared-memory switches . In the first few years of deploying Ž
. ATM switches, output-buffered switches including shared-memory switches
dominated the market. However, as the demand for large-capacity switches Ž
. increases rapidly either line rates or the switch port number increases , the
speed requirement for the memory must increase accordingly. This limits the capacity of output-buffered switches. Therefore, in order to build larger-scale
and higher-speed switches, people have focused on input-buffered or com- bined input᎐output-buffered switches with advanced scheduling and routing
techniques, which are the main subjects of this chapter.
Ž . Input-buffered switches have two problems: 1 throughput limitation due
Ž .
Ž . to the head-of-line HOL blocking and 2 the need of arbitrating cells due
to output port contention. The first problem can be circumvented by moder- ately increasing the switch fabric’s operation speed or the number of routing
Ž paths to each output port i.e., allowing multiple cells to arrive at the output
. port in the same time slot . The second problem is resolved by novel, fast
arbitration schemes that will be described in this chapter. According to Moore’s law, memory density doubles every 18 months. But the memory
speed increases at a much slower rate. For instance, the memory speed in 2001 is 5 ns for state-of-the-art CMOS static RAM, compared with 6 ns one
49
INPUT-BUFFERED SWITCHES
50
or two years ago. On the other hand, the speed of logic circuits increases at a higher rate than that of memory. Recently, much research has been devoted
to devising fast scheduling schemes to arbitrate cells from input ports to output ports.
Here are the factors used to compare different scheduling schemes: Ž .
Ž . Ž .
1 throughput, 2 delay, 3 fairness for cells, independent of port positions, Ž .
Ž . 4 implementation complexity, and 5 scalability as the line rate or the
switch size increases. Furthermore, some scheduling schemes even consider per-flow scheduling at the input ports to meet the delay᎐throughput require-
ments for each flow, which of course greatly increases implementation complexity and cost. Scheduling cells on a per-flow basis at input ports is
much more difficult than at output ports. For example, at an output port,
Ž .
cells or packets can be timestamped with values based on their allocated bandwidth and transmitted in ascending order of their timestamp values.
However, at an input port, scheduling cells must take output port contention into account. This makes the problem so complicated that so far no feasible
scheme has been devised.
A group of researchers attempted to use input-buffered switches to emulate output-buffered switches by moderately increasing the switch fabric
Ž .
operation speed e.g., to twice the input line rate together with some scheduling scheme. Although this has been shown to be possible, its imple-
mentation complexity is still too high to be practical. The rest of this chapter is organized as follows. Section 3.1 describes a
Ž .
simple switch model with input buffers optional and output buffers, and an on᎐off traffic model for performance study. Section 3.2 presents several
methods to improve the switch performance. The degradation is mainly caused by HOL blocking. Section 3.3 describes several schemes to resolve
output port contention among the input ports. Section 3.4 shows how an input-buffered switch can emulate an output-buffered switch. Section 3.5
presents a new scheduling scheme that can achieve a delay bound for an input-buffered switch without trying to emulate an output-buffered switch.
3.1 A SIMPLE SWITCH MODEL