Completely Shared-Buffer Switches PERFORMANCE OF BASIC SWITCHES

BASICS OF PACKET SWITCHING 44 Fig. 2.19 The mean waiting time for output queuing as a function of the offered load p, for N ™ ⬁ and output FIFO sizes varying from b s 1 to ⬁.

2.3.3 Completely Shared-Buffer Switches

With complete buffer sharing, all cells are stored in a common buffer shared by all inputs and outputs. One can expect that it will need less buffer to achieve a given cell loss probability, due to the statistical nature of cell arrivals. Output queuing can be maintained logically with linked lists, so that no cells will be blocked from reaching idle outputs, and we still can achieve the optimal throughput᎐delay performance, as with dedicated output queuing. In the following, we will take a look at how this approach improves the cell loss performance. Denote by Q i the number of cells destined for output i in m the buffer at the end of the mth time slot. The total number of cells in the shared buffer at the end of the mth time slot is Ý N Q i . If the buffer size is i s1 m infinite, then Q i s max 0, Q i q A i y 1 , 2.13 Ž . 4 m m y1 m where A i is the number of cells addressed to output i that arrive during the m mth time slot. With a finite buffer size, cell arrivals may fill up the shared buffer, and the Ž . resulting buffer overflow makes 2.13 only an approximation. However, we PEFORMANCE OF BASIC SWITCHES 45 Fig. 2.20 The cell loss probability for completely shared buffering as a function of Ž . the buffer size per output, b, and the switch size N, for offered loads a p s 0.8 and Ž . b p s 0.9. BASICS OF PACKET SWITCHING 46 Ž are only interested in the region of low cell loss probability e.g., less than y6 . 10 , in which this approximation is still good. When N is finite, A i , which is the number of cell arrivals destined for j Ž . output i in the steady state, is not independent of A j i . This is because at most N cells arrive at the switch, and a large number of cells arriving for one output implies a small number for the remaining outputs. As N goes to i Ž infinity, however, A becomes an independent Poisson random variable with . i mean value p . Then Q , which is the number of cells in the buffer that are destined for output i in the steady state, also becomes independent of Q j Ž . j i . We will use the Poisson and independence assumptions for finite N. These approximations are good for N G 16. Therefore we model the steady-state distribution of Ý N Q i , the number i s1 of cells in the buffer, as the N-fold convolution of N M rDr1 queues. With the assumption of an infinite buffer size, we then approximate the cell loss w N i x Ž . probability by the overflow probability Pr Ý Q G Nb . Figure 2.20 a and i s1 Ž . b show the numerical results. REFERENCES 1. X. Chen and J. F. Hayes, ‘‘Call scheduling in multicast packet switching,’’ Proc. IEEE ICC ’92, pp. 895᎐899, 1992. 2. I. Cidon et al., ‘‘Real-time packet switching: a performance analysis,’’ IEEE J. Select. Areas Commun., vol. 6, no. 9, pp. 1576᎐1586, Dec. 1988. 3. FORE systems, Inc., ‘‘White paper: ATM switching architecture,’’ Nov. 1993. 4. J. N. Giacopelli, J. J. Hickey, W. S. Marcus, W. D. Sincoskie, and M. Littlewood, ‘‘Sunshine: a high-performance self-routing broadband packet switch architec- ture,’’ IEEE J. Select. Areas Commun., vol. 9, no. 8, pp. 1289᎐1298, Oct. 1991. 5. J. Hui and E. Arthurs, ‘‘A broadband packet switch for integrated transport,’’ IEEE J. Select. Areas Commun., vol. 5, no. 8, pp. 1264᎐1273, Oct. 1987. 6. A. Huang and S. Knauer, ‘‘STARLITE: a wideband digital switch,’’ Proc. IEEE GLOBECOM ’84, pp. 121᎐125, Dec. 1984. 7. M. J. Karol, M. G. Hluchyj, and S. P. Morgan, ‘‘Input Versus Output Queueing on a Space Division Packet Switch,’’ IEEE Trans. Commun., vol. COM-35, No. 12, Dec. 1987. 8. T. Kozaki, N. Endo, Y. Sakurai, O. Matsubara, M. Mizukami, and K. Asano, ‘‘32 = 32 shared buffer type ATM switch VLSI’s for B-ISDN’s,’’ IEEE J. Select. Areas Commun., vol. 9, no. 8, pp. 1239᎐1247, Oct. 1991. 9. S. C. Liew and T. T. Lee, ‘‘ N log N dual shuffle-exchange network with error-cor- recting routing,’’ Proc. IEEE ICC ’92, Chicago, vol. 1, pp. 1173᎐1193, Jun. 1992. 10. P. Newman, ‘‘ATM switch design for private networks,’’ Issues in Broadband Networking, 2.1, Jan. 1992. 11. S. Nojima, E. Tsutsui, H. Fukuda, and M. Hashimoto, ‘‘Integrated services packet network using bus-matrix switch,’’ IEEE J. Select. Areas Commun., vol. 5, no. 10, pp. 1284᎐1292, Oct. 1987. REFERENCES 47 12. Y. Shobatake, M. Motoyama, E. Shobatake, T. Kamitake, S. Shimizu, M. Noda, and K. Sakaue, ‘‘A one-chip scalable 8 = 8 ATM switch LSI employing shared buffer architecture,’’ IEEE J. Select. Areas Commun., vol. 9, no. 8, pp. 1248᎐1254, Oct. 1991. 13. H. Suzuki, H. Nagano, T. Suzuki, T. Takeuchi, and S. Iwasaki, ‘‘Output-buffer switch architecture for asynchronous transfer mode,’’ Proc. IEEE ICC ’89, pp. 99᎐103, Jun. 1989. 14. F. A. Tobagi, T. K. Kwok, and F. M. Chiussi, ‘‘Architecture, performance and implementation of the tandem banyan fast packet switch,’’ IEEE J. Select. Areas Commun., vol. 9, no. 8, pp. 1173᎐1193, Oct. 1991. 15. Y. S. Yeh, M. G. Hluchyj, and A. S. Acampora, ‘‘The knockout switch: a simple, modular architecture for high-performance switching,’’ IEEE J. Select. Areas Commun., vol. 5, no. 8, pp. 1274᎐1283, Oct. 1987. H. Jonathan Chao, Cheuk H. Lam, Eiji Oki Copyright 䊚 2001 John Wiley Sons, Inc. Ž . Ž . ISBNs: 0-471-00454-5 Hardback ; 0-471-22440-5 Electronic CHAPTER 3 INPUT-BUFFERED SWITCHES When high-speed packet switches were constructed for the first time, they used either internal shared buffer or input buffer and suffered the problem of throughput limitation. As a result, most early-date research has focused on the output buffering architecture. Since the initial demand of switch capacity is at the range of a few to 10᎐20 Gbitrs, output buffered switches seem to be a good choice for their high delay throughput performance and memory Ž . utilization for shared-memory switches . In the first few years of deploying Ž . ATM switches, output-buffered switches including shared-memory switches dominated the market. However, as the demand for large-capacity switches Ž . increases rapidly either line rates or the switch port number increases , the speed requirement for the memory must increase accordingly. This limits the capacity of output-buffered switches. Therefore, in order to build larger-scale and higher-speed switches, people have focused on input-buffered or com- bined input᎐output-buffered switches with advanced scheduling and routing techniques, which are the main subjects of this chapter. Ž . Input-buffered switches have two problems: 1 throughput limitation due Ž . Ž . to the head-of-line HOL blocking and 2 the need of arbitrating cells due to output port contention. The first problem can be circumvented by moder- ately increasing the switch fabric’s operation speed or the number of routing Ž paths to each output port i.e., allowing multiple cells to arrive at the output . port in the same time slot . The second problem is resolved by novel, fast arbitration schemes that will be described in this chapter. According to Moore’s law, memory density doubles every 18 months. But the memory speed increases at a much slower rate. For instance, the memory speed in 2001 is 5 ns for state-of-the-art CMOS static RAM, compared with 6 ns one 49 INPUT-BUFFERED SWITCHES 50 or two years ago. On the other hand, the speed of logic circuits increases at a higher rate than that of memory. Recently, much research has been devoted to devising fast scheduling schemes to arbitrate cells from input ports to output ports. Here are the factors used to compare different scheduling schemes: Ž . Ž . Ž . 1 throughput, 2 delay, 3 fairness for cells, independent of port positions, Ž . Ž . 4 implementation complexity, and 5 scalability as the line rate or the switch size increases. Furthermore, some scheduling schemes even consider per-flow scheduling at the input ports to meet the delay᎐throughput require- ments for each flow, which of course greatly increases implementation complexity and cost. Scheduling cells on a per-flow basis at input ports is much more difficult than at output ports. For example, at an output port, Ž . cells or packets can be timestamped with values based on their allocated bandwidth and transmitted in ascending order of their timestamp values. However, at an input port, scheduling cells must take output port contention into account. This makes the problem so complicated that so far no feasible scheme has been devised. A group of researchers attempted to use input-buffered switches to emulate output-buffered switches by moderately increasing the switch fabric Ž . operation speed e.g., to twice the input line rate together with some scheduling scheme. Although this has been shown to be possible, its imple- mentation complexity is still too high to be practical. The rest of this chapter is organized as follows. Section 3.1 describes a Ž . simple switch model with input buffers optional and output buffers, and an on᎐off traffic model for performance study. Section 3.2 presents several methods to improve the switch performance. The degradation is mainly caused by HOL blocking. Section 3.3 describes several schemes to resolve output port contention among the input ports. Section 3.4 shows how an input-buffered switch can emulate an output-buffered switch. Section 3.5 presents a new scheduling scheme that can achieve a delay bound for an input-buffered switch without trying to emulate an output-buffered switch.

3.1 A SIMPLE SWITCH MODEL