Basic Architecture Unicasting Operation

TDXP STRUCTURE 241 speed of the switch fabric is the same as the input line rate. However, it suffers from the HOL blocking problem. The performance of the input- w x buffered switch was analyzed by Karol et al. 5 . They showed that, when the switch size N is infinite, the maximum throughput of the switch is 0.586, assuming that the internal speed of the switch is equal to that of inputr output lines. The limitation of the maximum throughput is due to HOL blocking in the input buffers. Several problems have to be addressed in order to improve the limited w x throughput of the input buffering switch 10 . One possible solution to HOL blocking is to increase the internal line speed of the switch as Ž . shown in Figure 9.1 a . Oie et al. analyzed the performance of the internal speedup switch with input and output buffers when the speedup factor is L Ž . w x 1 F L F N 9 . Yamanaka et al. developed a high-speed switching system that has 160-Gbitrs throughput; the internal line speed was twice that of the w x inputroutput lines 14, 4, 6 . In the switch reported, the inputroutput speed is 10 Gbitrs, so the internal line speed is 20 Gbitrs. To realize these speeds, the switch adopted ultrahigh-speed Si bipolar devices and special high- Ž . w x density multichip module MCM techniques 6, 14 . However, for much larger throughputs this internal speedup crossbar switch architecture is not so cost-effective, given the limitation of current hardware technologies. Another possible approach to improve the performance of crossbar-type Ž . switches is to employ a parallel switch architecture as shown in Figure 9.1 b w x 13, 8 . The parallel switch consists of K identical switch planes. Each switch plane has its own input buffer and shares output buffers with other planes. The parallel switch with K s 2 achieves the maximum throughput of 1.0. This is because the maximum throughput of each switch plane is more than 0.586 for arbitrary switch size N. Using this concept, Balboni et al. developed w x an industrial 160-Gbitrs cross-connect system 1, 7 . At the input buffers, however, timestamp values must be placed in each cell header. At the output ports, cells are buffered by implementing a maximum-delay equalization mechanism in order to rebuild the cell sequences, which, due to the internal w x routing algorithm, can not be guaranteed 3 . Thus, this type of parallel switch requires timestamps and also requires cell sequence regeneration at the output buffers. In addition, the hardware resources needed to implement the switch are double those of a single-plane switch. Considering the imple- mentation for much larger switches, rebuilding of the cell sequences at high-speed also makes cost-effective implementation unlikely.

9.2 TDXP STRUCTURE

9.2.1 Basic Architecture

Figure 9.2 shows the structure of the TDXP switch. It has, logically, multiple crossbar switch planes. The number of crossbar switch planes is K in general. The case with K s 3 is shown in Figure 9.2. The larger K is, the better the THE TANDEM-CROSSPOINT SWITCH 242 Ž . Fig. 9.2 Tandem-crosspoint switch structure. 䊚1997 IEEE. switch performance, is but at the expense of implementation cost. These switch planes are connected in tandem at every crosspoint. That is why this switch is called a TDXP switch. The internal speed in each plane is the same as the inputroutput line speed. In other words, each switch plane can transmit only one cell to each output port within one cell time slot. If more than one cell goes to the same output port on the same switch plane, unsuccessful cells that are not transmitted to the output port are stored in the TDXP. However, the TDXP switch that has multiple switch planes can transmit up to K cells to each output port within one time slot.

9.2.2 Unicasting Operation

The cell transmission algorithm in the TDXP switch for unicasting is first explained. Ž . Step 1 A cell at the head of the input buffer sends a request signal REQ to the destination TDXP according to the routing bits written in the cell header. Then go to Step 2. TDXP STRUCTURE 243 Ž . Fig. 9.3 Cell transmission mechanism of TDXP switch. 䊚1997 IEEE. Step 2 The TDXP that receives an REQ sends a not-acknowledge signal Ž . NACK back to the input buffer if the TDXP is already handling or buffering a cell that cannot be transmitted to the output line because Ž . of contention on the output line, as shown in Figure 9.3 a . Then go to Step 3. Step 3 The cell at the head of the input buffer that sent the REQ is sent to the destination crosspoint on the first switch plane if NACK is not Ž . received within a certain time, as shown in Figure 9.3 b . Then go to Step 4, setting k s 1. Otherwise, the cell is not sent to the crosspoint; at the next cell time, go to Step 1. Step 4 The k th crosspoint sends a request signal to an arbitration con- Ž troller asking for transmission to the destination output buffer. We . refer to the crosspoint on the k th plane as the k th crosspoint. The arbitration control on the k th plane is executed independently of that of the other planes. Ring arbitration is one possible approach. If the request is accepted by the arbitration controller, the cell is transmitted to the output buffer. Then, go to Step 5. Otherwise, the cell is moved to THE TANDEM-CROSSPOINT SWITCH 244 the k q 1th crosspoint if k is not equal to K, and k is set to k q 1. If k is equal to K, the cell is moved to the first crosspoint and k is set to 1. Then, go back to the beginning of Step 4 after one cell time slot. Step 5 The cell transmitted from the TDXP is stored in the output buffer. The output buffer can receive more than one cell within one cell time. It is noted that each TDXP needs only one cell buffer for arbitrary K. This is because, when one cell is stored in a TDXP, the following cell does not go to the same TDXP, due to the back-pressure mechanism. To clarify the cell transmission mechanism, let us consider that K q 1 cells request to be transmitted to the same output port on the first switch plane. First, on the first switch plane, only one cell is transmitted to the output port, and the other K cells go to the second switch plane. At the next cell time slot, only one cell is transmitted to the output port on the second switch plane, and K y 1 cells go to the third plane switch. In the same way, on the K th switch plane, one cell of the remaining two is transmitted to the output port. The unsuccessful cell that cannot be transmitted to the output port goes back to the first switch plane and tries again to be transmitted to the output port, competing with other cells that request to be transmitted on the first switch plane. Figure 9.4 shows the behavior of the cell transmission mechanism with K s 3, when four cells request to be transmitted to the same output port on the first switch plane at t s 0. In Figure 9.4, the states of only one output port are depicted. At t s 0, the cell at the second input port is transmitted on the first switch plane. Then, the cell at the third input port is transmitted on the second switch plane at t s 1, the cell at the first input port is transmitted on the third switch plane at t s 2, and the cell at the fifth input port is transmitted on the first plane at t s 3. These procedures are executed in a pipelined manner at every cell time.Therefore, more than one cell can be transmitted to the same output buffer within one cell time slot, even though the internal line speed of each switch plane equals the inputroutput line speed. When K s 3, three cells that come from different input lines can, as the maximum case, be transmit- ted to the output buffer at the same time slot at t s 2, as shown in Figure 9.4. Thus, the TDXP switch achieves a similar result to the internal speedup switch in eliminating HOL blocking. However, the effect on HOL blocking in the TDXP switch is not exactly same as that of the internal speedup switch. This is because the TDXP switch has one cell buffer at each TDXP and employs a backpressure mechanism in the input buffers, while the internal speedup switch does not have any crosspoint buffers. A detailed discussion is given in Section 9.3, considering the effect of such a backpressure mechanism. In addition, while a TDXP is handling a cell, the input buffer does not send the head-of-line cell to the same TDXP. The same TDXP never transmits more than one cell within the same cell time slot. Therefore, cell TDXP STRUCTURE 245 Ž . Fig. 9.4 Behavior of cell transmission mechanism K s 3 . sequences are completely guaranteed, and there is no need to rebuild them at the output buffers. Another advantage of the TDXP switch is that, although it has logically multiple crossbar switch planes, the hardware resources required are much less than for the parallel switch. This is because the parallel switch cannot share the hardware resources, while the TDXP switch can share input buffers, internal input lines, and so on. This is a significant implementation benefit. THE TANDEM-CROSSPOINT SWITCH 246

9.2.3 Multicasting Operation