Window-Based Lookahead Selection Increasing Scheduling Efficiency

INPUT-BUFFERED SWITCHES 54 Simulation studies show that a speedup factor of 2 yields 100 throughput w x 20, 12 . There is another meaning when people talk about ‘‘speedup’’ in the literature. At most one cell can be transferred from an input in a time slot, but during the same period of time an output can accept up to c cells w x 27, 6, 28 . In bursty traffic mode, a factor of 2 only achieves 82.8 to 88.5 Ž . throughput, depending on the degree of input traffic correlation burstiness w x 19 .

3.2.1.3 Parallel Switch

The parallel switch consists of K identical switch w x planes 21 . Each switch plane has its own input buffer and shares output buffers with other planes. The parallel switch with K s 2 achieves the maximum throughput of 1.0. This is because the maximum throughput of each switch plane is more than 0.586 for arbitrary switch size N. Since each input port distributes cells to different switch planes, the cell sequence is out of order at the output port. This type of parallel switch requires timestamps, and cell sequence regeneration at the output buffers. In addition, the hardware resources needed to implement the switch are K times as much as for a single switch plane.

3.2.2 Increasing Scheduling Efficiency

3.2.2.1 Window-Based Lookahead Selection

Throughput can be in- creased by relaxing the strict FIFO queuing discipline at input buffers. Although each input still sends at most one cell into the switch fabric per time slot, it is not necessarily the first cell in the queue. On the other hand, no more than one cell destined for the same output is allowed to pass through the switch fabric in a time slot. At the beginning of each time slot, the first w cells in each input queue sequentially contend for access to the Ž . switch outputs. The cells at the heads of the input queues HOL cells contend first. Due to output conflict, some inputs may not be selected to transmit the HOL cells, and they send their second cells in line to contend for access to the remaining outputs that are not yet assigned to receive cells in this time slot. This contention process is repeated up to w times in each time slot. It allows the w cells in an input buffer’s window to sequentially contend for any idle outputs until the input is selected to transmit a cell. A window size of w s 1 corresponds to input queuing with FIFO buffers. Table 3.1 shows the maximum throughput achievable for various switch Ž . and window sizes N and w, respectively . The values were obtained by simulation. The throughput is significantly improved on increasing the win- Ž . dow size from w s 1 i.e., FIFO buffers to w s 2, 3, and 4. Thereafter, however, the improvement diminishes, and input queuing with even an Ž . infinite window w s ⬁ does not attain the optimal delay᎐throughput per- formance of output queuing. This is because input queuing limits each input to send at most one cell into the switch fabric per time slot, which prevents cells from reaching idle outputs. METHOD FOR IMPROVING PERFORMANCE 55 TABLE 3.1 The Maximum Throughput Achievable with Input Queuing for Various Switch Sizes N and Window Sizes w Window Size w N 1 2 3 4 5 6 7 8 2 0.75 0.84 0.89 0.92 0.93 0.94 0.95 0.96 4 0.66 0.76 0.81 0.85 0.87 0.89 0.91 0.92 8 0.62 0.72 0.78 0.82 0.85 0.87 0.88 0.89 16 0.60 0.71 0.77 0.81 0.84 0.86 0.87 0.88 32 0.59 0.70 0.76 0.80 0.83 0.85 0.87 0.88 64 0.59 0.70 0.76 0.80 0.83 0.85 0.86 0.88 128 0.59 0.70 0.76 0.80 0.83 0.85 0.86 0.88 Fig. 3.4 Virtual output queue at the input ports.

3.2.2.2 VOQ-Based Matching