Round-Robin Greedy Scheduling SCHEDULING ALGORITHMS

SCHEDULING ALGORITHMS 65 first input advances to point to output B in time slot 2, and a cell destined for B is chosen and then granted because of no contenders. The other two inputs have their HOL cells unchanged, both destined for output A. Only Ž . one of them the one from the second input is granted, and the other has to wait until the third time slot. At that time, the round-robin pointers among the three inputs have been desynchronized and point to C, B, and A, respectively. As a result, all three cells chosen are granted. Ž Figure 3.12 shows the tail probability under FIFO q RR FIFO for input . selection and RR for round-robin arbitration , DRR, and iSLIP arbitration schemes. The switch size is 256, and the average burst length is 10 cell slots Ž . with the on᎐off model . DRR’s and iSLIP’s performance are comparable at a speedup of 2, while all three schemes have almost the same performance at speedups c G 3.

3.3.5 Round-Robin Greedy Scheduling

Although iSLIP and DRRM are efficient scheduling algorithms to achieve high switch throughput, they have a timing constraint that the scheduling has to be completed within one cell time slot. That constraint can become a bottleneck when the switch size or a port speed increases. For instance, when Ž . considering a 64-byte fixed-length cell at a port speed of 40 Gbitrs OC-786 , the computation time available for maximal-sized matching is only 12.8 ns. To relax the scheduling timing constraint, a pipeline-based scheduling Ž . algorithm called round-robin greedy scheduling RRGS was proposed by w x Smiljanic et al. 30 . ´ Before we describe pipelined RRGS, nonpipelined RRGS is described. Consider an N = N crossbar switch, where each input port i, i g 0, 1, . . . , N 4 y 1 , has N logical queues, corresponding to each of the N outputs. All packets are fixed-size cells. The input of the RRGS protocol is the state of all Ž . inputroutput queues, or a set C s i, j there is at least one packet at 4 input i for output j . The output of the protocol is a schedule, or a set Ž . 4 S s i, j packet will be sent from input i to output j . Note that in each time slot, an input can only transmit one packet, and an output can receive only one packet. The schedule for the k th time slot is determined as follows: 䢇 4 Step 1: I s 0, 1, . . . , N y 1 is the set of all inputs; O s k k 4 Ž . Ž 0, 1, . . . , N y 1 is the set of all outputs. i s const y k mod N such choice of an input that starts a schedule will enable a simple implemen- . tation . 䢇 Step 2: If I is empty, stop; otherwise, choose the next input in a k Ž . round-robin fashion according to i s i q 1 mod N. 䢇 Step 3: Choose in a round-robin fashion the output j from O such k Ž . that i, j g C . If there is none, remove i from I and go to step 2. k k 䢇 Ž . Step 4: Remove input i from I , and output j from O . Add i, j to k k S . Go to step 2. k INPUT-BUFFERED SWITCHES 66 The scheduling of a given time slot in RRGS consists of N phases. In each phase, one input chooses one of the remaining outputs for transmission during that time slot. A phase consists of the request from the input module Ž . IM to the RR arbiter, RR selection, and acknowledgement from the RR arbiter to the IM. The RR order in which inputs choose outputs shifts cyclically for each time slot, so that it ensures equal access for all inputs. Now, we describe pipelined RRGS by applying the nonpipelined concept. We assume that N is an odd number to simplify the discussion. When N is an even number, the basic concept is the same, but there are minor changes. N separate schedules are in progress simultaneously, for N distinct time slots in the future. Each phase of a particular schedule involves just one input. In any given time slot, other inputs may simultaneously perform the phases of schedules for other distinct time slots in the future. While N time slots are required to complete the N phases of a given schedule, N phases of N different schedules may be completed within one time slot using a pipeline approach, and computing the N schedules in parallel. But this is effectively equivalent to the completion of one schedule every time slot. In RRGS, a specific time slot in the future is made available to the inputs for scheduling in a round-robin fashion. Input i, which starts a schedule for the k th time Ž . slot in the future , chooses an output in a RR fashion, and sends to the next Ž . input, i q 1 mod N, a set O that indicates the remaining output ports that k are still free to receive packets during the k th time slot. Any input i, on Ž . receiving from the previous input, i y 1 mod N, the set O of available k outputs for the k th time slot, chooses one output if possible from this set, Ž . and sends to the next input, i q 1 mod N, the modified set O , if input i k did not complete the schedule for the k th time slot, T . An input i that k completes a schedule for the k th time slot should not forward the modified Ž . Ž . set O to the next input, i q 1 mod N. Thus input i q 1 mod N, which k did not receive the set O in the current time slot, will be starting a new k Ž . schedule for a new time slot in the next time slot. Step 1 of RRGS implies that an input refrains from forwarding O once in N time slots. The input i k that does not forward O should be the last one that chooses an output for k the k th time slot. Thus, since each RR arbiter has only to select one candidate within one time slot, RRGS dramatically relaxes the scheduling time constraint compared with iSLIP and DRRM. Figure 3.13 shows an example of a timing diagram of a pipelined 5 = 5 RRGS algorithm. A simple structure for the central controller that executes the RRGS algorithm is shown in Figure 3.14. A RR arbiter associated to each input module communicates only with the RR arbiters associated to adjacent input modules, and the complex interconnection between input and output modules is avoided. It stores addresses of the reserved outputs into the Ž . memory M . SCHEDULING ALGORITHMS 67 INPUT-BUFFERED SWITCHES 68 Fig. 3.14 Central controller for RRGS protocol. w x The average delay of RRGS, D , is approximately given as 30 RRGS 1 y prN N D s q . 3.1 Ž . RRGS 1 y p 2 The price that RRGS pays for its simplicity is the additional pipeline delay, which is on average equal to Nr2 time slots. This pipeline delay is not critical for the assumed very short packet transmission time. Smiljanic et al. ´ w x 30 showed that RRGS provides better performance than iSLIP with one iteration for a heavy load. Ž . Smiljanic also proposed so-called weighted RRGS WRRGS , which guar- ´ w x antees prereserved bandwidth 31 , while preserving the advantage of RRGS. WRRGS can flexibly share the bandwidth of any output among the inputs.

3.3.6 Design of Round-Robin Arbiters r