Token Tunneling This section introduces a more efficient arbi-

INPUT-BUFFERED SWITCHES 70 TABLE 3.2 A Part of the Logic Table of the Bidirectional Arbiter Level Combination next a a DL DH UH GI ACK GI H H H H L L H H H L L L H H L H L H H H L L L H L H H H L H L H H L L H L H L H L H L H L L H L H L H H H L H L H L L L Ž . H L L H H L xp4 case 1 in Fig. 3.15 H L L L L H L L H H H L Ž . L L H L L L xp1 case 1 in Fig. 3.15 L L L H H L Ž . L L L L H L xp1 case 2 in Fig. 3.15 a Group Indicationᎏindicating which group the crosspoint belongs to. The arbitration procedure can be analyzed into two cases in Figure 3.15. Case 1 has group H active or both L and H active. In this case, the highest request in group H will be selected. Case 2 has only group L active. In that case, the highest request in group L will be selected. In case 1, the DH signal locates the highest crosspoint in Group H, xp4, and triggers the ACK signal. The UH signal indicates that there is at least one request in group H to group L. The DL signal finds the highest crosspoint, xp1, in group L, but no ACK signal is sent from xp1, because of the state of the UH signal. In the next cell period, xp1 to xp4 form group L, and xp5 is group H. In case 2, the UH signal shows that there is no request in group H; the DL signal finds the highest request in group L, xp1, and triggers the ACK signal. Note that both selected crosspoints in case 1 and case 2 are the same crosspoints that would be selected by a theoretical RR arbiter. The bidirectional arbiter operates two times faster than a normal ring Ž . arbiter unidirectional arbiter , but twice as many transmitted signals are required. The bidirectional arbiter can be implemented with simple hard- ware. Only 200 or so gates are required to achieve the distributed arbitration at each crosspoint, according to the logic table shown in Table 3.2.

3.3.6.2 Token Tunneling This section introduces a more efficient arbi-

w x tration mechanism called token tunneling 4 . Input requests are arranged into groups. A token starts at the RR pointer position and runs through all SCHEDULING ALGORITHMS 71 Ž . Fig. 3.16 The token tunneling scheme. 䊚2000 IEEE. requests before selecting the one with the highest priority. As shown in Ž . Figure 3.16 a , when the token arrives in a group where all requests are 0, it can skip this group taking the tunnel directly from the input of the group to the output. The arbitration time thus becomes proportional to the number of groups, instead of the number of ports. Ž . Suppose each group handles n requests. Each crosspoint unit XPU contributes a two-gate delay for arbitration. The token rippling through all the XPUs in the group where the token is generated and all the XPUs in the group where the token is terminated contributes a 4 n-gate delay. There are altogether Nrn groups, and at most Nrn y 2 groups will be tunneled Ž . through. This contributes a 2 Nrn y 2 -gate delay. Therefore, the worst case Ž time complexity of the basic token tunneling method is D s 4 n q 2 Nrn y t . 2 gates of delay, with 2 gates of delay contributed by the tunneling in each group. This occurs when there is only one active request and it is at the farthest position from the round-robin pointerᎏor example, the request is at the bottommost XPU while the round-robin pointer points to the topmost one. By tunneling through smaller XPU groups of size g and having a hierarchy Ž . of these groups as shown in Figure 3.16 b , it is possible to further reduce the INPUT-BUFFERED SWITCHES 72 Ž . worst case arbitration delay to D s 4 g q 5d q 2 Nrn y 2 gates, where t u Ž .v d s log nrg . The hierarchical method basically decreases the time spent 2 in the group where the token is generated and in the group where the token is terminated. For N s 256, n s 16, and g s 2, the basic token tunneling method requires a 92-gate delay, whereas the hierarchical method requires only a 51-gate delay.

3.4 OUTPUT-QUEUING EMULATION