round robin arbiter design and generation
play

Round-robin Arbiter Design and Generation Eung S. Shin Prof. - PowerPoint PPT Presentation

Round-robin Arbiter Design and Generation Eung S. Shin Prof. Vincent J. Mooney III Prof. George F. Riley Electrical and Computer Engineering Georgia Institute of Technology Outline Introduction Terminology Related Work Bus


  1. Round-robin Arbiter Design and Generation Eung S. Shin Prof. Vincent J. Mooney III Prof. George F. Riley Electrical and Computer Engineering Georgia Institute of Technology

  2. Outline � Introduction � Terminology � Related Work � Bus Arbiter (BA) Design � Switch Arbiter (SA) Design � Round-robin Arbiter Generator (RAG) � Comparison with other Switch Arbiters � Conclusion 2

  3. Introduction � As the number of bus masters increases in a single chip, the importance of fast and powerful arbiters commands more attention. � A fast arbiter is one of the dominant factors to achieve Network Switch (16x16) VOQ(0,0) terabit switching speeds. output port 0 . . � To design with high per- . input port 0 . VOQ(0,16) Crossbar . formance and fairness in Switch . Fabric . arbitration is a tedious (16x16)x16 . . and error-prone task. VOQ(16,0) output port 16 � Our goal is to provide a . . . input port 16 fast and fair arbiter design … … … VOQ(16,16) req(0, 0) . grant(0, 0-16) grant(16, 0-16) with a tool for automatic . . . . generation. 16 (16x16 arbiter)s . . 3 . . req(16, 16)

  4. Terminology � MxN Switch : M-input by N-output switch. • Example: A 32x32 switch is a 32-input by 32-output switch with 1024 (32 2 ) possible connections between input ports and output ports. � Virtual Output Queues (VOQs) : there are VOQs in a switch to remove possible output port contention (Head of Line (HOL) blocking). � VOQ ( m , n ) : m is the input port index and n is the output port index. • Example: VOQ ( 1 , 0 ) is the VOQ of input port 1and queues packets destined to output port 0. 4

  5. HOL Blocking Example Without VOQs output port 0 input port 0 1 0 input port 1 0 output port 1 5

  6. HOL Blocking Example With VOQs VOQ (0, 0) 0 output port 0 input port 0 VOQ (0, 1) 1 VOQ (1, 0) 0 input port 1 output port 1 VOQ (1, 1) 6

  7. Terminology (Continued) Network Switch (32x32) � (MxV)xN Switch : VOQ(0,0) output port 0 . • M is the number of . . input port 0 input ports of an MxN . VOQ(0,31) Crossbar . switch. Switch . Fabric . • V is the number of (32x32)x32 . . VOQs per input port. VOQ(31,0) output port 31 • N is the number of . . output ports of an MxN . input port 31 … … … VOQ(31,31) switch. req(0, 0) . grant(0, 0-31) grant(31, 0-31) . . • Typically, V is equal to . . 32 (32x32 arbiter)s N. . . . . • The total number of req(31, 31) VOQs in an MxN switch is M ∗ N. 7

  8. Terminology (Continued) � (MxV)xN crossbar (32x32)x32 Crossbar Switch Fabric switch fabric : VOQ (0, 0) VOQ (1, 0) output port 0 • There are connections . . between (MxV) inputs . . . . (from VOQ (0, 0) to VOQ . VOQ (31, 0) . (M-1, V-1)) and N outputs, . . . . . . the number of output . ports in the switch fabric. VOQ (0, 31) MxM Switch Arbiter (SA) : � VOQ (1, 31) output port 31 . . • An MxM SA controls M . . . . specific transmission VOQ (31, 31) gates between M VOQs . . . and a particular output port. • There are N MxM SAs in grant (31, 31) grant (31, 0) grant (0, 31) grant (1, 31) grant (0, 0) grant (1, 0) an MxN switch. . . . 32 x 32 32 x 32 8 SA_0 SA_31 Thirty-two 32x32 SAs

  9. Terminology (Continued) � MxM distributed SA (MxM hierarchical SA) : plays the same role as an MxM SA. • Consists of smaller switch arbiter in the form of a hierarchical tree structure. � Bus Arbiter (BA) : resolves bus conflicts when multiple bus masters request a bus in the same cycle. ack ack 8x8 hierarchical SA D-FF D-FF D-FF clock clock Ring Counter Ring Counter reset reset ack 4x4 BA 4x4 BA token [3] token [3] token [3] token [2] token [2] token [2] token [1] token [1] token [1] token [0] token [0] token [0] clock ack0[1] EN EN output[0] output[0] in[0] in[0] ack0[0] req[0] req[0] in[1] in[1] Priority Priority output[1] output[1] grant[0] grant[0] req[1] req[1] in[2] in[2] Logic 0 Logic 0 output[2] output[2] D counter req[2] req[2] grant0[0] req0[0] output[3] output[3] in[3] in[3] req[3] req[3] 4x4 grant0[1] req0[1] EN EN grant[1] grant[1] ack-req grant0[2] req0[2] Priority Priority SA 0 grant0[3] Logic 1 Logic 1 req0[3] grant[2] grant[2] EN EN Priority Priority D counter grant1[0] req1[0] Logic 2 Logic 2 4x4 grant1[1] req1[1] grant[3] grant[3] ack-req grant1[2] EN EN req1[2] SA 1 grant1[3] Priority Priority req1[3] Logic 3 Logic 3 2x2 req0 root req1 9 SA

  10. Related Work Centralized Switch Arbiters: � • Dual Round-Robin Matching algorithm (DRRM) – H. J. Chao and J. S. Park, “Centralized Contention Resolution Schemes for a Larger-capacity Optical ATM Switch,” Proceedings of IEEE ATM Workshop , 1998, pp. 11-16. • Programmable Priority Encoder (PPE) implementing iterative round-robin algorithm (iSLIP) – P. Gupta and N. Mckeown, “Designing and Implementing a Fast Crossbar Scheduler,” IEEE Micro , 1999, pp. 20-28. – N. Mckeown, P. Varaiya, and J. Warland, “The iSLIP Scheduling Algorithm for Input-Queued Switch,” IEEE Transaction on Networks , 1999, pp. 188-201. � Distributed Switch Arbiter: • Ping Pong Arbiter (PPA) – H. J. Chao, C. H. Lam, and X. Guo, “A Fast Arbitration Scheme for Terabit Packet Switches,” Proceedings of IEEE Global Telecommunications Conference , 1999, pp. 1236-1243. � We will show how our generated SA achieves throughput 2.4X higher than PPE and 1.9X higher than PPA (and thus, at least 1.9X higher than DRRM since PPA outperforms DRRM). 10

  11. Bus Arbiter Design � Implemented based on ring counter for a token and “priority logic”. � Priority Logic for 4 inputs: • output[0] = EN•in[0] • output[1] = EN•in[0]'•in[1] • output[2] = EN•in[0]'•in[1]'•in[2] • output[3] = EN•in[0]'•in[1]'•in[2]'•in[3] EN in [0] in [1] in [2] in [3] output [0] output [1] output [2] output [3] 0 X X X X 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 X X X 1 0 0 0 1 0 1 X X 0 1 0 0 1 0 0 1 X 0 0 1 0 1 0 0 0 1 0 0 0 1 11

  12. Example: Bus Arbiter 4x4 BA Processor 0 Processor 1 Processor 2 Processor 3 ring counter token[2] grant[0] output[2] req[1] req[0] PL 2 ack Memory Condition: � • Token=4’b0100 → Processor 2 has the highest priority. • Processor 0 and processor 1 request a bus. � Result: • Only Priority Logic 2 is enabled. • Processor 0 is granted because the higher priority parties (processor 2 and processor 3) do not request a bus. • Token is rotated to 4’b1000 after the ring counter receives ack signal. 12

  13. Example: Bus Arbiter (Continued) ack D-FF clock Ring Counter reset 4x4 BA token [2] token [3] token [2] token [1] token [0] EN output[0] in[0] req[0] output[1] in[1] Priority grant[0] grant[0] req[1] output[2] in[2] Logic 0 req[2] output[3] in[3] req[3] EN grant[1] Priority Logic 1 grant[2] EN Priority Priority Logic 2 Logic 2 grant[3] EN Priority Logic 3 13

  14. Switch Arbiter Design A hierarchical SA consists of � 4x4 ack-req SA small switch arbiter blocks. clock grant0[0] ack 4x4 reset grant0[1] � There are four types of Bus req0[0] req0[1] grant0[2] Arbiter switch arbiter blocks. req0[2] grant0[3] req0[3] • 2x2 ack-req SA. req0 • 4x4 ack-req SA. 2x2 ack-req SA • 2x2 root SA. clock ack 2x2 grant0[1] • 4x4 root SA. reset Bus req0[0] grant0[2] � A root SA placed on the top Arbiter req0[1] req0 of a hierarchy. 4x4 root SA 2x2 root SA clock clock ring counter ring counter reset ack0 reset ack0 req0 2x2 4x4 ack1 req1 req0 BA BA ack2 req2 without without ack3 ack1 req3 req1 D flip-flop D flip-flop 14

  15. Key Insight 16x16 SA 4x4 4x4 4x4 4x4 � With TSMC .25 µ std. cell library from LEDA Systems, 4x4 is the 4x4 “sweet spot” of high performance → analogous to std. cell design where .76 ns using 4-input gates in design 16x16 PPA speeds up over, say only 2-input 8x8 SA gates or 8-input gates. 2x2 4x4 2x2 • Use as many 4x4 as possible. 2x2 2x2 • Use 2x2 if needed. 2x2 4x4 2x2 .53 ns 2x2 4x4 2x2 8x8 PPA 2x2 SA 2x2 2x2 Our SA from RAG 2x2 .34 ns 2x2 2x2 SA 2x2 4x4 PPA 2x2 2x2 PPA .24 ns 2x2 2x2 2x2 2x2 2x2 2x2 2x2 2x2 PPE 2x2 2x2 2x2 PPA .65 ns 1.45 ns .85 ns .40 ns 4x4 8x8 16x16 2x2 PPE PPE PPE PPE 1.55 ns .45 ns .61 ns 1.12 ns

Recommend


More recommend