Automated Generation of Round-robin Arbitration and Crossbar Switch Logic Eung S. Shin Advisor: Professor Vincent J Mooney III School of Electrical and Computer Engineering, Georgia Institute of Technology
Overview Crossbar (Xbar) GPP GPP DSP DSP peripheral on-chip network custom logic core round- & arbiter robin arbiter memory memory memory memory module module module module Multiprocessor System-on-a Chip (SoC) 2/12/2004 2
Arbiter Problems Network Switch (16x16) A fast and powerful VOQ(0,0) output port 0 arbiter for an SoC . . . input port 0 . VOQ(0,16) A fast arbiter for terabit Crossbar . Switch . Fabric . switching speeds (16x16)x16 . . VOQ(16,0) A tedious and error- output port 16 . . . prone task input port 16 … … … VOQ(16,16) req(0, 0) . grant(0, 0-16) grant(16, 0-16) . . . . 16 (16x16 arbiter)s . . . . req(16, 16) 2/12/2004 3
Xbar Problems Multiple communication channels demanded in a multiprocessor SoC Challenge: reducing productivity gap Productivity gap reduction techniques: � Enhancing IP core reusability � Developing a CAD tool 2/12/2004 4
Objective To design and automate a fast round-robin arbiter logic generation for a bus or a network switch � The generated arbiter employed to crossbar (Xbar) switch arbitration logic To automate Xbar generation providing multiple communication paths among masters � The generated Xbar customized according to user specifications 2/12/2004 5
Outline Terminology Origin and history of problems: � Arbiter design: PPE and PPA � Crossbar switch design: “Smart” Memory Arbiter design Arbiter experiments RAG: Round-robin Arbiter Generator X-Gt: Xbar Generator Xbar experiments Conclusion 2/12/2004 6
Terminology Network Switch (32x32) VOQ(0,0) MxN Switch : M-input by N-output switch output port 0 . . � Example: A 32x32 switch − 32-input by 32-output . input port 0 . VOQ(0,31) Crossbar . switch with 1024 (32 2 ) possible connections Switch . Fabric between input ports and output ports . (32x32)x32 . . Virtual Output Queues (VOQs) : to remove VOQ(31,0) possible output port contention (Head of Line output port 31 . . (HOL) blocking) . input port 31 … … … VOQ(31,31) req(0, 0) VOQ ( m , n ) : m − the input port index; n − the . grant(0, 0-31) grant(31, 0-31) . . . output port index . 32 (32x32 arbiter)s . . . � Example: VOQ ( 1 , 0 ) . req(31, 31) 2/12/2004 7
Terminology (Continued) (MxV)xN Switch : Network Switch (32x32) VOQ(0,0) � M − the number of output port 0 . input ports of an MxN . . input port 0 switch . VOQ(0,31) Crossbar . � V − the number of Switch . Fabric VOQs per input port . (32x32)x32 . � N − the number of . output ports of an MxN VOQ(31,0) output port 31 switch . . � Typically, V = N . input port 31 … … … VOQ(31,31) req(0, 0) � The total number of . grant(0, 0-31) grant(31, 0-31) . . VOQs in an MxN . switch − M ∗ N . 32 (32x32 arbiter)s . . . . req(31, 31) 2/12/2004 8
Terminology (Continued) (MxV)xN crossbar (32x32)x32 Crossbar Switch Fabric switch fabric : VOQ (0, 0) VOQ (0, 0) � Connections between VOQ (1, 0) VOQ (1, 0) output port 0 (MxV) inputs and N . . . . . . . . outputs . . . . . VOQ (31, 0) VOQ (31, 0) . MxM Switch Arbiter . . . . . . . (SA) : VOQ (0, 31) � Controlling M specific VOQ (1, 31) output port 31 . . transmission gates . . . . between M VOQs and VOQ (31, 31) . . . a particular output port � N MxM SAs in an MxN switch grant (31, 31) grant (31, 0) grant (0, 31) grant (1, 31) grant (1, 0) grant (0, 0) . . . 32 x 32 32 x 32 SA_0 SA_31 2/12/2004 9 Thirty-two 32x32 SAs
Terminology (Continued) MxM distributed SA (MxM hierarchical SA) : � Equivalent to an MxM SA � Consisting of smaller switch arbiter in the form of a hierarchical tree structure Bus Arbiter (BA) : resolving bus conflicts 8x8 hierarchical SA 8x8 hierarchical SA 8x8 hierarchical SA 8x8 hierarchical SA 8x8 hierarchical SA 8x8 hierarchical SA 8x8 hierarchical SA ack ack D-FF D-FF D-FF D-FF clock clock Ring Counter Ring Counter ack ack ack ack ack ack ack reset reset 4x4 BA 4x4 BA token [1] token [1] token [1] token [1] token [0] token [0] token [0] token [0] token [3] token [3] token [3] token [3] token [2] token [2] token [2] token [2] clock clock clock clock clock clock clock ack0[1] ack0[1] ack0[1] ack0[1] ack0[1] ack0[1] ack0[1] EN EN ack0[0] ack0[0] ack0[0] ack0[0] ack0[0] ack0[0] ack0[0] output[0] output[0] in[0] in[0] req[0] req[0] in[1] in[1] output[1] output[1] grant[0] grant[0] Priority Priority req[1] req[1] D D D D D D D counter counter counter counter counter counter counter grant0[0] grant0[0] grant0[0] grant0[0] grant0[0] grant0[0] grant0[0] grant0[0] grant0[0] grant0[0] grant0[0] in[2] in[2] output[2] output[2] Logic 0 Logic 0 req0[0] req0[0] req0[0] req0[0] req0[0] req0[0] req0[0] req0[0] req0[0] req0[0] req0[0] req[2] req[2] output[3] output[3] in[3] in[3] req[3] req[3] 4x4 4x4 4x4 4x4 4x4 4x4 4x4 grant0[1] grant0[1] grant0[1] grant0[1] grant0[1] grant0[1] grant0[1] grant0[1] grant0[1] grant0[1] grant0[1] req0[1] req0[1] req0[1] req0[1] req0[1] req0[1] req0[1] req0[1] req0[1] req0[1] req0[1] ack-req ack-req ack-req ack-req ack-req ack-req ack-req EN EN grant0[2] grant0[2] grant0[2] grant0[2] grant0[2] grant0[2] grant0[2] grant0[2] grant0[2] grant0[2] grant0[2] grant[1] grant[1] req0[2] req0[2] req0[2] req0[2] req0[2] req0[2] req0[2] req0[2] req0[2] req0[2] req0[2] SA 0 SA 0 SA 0 SA 0 SA 0 SA 0 SA 0 Priority Priority grant0[3] grant0[3] grant0[3] grant0[3] grant0[3] grant0[3] grant0[3] grant0[3] grant0[3] grant0[3] grant0[3] req0[3] req0[3] req0[3] req0[3] req0[3] req0[3] req0[3] req0[3] req0[3] req0[3] req0[3] Logic 1 Logic 1 grant[2] grant[2] EN EN D D D D D D D counter counter counter counter counter counter counter grant1[0] grant1[0] grant1[0] grant1[0] grant1[0] grant1[0] grant1[0] grant1[0] grant1[0] grant1[0] grant1[0] Priority Priority req1[0] req1[0] req1[0] req1[0] req1[0] req1[0] req1[0] req1[0] req1[0] req1[0] req1[0] Logic 2 Logic 2 4x4 4x4 4x4 4x4 4x4 4x4 4x4 grant1[1] grant1[1] grant1[1] grant1[1] grant1[1] grant1[1] grant1[1] grant1[1] grant1[1] grant1[1] grant1[1] req1[1] req1[1] req1[1] req1[1] req1[1] req1[1] req1[1] req1[1] req1[1] req1[1] req1[1] ack-req ack-req ack-req ack-req ack-req ack-req ack-req grant[3] grant[3] grant1[2] grant1[2] grant1[2] grant1[2] grant1[2] grant1[2] grant1[2] grant1[2] grant1[2] grant1[2] grant1[2] req1[2] req1[2] req1[2] req1[2] req1[2] req1[2] req1[2] req1[2] req1[2] req1[2] req1[2] SA 1 SA 1 SA 1 SA 1 SA 1 SA 1 SA 1 EN EN grant1[3] grant1[3] grant1[3] grant1[3] grant1[3] grant1[3] grant1[3] grant1[3] grant1[3] grant1[3] grant1[3] req1[3] req1[3] req1[3] req1[3] req1[3] req1[3] req1[3] req1[3] req1[3] req1[3] req1[3] Priority Priority Logic 3 Logic 3 2x2 2x2 2x2 2x2 2x2 2x2 2x2 req0 req0 req0 req0 req0 req0 req0 root root root root root root root 2/12/2004 10 req1 req1 req1 req1 req1 req1 req1 SA SA SA SA SA SA SA
Requirements for a Terabit Switch Arbiter Starvation free Fast Arbitration Simplicity to implement Low power: � Power budget of single rack router ~ 10kW 2/12/2004 11
Outline Terminology Origin and history of problems: � Arbiter design: PPE and PPA � Crossbar switch design: “Smart” Memory Arbiter design Arbiter experiments RAG: Round-robin Arbiter Generator X-Gt: Xbar Generator Xbar experiments Conclusion 2/12/2004 12
History: Arbiter in PPE P_enc Req log 2 n n Centralized Switch tothermo Arbiters: n P_thermo � Programmable Priority Encoder (PPE) new_Req n implementing iterative Priority Encoder Priority Encoder_thermo round-robin algorithm (iSLIP) n n n Gnt_PE P. Gupta and N. Mckeown, � any_Gnt_PE_thermo “Designing and Implementing a Fast Crossbar Scheduler,” IEEE Micro , 1999, pp. 20-28. N. Mckeown, P. Varaiya, and J. Gnt_PE_thermo � Warland, “The iSLIP Scheduling Algorithm for Input-Queued Switch,” n IEEE Transaction on Networks , 1999, pp. 188-201. Gnt 2/12/2004 13
History: Arbiter in PPA r0 Distributed Switch r1 layer 4 Arbiter: Fi Gg0 Gg1 � Ping Pong Arbiter (PPA) external grant signals layer 3 H. J. Chao, C. H. Lam, and � r0 g0 X. Guo, “A Fast Arbitration 2x2 Scheme for Terabit Packet r1 g1 PPA Switches,” Proceedings of layer 2 IEEE Global Telecommunications Gg0 Conference , 1999, pp. 1236- Fi Fo 1243. layer 1 Gg1 Q D Comparison: our generated SA 2.3X 1 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Clock faster than PPE and root PPA intermediate PPA leaf PPA 1.8X faster than PPA g1 Fo g0 2/12/2004 14
Why do we need an arbiter for an SoC? Arbitration required by all buses Our arbiter applicable to anywhere requiring arbitration The generated arbiter utilized in our Xbar 2/12/2004 15
Recommend
More recommend