Comparison of Channel Protocols for Fast, Low Energy Communication over Transmission Lines Shomit Das*, Kenneth S. Stevens The University of Utah *now with AMD Research
Exascale Challenges Cost of data movement relative to cost of a flop* Data movement energy component** * J. Shalf et.al., Exascale Computing Technology Challenges, LNCS 2011 ** G. Kestor et.al., Quantifying the energy cost of data movement in scientific applications, IISWC 2013
Interconnect Challenges *Shekhar Borkar, Exascale Computing- Fact or Fiction? IPDPS 2013
Global Interconnect AMD NVidia Exascale System Architecture Examples (proposed)
Transmission Lines Transmission Lines require thick top level metals They require carefully designed signal and return paths Signal integrity depends on interconnect aspect ratio among many other factors Bandwidth per unit area suffers as a result Repeated RC wire Analog signaling techniques such as differential signaling, current mode signaling are applied Higher frequencies can be used SerDes means more timing and energy considerations SerDes based TL interconnect On-chip Transmission Lines
Transmission Lines – Design and Simulation 7mm TL Transmission Line Interconnect Design Environment *H.G. Rhew et.al., A 22Gb/s, 10 mm on-chip serial link over lossy Transmission Line, ESSCIRC 2012
Communication Protocols � Dual Rail � Bundled Data 4-phase � Bundled Data 2-phase � Source Asynchronous Signaling � Clocked latched � Clocked flopped � Source Synchronous
Communication Protocols - SAS Source Asynchronous Signaling (uncoupling req and ack)
Metrics
Models – Cycle Time Cycle Time expressions
Models – Latency Latency expressions
Models – Energy Energy per transaction expressions
Comparisons – Cycle Time Cycle Time Comparison RC vs TL 100 50 90 45 80 40 70 35 30 60 25 50 20 40 15 30 10 20 5 10 0 0 DualRail BD4 BD2 SAS Clock_l Clock_f SrcSync DualRail BD4 BD2 SAS Clock_l Clock_f SrcSync RC TL
Comparisons - Latency Latency Comparison RC vs TL 25 10 9 20 8 7 6 15 5 4 10 3 2 5 1 0 0 DualRail BD4 BD2 SAS Clock_l Clock_f SrcSync DualRail BD4 BD2 SAS Clock_l Clock_f SrcSync RC TL
Comparisons - Energy Energy Comparison RC vs TL 800 200 180 700 160 600 140 500 120 400 100 80 300 60 200 40 100 20 0 0 DualRail BD4 BD2 SAS Clock_l Clock_f SrcSync DualRail BD4 BD2 SAS Clock_l Clock_f SrcSync RC TL
Key Observations � Clocked protocols have better timing Percentage difference in Cycle Time characteristics � 120 Clock distribution energy is a killer � Single “cycle” communication due to 100 discontinuity-free requirement of TLs 80 � SAS provides clocked-like timing without the 60 energy overhead of clock distribution 40 � Longer distances more “manageable” using 20 Transmission Lines 0 � SAS outperforms other protocols in almost all DualRail BD4 BD2 SAS Clock_l Clock_f SrcSync metrics TL RC � No wavepipelining Effect of link length (7mm vs 3mm) � SAS robust to variation due to decoupled throughput and wire latency
Recommend
More recommend