Multi-core Architectures Interconnect Technology Virendra Singh Associate Professor Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology Bombay http://www.ee.iitb.ac.in/~viren/ E-mail: viren@ee.iitb.ac.in CS-683: Advanced Computer Architecture Lecture 29 (30 Oct 2013) CADSL
Topology Summary • First network design decision • Critical impact on network latency and throughput – Hop count provides first order approximation of message latency – Bottleneck channels determine saturation throughput CADSL 30 Oct 2013 CS-683@IITB 2
Routing Summary • Latency paramount concern – Minimal routing most common for NoC – Non-minimal can avoid congestion and deliver low latency • To date: NoC research favors DOR for simplicity and deadlock freedom – On-chip networks often lightly loaded • Only covered unicast routing – Recent work on extending on-chip routing to support multicast CADSL 30 Oct 2013 CS-683@IITB 3
Switching/Flow Control Overview • Topology: determines connectivity of network • Routing: determines paths through network • Flow Control: determine allocation of resources to messages as they traverse network – Buffers and links – Significant impact on throughput and latency of network CADSL 30 Oct 2013 CS-683@IITB 4
Packets • Messages: composed of one or more packets – If message size is <= maximum packet size only one packet created • Packets: composed of one or more flits • Flit: flow control digit • Phit: physical digit – Subdivides flit into chunks = to link width – In on-chip networks, flit size == phit size. ● Due to very wide on-chip channels CADSL 30 Oct 2013 CS-683@IITB 5
Switching • Different flow control techniques based on granularity • Circuit-switching: operates at the granularity of messages • Packet-based: allocation made to whole packets • Flit-based: allocation made on a flit-by-flit basis CADSL 30 Oct 2013 CS-683@IITB 6
Virtual Cut Through • Packet-based: similar to Store and Forward • Links and Buffers allocated to entire packets • Flits can proceed to next hop before tail flit has been received by current router – But only if next router has enough buffer space for entire packet • Reduces the latency significantly compared to SAF CADSL • But still requires large buffers 30 Oct 2013 CS-683@IITB 7
Virtual Cut Through Example 0 5 • Lower per-hop latency • Larger buffering required CADSL 30 Oct 2013 CS-683@IITB 8
Flit Level Flow Control • Wormhole flow control • Flit can proceed to next router when there is buffer space available for that flit – Improved over SAF and VCT by allocating buffers on a flit-basis • Pros – More efficient buffer utilization (good for on- chip) – Low latency • Cons CADSL 30 Oct 2013 CS-683@IITB 9 – Poor link utilization: if head flit becomes
Wormhole Example Red holds this Channel idle but channel: channel red packet blocked remains idle until read behind blue proceeds Buffer full: blue cannot proceed Blocked by other packets • 6 flit buffers/input port CADSL 30 Oct 2013 CS-683@IITB 10
Virtual Channel Flow Control • Virtual channels used to combat HOL block in wormhole • Virtual channels: multiple flit queues per input port – Share same physical link (channel) • Link utilization improved – Flits on different VC can pass blocked packet CADSL 30 Oct 2013 CS-683@IITB 11
Virtual Channel Example Buffer full: blue cannot proceed Blocked by other packets • 6 flit buffers/input port • 3 flit buffers/VC CADSL 30 Oct 2013 CS-683@IITB 12
Deadlock • Using flow control to guarantee deadlock freedom give more flexible routing • Escape Virtual Channels – If routing algorithm is not deadlock free – VCs can break resource cycle – Place restriction on VC allocation or require one VC to be DOR • Assign different message classes to different VCs to prevent protocol level deadlock CADSL – Prevent req-ack message cycles 30 Oct 2013 CS-683@IITB 13
Buffer Backpressure • Need mechanism to prevent buffer overflow – Avoid dropping packets – Upstream nodes need to know buffer availability at downstream routers • Significant impact on throughput achieved by flow control • Credits • On-off CADSL 30 Oct 2013 CS-683@IITB 14
Credit-Based Flow Control • Upstream router stores credit counts for each downstream VC • Upstream router – When flit forwarded ● Decrement credit count – Count == 0, buffer full, stop sending • Downstream router – When flit forwarded and buffer freed ● Send credit to upstream router ● Upstream increments credit count CADSL 30 Oct 2013 CS-683@IITB 15
Credit Timeline Node 1 Node 2 t1 Flit departs Credit router t2 Process Credit round t3 trip delay Credit F l i t t4 Process t5 • Round-trip credit delay: – Time between when buffer empties and when next flit can be processed from that buffer entry – If only single entry buffer, would result in significant throughput degradation CADSL – Important to size buffers to tolerate credit turn- 30 Oct 2013 CS-683@IITB 16 around
On-Off Flow Control • Credit: requires upstream signaling for every flit • On-off: decreases upstream signaling • Off signal – Sent when number of free buffers falls below threshold Foff • On signal – Send when number of free buffers rises above threshold Fon CADSL 30 Oct 2013 CS-683@IITB 17
On-Off Timeline Foffthreshold Node 1 Node 2 reached t1 Flit Flit Foffset to prevent t2 Flit Off flits arriving t3 Flit before t4 from Proces t4 Flit overflowing s Flit Flit Flit Fonthreshold t5 Flit reached Fonset so that On Flit t6 Node 2 does Flit Proces not run out of t7 Flit s Flit flits between t5 Flit and t8 t8 Flit • Less signaling but more buffering – On-chip buffers more expensive than wires CADSL 30 Oct 2013 CS-683@IITB 18
Flow Control Summary • On-chip networks require techniques with lower buffering requirements – Wormhole or Virtual Channel flow control • Dropping packets unacceptable in on-chip environment – Requires buffer backpressure mechanism • Complexity of flow control impacts router microarchitecture (next) CADSL 30 Oct 2013 CS-683@IITB 19
Router Microarchitecture Overview • Consist of buffers, switches, functional units, and control logic to implement routing algorithm and flow control • Focus on microarchitecture of Virtual Channel router • Router is pipelined to reduce cycle time CADSL 30 Oct 2013 CS-683@IITB 20
Virtual Channel Router Virtual Channel Routing Computation Allocator Switch Allocator VC 0 VC 0 VC 0 MVC 0 VC x VC 0 Input Ports VC 0 MVC 0 VC x CADSL 30 Oct 2013 CS-683@IITB 21
Baseline Router Pipeline BW RC VA SA ST LT • Canonical 5-stage (+link) pipeline – BW: Buffer Write – RC: Routing computation – VA: Virtual Channel Allocation – SA: Switch Allocation – ST: Switch Traversal – LT: Link Traversal CADSL 30 Oct 2013 CS-683@IITB 22
Baseline Router Pipeline 1 2 3 4 5 6 7 8 9 Head BW RC VA SA ST LT Body 1 BW SA ST LT BW SA ST LT Body 2 BW SA ST LT Tail • Routing computation performed once per packet • Virtual channel allocated once per packet • body and tail flits inherit this info from head flit CADSL 30 Oct 2013 CS-683@IITB 23
Router Pipeline Optimizations • Baseline (no load) delay ( ) = 5 + × + cycles link delay hops t serializat ion • Ideally, only pay link delay • Techniques to reduce pipeline stages – Lookahead routing: At current router perform routing computation for next router ● Overlap with BW BW VA SA ST LT NRC CADSL 30 Oct 2013 CS-683@IITB 24
Router Pipeline Optimizations • Speculation – Assume that Virtual Channel Allocation stage will be successful ● Valid under low to moderate loads – Entire VA and SA in parallel BW VA ST LT NRC SA – If VA unsuccessful (no virtual channel returned) CADSL ● Must repeat VA/SA in next cycle 30 Oct 2013 CS-683@IITB 25
Router Pipeline Optimizations • Bypassing: when no flits in input buffer – Speculatively enter ST – On port conflict, speculation aborted VA NRC ST LT Setup – In the first stage, a free VC is allocated, next routing is performed and the crossbar is setup CADSL 30 Oct 2013 CS-683@IITB 26
Buffer Organization Physical Virtual channel channel s s • Single buffer per input • Multiple fixed length queues per physical CADSL channel 30 Oct 2013 CS-683@IITB 27
Arbiters and Allocators • Allocator matches N requests to M resources • Arbiter matches N requests to 1 resource • Resources are VCs (for virtual channel routers) and crossbar switch ports. • Virtual-channel allocator (VA) – Resolves contention for output virtual channels – Grants them to input virtual channels • Switch allocator (SA) that grants crossbar CADSL 30 Oct 2013 CS-683@IITB 28 switch ports to input virtual channels
Round Robin Arbiter • Last request serviced given lowest priority • Generate the next priority vector from current grant vector • Exhibits fairness CADSL 30 Oct 2013 CS-683@IITB 29
Recommend
More recommend