Router Microarchitecture and Scalability of Ring Topology in On-Chip Networks John Kim, Hanjoon Kim Department of Computer Science KAIST Ring Router Microarchitecture NoCArc’09 1
Topology • Topology efficiently exploits the available packaging technology to meet the requirements at a minimum cost saturation throughput zero-load latency Ring Router Microarchitecture NoCArc’09 2
On-chip networks are different [Scott et al. ISCA06] [src: Intel Developers Forum] On-Chip Networks Off-Chip Networks Ring Router Microarchitecture NoCArc’09 3
Topologies for On-Chip Networks • Crossbar is often sufficient – if it can be done efficiently • 2D mesh topology commonly assumed • Many different topologies recently proposed – CMESH [ICS’06] – Flattened butterfly [Micro’07] – Express Cubes [HPCA’09] – Hierarchical Network [HPCA’09] – … • Recent multicore architectures have used the ring topology – Cell processor, Intel processors, … Ring Router Microarchitecture NoCArc’09 4
Why Ring Topology? • Routing – route with clockwise or counterclockwise – route until destination reached • Low-radix router – each “router” only requires 3 ports (local port, left & right port) • Flow control – Arbitration can be simplified – 3 ports but only two maximum requests • Can be implemented without “routers” – Bufferless router – Simple topology Ring Router Microarchitecture NoCArc’09 5
Today’s Talk • Background in On-Chip Networks and Topology • Router Microarchitecture for Ring Topology • Scalability of Ring Topology • Summary Ring Router Microarchitecture NoCArc’09 6
Bufferless router in ring topology • Simplified arbitration – Priority to packets already in flight – Guaranteed (deterministic) latency to destination • No buffers needed – No misrouting [Bufferless router ISCA’09] – No packet dropping [SCARAB Micro’09] • Only two-input muxes • No routing deadlock Ring Router Microarchitecture NoCArc’09 7
Conventional Router Microarchitecture Ring Router Microarchitecture NoCArc’09 8
Bufferless Ring Topology Router Microarchitecture Ring Router Microarchitecture NoCArc’09 9
No buffers needed Ring Router Microarchitecture NoCArc’09 10
Bufferless router in ring topology • Simplified arbitration – Priority to packets already in flight – Guaranteed (deterministic) latency to destination • No buffers needed – No misrouting [Bufferless router ISCA’09] – No packet dropping [SCARAB Micro’09] • Only two-input muxes • No routing deadlock • However… – Requires reserving the path to destination – Can reduce performance/throughput Ring Router Microarchitecture NoCArc’09 11
Lightweight Router Microarchitecture • Add a buffer entry (2 buffer entry per input port) • Credit-based flow control for backpressure • Maintain same prioritized arbitration for packets in flight • Arbitration needed when ejecting packets lightweight bufferless Ring Router Microarchitecture NoCArc’09 12
Lightweight Router Microarchitecture • No predetermined routing – Bufferless : only in the appropriate slot was packet injected into the network – Lightweight : the packet can be injected at any time • Deadlock – Packets in the bufferless router were guaranteed to make progress – Routing deadlock still avoided without additional virtual channels ( see paper for detail ) Ring Router Microarchitecture NoCArc’09 13
Evaluation • Cycle accurate simulator used to compared ring router microarchitecture • Simulator parameters include – N = 16 – single-flit packet (1 flit = 512 bits) – synthetic traffic patterns • Orion2.0 used to model area / power (results in paper) • Following microarchitectures compared: – baseline (3 cycle) – bufferless (1 cycle) – lightweight (1 cycle) Ring Router Microarchitecture NoCArc’09 14
Performance Comparison 30 30 25 25 Latency (cycles) Latency (cycles) 20 20 bufferless bufferless 15 15 lightweight lightweight 10 baseline (b=2) baseline (b=2) 10 baseline (b=8) baseline (b=8) 5 5 0 0 0 0.2 0.4 0.6 0.8 0 0.2 0.4 0.6 0.8 Offered load (fraction of capacity) Offered load (fraction of capacity) uniform random bit complement Ring Router Microarchitecture NoCArc’09 15
Impact of Prioritized Arbitration 30 25 Latency (cycles) 20 baseline (b=1) 15 baseline (b=2) 10 lightweight 5 0 0 0.2 0.4 0.6 0.8 Offered load (fraction of capacity) Ring Router Microarchitecture NoCArc’09 16
Today’s Talk • Background in On-Chip Networks and Topology • Router Microarchitecture for Ring Topology • Scalability of Ring Topology • Summary Ring Router Microarchitecture NoCArc’09 17
How Scalable is the Ring Topology? • Assumption : same bisection bandwidth comparing ring and 2D mesh The bandwidth PER channel for ring is higher than 2D mesh Trade-off of hop count vs serialization latency Per-hop latency can be higher with 2D mesh Ring Router Microarchitecture NoCArc’09 18
Synthetic Workload 2.5 Normalized runtime 2 1.5 ring 1 mesh 0.5 0 network size (N) 16 36 64 16 36 64 16 36 64 16 36 64 max oustanding req (r) 2 4 8 16 Ring Router Microarchitecture NoCArc’09 19
Bandwidth Fragmentation • 2D mesh : – short packets (req) = 1 flit – long packets (reply) = 4 flits • ring : – short packets (req) = 1 flit – long packets (reply) = 1 flit Wide channels results in high bandwidth for ring However, for short packets, ring only utilizes ¼ of the channel bandwidth Ring topology inefficient for short packets Ring Router Microarchitecture NoCArc’09 20
Bandwidth Fragmentation 2.5 2.5 Normalized runtime Normalized runtime 2 2 1.5 1.5 ring 1 1 mesh 0.5 0.5 0 0 16 36 64 16 36 64 16 36 64 16 36 64 16 36 64 16 36 64 16 36 64 16 36 64 2 4 8 16 2 4 8 16 single flits pkts bimodal pkts Ring Router Microarchitecture NoCArc’09 21
Limitations of this study • “Packaging” of on-chip network topology = 2D layout of the topology • Layout of topology can impact the performance – 2D mesh : only require communicating with neighbors – Ring : long links can be needed as network scale • Hierarchical rings not investigated. • Router complexity (for mesh) not properly modeled. Ring Router Microarchitecture NoCArc’09 22
Summary • On-chip networks presents different constraints compared to off- chip networks – can exploit different router microarchitecture. • Ring topology presents a simple topology and bufferless router microarchitecture can be implemented. • Lightweight router microarchitecture proposed to increase performance with minimal additional complexity. • Ring topology can scale but because of bandwidth fragmentation, can be limited in scalability – especially high traffic. • Can we scale this router microarchitecture to 2D mesh topology? Ring Router Microarchitecture NoCArc’09 23
Low-Cost Router Microarchitecture (Micro’09) Ring Router Microarchitecture NoCArc’09 24
Thank you Questions? Ring Router Microarchitecture NoCArc’09 25
Recommend
More recommend