OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel Hyoukjun Kwon and Tushar Krishna Georgia Institute of Technology Synergy Lab (http://synergy.ece.gatech.edu) hyoukjun@gatech.edu April 25, 2017
Hardware Development Cost source: Todd Austin, Micro-49 keynote • Low cost challenge 2
IP IP IP IP IP IP IP IP IP IP Many-IP Heterogeneous System Wireless … CPU1 CPU2 GPU Network Network-on-Chip (NoC) … Sensor Sensor2 Memory Accelerator • Scalability challenge • Flexibility challenge 3
Diverse System Requirements Throughput Critical Latency Critical source: MNIST, Engadget, TheStack 4
Challenges for NoCs • Low-cost - Low design/verification costs of custom/generic NoCs - Design Automation of high-performance, low-energy NoCs • Scalability - Many-IP heterogeneous system support - Low latency - Low energy - Low area • Flexibility - Diverse connectivity - Diverse latency/throughput requirements 5
OpenSMART Low Cost Flexibility Scalability User-configurable SMART NoC Automatic NoC Generation Krishna et al, HPCA 2013 Chen et al, DATE 2013 High-level Krishna et al, IEEE Micro Top Picks 2014 HW Lanugage Arbitrary Area/power-efficient Verified on FPGA Topology Support RTL Building Blocks OpenSMART 6
Outline • Motivation: Scalable, Flexible, and Low-cost NoCs • Background: SMART NoCs • OpenSMART - Design Flow - Building Blocks - Walk-through Examples • Case Studies - Mesh vs. SMART - High-radix vs. Low-radix • Conclusions 7
Outline • Motivation: Scalable, Flexible, and Low-cost NoCs • Background: SMART NoCs • OpenSMART - Design Flow - Building Blocks - Walk-through Examples • Case Studies - Mesh vs. SMART - High-radix vs. Low-radix • Conclusions 8
SMART NoC • Single-cycle Multi-hop Asynchronous Repeated Traversal SSR (SMART Setup Request) SSR (SMART Setup Request)SSR (SMART Setup Request) D S SMART: achieve the performance of dedicated HPCmax Krishna et al, HPCA 2013 Chen et al, DATE 2013 connections over a network of shared links Krishna et al, IEEE Micro Top Picks 2014, 1-cycle (no other traffic) 9
Is 1-cycle Network Possible? Yes Is wire fast enough to support 1-cycle network? • Wire traversal length within 1ns (1Ghz): 10-16mm • Wire delay over technology: constant • Chip dimension: remain similar (~20mm) On-chip wires are fast enough to transmit across the chip • Clock frequency: remain similar (1~3GHz) within 1-2 cycles at 1GHz even if technology scales • Tile dimension: decrease over technology ~20mm ~20mm ~20mm ~20mm 10
Features of SMART • Low latency network - Dynamic bypass of intermediate routers between any two routers - Limit: HPCmax (hops per cycle max), maximum number of “hops” that the underlying wire allows the flit to traverse within a clock cycle • Separate control path - HPCmax bits from every router along each direction - Arbitration of multiple bypass requests on the same link - No ACK required 11
Outline • Motivation: Scalable, Flexible, and Low-cost NoCs • Background: SMART NoCs • OpenSMART - Design Flow - Building Blocks - Walk-through Examples • Case Studies - Mesh vs. SMART - High-radix vs. Low-radix • Conclusions 12
OpenSMART Design Flow OpenSMART Front-end Topology Verilog BSV/Chisel Input Output Files Unit Unit Compiler - Bandwidth - VC SMART Switch - Routing Unit Unit … … Building Block Library (RTL) ASIC/FPGA Configuration Synthesis Tool HPCmax Analyzer User External Specification OpenSMART Tool Chains 13
IP IP IP IP IP IP IP IP IP IP IP IP IP IP IP IP OpenSMART NoC NoC generated by OpenSMART NoC generated by OpenSMART NIC NIC NIC NIC NIC NIC NIC NIC Router Router Router Router Router Router Router Router … … Router Router Router Router Router Router Interface Router Router AMBA Wishbone NIC NIC NIC NIC NIC NIC NIC NIC Custom 14
Outline • Motivation: Scalable, Flexible, and Low-cost NoCs • Background: SMART NoCs • OpenSMART - Design Flow - Building Blocks - Walk-through Examples • Case Studies - Mesh vs. SMART - High-radix vs. Low-radix • Conclusions 15
OpenSMART Building Blocks input Buffer + input VC arbitration Bu ff er Arbiter Input Unit output VC selection + output port arbitration + credit management VC Arbiter Selector Output Unit >> switching (via crossbar) + routing calculation Routing Crossbar Calculator Switch Unit SSR SSR communication & arbitration + bypass flag SSR Controller Bypass Flag SMART Unit 16
OpenSMART Router Input Unit Arbiter OpenSMART Router (Baseline) Arbiter Incoming Outgoing Flits Flits Output Units Input Units Flit Flit >> Header Header … Switch Unit Flit Flit Data … Flit Size Flit Data Input Bu ff ers Number of VCs/VC Depth 17
OpenSMART Router Output Unit Arbiter OpenSMART Router (Baseline) Incoming Output Port Outgoing Flits Request Flits Output Port Grant Output Units Arbiter Input Units VC Selector >> nextVC Switch Unit VC nextVC VC queue hasCredit Credit Manager Credit 18
OpenSMART Router OpenSMART Router (Baseline) Incoming Outgoing Flits Flits Switching Unit Outgoing From Output Units Flits Input Units >> Input Units >> >> >> >> Routing Algorithm Switch Unit Routing Crossbar Unit 19
OpenSMART Router (SMART) SMART Unit Incoming Outgoing OpenSMART Router (SMART) Incoming Outgoing SSRs SSRs >> SSRs SSRs SSR Controller Priority Incoming Flits SMART Unit SMART Arbiter HPC max Input Units Output Units Bypass Flag >> Outgoing Priority Flits Switch Unit Prioritization by distance -> SSR from a nearer router gets the higher priority SSR Prioritization (Local (distance = 0) has the highest prirority) SSR From Bypass MUX Local Router Selection 20
OpenSMART Router (1cycle) OpenSMART Router (Baseline) Incoming Outgoing Flits Flits Cycle 0 Cycle 1 Output Units Input Units >> Switch Unit 21
OpenSMART Router (2cycle/SMART) Cycle 1 OpenSMART Router (SMART) Incoming Outgoing >> SSRs SSRs Priority Incoming Flits Cycle 0 SMART Unit Input Units Output Units >> Outgoing Flits Switch Unit Cycle 3 22
Outline • Motivation: Scalable, Flexible, and Low-cost NoCs • Background: SMART NoCs • OpenSMART - Design Flow - Building Blocks - Walk-through Examples • Case Studies - Mesh vs. SMART - High-radix vs. Low-radix • Conclusions 23
r4 Walk-through Example 1 • Router r4 sends a flit to router r7 • HPCmax = 3 bypass, bypass, stop Cycle 1: Multi-hop Bypass Cycle 0: SSR Send 110 110 110 SSR (SMART Setup Request) r5 r6 r7 24
r4 Walk-through Example 2 • Router r4 sends a flit to router r7 Incoming Incoming • Router r5 sends a flit to router r7 SSRs SSRs SMART Arbiter SMART Arbiter • HPCmax = 3 Dist = 3 Bypass Flag Dist = 3 Bypass Flag Cycle 1: Multi-hop Bypass Cycle 0: SSR Send Dist = 2 Dist = 2 Priority Priority Dist = 1 Dist = 1 110 110 SSR (SMART Setup Request) 110 From: r4 100 100 Dist = 0 From: r5 Winner Dist = 0 SSR From SSR From Local Router Local Router r5 r6 r7 SMART Unit in r5 25
OpenSMART: Features • Language - BSV and Chisel • Flow control - VC and SMART • Buffer Management - Credit-based buffer management • Router Microarchitecture - 1- and 2-cycle state-of-the-art packet switching router - SMART router 26
OpenSMART: Features • Routing Calculation - XY , YX , and source-routing - One-hot encoding hop count + shift-based routing calculation - For SMART, routing calculation is done during bypasses • VC Selection - FIFO -based dynamic VC selection - Next VC is stored in a separate register - For SMART, VC selection is done during bypasses 27
Outline • Motivation: Scalable, Flexible, and Low-cost NoCs • Background: SMART NoCs • OpenSMART - Design Flow - Building Blocks - Walk-through Examples • Case Studies - Mesh vs. SMART - High-radix vs. Low-radix • Conclusions 28
Latency 5X 4X (b) Bit-complement (a) Uniform Random 29
Energy Consumption Repeaters require less energy than clocked latches 30
HPCmax (a) HPCmax on ASIC (b) HPCmax on FPGA 31
Outline • Motivation: Scalable, Flexible, and Low-cost NoCs • Background: SMART NoCs • OpenSMART - Design Flow - Building Blocks - Walk-through Examples • Case Studies - Mesh vs. SMART - High-radix vs. Low-radix • Conclusions 32
Router Area Number of Ports (a) ASIC area (b) FPGA LUTs (c) FPGA FFs 33
Router Power Number of Ports (a) ASIC (b) FPGA 34
Maximum Clock Frequency 35
Outline • Motivation: Scalable, Flexible, and Low-cost NoCs • Background: SMART NoCs • OpenSMART - Design Flow - Building Blocks - Walk-through Examples • Case Studies - Mesh vs. SMART - High-radix vs. Low-radix • Conclusions 36
Conclusion • NoCs are crucial components to support many- IP heterogeneous systems – Providing connectivity while satisfying their diverse requrements. • OpenSMART provides automatic generation of NoCs for many-IP heterogeneous systems – Supports recent low latency SMART NoC as well as highly-optimized 1-cycle routers – Written in high-level HDLs 37
Recommend
More recommend