OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel - - PowerPoint PPT Presentation
OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel - - PowerPoint PPT Presentation
OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel Hyoukjun Kwon and Tushar Krishna Georgia Institute of Technology Synergy Lab (http://synergy.ece.gatech.edu) hyoukjun@gatech.edu April 25, 2017 Hardware Development Cost
Hardware Development Cost
2
source: Todd Austin, Micro-49 keynote
- Low cost challenge
Many-IP Heterogeneous System
3
IP IP IP IP IP
…
IP IP IP IP IP
…
- Scalability challenge
CPU1 GPU Accelerator Memory Sensor Wireless Network Sensor2 CPU2
Network-on-Chip (NoC)
- Flexibility challenge
Diverse System Requirements
4
source: MNIST, Engadget, TheStack
Throughput Critical Latency Critical
Challenges for NoCs
5
- Low-cost
- Low design/verification costs of custom/generic NoCs
- Design Automation of high-performance, low-energy NoCs
- Scalability
- Many-IP heterogeneous system support
- Low latency
- Low energy
- Low area
- Flexibility
- Diverse connectivity
- Diverse latency/throughput requirements
OpenSMART
6
SMART NoC
Krishna et al, HPCA 2013 Chen et al, DATE 2013 Krishna et al, IEEE Micro Top Picks 2014 User-configurable Automatic NoC Generation High-level HW Lanugage Verified on FPGA
Low Cost Flexibility Scalability
Arbitrary Topology Support Area/power-efficient RTL Building Blocks
OpenSMART
Outline
7
- Motivation: Scalable, Flexible, and Low-cost NoCs
- Background: SMART NoCs
- OpenSMART
- Design Flow
- Building Blocks
- Walk-through Examples
- Case Studies
- Mesh vs. SMART
- High-radix vs. Low-radix
- Conclusions
Outline
8
- Motivation: Scalable, Flexible, and Low-cost NoCs
- Background: SMART NoCs
- OpenSMART
- Design Flow
- Building Blocks
- Walk-through Examples
- Case Studies
- Mesh vs. SMART
- High-radix vs. Low-radix
- Conclusions
9
SMART NoC
1-cycle (no other traffic) Krishna et al, HPCA 2013 Chen et al, DATE 2013 Krishna et al, IEEE Micro Top Picks 2014,
S D
HPCmax SSR (SMART Setup Request)SSR (SMART Setup Request) SSR (SMART Setup Request)
- Single-cycle Multi-hop Asynchronous Repeated
Traversal
SMART: achieve the performance of dedicated connections over a network of shared links
Is 1-cycle Network Possible?
10
~20mm ~20mm ~20mm ~20mm
Is wire fast enough to support 1-cycle network?
- Wire traversal length within 1ns (1Ghz): 10-16mm
- Wire delay over technology: constant
- Chip dimension: remain similar (~20mm)
- Clock frequency: remain similar (1~3GHz)
- Tile dimension: decrease over technology
On-chip wires are fast enough to transmit across the chip within 1-2 cycles at 1GHz even if technology scales Yes
Features of SMART
11
- Low latency network
- Dynamic bypass of intermediate routers between any
two routers
- Limit: HPCmax (hops per cycle max), maximum number of
“hops” that the underlying wire allows the flit to traverse within a clock cycle
- Separate control path
- HPCmax bits from every router along each direction
- Arbitration of multiple bypass requests on the same link
- No ACK required
Outline
12
- Motivation: Scalable, Flexible, and Low-cost NoCs
- Background: SMART NoCs
- OpenSMART
- Design Flow
- Building Blocks
- Walk-through Examples
- Case Studies
- Mesh vs. SMART
- High-radix vs. Low-radix
- Conclusions
OpenSMART Design Flow
13
User Specification OpenSMART External Tool Chains Topology
- Bandwidth
- VC
- Routing
…
Configuration
OpenSMART Front-end
Building Block Library (RTL)
Input Unit Output Unit SMART Unit Switch Unit
… BSV/Chisel Compiler
Verilog Files ASIC/FPGA Synthesis Tool
HPCmax Analyzer
14
…
Router
IP NIC
Router Router Router
NIC NIC NIC
Router Router Router Router
NIC NIC NIC NIC IP IP IP IP IP IP IP
NoC generated by OpenSMART
Interface AMBA Wishbone Custom
OpenSMART NoC
…
Router
IP NIC
Router Router Router
NIC NIC NIC
Router Router Router Router
NIC NIC NIC NIC IP IP IP IP IP IP IP
NoC generated by OpenSMART
Outline
15
- Motivation: Scalable, Flexible, and Low-cost NoCs
- Background: SMART NoCs
- OpenSMART
- Design Flow
- Building Blocks
- Walk-through Examples
- Case Studies
- Mesh vs. SMART
- High-radix vs. Low-radix
- Conclusions
16
Input Unit
Buffer Arbiter
input Buffer + input VC arbitration
Output Unit
VC Selector Arbiter
- utput VC selection + output port arbitration
+ credit management
Switch Unit
>>
Crossbar Routing Calculator
switching (via crossbar) + routing calculation SSR communication & arbitration + bypass flag
Bypass Flag
SSR
SSR Controller
SMART Unit
OpenSMART Building Blocks
OpenSMART Router (Baseline)
OpenSMART Router
17
Output Units Input Units
Incoming Flits Outgoing Flits
>>
Switch Unit
Input Unit
Flit Data Flit Header Flit Data Flit Flit Header
… …
Arbiter Input Buffers
Number of VCs/VC Depth Flit Size Arbiter
OpenSMART Router (Baseline)
OpenSMART Router
18
Output Units Input Units
Incoming Flits Outgoing Flits
>>
Switch Unit
Output Unit
Arbiter
Credit
VC Selector
nextVC VC queue nextVC Output Port Request Output Port Grant
Credit Manager
hasCredit VC Arbiter
OpenSMART Router (Baseline)
OpenSMART Router
19
Output Units Input Units
Incoming Flits Outgoing Flits
>>
Switch Unit
Switching Unit
Crossbar Outgoing Flits From Input Units Routing Unit
>> >> >> >>
Routing Algorithm
OpenSMART Router (SMART)
Switch Unit Input Units Output Units
>>
Incoming Flits Outgoing Flits SMART Unit
>>
Priority
Incoming SSRs Outgoing SSRs
OpenSMART Router (SMART)
20
SMART Unit
SMART Arbiter
Bypass Flag Priority
SSR Controller
Incoming SSRs Outgoing SSRs SSR From Local Router Bypass MUX Selection
SSR Prioritization HPCmax
Prioritization by distance
- > SSR from a nearer router gets the higher priority
(Local (distance = 0) has the highest prirority)
21
OpenSMART Router (1cycle)
OpenSMART Router (Baseline)
Output Units Input Units
Incoming Flits Outgoing Flits
>>
Switch Unit
Cycle 0 Cycle 1
OpenSMART Router (SMART)
Switch Unit Input Units Output Units
>>
Incoming Flits Outgoing Flits SMART Unit
>>
Priority
Incoming SSRs Outgoing SSRs
22
OpenSMART Router (2cycle/SMART)
Cycle 0 Cycle 1 Cycle 3
Outline
23
- Motivation: Scalable, Flexible, and Low-cost NoCs
- Background: SMART NoCs
- OpenSMART
- Design Flow
- Building Blocks
- Walk-through Examples
- Case Studies
- Mesh vs. SMART
- High-radix vs. Low-radix
- Conclusions
24
Walk-through Example 1
- Router r4 sends a flit to router r7
- HPCmax = 3
r4 r5 r6 r7
SSR (SMART Setup Request)
Cycle 0: SSR Send Cycle 1: Multi-hop Bypass
110 110 110
bypass, bypass, stop
25
Walk-through Example 2
- Router r4 sends a flit to router r7
- Router r5 sends a flit to router r7
- HPCmax = 3
r4 r5 r6 r7
SSR (SMART Setup Request)
Cycle 0: SSR Send Cycle 1: Multi-hop Bypass
110 110 110 100 100
SMART Arbiter
Priority Incoming SSRs SSR From Local Router Bypass Flag Dist = 3 Dist = 2 Dist = 1 Dist = 0
SMART Arbiter
Priority Incoming SSRs SSR From Local Router Bypass Flag Dist = 3 Dist = 2 Dist = 1 Dist = 0
From: r5 From: r4
SMART Unit in r5 Winner
26
OpenSMART: Features
- Language
- BSV and Chisel
- Flow control
- VC and SMART
- Buffer Management
- Credit-based buffer management
- Router Microarchitecture
- 1- and 2-cycle state-of-the-art packet switching router
- SMART router
27
OpenSMART: Features
- Routing Calculation
- XY, YX, and source-routing
- One-hot encoding hop count + shift-based routing
calculation
- For SMART, routing calculation is done during bypasses
- VC Selection
- FIFO-based dynamic VC selection
- Next VC is stored in a separate register
- For SMART, VC selection is done during bypasses
Outline
28
- Motivation: Scalable, Flexible, and Low-cost NoCs
- Background: SMART NoCs
- OpenSMART
- Design Flow
- Building Blocks
- Walk-through Examples
- Case Studies
- Mesh vs. SMART
- High-radix vs. Low-radix
- Conclusions
29
Latency
(a) Uniform Random (b) Bit-complement
4X 5X
30
Repeaters require less energy than clocked latches
Energy Consumption
31
HPCmax
(a) HPCmax on ASIC (b) HPCmax on FPGA
Outline
32
- Motivation: Scalable, Flexible, and Low-cost NoCs
- Background: SMART NoCs
- OpenSMART
- Design Flow
- Building Blocks
- Walk-through Examples
- Case Studies
- Mesh vs. SMART
- High-radix vs. Low-radix
- Conclusions
33
Router Area
(b) FPGA LUTs (a) ASIC area (c) FPGA FFs
Number of Ports
34
Router Power
(a) ASIC (b) FPGA Number of Ports
35
Maximum Clock Frequency
Outline
36
- Motivation: Scalable, Flexible, and Low-cost NoCs
- Background: SMART NoCs
- OpenSMART
- Design Flow
- Building Blocks
- Walk-through Examples
- Case Studies
- Mesh vs. SMART
- High-radix vs. Low-radix
- Conclusions
- NoCs are crucial components to support many-
IP heterogeneous systems
– Providing connectivity while satisfying their diverse requrements.
- OpenSMART provides automatic generation of
NoCs for many-IP heterogeneous systems
– Supports recent low latency SMART NoC as well as highly-optimized 1-cycle routers – Written in high-level HDLs
37
Conclusion
- OpenSMART contributes the open-source
hardware ecosystem!
- Source code will be available in May 2017
- Please sign up via our webpage to request the
source code http://synergy.ece.gatech.edu/tools/opensmart/
38