experience ces using the risc v e v ecosystem to design
play

Experience ces Using the RISC-V E V Ecosystem to Design an Acce - PowerPoint PPT Presentation

Experience ces Using the RISC-V E V Ecosystem to Design an Acce ccelerator-Centric c SoC in TSMC 16nm Tutu Ajayi 2 , Khalid Al-Hawaj 1 , Aporva Amarnath 2 , Steve Dai 1 , Scott Davidson 4 , Paul Gao 4 , Gai Liu 1 , Anuj Rao 4 , Austin Rovinski


  1. Experience ces Using the RISC-V E V Ecosystem to Design an Acce ccelerator-Centric c SoC in TSMC 16nm Tutu Ajayi 2 , Khalid Al-Hawaj 1 , Aporva Amarnath 2 , Steve Dai 1 , Scott Davidson 4 , Paul Gao 4 , Gai Liu 1 , Anuj Rao 4 , Austin Rovinski 2 , Ningxiao Sun 4 , Christopher Torng 1 , Luis Vega 4 , Bandhav Veluri 4 , Shaolin Xie 4 , Chun Zhao 4 Ritchie Zhao 1 , Christopher Batten 1 , Ronald G. Dreslinski 2 , Rajesh K. Gupta 3 , Michael B. Taylor 4 , Zhiru Zhang 1 1 Cornell University 2 University of Michigan 3 University of California, San Diego 4 Bespoke Silicon Group, (U. Washington/ UC San Diego) MICRO-50 October 14, 2017

  2. Computer Architecture Research Prototyping Prototyping is important to complement the results of simulation-based research Many benefits to prototyping : • Validating assumptions • Validating design methodologies • Measuring real system-level performance and energy efficiency • Creating platforms for software research • Building credibility with industry • Building intuition for physical design • Pedagogical benefits • Building real things is fun! Celerity :: Introduction

  3. The Continuing Need for Building Prototypes The Four Horsemen of the Coming The rise of the dark silicon era [1] , in which an Dark Silicon Apocalypse increasing fraction of silicon must remain unpowered, is motivating an increasing trend towards accelerator-centric architectures. Specialization research requires: “Dim” “Shrink” • New simulation-based evaluation methodologies based on accelerators [2] • New prototyping methodologies for rapidly building accelerator-centric prototypes Unfortunately, building research prototypes can “Magic” be tremendously challenging. “Specialize” [1] M. Taylor. “ Is Dark Silicon Useful? Harnessing the Four Horsemen of the Coming Dark Silicon Apocalypse ,” In Design Automation Conference, 2012. [2] Y. Shao, et al. “ Aladdin: A Pre-RTL, Power-Performance Accelerator Simulator Enabling Large Design Space Exploration of Customized Architectures ”, ISCA 2014 Celerity :: Introduction

  4. Prototyping with the RISC-V Software/Hardware Ecosystem Software Toolchain Application • A complete, off-the-shelf software stack (e.g., binutils, GCC, newlib/glibc, Algorithm Linux kernel & distros) for both embedded and general-purpose Programming Language Architecture Operating System • RISC-V ISA specification designed to be both modular and extensible, with Compilers a small base ISA and optional extensions Instruction Set Architecture Microarchitecture Microarchitecture Register-Transfer Level • On-chip network specifications and implementations (NASTI, TileLink) Gate-Level • RISC-V processor implementations for both in-order (Berkeley Rocket) and Circuits out-of-order (Berkeley BOOM) cores Devices Physical Design Technology • Previous spins of chips for reference Testing • Standard core verification test suites + Turn-key FPGA gateware Celerity :: Introduction

  5. The Celerity System-on-Chip BaseJumpFSB and Motherboard NASTI RISC-V Rocket Core RoCC Celerity , an accelerator-centric SoC D-Cache I-Cache with a tiered accelerator fabric NASTI RISC-V Rocket Core RoCC that targets highly performant and energy- D-Cache I-Cache efficient embedded systems RISC-V NASTI RISC-V Rocket Core NoC Router RoCC Vanilla-5 Core D-Cache I-Cache Funded by the DARPA CRAFT program, I Mem XBAR NASTI RISC-V Rocket Core RoCC “Circuit Realization At Faster Timescales” D Mem D-Cache I-Cache NASTI RISC-V Rocket Core RoCC The goal was to develop new methodologies to D-Cache I-Cache design chips more quickly General-Purpose Massively Parallel Specialization Tier Tier Tier We leveraged the RISC-V software/hardware ecosystem as we built Celerity, and we believe it was instrumental in enabling a team of 20 graduate students to tape out a complex SoC in only 9 months Celerity :: Introduction

  6. Celerity: Chip Overview http://www.opencelerity.org • TSMC 16nm FFC • 25 mm 2 die area (5mm x 5mm) • ~385 million transistors • 511 RISC-V cores • 5 Linux-capable RV64G Berkeley Rocket cores • 496-core RV32IM mesh tiled array “manycore” • 10-core RV32IM mesh tiled array (low voltage) • Binarized Neural Network Specialized Accelerator • On-chip synthesizable PLLs and DC/DC LDO • Developed in-house • 3 Clock domains • 400 MHz – DDR I/O • 625 MHz – Rocket core + Specialized accelerator • 1.05 GHz – Manycore array • 672-pin flip chip BGA package • 9-months from PDK access to tape-out Celerity :: Introduction

  7. Agenda BaseJumpFSB and Motherboard NASTI RISC-V Rocket Core RoCC • Introduction D-Cache I-Cache NASTI RISC-V Rocket Core RoCC • For each Tier: D-Cache I-Cache • What did we build? RISC-V NASTI RISC-V Rocket Core NoC Router RoCC Vanilla-5 Core • How did we build it? D-Cache I-Cache I Mem XBAR NASTI RISC-V Rocket Core • RISC-V Ecosystem Successes RoCC D Mem D-Cache I-Cache • RISC-V Ecosystem Challenges NASTI RISC-V Rocket Core RoCC • Conclusion D-Cache I-Cache General-Purpose Massively Parallel Specialization Tier Tier Tier Celerity :: Introduction

  8. Celerity: General-Purpose Tier BaseJumpFSB and Motherboard NASTI RISC-V Rocket Core RoCC D-Cache I-Cache NASTI RISC-V Rocket Core RoCC D-Cache I-Cache RISC-V NASTI RISC-V Rocket Core NoC Router RoCC Vanilla-5 Core D-Cache I-Cache I Mem XBAR NASTI RISC-V Rocket Core RoCC D Mem D-Cache I-Cache NASTI RISC-V Rocket Core RoCC D-Cache I-Cache Massively Parallel Specialization General-Purpose Tier Tier Tier Celerity :: General-Purpose Tier :: What is it? • How did we build it? • Successes with RISC-V • Challenges with RISC-V

  9. General-Purpose Tier Overview • 5 Berkeley Rocket Cores (RV64G) NASTI RISC-V Rocket Core RoCC • Workload BNN D-Cache I-Cache • General-purpose compute BaseJumpMotherboard • Operating system (e.g. Linux & TCP/IP Stack) NASTI RISC-V Rocket Core RoCC BaseJumpFSB • Interrupt and Exception handling D-Cache I-Cache • Program dispatch and control flow NASTI • Interface RISC-V Rocket Core RoCC Manycore • Interface to off-chip I/O and other peripherals D-Cache I-Cache • 4 Cores connect to the manycore array NASTI RISC-V Rocket Core RoCC • 1 Core interfaces with the BNN D-Cache I-Cache • Memory • Each core executes independently within its NASTI RISC-V Rocket Core RoCC own address space D-Cache I-Cache • Memory management for all tiers Celerity :: General-Purpose Tier :: What is it? • How did we build it? • Successes with RISC-V • Challenges with RISC-V

  10. Berkeley Rocket Cores • 5 Berkeley Rocket Cores (https://github.com/freechipsproject/rocket-chip) • Generated from Chisel • RV64G ISA • 5-stage, in-order, scalar processor • Double-precision floating point • I-Cache: 16KB 4-way assoc. • D-Cache: 16KB 4-way assoc. • Physical Implementation • 625 MHz (Critical path in FSB) • 0.19 mm 2 per core http://www.lowrisc.org/docs/tagged-memory-v0.1/rocket-core/ Celerity :: General-Purpose Tier :: What is it? • How did we build it? • Successes with RISC-V • Challenges with RISC-V

  11. Design Iterations 1. Loopback 2. Alpaca Baseline design to validate FSB and Northbridge Implemented NASTI bridge and connected rocket core Motherboard Motherboard BaseJump BaseJump BaseJump BaseJump NASTI RISC-V Rocket Core FSB FSB Loopback FIFO D-Cache I-Cache 3. Bison 4. Coyote Implemented accelerator connected through Blackboxed RoCC Modularized RoCC interface to accelerator RISC-V Rocket Core NASTI Motherboard Motherboard RoCC BaseJump BaseJump BaseJump BaseJump Accelerator RISC-V Rocket Core D-Cache I-Cache NASTI FSB FSB D-Cache I-Cache … … RoCC Accelerator Celerity :: General-Purpose Tier :: What is it? • How did we build it? • Successes with RISC-V • Challenges with RISC-V

  12. Off-Chip Interface and Northbridge • Open-source BaseJump IP Library • http://bjump.org • Front Side bus L2 $ RISC-V Rocket Core NASTI RoCC • BaseJump Communication Link D-Cache I-Cache DRAM • High Speed (DDR) Source-Synchronous Controller FPGA Bridge RISC-V Rocket Core BaseJump NASTI RoCC Communication Interface FSB & FPGA Bridge D-Cache I-Cache Ethernet • Packaging BaseJump . . . RISC-V Rocket Core NASTI RoCC • Modified BaseJump BGA Package and I/O Ring D-Cache I-Cache • Validation SSD RISC-V Rocket Core NASTI RoCC D-Cache I-Cache • BaseJump Super Trouble PCB (Daughter Card) Clocks • BaseJump Motherboard (ZedBoard) RISC-V Rocket Core NASTI RoCC D-Cache I-Cache JTAG Celerity SoC BaseJump Motherboard Celerity :: General-Purpose Tier :: What is it? • How did we build it? • Successes with RISC-V • Challenges with RISC-V

  13. RISC-V Successes • Berkeley Rocket Cores • Very quickly generated validated designs • Vibrant ecosystem to provide feedback and support • Test and Validation infrastructure • Software and Toolchain support • Flexible memory system and peripheral I/O support • Easy integration with BaseJump IP Library • Balances extensibility and software support Celerity :: General-Purpose Tier :: What is it? • How did we build it? • Successes with RISC-V • Challenges with RISC-V

  14. RISC-V Lessons Learned • Component stability, compatibility and versioning • Chisel adoption • RTL simulationissues • Deciphering Chisel generated RTL • Register initialization and X-Pessimism Celerity :: General-Purpose Tier :: What is it? • How did we build it? • Successes with RISC-V • Challenges with RISC-V

Recommend


More recommend