jade heterogeneous multiprocessor
play

JADE Heterogeneous Multiprocessor Design & Simulation - PowerPoint PPT Presentation

JADE Heterogeneous Multiprocessor Design & Simulation Environment Jiang Xu Acknowledgement Intel Labs Bin Li, Ravi Iyer, Ramesh Illikkal HP Labs Qiong Cai Current PhD students Rafael Kioji Vivas Maeda,


  1. JADE Heterogeneous Multiprocessor Design & Simulation Environment Jiang Xu

  2. Acknowledgement  Intel Labs  Bin Li, Ravi Iyer, Ramesh Illikkal  HP Labs  Qiong Cai  Current PhD students  Rafael Kioji Vivas Maeda, Peng Yang, Zhe Wang, Haoran Li, Zhehui Wang, Zhongyuan Tian, Zhifei Wang, Duong Huu Kinh Luan, Xuanqi Chen  Past members  Xiaowen Wu, Weichen Liu, Xuan Wang, Yaoyao Ye 2016-06-09 Jiang Xu (HKUST) 2

  3. PERFECT Computing Systems  Design targets Performance Energy efficiency Reliability Functionality Extensibility Cost Testability  More cores and memory on a chip and in a system  Heterogeneous 2016-06-09 Jiang Xu (HKUST) 3

  4. Huge Design Space to Explore  Application  Interconnect IoT /IoE, mobile, data center, HPC, mainframe … Ad-hoc, bus, NoC , hybrid …   Wireless communication, multimedia processing, machine Regular vs. irregular topology   learning, database … Protocol: routing, flow control, congestion control …  Switch/router architecture  Processor  Electrical, optical, RF …  CPU, GPU, FPGA, DSP, ASIP, ASIC …  Homogenous vs. heterogeneous multiprocessor  Support  FinFET, FD- SOI, GAA, CNT FET …  Power delivery and management  Clock distribution and management  Memory and storage  Thermal, aging, noise …  Hierarchy  Cache coherence  Peripherals  DRAM, SRAM, flash, STT- RAM …  Network interface, user interface, management …  Mesh MPEG RISC RISC CPU SRAM Core Core Core Core FPGA MPEG arbiter 1 ring processor bus CPU DSP DSP Core Core Core Core memory SRAM controller memory USB bridge controller bridge USB CPU Core Core Core Core arbiter 2 arbiter peripheral bus CPU GPU bus Core Core Core Core CPU LCD power GPIO LCD power GPIO controller manager controller manager 2016-06-09 Jiang Xu (HKUST) 4

  5. Simulation-based Architecture Exploration  Benchmark applications with sample Benchmark Applications input data sets Programs Sample inputs  System software  Cycle- accurate “full - system” architecture Compilation simulator software System Instructions  Speed-up techniques  Simplify interconnect, memory, processor, OS, etc. Operating system  Sampling application executions Device drivers  Sampling inputs Architecture simulator  Break causality to better parallelize simulations Architecture under evaluation  Hybrid the above techniques 2016-06-09 Jiang Xu (HKUST) 5

  6. The Good, the Bad and the Ugly  Good for detailed/late-stage design Benchmark Applications  Tweaking, testing, debugging … Programs Sample inputs  Bad for early design space exploration Compilation  Too slow to provide essential system statistics such as average and worst-case performance, software System energy efficiency, cost … Instructions  Ugly for heterogeneous systems Operating system  Compilation for heterogeneous ISAs, hardware Device drivers accelerator, FPGA … Architecture simulator Architecture under evaluation  OS support of new large-scale heterogeneous systems without drivers 2016-06-09 Jiang Xu (HKUST) 6

  7. Joint Application/Architecture Design Exploration  Application models for heterogeneous Applications Sample Algorithms Programs multiprocessor system explorations inputs  COSMIC Algorithm Application analysis partition COSMIC  Heterogeneous multiprocessor system Computation, communication, and memory analysis and profiling design and simulation platform Application TCG models Statistical application Recorded application  JADE models models Mapping, routing, scheduling Mapping, routing, scheduling Traffic routing plan Memory space mapping algorithms JADE Task mapping & scheduling Architecture under evaluation 2016-06-09 Jiang Xu (HKUST) 7

  8. JADE Heterogeneous Multiprocessor Simulation Environment  JADE (Joint Application/Architecture Hardware Architecture Network Architecture Design Exploration) Processor Memory Coherence Architecture Hierarchy Protocol Optical Electrical  Heterogeneous system designs  Early design space exploration COSMIC Architecture Template and Energy Library Benchmark Optical and Electrical Memory and Cache  Systematic system evaluation Processor Library Recorded Network Library Coherence Library Application Model Memory  Highlights JADE Statistical Application Model  Statistical, recorded and synthetic application Network Synthetic Application models Model Processor Peripherals  Network-on-chip and off-chip networks Mapping, Routing, Scheduling MRS Task Mapping and Communication Memory Space  Optical and electrical interconnects Algorithms Scheduling Traffic Routing Plan Mapping  Memory subsystem  Built-in power analysis Memory System Performance Energy Access Trace Behavior Analysis Analysis Output 2016-06-09 Jiang Xu (HKUST) 8

  9. COSMIC Heterogeneous Multiprocessor Benchmark Applica cation tion Descript iption ion Machine Learning - FMP Financial market prediction using machine learning Machine Learning - ALIP Machine learning based image indexing Molecular Dynamics Simulating molecular dynamics when molecules hit surfaces of solid atoms Ray Tracing 3D scenes rendering Ultrasound Medical diagnostics using 2D/3D ultrasound imaging Fast Fourier Transform Fast Fourier Transform with complex number inputs LDPC Encoder Low-density parity-check code encoder TURBO Decoder Turbo code decoder Reed-Solomon Reed-Solomon code encoder and decoder  Collaborating with application experts  More applications are under development 2016-06-09 Jiang Xu (HKUST) 9

  10. Exploration Cases  I 2 CON inter/intra-chip optical network ONI ONI ONI ONI  SUOR optical NoC ONI ONI ONI ONI  Electrical mesh-based NoC Controller Memory controller Memory controller ONI ONI ONI ONI  Memory hierarchy  Private L1 caches ONI ONI ONI ONI  Shared L2 cache – 16 banks Cluster of Electrical Optical Network Core Waveguide Cores  16 memory controllers Wire Interface (ONI) Memory controller  Processor core  ARM-v7a Memory controller  7nm, 1GHz, 0.6V Memory controller Memory controller 2016-06-09 Jiang Xu (HKUST) 10

  11. Performance and Scalability 2016-06-09 Jiang Xu (HKUST) 11

  12. Energy Efficiency and Scalability 2016-06-09 Jiang Xu (HKUST) 12

Recommend


More recommend