design space exploration for
play

Design Space Exploration for Memory Subsystems of VLIW - PowerPoint PPT Presentation

HEINZ NIXDORF INSTITUTE University of Paderborn Schaltungstechnik Dr.-Ing. Mario Porrmann Design Space Exploration for Memory Subsystems of VLIW Architectures Thorsten Jungeblut 1 , Gregor Sievers, Mario Porrmann 1 , Ulrich Rckert 2 1 System


  1. HEINZ NIXDORF INSTITUTE University of Paderborn Schaltungstechnik Dr.-Ing. Mario Porrmann Design Space Exploration for Memory Subsystems of VLIW Architectures Thorsten Jungeblut 1 , Gregor Sievers, Mario Porrmann 1 , Ulrich Rückert 2 1 System and Circuit Technology, University of Paderborn 2 Cognitive Interaction Technology – Center of Excellence, Bielefeld University

  2. HEINZ NIXDORF INSTITUT Motivation(1) Universität Paderborn Schaltungstechnik Prof. Dr.-Ing. Ulrich Rückert - Increasing complexity of mobile applications - More functionality - New algorithms (LTE; LTE Advanced) Multimedia applications (Video, 3- D, …) - Nonflexible hardware  Flexible software implementation - (Software-Defined Radio - SDR)  Powerful CPU necessary - High requirements to ressource efficiency! 2

  3. HEINZ NIXDORF INSTITUT Motivation(2) Universität Paderborn Schaltungstechnik Prof. Dr.-Ing. Ulrich Rückert • In embedded processors size of on-chip memories is limited • External (SDRAM) memory – Low costs per bit – Slow/high latency • Intermediate storage of accesses in the cache – Loading of entire cache lines from the external memory – Use of temporal and spatial locality Register Level-1 cache • Size of the caches is limited by the Level-2 cache operating frequency of the processor core – Cache hierachie Main memory – Level-1 cache is matched to the core frequency Hard disk → additional levels with higher latency 3

  4. HEINZ NIXDORF INSTITUT Outline Universität Paderborn Schaltungstechnik Prof. Dr.-Ing. Ulrich Rückert • Concurrent design flow for DSE • VLIW architecture/Cache architecture • Prototyping Environment • Performance results and resource requirements • Conclusion/Outlook Specification Instruction FE Instruction Fetch / L1 Instruction Cache Memory Benchmarks Vice-UPSLA DC Instruction Decode Source Code UPSLA Bypass RTL-Description Compiler RD Register Read RTL-Code Assembler Code RTL-Simulator Assembler ALU ALU ALU ALU EX Condition RTL-Code Object-Files Register / / / * / * * * Synthesis-Tool Linker LD/ST LD/ST LD/ST LD/ST Netlist Executables ME Register Data Emulator (Prototyp) ASIC-Realization Software Simulator L1 Data-Cache Memory Profiling-Data Functional WR Register Write Visualization Verification Ressource Efficiency 4

  5. HEINZ NIXDORF INSTITUT Design Space Exploration Specification Universität Paderborn Tool Flow Schaltungstechnik Prof. Dr.-Ing. Ulrich Rückert Goal: Highly automated design flow Benchmarks Vice-UPSLA Source Code UPSLA RTL-Description Compiler RTL-Code Assembler Code RTL-Simulator Assembler RTL-Code Object-Files Synthesis-Tool Linker Netlist Executables Emulator (Prototyp) ASIC-Realization Software Simulator Profiling-Data Functional Visualization Verification Ressource Efficiency 5

  6. HEINZ NIXDORF INSTITUT The CoreVA architecture Universität Paderborn Modular Design Schaltungstechnik Prof. Dr.-Ing. Ulrich Rückert Instruction FE Instruction Fetch / L1 Instruction Cache Memory DC Instruction Decode Bypass RD Register Read ALU ALU ALU ALU EX Condition Register / / / * / * * * LD/ST LD/ST LD/ST LD/ST ME Register Data L1 Data-Cache Memory WR Register Write 6

  7. Dynamically Reconfigurable Platform HEINZ NIXDORF INSTITUT Universität Paderborn RAPTOR-X64 Schaltungstechnik Prof. Dr.-Ing. Ulrich Rückert Prototypic Implementation of Microelectronic Circuits on FPGAs • Up to 200 Million transistors emulated • Flexible, modular concept: PCI-Bus- motherboard with up to six modules • Partial dynamic reconfiguration at high reconfiguration bandwidth USB Controller System Monitor Clock USB 2.0-High-Speed Voltage, Tempature, Sythesis, USB-OTG Analog Inputs Distribution TST-JTAG CFG-JTAG USB Logic CTRL+Config Logic Xilinx Local-Bus Master Arbiter, MMU SystemACE CF Local-Bus Slave Diagnostics, CLK, CF Access, OTG-Control Configuration, etc. JTAG Control PCI-X-Bus (64Bit Data / 32Bit Address) PCI-Bus- Bridge Local-Bus (32Bit Data / 32Bit Address) Master, Slave, DMA CTRL, CTRL, CTRL, 85 85 85 SMB SMB SMB Module 6 Module 4 Dual-Port 128 128 128 128 SRAM Module 1 Module 2 Module 3 SelectMAP, SelectMAP, SelectMAP, 75 75 75 CFG-JTAG CFG-JTAG CFG-JTAG Broadcast-Bus 7

  8. HEINZ NIXDORF INSTITUT System Environment Universität Paderborn Schaltungstechnik Prof. Dr.-Ing. Ulrich Rückert • Multi master system bus • Generic I/D cache interfaces to external memory • 4 GB SDRAM • Penalty cycles on cache misses: • Instr. cache: >73 clock cycles • Data cache: >61 clock cycles SDRAM • Internal memories can be accessed Systembus SDRAM from host system Controller Instr. • Cache Generic interface for dedicated CoreVA Arbiter Systembus CPU Controller hardware extensions Data Cache • 9.1 Gbit/s external bandwidth Localbus MMIO Interface FIFO CRC UART Host PC Xilinx FPGA ASIC RAPTOR2000 System 8

  9. HEINZ NIXDORF INSTITUT Cache Architecture Universität Paderborn Overview Schaltungstechnik Prof. Dr.-Ing. Ulrich Rückert • I-Cache: – 32 bit per issue slot  4 slot configuration: 128 bit interface – Direct mapped (low latency/power/area) – 16kB cache size, 64 bytes line width (configurable) • D-Cache: – 1-/2-port configuration possible – Direct mapped – 16kB cache size, 32 bytes line width (configurable) – Write-back policy, non-blocking – Two programmable allocation modes: fetch-on-write-miss/allocate-on-write-miss • I-/D-Caches can dynamically be configured as scratch pad memories – Higher performance for timing critical parts of an application (cache misses are avoided) – Energy improvements due to nonexistent external memory accesses 9

  10. HEINZ NIXDORF INSTITUT Cache Architecture Universität Paderborn Synthesis Results Schaltungstechnik Prof. Dr.-Ing. Ulrich Rückert 10

  11. HEINZ NIXDORF INSTITUT Application Evaluation Universität Paderborn Different Cache Configurations Schaltungstechnik Prof. Dr.-Ing. Ulrich Rückert • Applications: synthetic benchmarks, baseband, cryptography, multimedia, LTE protocol stack • 50% LD/ST-units per #FUs best trade-off • Concurrent LD/ST ≠ Speedup! Speedup dependent on scheduling! 11

  12. HEINZ NIXDORF INSTITUT Results(1) Universität Paderborn Hit Rates Schaltungstechnik Prof. Dr.-Ing. Ulrich Rückert • High hit rates for all applications • Allocate-on-write- miss ↔ Fetch-on-write-miss 12

  13. HEINZ NIXDORF INSTITUT Results(2) Universität Paderborn Portion of Stall Cycles to Execution Time Schaltungstechnik Prof. Dr.-Ing. Ulrich Rückert • Latencies of SDRAM accesses may vary dependent on the order, distribution and frequency of the accesses. 13

  14. HEINZ NIXDORF INSTITUT Results(3) Universität Paderborn Performance Schaltungstechnik Prof. Dr.-Ing. Ulrich Rückert 14

  15. HEINZ NIXDORF INSTITUT Results(4) Universität Paderborn Energy Schaltungstechnik Prof. Dr.-Ing. Ulrich Rückert 15

  16. HEINZ NIXDORF INSTITUT Results(5) Universität Paderborn Energy-Delay Schaltungstechnik Prof. Dr.-Ing. Ulrich Rückert 16

  17. HEINZ NIXDORF INSTITUT The CoreVA VLIW architecture Universität Paderborn ASIC realization Schaltungstechnik Prof. Dr.-Ing. Ulrich Rückert 1.66 mm • 4-issue VLIW processor, 2x MLA,DIV • 1-Port I-Cache (16kByte,128 Bit), Comp. • 2-Port D-Cache (16kByte, 32 Bit) Cell Instruction Cache • 65nm ST Microelectronics, Low Power (Thick Oxide), 1.2V MixedVT, 1.8V I/Os (configurable pullups) • Hardware extensions (incl. ECC) Register File 1.66 mm Execute Frequency 400 MHz Area (32kB SRAM) 2.7 mm² Power Consumption 0.1 W ECC 1.6 GOP/s in scalar mode Data Cache 3.2 GOP/s in SIMD mode 17

  18. HEINZ NIXDORF INSTITUT Conclusion/Outlook Universität Paderborn Schaltungstechnik Prof. Dr.-Ing. Ulrich Rückert • Framework for the design-space exploration of processor architectures and memory subsystems • Rapid prototyping environment RAPTOR • Dynamic configurable cache architecture • 2-slot configuration/allocate-on-write-miss shows best energy trade-off • Performance/Energy gains up to 25% • Future work: – Include associativity – Combination of caches/scratch-pad memories to enhance memory bandwidth 18

  19. HEINZ NIXDORF INSTITUT Questions? Universität Paderborn Schaltungstechnik Prof. Dr.-Ing. Ulrich Rückert Specification Vice-UPSLA Benchmarks Source Code UPSLA RTL-Description Compiler RTL-Code Assembler Code RTL-Simulator Assembler RTL-Code Object-Files Synthesis-Tool Linker Netlist Executables Emulator (Prototyp) ASIC-Realization Software Simulator Profiling-Data Functional Visualization Verification Ressource Efficiency Design space exploration VLIW architecture SDRAM Systembus SDRAM Controller Instr. Cache CoreVA Arbiter Systembus CPU Controller Data Cache Localbus MMIO Interface FIFO CRC UART Host PC Xilinx FPGA ASIC RAPTOR2000 System Rapid prototyping System Architecture 19

  20. HEINZ NIXDORF INSTITUTE University of Paderborn Schaltungstechnik Dr.-Ing. Mario Porrmann Thank you for your attention! Heinz Nixdorf Institute University of Paderborn System and Circuit Technology Dipl.-Ing. Thorsten Jungeblut Fürstenallee 11 33102 Paderborn Tel.: 0 52 51/60 63 39 Fax.: 0 52 51/60 63 51 Email: tj@hni.upb.de http://wwwhni.upb.de/sct

Recommend


More recommend