reconfigurable and
play

Reconfigurable and Adaptive Systems (RAS) 7. Adaptive Reconfigurable - PowerPoint PPT Presentation

Institut fr Technische Informatik Institut fr Technische Informatik Chair for Embedded Systems - Prof. Dr. J. Henkel Chair for Embedded Systems - Prof. Dr. J. Henkel Vorlesung im SS 2014 Reconfigurable and Adaptive Systems (RAS) 7.


  1. Institut für Technische Informatik Institut für Technische Informatik Chair for Embedded Systems - Prof. Dr. J. Henkel Chair for Embedded Systems - Prof. Dr. J. Henkel Vorlesung im SS 2014 Reconfigurable and Adaptive Systems (RAS) 7. Adaptive Reconfigurable Processors Lars Bauer, Jörg Henkel - 1 - - 2 - RAS Topic Overview Institut für Technische Informatik Chair for Embedded Systems - Prof. Dr. J. Henkel 1. Introduction 2. Overview 7.1 RISPP: Rotating 7 3. Special Instructions Instruction Set 4. Fine-Grained Reconfigurable Processors Processing Platform 5. Configuration Prefetching • RISPP 6. Coarse-Grained Reconfigurable Processors • WARP • Dynamic Instruction 7. Adaptive Merging (DIM) Reconfigurable Processors • Further relevant 8. Fault-tolerance architectures / by Reconfiguration domains - 3 - - 4 - L. Bauer, CES, KIT, 2014

  2. Overview RISPP Recall � Some parts were already introduced as case-study in � Developed at CES, KIT previous lectures � Tightly-coupled fine-grained reconfigurable � Instruction Format (up to 4 read and 2 write registers, fabric immediate values, 10-bit virtual opcode) � Using the core ISA (cISA) to implement SIs when their � Introduces and implements modular SIs reconfiguration is not completed yet (trap handler) � Provide different performance/area trade-offs at run- � Special Instructions have access to main memory and to a time fast on-chip scratch-pad memory � Using two independent 128-bit ports � Realizes high run-time adaptivity, i.e. a run-time � Pipeline stalls when SI executes in hardware system decides which reconfigurations shall be � Dynamic Prefetching (called ‘Forecasting’) using weighted performed and when they shall be performed error-back propagation - 5 - - 6 - L. Bauer, CES, KIT, 2014 L. Bauer, CES, KIT, 2014 Analysis of Special Instruction RISPP HW Architecture Overview Execution On-chip Memory Legend: System 128 � Partition the reconfi- Added Bus gurable fabric into so- Core Pipeline parts Core Pipeline 32 128 called SI Containers Memory � aka ‘Reconfigurable Functional Unit’ Arbiter 32 32 � An SI may be loaded Data Cache Off-Chip into any free Container WB Core Pipeline 32 Memory 128 128 Legend: Core Pipeline (scaled down): � Problems: MEM Reconfigu- Special Instruction Container Container Container VGA Reconfig. Reconfig. Reconfig. rable area: Container (SIC): � Relatively long reconfi- Load / … EXE guration time Store Units Corresponds to ICAP � Limited Resource Sharing ID OneChip, Chimaera, � Fragmentation (not the … Proteus, … entire available space Inter- Inter- nect … Inter- IF Intercon- con- con- con- may be usable) nect nect nect - 7 - - 8 - L. Bauer, CES, KIT, 2014 L. Bauer, CES, KIT, 2014

  3. Analysis of Special Instruction Fundamental Processor Extension: Execution (cont’d) Atom / Molecule Model All 31,977 SI executions completed � Definition Atom: 35 Executions (in thousands) � A computational data path No cISA exec. � Smallest block that can be reconfigured (‘atomic’ in that 30 With cISA exec. sense) #Accumulated SI 25 With cISA exec. & smaller SIs � Example: Transform Atom With cISA exec. & upgrades 20 DCT HT 15 Y 00 + � X 00 >> 1 RISPP’s 10 + � modular SIs X 30 << 1 Y 10 >> 1 5 >> 1 � Y 30 0 X 10 << 1 � 0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.8 2.0 1.6 + + X 20 >> 1 Execution Time [Million cycles] src: [BSH08a] Y 20 - 9 - - 10 - L. Bauer, CES, KIT, 2014 L. Bauer, CES, KIT, 2014 Fundamental Processor Extension: Fundamental Processor Extension: Atom / Molecule Model Atom / Molecule Model � Definition Special Instruction: � Example: Sum of Absolute � Definition Molecule: reconfigured) Atoms � Similar to HLS scheduling after An assembly instruction Transformed Differences (SATD) Implementation of an SI � � allocating a certain number of Atoms Dataflow graph of Atoms g p Using the available (i.e. at that time Using the available (i.e. at that time � � DCT=0 HT=0 DCT=0 HT=1 INPUT: OUTPUT: Repack (2 instances) Transform (2 instances) SAV (2 instances) 10 11 12 13 14 15 16 17 + + + + + + SAV (Sum of QSub Repack Transform Absolute Values) - 11 - - 11 - - 12 - - 12 - L. Bauer, CES, KIT, 2014 . Bauer, CES, KIT, 2014 L. Bauer, CES, KIT, 2014 . Bauer, CES, KIT, 2014

  4. Fundamental Processor Extension: Difference to SI Containers Atom / Molecule Model SI Containers Atom Containers SPECIAL IN- SI A SI B SI C STRUCTIONS Core Pipeline Core Pipeline (SIs) MOLECULES (an SI can be implemented by any of its C cISA A 1 A 2 A 3 A cISA B 1 B 2 B cISA C 1 C 2 Molecules) 1 � Multiple SIs may share common Atoms ATOMS 2 1 2 2 1 1 1 2 2 1 2 1 2 2 1 1 (the numbers 1 2 1 � There is no predetermined maximum of supported SIs denote: #Atom- instances requi- red for this � But: it is not possible/easy to execute two SIs at the same time Atom 1 Atom 2 Atom 3 Atom 4 Atom 5 Atom 6 Molecule) (as they are no longer independent) � For each SI there are different Atoms can be shared among different � � Not necessarily a problem, see Molen (single controller unit) and Molecules and SIs implementations (Molecules) � Implementation of a particular SI OneChip (memory coherency problems) There is one Molecule that does not � can be gradually upgraded by need any Atom (Software � SIs can be upgraded (step-by-step by loading more Atoms) loading more Atoms Implementation with core-ISA: cISA) - 13 - - 14 - L. Bauer, CES, KIT, 2014 L. Bauer, CES, KIT, 2014 Adaptivity Through Dynamic Summary Modular SIs Performance vs. Area Trade-off SI Molecules: Performance vs. Reconfigurable Resources � Concept improves the efficiency and flexibility 40 Area requirements [# loaded Atoms] � Atom sharing 35 � Reduced fragmentation max 10 Execution Time [Cycles] IPred VDC 16x16 (I-MB) 30 � Reduced reconfiguration overhead (due to SI upgrading) IPred HDC 16x16 (I-MB) � Decision how many Atom Containers shall be 25 MC Hz 4 (P-MB) 5 spend for which SI can be adapted at run time 20 � However, this adaptivity demands a run-time 15 system that determines the decision and that 10 0 implies overhead (to execute it) 5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 Hardware Resources [Atom Containers] - 15 - - 16 - L. Bauer, CES, KIT, 2014 L. Bauer, CES, KIT, 2014

  5. Run-time System: Simplified Run-time System: Simplified Overview Overview (cont’d) Core Pipeline � Decode: detects SIs and Forecasts (for prefetching) and sends them to the execution controls (only SIs) and Monitoring (SIs and Instruction Forecasts) Status / Control Memory Reconfigurable HW � Execution Control: executes SIs by determining their fastest Instruction currently available Molecule (state is maintained in a look-up table) and triggers the hardware execution (using the Atoms) or the software emulation (using the trap handler) Execution Reconf. Decode Replacing � Monitoring: Counts the executions for each SI Control Sequence � Prediction: Fine-tunes the Forecasts (recall: dynamic prefetching; Run-time see below) and resets the monitoring values System Selection ME EE LF ME EE P ME P EE P LF P ME P EE Monitoring Prediction P: Prefetching Point EE: Encoding Engine ME: Motion Estimation LF: Loop Filter - 17 - - 18 - L. Bauer, CES, KIT, 2014 L. Bauer, CES, KIT, 2014 Run-time System: Simplified Formal Atom/Molecule Model Overview (cont’d) # Instances of Atom A 1 � Selection: Select Molecules to implement the � Representing the Molecules as a 5 � � o � � forecasted SIs o o o 1 4 1, 4 1, 4 1, 4 vector of Atoms � � � o � � y o o p o 5 5 � � � The example only � 5, 4 � Reconfiguration Sequence Scheduling: 4 shows 2 Atom Types y � � y 9 9 ( A 0 and A 1 ), thus each Determine the reconfiguration sequence of � � vector has 2 entries; x � � � � � � � x x x o o o o o p p p p 1, 2 1, 2 3 in general: � n the Atoms that are required to implement the � � x x 3 3 � Basic operators � � p � � p p p 5 2 5, 2 5, 2 5, 2 selected Molecules 2 � How many Atoms are p � � p 7 7 needed for a Molecule � Replacing: Determines, which currently � Which Atoms have 1 two Molecules in configured Atom shall be replaced by a new common � Which Atoms are Atom that is scheduled to be reconfigured needed to fulfill the demands of two 1 2 3 5 6 4 Molecules # Instances of Atom A 0 - 19 - - 20 - L. Bauer, CES, KIT, 2014 L. Bauer, CES, KIT, 2014

Recommend


More recommend