Institut für Technische Informatik Institut für Technische Informatik Chair for Embedded Systems - Prof. Dr. J. Henkel Chair for Embedded Systems - Prof. Dr. J. Henkel Vorlesung im SS 2013 Reconfigurable and Reconfigurable and Adaptive Systems (RAS) Adaptive Systems (RAS) 7. Adaptive Reconfigurable Processors Lars Bauer, Jörg Henkel - 1 - - 2 - RAS Topic Overview Institut für Technische Informatik Chair for Embedded Systems - Prof. Dr. J. Henkel 1. Introduction 2. Overview 3. Special Instructions 4. Fine-Grained Reconfigurable Processors 5. Configuration Prefetching • RISPP 6. Coarse-Grained Reconfigurable Processors • WARP • Dynamic Instruction 7. Adaptive Merging (DIM) Reconfigurable Processors • Further relevant 8. Fault-tolerance architectures / by Reconfiguration domains - 3 - - 4 - L. Bauer, CES, KIT, 2013
Overview RISPP Recall � Some parts were already introduced as case-study in � Developed at CES, KIT previous lectures � Tightly-coupled fine-grained reconfigurable � Instruction Format (up to 4 read and 2 write registers, immediate values, 10-bit virtual opcode) fabric � Using the core ISA (cISA) to implement SIs when their � Introduces and implements modular SIs reconfiguration is not completed yet (trap handler) ◦ Provide different performance/area trade-offs at run- � Special Instructions have access to main memory and to a time fast on-chip scratch-pad memory ◦ Using two independent 128-bit ports � Realizes high run-time adaptivity, i.e. a run-time ◦ Pipeline stalls when SI executes in hardware system decides which reconfigurations shall be � Dynamic Prefetching (called ‘Forecasting’) using weighted performed and when they shall be performed error-back propagation - 5 - - 6 - L. Bauer, CES, KIT, 2013 L. Bauer, CES, KIT, 2013 Analysis of Special Instruction RISPP HW Architecture Overview Execution On-chip Memory System Legend: 128 � Partition the reconfi- Added Bus gurable fabric into so- Core Pipeline parts Core Pipeline 32 128 called SI Containers Memory ◦ aka ‘Reconfigurable Functional Unit’ Arbiter 32 32 � An SI may be loaded Data Cache Off-Chip WB into any free Container 32 Memory Core Pipeline 128 128 Legend: Core Pipeline (scaled down): MEM � Problems: Special Instruction Reconfigu- Container Container Container VGA Reconfig. Reconfig. Reconfig. rable area: Container (SIC): ◦ Relatively long reconfi- Load / … EXE Store Units guration time ICAP Corresponds to ◦ Limited Resource Sharing ID OneChip, Chimaera, ◦ Fragmentation (not the … Proteus, … entire available space Inter- Inter- Inter- IF nect … Intercon- con- con- con- may be usable) nect nect nect - 7 - - 8 - L. Bauer, CES, KIT, 2013 L. Bauer, CES, KIT, 2013
Analysis of Special Instruction Fundamental Processor Extension: Execution (cont’d) Atom / Molecule Model All 31,977 SI executions completed � Definition Atom: 35 Executions (in thousands) ◦ A computational data path No cISA exec. 30 ◦ Smallest block that can be reconfigured (‘atomic’ in that With cISA exec. #Accumulated SI sense) 25 With cISA exec. & smaller SIs � Example: Transform Atom With cISA exec. & upgrades 20 DCT HT 15 Y 00 + − X 00 >> 1 RISPP’s 10 + modular SIs − X 30 << 1 Y 10 >> 1 5 >> 1 − Y 30 0 X 10 << 1 − 0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 + + X 20 >> 1 Execution Time [Million cycles] src: [BSH08a] Y 20 - 9 - - 10 - L. Bauer, CES, KIT, 2013 L. Bauer, CES, KIT, 2013 Fundamental Processor Extension: Fundamental Processor Extension: Atom / Molecule Model Atom / Molecule Model � Definition Special Instruction: � Example: Sum of Absolute � Definition Molecule: reconfigured) Atoms ◦ Similar to HLS scheduling after ◦ An assembly instruction Transformed Differences (SATD) ◦ Implementation of an SI allocating a certain number of Atoms ◦ Dataflow graph of Atoms g p ◦ Using the available (i.e. at that time Using the available (i.e. at that time DCT=0 HT=0 DCT=0 HT=1 INPUT: OUTPUT: Repack (2 instances) Transform (2 instances) SAV (2 instances) 10 11 12 13 14 15 16 17 + + + + + + SAV (Sum of QSub Repack Transform Absolute Values) - 11 - - 11 - - 12 - - 12 - L. Bauer, CES, KIT, 2013 . Bauer, CES, KIT, 2013 L. Bauer, CES, KIT, 2013 . Bauer, CES, KIT, 2013
Fundamental Processor Extension: Difference to SI Containers Atom / Molecule Model SI Containers Atom Containers SPECIAL IN- SI A SI B SI C STRUCTIONS Core Pipeline Core Pipeline (SIs) MOLECULES (an SI can be implemented by any of its A 1 A 2 A 3 A cISA B 1 B 2 B cISA C 1 C 2 C cISA Molecules) � Multiple SIs may share common Atoms 1 ATOMS 2 1 2 2 1 1 1 2 2 1 1 2 2 (the numbers 2 1 1 1 2 1 � There is no predetermined maximum of supported SIs denote: #Atom- instances requi- red for this � But: it is not possible/easy to execute two SIs at the same time Atom 1 Atom 2 Atom 3 Atom 4 Atom 5 Atom 6 Molecule) (as they are no longer independent) � For each SI there are different Atoms can be shared among different ◦ ◦ Not necessarily a problem, see Molen (single controller unit) and Molecules and SIs implementations (Molecules) � Implementation of a particular SI OneChip (memory coherency problems) There is one Molecule that does not ◦ can be gradually upgraded by need any Atom (Software � SIs can be upgraded (step-by-step by loading more Atoms) Implementation with core-ISA: cISA) loading more Atoms - 13 - - 14 - L. Bauer, CES, KIT, 2013 L. Bauer, CES, KIT, 2013 Adaptivity Through Dynamic Summary Modular SIs Performance vs. Area Trade-off SI Molecules: Performance vs. Reconfigurable Resources � Concept improves the efficiency and flexibility 40 Area requirements [# loaded Atoms] ◦ Atom sharing 35 ◦ Reduced fragmentation max Execution Time [Cycles] 10 IPred VDC 16x16 (I-MB) 30 ◦ Reduced reconfiguration overhead (due to SI upgrading) IPred HDC 16x16 (I-MB) 25 � Decision how many Atom Containers shall be MC Hz 4 (P-MB) 5 spend for which SI can be adapted at run time 20 15 � However, this adaptivity demands a run-time system that determines the decision and that 10 0 implies overhead (to execute it) 5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 Hardware Resources [Atom Containers] - 15 - - 16 - L. Bauer, CES, KIT, 2013 L. Bauer, CES, KIT, 2013
Run-time System: Simplified Run-time System: Simplified Overview Overview (cont’d) Core Pipeline � Decode: detects SIs and Forecasts (for prefetching) and sends them to the execution controls (only SIs) and Monitoring (SIs and Instruction Status / Control Forecasts) Memory Reconfigurable HW � Execution Control: executes SIs by determining their fastest Instruction currently available Molecule (state is maintained in a look-up table) and triggers the hardware execution (using the Atoms) or the software emulation (using the trap handler) Execution Reconf. Decode Replacing � Monitoring: Counts the executions for each SI Control Sequence � Prediction: Fine-tunes the Forecasts (recall: dynamic prefetching; Run-time see below) and resets the monitoring values Selection System P ME ME P EE EE P LF LF P ME ME P EE EE Monitoring Prediction P: Prefetching Point EE: Encoding Engine ME: Motion Estimation LF: Loop Filter - 17 - - 18 - L. Bauer, CES, KIT, 2013 L. Bauer, CES, KIT, 2013 Run-time System: Simplified Formal Atom/Molecule Model Overview (cont’d) # Instances of Atom A 1 � Selection: Select Molecules to implement the � Representing the 5 � � Molecules as a o � � o o o 1 4 1, 4 1, 4 1, 4 forecasted SIs vector of Atoms � � � o � � y o o p o 5 5 � � � ◦ The example only 5, 4 4 � Reconfiguration Sequence Scheduling: shows 2 Atom Types y � � y 9 9 ( A 0 and A 1 ), thus each Determine the reconfiguration sequence of � � x � � � � � � � vector has 2 entries; x x x o o o o p p p p 1, 2 1, 2 3 in general: ℕ n � � x x 3 3 the Atoms that are required to implement the � � � Basic operators p � � p p p 5, 2 5, 2 5, 2 5 2 selected Molecules 2 ◦ How many Atoms are p � � p 7 7 needed for a Molecule � Replacing: Determines, which currently ◦ Which Atoms have 1 two Molecules in common configured Atom shall be replaced by a new ◦ Which Atoms are Atom that is scheduled to be reconfigured needed to fulfill the demands of two 1 2 3 5 6 4 Molecules # Instances of Atom A 0 - 19 - - 20 - L. Bauer, CES, KIT, 2013 L. Bauer, CES, KIT, 2013
Recommend
More recommend