Institut für Technische Informatik Chair for Embedded Systems - Prof. Dr. J. Henkel Vorlesung im SS 2012 Reconfigurable and Adaptive Systems (RAS) Lars Bauer, Jörg Henkel - 1 - Organisation � Lecture time: Mon., 14.00 - 15.30 Bld. 50.34, HS -101 � Homepage: http://ces.itec.kit.edu/ � Teaching � Slides Login: Login: “student” Passwd: “CES-Student” � Contact: lars.bauer@kit.edu Haid-und-Neu-Str. 7 Bld. 07.21, Rm. 316.2 (2 nd Floor!!) - 2 - L. Bauer, CES, KIT, 2012
CES @ Technologiefabrik (TFI) Info-Bau TFI Mensa - 3 - L. Bauer, CES, KIT, 2012 RAS Examine � CS Diploma: ◦ Vertiefungsfach 8: Entwurf eingebetteter Systeme und Rechnerarchitekturen � CS Master: ◦ Modul: Rekonfigurierbare und Adaptive Systeme [IN4INRAS] (3 ECTS) ◦ Modul: Eingebettete Systeme: Weiterführende Themen [IN4INESWT] (8 ECTS) ◦ Modul: Advanced Computer Architecture [IN4INACA] (10 ECTS) � Other Study Courses (e.g. EE): ask individually - 4 - L. Bauer, CES, KIT, 2012
Teaching @ CES, SS 2012 ◦ Embedded Multimedia � Lectures ◦ Wireless Sensor Networks ◦ RAS ◦ Processor Modeling at ◦ Low Power Design Transaction-Level � Labs ◦ Low Power Design for ◦ Entwurf eingebetteter Systeme Embedded Systems ◦ Entwurf von eingebetteten ◦ Rekonfigurierbare applikationsspezifischen Eingebettete Systeme Prozessoren ◦ Design Tools for Embedded ◦ Software-Entwicklung Processors � Seminars ◦ Dependability in Embedded Systems ◦ Distributed Decision Making ◦ Dependable Embedded ◦ Organic Computing Software ◦ Stereo Video Processing ◦ Multicore for Multimedia Processors More Info: ces.itec.kit.edu/teaching - 5 - L. Bauer, CES, KIT, 2012 Theses @ CES � Note: Info on homepage is typically not up-to-date ◦ If you are interested in a particular topic: better ask individually � There are nearly always SADABAMA theses or Hiwi jobs available in the scope of reconfigurable systems � Main projects: ◦ i-Core (invasive Core) ◦ OTERA (Online Test Strategies for Reliable Reconfigurable Architectures) � Topics: ◦ Hardware Prototype ◦ Simulation Environment / Algorithms for Runtime System � Examples: Fault Emulation, adaptive redundancy schemes, online monitoring, bitstream manipulation, multicore integration, compiler tools, … - 6 - L. Bauer, CES, KIT, 2012
Beneficial Previous Knowledge � Rechnerstrukturen ◦ Prerequisites � Eingebettete Systeme ◦ ES1: Optimierung und Synthese Eingebetteter Systeme ◦ ES2: Entwurf und Architekturen für Eingebettete Systeme ◦ The core topics (e.g. details about FPGA architectures) will be recapitulated in the scope of this lecture ◦ Thus, the contents of ES1 and ES2 are beneficial but not required in full detail - 7 - L. Bauer, CES, KIT, 2012 General Literature � “Fine- and Coarse-Grain Reconfigurable Computing”, S. Vassiliadis and D. Soudris, Springer 2007. � “Runtime adaptive extensible embedded processors – a survey”, H. P. Huynh and T. Mitra, SAMOS, pp. 215–225, 2009. � “Reconfigurable computing: architectures and design methods”, T.J. Todman et al., IEE Proceedings Computers & Digital Techniques, vol. 152, no. 2, pp. 193-207, 2005. � “Reconfigurable Instruction Set Processors from a Hardware/Software Perspective”, F. Barat et al., IEEE Transactions on Software Engineering, vol. 28, no. 9, pp. 847-862, 2002. - 8 - L. Bauer, CES, KIT, 2012
Institut für Technische Informatik Chair for Embedded Systems - Prof. Dr. J. Henkel Reconfigurable and Adaptive Systems (RAS) 1. Introduction and Motivation: The Demand for Adaptivity - 9 - Designing Embedded Systems � Typical approach: ◦ Static analysis of system requirements (e.g. com- putational hot spots) ◦ Build optimized system � Today’s requirements: ◦ Increasing complexity ◦ More functionality � Problem: ◦ Statically chosen design point has to match all requirements ◦ Typically inefficient for individual components (e.g. tasks or hot spots) - 10 - - 10 - L. Bauer, CES, KIT, 2012
Definition ‘Computational Hot Spot’ � A rather small part of the application that corresponds to a rather large part of the execution time ◦ Also called ‘Computational Kernel’ ◦ Typically: inner loop ◦ 80/20 rule (90/10 rule etc.) 20 80 20 80 Code Size Execution Time - 11 - L. Bauer, CES, KIT, 2012 Typical Implementation Alternatives Efficiency: Mips/$, MHz/mW, Mips/area, … “ Hardware solution ” ASIC: - Non-programmable, - highly specialized - Instruction set extension - parameterization - inclusion/exclusion of ASIP: Application functional blocks specific instruction set processor “ Software GPP: General pur- solution ” pose processor Flexibility, 1/time-to-market, … src: Henkel, ESII - 12 - L. Bauer, CES, KIT, 2012
Example Application: H.324 Video Conferencing Remote � Video En-/Decoding Mic Phone CVBS CVHS Control � Audio En-/Decoding IR AUDIO INPUT VIDEO INPUT � Data (De-)Multi- Digital Video Input plexing AUDIO ENCODER VIDEO ENCODER H.245 CONTROL G.723 H.263 / H.264 � Control protocol MULTIPLEXER H.223 DE-MULTIPLEXER H.223 AUDIO DECODER VIDEO DECODER H.245 CONTROL G.723 H.263 / H.264 MODEM PSTN AUDIO OUPUT VIDEO OUPUT INTERFACE Display src: cityrockz.com Line Phone Speakers Screen - 13 - L. Bauer, CES, KIT, 2012 Hotspots in H.324 Video Conferencing 12 10 Processing Time [%] 8 6 4 2 0 I_ME S_ME PMV TQ_PL TQ_IL TQ_C LF MC_L MC_C IP_L16 MD_I4 CABAC CAVLC Dec_MB get_pos IDQ_PL CABAC_d CAVLC_d FM Q UP Enc Qt Pred_0 Reconst ED BC TC BA CS NF LPF HPF EE DRF Dt FGA H245_C H223_M H223_DM V34Mod USB MAC Processing Functions - 14 - L. Bauer, CES, KIT, 2012
ASIP Implementation � Design accele- rators for the hot spots � Connect them as Execution Units, Register Files, and Interfaces src: Tensilica, Inc.: “Xtensa LC Product Brief” - 15 - L. Bauer, CES, KIT, 2012 ASIP Implementation (cont’d) � Provides noticeably improved performance after targe- ting the ma- jor hot spots I_ME � However, performance TQ_PL still not suf- ficient to achieve real- MC_L time require- ments ◦ More hot spots need to be accelerated src: Tensilica, Inc.: “Xtensa LC Product Brief” - 16 - L. Bauer, CES, KIT, 2012
ASIP Implementation (cont’d) CAVLC � Scalability CABAC problem when rather many hot- pots exist ◦ Note: still not I_ME all relevant hot spots are covered TQ_PL Dec_ MC_L H245_C MB MAC FM V34 mod S_ME src: Tensilica, Inc.: “Xtensa LC Product Brief” - 17 - L. Bauer, CES, KIT, 2012 ASIP Implementation (cont’d) � ASIPs perform well when 1. rather few hot spots need to be accelerated and 2. those hot spots are well known in advance � ASIPs are less efficient when targeting rather many hot spots ◦ All accelerators are provided statically (i.e. they require area and consume power) even though typically just a few of them are needed at a certain time � ASIPs are less efficient when targeting unknown hot spots ◦ Performance degenerates to the performance of a GPP ◦ Note that even for a given application it is not necessarily clear, which parts of it are ‘hot’ when executing as this may depend on input data (as demonstrated in the following) - 18 - L. Bauer, CES, KIT, 2012
Example Application: H.264 video Encoder MB Encoding Loop � MB-Type Decision (I or P) � Mode Decision (for I or P) DCT / IDCT / If MB_Type = P_MB MC Loop Over MB Loop Over MB Loop Over MB Blocking Filter Q IQ then In-Loop De- ME: SA(T)D Encoding RD CAVLC Engine else DCT / IDCT / IPRED HT / Q IHT / IQ � Iterates on MacroBlocks (M MBs, i.e. 16x16 pixels) � 2 different MB-types � different computational paths with different computational requirements ◦ I-MB (spatial prediction) ◦ P-MB (temporal prediction) - 19 - L. Bauer, CES, KIT, 2012 Example: Football Video I-MB P-MB Note: 16x16 MBs can be partitioned into sub- MBs down to 4x4 - 20 - L. Bauer, CES, KIT, 2012
Example: Distribution of I-MBs in Medium-to-VeryHigh Motions Rafting Rugby Football 100% 90% 80% Scene with Very INTRA MB in a Frame [%] High Motion 70% 60% 50% Scene with Medium- to-Slow Motion 40% 30% 20% Scene with High-to- 10% Medium Motion 0% 1 21 41 61 81 101 121 141 161 181 201 221 241 261 281 301 Frame Number - 21 - L. Bauer, CES, KIT, 2012 Example: Changing Energy Con- sumption at Frame- and MB Level 35 Energy Consumption [µWs] Carphone_QCIF 30 Clair_QCIF 25 SusieTable_QCIF 20 15 10 5 0 0 20 40 60 80 100 120 140 Frame Number 9 0.4-0.5µWs 8 0.3-0.4µWs 7 0.2-0.3µWs 6 0.1-0.2µWs 0.0-0.1µWs 5 4 3 2 1 Frame#99 Frame#100 1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10 11 - 22 - L. Bauer, CES, KIT, 2012
Recommend
More recommend