Organisation Institut für Technische Informatik Chair for Embedded Systems - Prof. Dr. J. Henkel � Lecture time: Mi., 15.45 - 17.15 Vorlesung im SS 2014 Bld. 50.34, HS -102 Reconfigurable and � Homepage: http://ces.itec.kit.edu/teaching/ you can also find the slides from Adaptive Systems (RAS) previous years there � Slides Login: Login: “student” Lars Bauer, Jörg Henkel Passwd: “CES-Student” � Contact: lars.bauer@kit.edu Haid-und-Neu-Str. 7 Bld. 07.21, Rm. 316.2 (2 nd Floor!!) - 1 - - 2 - L. Bauer, CES, KIT, 2014 CES @ Technologiefabrik (TFI) Questions during the lecture Info-Bau TFI Mensa � Simply let me know / interrupt me - 3 - - 4 - L. Bauer, CES, KIT, 2014 L. Bauer, CES, KIT, 2014
RAS Examine Teaching @ CES, SS 2014 � Lectures � Seminars � CS Diploma: ◦ RAS ◦ Rekonfigurierbare ◦ Vertiefungsfach 8: Entwurf eingebetteter Systeme und Eingebettete Systeme ◦ Low Power Design Rechnerarchitekturen ◦ Dependability in Embedded ◦ Embedded Systems for Systems Multimedia and Image � CS Master: Processing ◦ Distributed Decision ◦ Modul: Rekonfigurierbare und Adaptive Systeme Making � Labs [IN4INRAS] (3 ECTS) ◦ Stereo Video Processing ◦ Entwurf eingebetteter ◦ Multicore for Multimedia ◦ Modul: Eingebettete Systeme: Weiterführende Themen Systeme Processors [IN4INESWTN] (10 ECTS) ◦ Entwurf von eingebetteten ◦ Sensor Networks applikationsspezifischen ◦ Modul: Advanced Computer Architecture Prozessoren [IN4INACA] (10 ECTS) ◦ Low Power Design and Embedded Systems � Other Study Courses (e.g. EE): ask individually More Info: ces.itec.kit.edu/teaching - 5 - - 6 - L. Bauer, CES, KIT, 2014 L. Bauer, CES, KIT, 2014 Theses @ CES Beneficial Previous Knowledge � Note: Info on homepage is typically not up-to-date � Rechnerstrukturen ◦ If you are interested in a particular topic: better ask individually ◦ Prerequisites � There are nearly always SADABAMA theses or Hiwi jobs � Eingebettete Systeme available in the scope of reconfigurable systems � Main projects: ◦ ES1: Optimierung und Synthese Eingebetteter Systeme ◦ i -Core: invasive Core ◦ ES2: Entwurf und Architekturen für Eingebettete Systeme ◦ OTERA: Online Test Strategies for Reliable Reconfigurable Architectures ◦ The core topics (e.g. details about FPGA architectures) ◦ Compilers for reconfigurable architectures will be recapitulated in the scope of this lecture � Topics: ◦ Thus, the contents of ES1 and ES2 are beneficial but not ◦ Algorithms for Runtime System, Operating System, … required in full detail ◦ Toolchain, Compiler, Synthesis, … ◦ Architecture, Hardware Prototype, Simulation Environment, … - 7 - - 8 - L. Bauer, CES, KIT, 2014 L. Bauer, CES, KIT, 2014
General Literature Institut für Technische Informatik Chair for Embedded Systems - Prof. Dr. J. Henkel � “Fine- and Coarse-Grain Reconfigurable Computing”, S. Vassiliadis and D. Soudris, Springer 2007. Reconfigurable and � “Runtime adaptive extensible embedded processors – a survey”, H. P. Huynh and T. Mitra, SAMOS, pp. 215–225, Adaptive Systems (RAS) 2009. � “Reconfigurable computing: architectures and design methods”, T.J. Todman et al., IEE Proceedings Computers & 1. Introduction and Motivation: Digital Techniques, vol. 152, no. 2, pp. 193-207, 2005. � “Reconfigurable Instruction Set Processors from a The Demand for Adaptivity Hardware/Software Perspective”, F. Barat et al., IEEE Transactions on Software Engineering, vol. 28, no. 9, pp. 847-862, 2002. - 9 - - 10 - L. Bauer, CES, KIT, 2014 Designing Embedded Systems Definition ‘Computational Hot Spot’ � Typical approach: � A rather small part of the application that ◦ Static analysis of system corresponds to a rather large part of the requirements (e.g. com- execution time putational hot spots) ◦ Also called ‘Computational Kernel’ ◦ Build optimized system ◦ Typically: inner loop � Today’s requirements: ◦ 80/20 rule (90/10 rule etc.) ◦ Increasing complexity ◦ More functionality � Problem: 20 80 ◦ Statically chosen design 20 point has to match all 80 requirements ◦ Typically inefficient for individual components Code Size Execution Time (e.g. tasks or hot spots) - 11 - - 11 - - 11 - - 12 - L. Bauer, CES, KIT, 2014 L. Bauer, CES, KIT, 2014
Example Application: H.324 Video Typical Implementation Alternatives Conferencing Efficiency: Mips/$, MHz/mW, Mips/area, … � Video En-/Decoding Remote “ Hardware solution ” Mic Phone CVBS CVHS Control � Audio En-/Decoding IR AUDIO INPUT VIDEO INPUT ASIC: � Data (De-)Multi- Digital Video Input - Non-programmable, - highly specialized plexing AUDIO ENCODER VIDEO ENCODER - Instruction set extension H.245 CONTROL G.723 H.263 / H.264 � Control protocol - parameterization - inclusion/exclusion of MULTIPLEXER ASIP: Application H.223 functional blocks specific instruction set processor DE-MULTIPLEXER H.223 “ Software AUDIO DECODER VIDEO DECODER H.245 CONTROL GPP: General pur- G.723 H.263 / H.264 solution ” pose processor MODEM PSTN AUDIO OUPUT VIDEO OUPUT INTERFACE Display src: cityrockz.com Line Phone Speakers Flexibility, 1/time-to-market, … src: Henkel, ESII Screen - 13 - - 14 - L. Bauer, CES, KIT, 2014 L. Bauer, CES, KIT, 2014 Hotspots in H.324 Video ASIP Implementation Conferencing � Design accele- 12 rators for the hot 10 spots Processing Time [%] 8 � Connect them as Execution Units, 6 Register Files, 4 and Interfaces 2 0 I_ME S_ME PMV TQ_PL TQ_IL TQ_C LF MC_L MC_C IP_L16 MD_I4 CABAC CAVLC Dec_MB get_pos IDQ_PL CABAC_d CAVLC_d FM Q UP Enc Qt Pred_0 Reconst ED BC TC BA CS NF LPF HPF EE DRF Dt FGA H245_C H223_M H223_DM V34Mod USB MAC Processing Functions src: Tensilica, Inc.: “Xtensa LC Product Brief” - 15 - - 16 - L. Bauer, CES, KIT, 2014 L. Bauer, CES, KIT, 2014
ASIP Implementation (cont’d) ASIP Implementation (cont’d) CAVLC � Scalability CABAC � Provides noticeably improved problem when performance after targe- rather many hot- ting the ma- pots exist jor hot spots ◦ Note: still not I_ME I_ME � However, all relevant hot spots are covered performance TQ_PL TQ_PL still not suf- ficient to Dec_ achieve real- MC_L MC_L H245_C MB time require- MAC FM ments V34 ◦ More hot spots need to be mod accelerated S_ME src: Tensilica, Inc.: “Xtensa LC Product Brief” src: Tensilica, Inc.: “Xtensa LC Product Brief” - 17 - - 18 - L. Bauer, CES, KIT, 2014 L. Bauer, CES, KIT, 2014 Example Application: H.264 video Summary of ASIP Implementation Encoder MB Encoding Loop � ASIPs perform well when 1. rather few hot spots need to be accelerated and � MB-Type Decision (I or P) � Mode Decision (for I or P) DCT / IDCT / If MB_Type = P_MB MC Loop Over MB 2. those hot spots are well known in advance Loop Over MB Loop Over MB Blocking Filter Q IQ then ME: SA(T)D In-Loop De- � ASIPs are less efficient when targeting rather many Encoding hot spots RD CAVLC Engine ◦ All accelerators are provided statically (i.e. they require area else and consume power) even though typically just a few of DCT / IDCT / them are needed at a certain time IPRED HT / Q IHT / IQ � ASIPs are less efficient when targeting unknown hot spots � Iterates on MacroBlocks (MBs, i.e. 16x16 pixels) ◦ Even for a given application it is not necessarily clear, which � 2 different MB-types parts of it are ‘hot’ during execution as this may depend on � different computational paths with different input data (as demonstrated in the following) computational requirements ◦ I-MB (spatial prediction) ◦ P-MB (temporal prediction) - 19 - - 20 - L. Bauer, CES, KIT, 2014 L. Bauer, CES, KIT, 2014
Example: Distribution of I-MBs in Example: Football Video Medium-to-VeryHigh Motions I-MB Rafting Rugby Football 100% P-MB 90% 80% Scene with Very INTRA MB in a Frame [%] High Motion 70% Note: 16x16 60% MBs can be 50% Scene with Medium- partitioned to-Slow Motion 40% into sub- 30% MBs, e.g. 16x8, 8x8, 20% Scene with High-to- down to 4x4 10% Medium Motion 0% 1 21 41 61 81 101 121 141 161 181 201 221 241 261 281 301 Frame Number - 21 - - 22 - L. Bauer, CES, KIT, 2014 L. Bauer, CES, KIT, 2014 Potentials of RAS Conclusion: Demand for Adaptivity , MIPS/area, … � Even for a well known application it is not always clear “ Hardware solution ” which parts will be ‘hot’ (e.g. according computational ASIC: complexity) and thus benefit from accelerators Reconfigurable - Non-programmable, ◦ This depends on changing input data and control flow and Adaptive - highly specialized Efficiency: MIPS/$, MHz/mW � Even more complex: multi-tasking scenarios Systems ◦ Not clear, which applications will execute at the same time ◦ Not clear, which applications will execute at all (user can ASIP: Application tion download new applications) specific instruction ion ◦ This significantly increases the number of potential hot spots set processor r � hardly possible to address this with an ASIP � Systems that fulfill the demand for adaptivity may lead to “ Software GPP: General pur- ◦ Better performance (absolute criteria) solution ” pose processor ◦ Higher Efficiency (relative criteria e.g. performance per area etc.) ◦ Lower cost (no redesign if specifications change, no overdesign to cover all scenarios) Flexibility, 1/time-to-market, … - 23 - - 24 - L. Bauer, CES, KIT, 2014 L. Bauer, CES, KIT, 2014
Recommend
More recommend