Institut für Technische Informatik Chair for Embedded Systems - Prof. Dr. J. Henkel Vorlesung im SS 2014 Lars Bauer, Jörg Henkel - 1 -
� Lecture time: Mi., 15.45 - 17.15 Bld. 50.34, HS -102 � Homepage: http://ces.itec.kit.edu/teaching/ you can also find the slides from previous years there � Slides Login: Login: “student” Passwd : “CES - Student” � Contact: lars.bauer@kit.edu Haid-und-Neu-Str. 7 Bld. 07.21, Rm. 316.2 (2 nd Floor!!) - 2 - L. Bauer, KIT, 2014
Info-Bau TFI Mensa - 3 - L. Bauer, KIT, 2014
� Simply let me know / interrupt me - 4 - L. Bauer, KIT, 2014
� CS Diploma: ◦ Vertiefungsfach 8: Entwurf eingebetteter Systeme und Rechnerarchitekturen � CS Master: ◦ Modul: Rekonfigurierbare und Adaptive Systeme [IN4INRAS] (3 ECTS) ◦ Modul: Eingebettete Systeme: Weiterführende Themen [IN4INESWTN] (10 ECTS) ◦ Modul: Advanced Computer Architecture [IN4INACA] (10 ECTS) � Other Study Courses (e.g. EE): ask individually - 5 - L. Bauer, KIT, 2014
� Lectures � Seminars ◦ RAS ◦ Rekonfigurierbare Eingebettete Systeme ◦ Low Power Design ◦ Dependability in Embedded ◦ Embedded Systems for Systems Multimedia and Image Processing ◦ Distributed Decision Making � Labs ◦ Stereo Video Processing ◦ Entwurf eingebetteter ◦ Multicore for Multimedia Systeme Processors ◦ Entwurf von eingebetteten ◦ Sensor Networks applikationsspezifischen Prozessoren ◦ Low Power Design and Embedded Systems More Info: ces.itec.kit.edu/teaching - 6 - L. Bauer, KIT, 2014
� Note: Info on homepage is typically not up-to-date ◦ If you are interested in a particular topic: better ask individually � There are nearly always SADABAMA theses or Hiwi jobs available in the scope of reconfigurable systems � Main projects: ◦ i -Core: invasive Core ◦ OTERA: Online Test Strategies for Reliable Reconfigurable Architectures ◦ Compilers for reconfigurable architectures � Topics: ◦ Algorithms for Runtime System, Operating System, … ◦ Toolchain , Compiler, Synthesis, … ◦ Architecture, Hardware Prototype, Simulation Environment, … - 7 - L. Bauer, KIT, 2014
� Rechnerstrukturen ◦ Prerequisites � Eingebettete Systeme ◦ ES1: Optimierung und Synthese Eingebetteter Systeme ◦ ES2: Entwurf und Architekturen für Eingebettete Systeme ◦ The core topics (e.g. details about FPGA architectures) will be recapitulated in the scope of this lecture ◦ Thus, the contents of ES1 and ES2 are beneficial but not required in full detail - 8 - L. Bauer, KIT, 2014
� “Fine - and Coarse- Grain Reconfigurable Computing”, S. Vassiliadis and D. Soudris, Springer 2007. � “ Runtime adaptive extensible embedded processors – a survey ”, H. P. Huynh and T. Mitra, SAMOS, pp. 215 – 225, 2009. � “Reconfigurable computing: architectures and design methods”, T.J. Todman et al., IEE Proceedings Computers & Digital Techniques, vol. 152, no. 2, pp. 193-207, 2005. � “Reconfigurable Instruction Set Processors from a Hardware/Software Perspective”, F. Barat et al., IEEE Transactions on Software Engineering, vol. 28, no. 9, pp. 847-862, 2002. - 9 - L. Bauer, KIT, 2014
Institut für Technische Informatik Chair for Embedded Systems - Prof. Dr. J. Henkel 1. Introduction and Motivation: The Demand for Adaptivity - 10 -
� Typical approach: ◦ Static analysis of system requirements (e.g. com- putational hot spots) ◦ Build optimized system � Today’s requirements: ◦ Increasing complexity ◦ More functionality � Problem: ◦ Statically chosen design point has to match all requirements ◦ Typically inefficient for individual components (e.g. tasks or hot spots) - 11 - - 11 - - 11 11 11 11 11 11 11 11 11 11 - 11 11 11 11 11 11 11 L. Bauer, KIT, 2014
� A rather small part of the application that corresponds to a rather large part of the execution time ◦ Also called ‘Computational Kernel’ ◦ Typically: inner loop ◦ 80/20 rule (90/10 rule etc.) 20 80 20 80 Code Size Execution Time - 12 - L. Bauer, KIT, 2014
Efficiency: Mips/$, MHz/mW, Mips/area , … “ Hardware solution ” ASIC: - Non-programmable, - highly specialized - Instruction set extension - parameterization - inclusion/exclusion of ASIP: Application functional blocks specific instruction set processor “ Software GPP: General pur- solution ” pose processor Flexibility, 1/time-to- market, … src: Henkel, ESII - 13 - L. Bauer, KIT, 2014 L. Bauer, KIT, 2014
� Video En-/Decoding Remote Mic Phone CVBS CVHS Control � Audio En-/Decoding IR AUDIO INPUT VIDEO INPUT � Data (De-)Multi- Digital Video Input plexing AUDIO ENCODER VIDEO ENCODER H.245 CONTROL G.723 H.263 / H.264 � Control protocol MULTIPLEXER H.223 DE-MULTIPLEXER H.223 AUDIO DECODER VIDEO DECODER H.245 CONTROL G.723 H.263 / H.264 MODEM PSTN AUDIO OUPUT VIDEO OUPUT INTERFACE src: cityrockz.com Display Line Phone Speakers Screen - 14 - L. Bauer, KIT, 2014
12 10 Processing Time [%] 8 6 4 2 0 E E V L L C F L C 6 4 C C B L d M Q P t t D C C A S F E t A M M d C s d c 0 F F F C B Q D s M I L _ 1 I o P _ n _ N P P R G M M P _ _ _ A L M _ F U E B T B C E _ o S A _ n _ D Q C L p C C E d _ C D V _ L H D 5 M M _ _ P Q B o F 3 U _ _ _ _ Q e Q M M A A L 4 2 I S T M A 3 4 T c r c P t 2 B V 2 T C e e D P e 2 3 C I H H A 2 V D g I A R H C C Processing Functions - 15 - L. Bauer, KIT, 2014
� Design accele- rators for the hot spots � Connect them as Execution Units, Register Files, and Interfaces src: Tensilica, Inc.: “ Xtensa LC Product Brief” - 16 - L. Bauer, KIT, 2014
� Provides noticeably improved performance after targe- ting the ma- jor hot spots I_ME � However, performance TQ_PL still not suf- ficient to achieve real- MC_L time require- ments ◦ More hot spots need to be accelerated src: Tensilica, Inc.: “ Xtensa LC Product Brief” - 17 - L. Bauer, KIT, 2014
CAVLC � Scalability CABAC problem when rather many hot- pots exist ◦ Note: still not I_ME all relevant hot spots are covered TQ_PL Dec_ MC_L MB H245_C MAC FM V34 mod S_ME src: Tensilica, Inc.: “ Xtensa LC Product Brief” - 18 - L. Bauer, KIT, 2014
� ASIPs perform well when 1. rather few hot spots need to be accelerated and 2. those hot spots are well known in advance � ASIPs are less efficient when targeting rather many hot spots ◦ All accelerators are provided statically (i.e. they require area and consume power) even though typically just a few of them are needed at a certain time � ASIPs are less efficient when targeting unknown hot spots ◦ Even for a given application it is not necessarily clear, which parts of it are ‘hot’ during execution as this may depend on input data (as demonstrated in the following) - 19 - L. Bauer, KIT, 2014
MB Encoding Loop � MB-Type Decision (I or P) � Mode Decision (for I or P) DCT / IDCT / If MB_Type = P_MB MC Loop Over MB Loop Over MB Loop Over MB Q IQ Blocking Filter then In-Loop De- ME: SA(T)D Encoding RD CAVLC Engine else DCT / IDCT / IPRED HT / Q IHT / IQ � Iterates on MacroBlocks (M MBs, i.e. 16x16 pixels) � 2 different MB-types � different computational paths with different computational requirements ◦ I-MB (spatial prediction) ◦ P-MB (temporal prediction) - 20 - L. Bauer, KIT, 2014
I-MB P-MB Note: 16x16 MBs can be partitioned into sub- MBs, e.g. 16x8, 8x8, down to 4x4 - 21 - L. Bauer, KIT, 2014
Rugby Rafting Football 100% 90% 80% Scene with Very INTRA MB in a Frame [%] High Motion 70% 60% 50% Scene with Medium- to-Slow Motion 40% 30% 20% Scene with High-to- Medium Motion 10% 0% 1 1 21 21 41 41 61 61 81 81 101 121 141 161 181 201 221 241 261 281 301 Frame Number - 22 - L. Bauer, KIT, 2014 L. Bauer, KIT, 2014
� Even for a well known application it is not always clear which parts will be ‘hot’ (e.g. according computational complexity) and thus benefit from accelerators ◦ This depends on changing input data and control flow � Even more complex: multi-tasking scenarios ◦ Not clear, which applications will execute at the same time ◦ Not clear, which applications will execute at all (user can download new applications) ◦ This significantly increases the number of potential hot spots � hardly possible to address this with an ASIP � Systems that fulfill the demand for adaptivity may lead to ◦ Better performance (absolute criteria) ◦ Higher Efficiency (relative criteria e.g. performance per area etc.) ◦ Lower cost (no redesign if specifications change, no overdesign to cover all scenarios) - 23 - L. Bauer, KIT, 2014
, MIPS/area , … “ Hardware solution ” ASIC: Reconfigurable - Non-programmable, and Adaptive - highly specialized Efficiency: MIPS/$, MHz/mW Systems ASIP: Application tion specific instruction ion set processor r “ Software GPP: General pur- solution ” pose processor E F Flexibility, 1/time-to- market, … - 24 - L. Bauer, KIT, 2014 L. Bauer, KIT, 2014
Recommend
More recommend