hardware software codesign
play

Hardware-Software Codesign 7. Design Space Exploration Lothar - PowerPoint PPT Presentation

Hardware-Software Codesign 7. Design Space Exploration Lothar Thiele Computer Engineering Swiss Federal 7 - 1 Institute of Technology and Networks Laboratory System Design specification system synthesis estimation SW-compilation


  1. Hardware-Software Codesign 7. Design Space Exploration Lothar Thiele Computer Engineering Swiss Federal 7 - 1 Institute of Technology and Networks Laboratory

  2. System Design specification system synthesis estimation SW-compilation instruction set HW-synthesis intellectual intellectual prop. code prop. block machine code net lists Computer Engineering Swiss Federal 7 - 2 Institute of Technology and Networks Laboratory

  3. Optimization-Analysis Cycle evaluation model (e.g., simulation, analytic) cost allocation p0 p3 p1 p2 throughput objective p0 p3 p1 p2 decision binding CPU0 CPU1 CPU2 CPU3 bus vector f(X) vector X p0 p3 p1 p2 schedule CPU0 CPU1 CPU2 CPU3 delay bus CPU0 CPU1 CPU2 CPU3 bus memory optimization algorithm make decisions only by knowing (and comparing) f Computer Engineering Swiss Federal 7 - 3 Institute of Technology and Networks Laboratory

  4. Three Examples Computer Engineering Swiss Federal 7 - 4 Institute of Technology and Networks Laboratory

  5. Example 1: Remember … data flow (all possible) application data flow architecture graph mapping relations graph G P ( , ,E P ) G A ( , ,E A ) E M Computer Engineering Swiss Federal 7 - 5 Institute of Technology and Networks Laboratory

  6. Example 1: Remember … Computer Engineering Swiss Federal 7 - 6 Institute of Technology and Networks Laboratory

  7. Example 1: Simple Mapping Model search algorithm solutions analysis of individual solutions decode allocation allocation EA  selection binding decode binding  recombination  mutation scheduling binding β allocation α scheduling τ “ chromosome ” = design point encoded allocation + (implementation) binding fitness evaluation fitness user constraints Computer Engineering Swiss Federal 7 - 7 Institute of Technology and Networks Laboratory

  8. Challenges of EAs in DSE encoding allocation+binding  simple encoding e.g., one bit per resource, one variable per binding • easy to implement • … however, it may lead to (many) infeasible partitioning solutions  encoding + repair e.g. simple encoding and repair for allocation s.t. for each v p ∈ V P there exists at least one v a ∈ α with ( v p ,v a ) ∈ E m • reduces number of infeasible partitioning solutions (“smart”) generation of initial population (“smart”) neighborhood operations, e.g., mutation, crossover Computer Engineering Swiss Federal 7 - 8 Institute of Technology and Networks Laboratory

  9. Example 2: Network Processors - Definition Typically, network processors serve as bridge between the network and the source/sink audio/video device (or set of devices) implementation: high- performance, programmable on-/off-chip links Tile 7 Tile 6 Network core1 mem1 devices optimized for (real- Proc. Tile 4 Tile 5 time) network packet internal shared bus processing Tile 3 Tile 2 core2 mem2 I/Os Tile 0 Tile 1 features: complex packet processing capabilities at high line speeds (routing; forwarding; de-/encryption; de- /compression; …) and means to guarantee quality-of-service Computer Engineering Swiss Federal 7 - 9 Institute of Technology and Networks Laboratory

  10. Network Processor Architecture (*) Network processor heterogeneous hardware/software architecture: available processing units  … are described in resource set R = {ARM9, PowerPC, DSP, MEngine, Classifier, Cipher, LookUp, CheckSum}  … have a relative implementation cost cost(r)≥0, r ∈ R  ... and are selected for a specific architecture during the allocation step • with alloc(r)=1 if a resource is selected and 0 otherwise (*) Note: example from Simon Künzli: Efficient Design Space Exploration for Embedded Computer Engineering Swiss Federal Systems , Shaker Verlag, ISBN 3-8322-5246-0, 2006. 7 - 10 Institute of Technology and Networks Laboratory

  11. Network Processor Task Model application structure: set of streams s ∈ S and set of tasks t ∈ T  each stream includes an ordered sequence of tasks V(s)=[t 0 ,...,t n ] example: S={RTSend,NRTDecrypt,NRTEncrypt,RTRecv,NRTForward} Computer Engineering Swiss Federal 7 - 11 Institute of Technology and Networks Laboratory

  12. Problem: Optimal Design of Network Processor mappings M ⊆ T × R: all possible bindings of tasks c  i.e., if (t,r) ∈ M , then task t could be executed on resource r request w(r,t ) ≥ 0  i.e., execution of one packet in t would use w computing units of r resource allocation cost c(r) ≥ 0  binding Z of tasks to resources Z ⊆ M (leading to actual implementation)  subset of mappings M s.t. every task t ∈ T is bound to exactly one allocated resource r ∈ R and alloc(r) = 1 and r = bind(t) Computer Engineering Swiss Federal 7 - 12 Institute of Technology and Networks Laboratory

  13. NP Design Constraints the design of network processors typically faces conflicting goals: delay constraints  e.g., maximal time a packet is processed within NP throughput maximization  e.g., maximum throughput of NP (packets per second) cost minimization  implementation with small amount of resources (e.g., processing units, memory, and communication networks) … and conflicting usage scenarios  usually, a packet processor is used in several different systems (e.g., router or consumer multimedia processing device) and might have different implementations with different throughput/delay requirements Computer Engineering Swiss Federal 7 - 13 Institute of Technology and Networks Laboratory

  14. NP Design Space Exploration issues to be considered during system-level design (and synthesis): allocation  determine hardware components of the network processor binding  for each process of the software application choose an allocated hardware unit which executes it scheduling  for the set of tasks mapped onto a specific resource choose scheduling policy/parameters – from available run-time environment, e.g., a fixed priority for each stream s : prio(s)>0 Computer Engineering Swiss Federal 7 - 14 Institute of Technology and Networks Laboratory

  15. Design Space Exploration Flow alloc(r) = 0/1 r = bind(t) prio(s)>0 Computer Engineering Swiss Federal 7 - 15 Institute of Technology and Networks Laboratory

  16. Tools and a Small Demo Computer Engineering Swiss Federal 7 - 16 Institute of Technology and Networks Laboratory

  17. … Some Results cost performance of performance of encryption/decryption RT voice processing Computer Engineering Swiss Federal 7 - 17 Institute of Technology and Networks Laboratory

  18. Example 3: Wave Field Synthesis What is wave field synthesis (WFS)? high quality spatial sound reproduction system for huge listening areas 32 sound sources and 300 loudspeakers for medium sized reproduction rooms Mixing console Microphones Reproduction WFS room processor g Recording room Computer Engineering Swiss Federal 7 - 18 Institute of Technology and Networks Laboratory

  19. System Specification: WFS Application Parallel application modeled as Kahn Process Network structure: XML functionality: ANSI C & DOL(*) API control source convolution loudspeaker (*) DOL – distributed operation layer: http://www.tik.ee.ethz.ch/~shapes/dol.html Computer Engineering Swiss Federal 7 - 19 Institute of Technology and Networks Laboratory

  20. System Specification: Architecture Architecture is modeled at abstract level in XML format Modeled elements:  processors, buses, memories  communication paths between these elements  … parameters are included in the model S S C S R E G DXM AHB 2 AHB 1 R DS IS P BUS C BUS DS P DDM DMA R DM R IS C AHB 0 Computer Engineering Swiss Federal 7 - 20 Institute of Technology and Networks Laboratory

  21. Application-to-Architecture Mapping microphones S S C S R E G DXM convolution AHB 2 AHB 1 R DS IS P BUS C BUS DS P DDM DMA R DM R IS C sum AHB 0 loudspeakers parallel application heterogeneous architecture design space exploration (performance analysis & mapping optimization) software synthesis Computer Engineering Swiss Federal 7 - 21 Institute of Technology and Networks Laboratory

  22. Simple Analysis Model number of activations runtime of process p processor c with of process p on processor c worst total runtime max processor load bandwidth of communication link communication request communication link g with worst load from channel s max bus load Computer Engineering Swiss Federal 7 - 22 Institute of Technology and Networks Laboratory

  23. Computer Engineering Swiss Federal 7 - 23 Institute of Technology and Networks Laboratory

  24. Computer Engineering Swiss Federal 7 - 24 Institute of Technology and Networks Laboratory

  25. Where Are Data Obtained From? Static parameters: bandwidth of buses t(g) Functional simulation: number of activations for each process n(p) , amount of data for each channel b(s) Instruction-set simulation: runtime of each process on different processors r(p,c) by using benchmark mappings AR M mAgic AR M mAgic AHB 0 AHB 0 Computer Engineering Swiss Federal 7 - 25 Institute of Technology and Networks Laboratory

Recommend


More recommend