Hardware-Software Codesign 7. Design Space Exploration Lothar Thiele Computer Engineering Swiss Federal 7 - 1 Institute of Technology and Networks Laboratory
System Design specification system synthesis estimation SW-compilation instruction set HW-synthesis intellectual intellectual prop. code prop. block machine code net lists Computer Engineering Swiss Federal 7 - 2 Institute of Technology and Networks Laboratory
Optimization-Analysis Cycle evaluation model (e.g., simulation, analytic) cost allocation p0 p3 p1 p2 throughput objective p0 p3 p1 p2 decision binding CPU0 CPU1 CPU2 CPU3 bus vector f(X) vector X p0 p3 p1 p2 schedule CPU0 CPU1 CPU2 CPU3 delay bus CPU0 CPU1 CPU2 CPU3 bus memory optimization algorithm make decisions only by knowing (and comparing) f Computer Engineering Swiss Federal 7 - 3 Institute of Technology and Networks Laboratory
Three Examples Computer Engineering Swiss Federal 7 - 4 Institute of Technology and Networks Laboratory
Example 1: Remember … data flow (all possible) application data flow architecture graph mapping relations graph G P ( , ,E P ) G A ( , ,E A ) E M Computer Engineering Swiss Federal 7 - 5 Institute of Technology and Networks Laboratory
Example 1: Remember … Computer Engineering Swiss Federal 7 - 6 Institute of Technology and Networks Laboratory
Example 1: Simple Mapping Model search algorithm solutions analysis of individual solutions decode allocation allocation EA selection binding decode binding recombination mutation scheduling binding β allocation α scheduling τ “ chromosome ” = design point encoded allocation + (implementation) binding fitness evaluation fitness user constraints Computer Engineering Swiss Federal 7 - 7 Institute of Technology and Networks Laboratory
Challenges of EAs in DSE encoding allocation+binding simple encoding e.g., one bit per resource, one variable per binding • easy to implement • … however, it may lead to (many) infeasible partitioning solutions encoding + repair e.g. simple encoding and repair for allocation s.t. for each v p ∈ V P there exists at least one v a ∈ α with ( v p ,v a ) ∈ E m • reduces number of infeasible partitioning solutions (“smart”) generation of initial population (“smart”) neighborhood operations, e.g., mutation, crossover Computer Engineering Swiss Federal 7 - 8 Institute of Technology and Networks Laboratory
Example 2: Network Processors - Definition Typically, network processors serve as bridge between the network and the source/sink audio/video device (or set of devices) implementation: high- performance, programmable on-/off-chip links Tile 7 Tile 6 Network core1 mem1 devices optimized for (real- Proc. Tile 4 Tile 5 time) network packet internal shared bus processing Tile 3 Tile 2 core2 mem2 I/Os Tile 0 Tile 1 features: complex packet processing capabilities at high line speeds (routing; forwarding; de-/encryption; de- /compression; …) and means to guarantee quality-of-service Computer Engineering Swiss Federal 7 - 9 Institute of Technology and Networks Laboratory
Network Processor Architecture (*) Network processor heterogeneous hardware/software architecture: available processing units … are described in resource set R = {ARM9, PowerPC, DSP, MEngine, Classifier, Cipher, LookUp, CheckSum} … have a relative implementation cost cost(r)≥0, r ∈ R ... and are selected for a specific architecture during the allocation step • with alloc(r)=1 if a resource is selected and 0 otherwise (*) Note: example from Simon Künzli: Efficient Design Space Exploration for Embedded Computer Engineering Swiss Federal Systems , Shaker Verlag, ISBN 3-8322-5246-0, 2006. 7 - 10 Institute of Technology and Networks Laboratory
Network Processor Task Model application structure: set of streams s ∈ S and set of tasks t ∈ T each stream includes an ordered sequence of tasks V(s)=[t 0 ,...,t n ] example: S={RTSend,NRTDecrypt,NRTEncrypt,RTRecv,NRTForward} Computer Engineering Swiss Federal 7 - 11 Institute of Technology and Networks Laboratory
Problem: Optimal Design of Network Processor mappings M ⊆ T × R: all possible bindings of tasks c i.e., if (t,r) ∈ M , then task t could be executed on resource r request w(r,t ) ≥ 0 i.e., execution of one packet in t would use w computing units of r resource allocation cost c(r) ≥ 0 binding Z of tasks to resources Z ⊆ M (leading to actual implementation) subset of mappings M s.t. every task t ∈ T is bound to exactly one allocated resource r ∈ R and alloc(r) = 1 and r = bind(t) Computer Engineering Swiss Federal 7 - 12 Institute of Technology and Networks Laboratory
NP Design Constraints the design of network processors typically faces conflicting goals: delay constraints e.g., maximal time a packet is processed within NP throughput maximization e.g., maximum throughput of NP (packets per second) cost minimization implementation with small amount of resources (e.g., processing units, memory, and communication networks) … and conflicting usage scenarios usually, a packet processor is used in several different systems (e.g., router or consumer multimedia processing device) and might have different implementations with different throughput/delay requirements Computer Engineering Swiss Federal 7 - 13 Institute of Technology and Networks Laboratory
NP Design Space Exploration issues to be considered during system-level design (and synthesis): allocation determine hardware components of the network processor binding for each process of the software application choose an allocated hardware unit which executes it scheduling for the set of tasks mapped onto a specific resource choose scheduling policy/parameters – from available run-time environment, e.g., a fixed priority for each stream s : prio(s)>0 Computer Engineering Swiss Federal 7 - 14 Institute of Technology and Networks Laboratory
Design Space Exploration Flow alloc(r) = 0/1 r = bind(t) prio(s)>0 Computer Engineering Swiss Federal 7 - 15 Institute of Technology and Networks Laboratory
Tools and a Small Demo Computer Engineering Swiss Federal 7 - 16 Institute of Technology and Networks Laboratory
… Some Results cost performance of performance of encryption/decryption RT voice processing Computer Engineering Swiss Federal 7 - 17 Institute of Technology and Networks Laboratory
Example 3: Wave Field Synthesis What is wave field synthesis (WFS)? high quality spatial sound reproduction system for huge listening areas 32 sound sources and 300 loudspeakers for medium sized reproduction rooms Mixing console Microphones Reproduction WFS room processor g Recording room Computer Engineering Swiss Federal 7 - 18 Institute of Technology and Networks Laboratory
System Specification: WFS Application Parallel application modeled as Kahn Process Network structure: XML functionality: ANSI C & DOL(*) API control source convolution loudspeaker (*) DOL – distributed operation layer: http://www.tik.ee.ethz.ch/~shapes/dol.html Computer Engineering Swiss Federal 7 - 19 Institute of Technology and Networks Laboratory
System Specification: Architecture Architecture is modeled at abstract level in XML format Modeled elements: processors, buses, memories communication paths between these elements … parameters are included in the model S S C S R E G DXM AHB 2 AHB 1 R DS IS P BUS C BUS DS P DDM DMA R DM R IS C AHB 0 Computer Engineering Swiss Federal 7 - 20 Institute of Technology and Networks Laboratory
Application-to-Architecture Mapping microphones S S C S R E G DXM convolution AHB 2 AHB 1 R DS IS P BUS C BUS DS P DDM DMA R DM R IS C sum AHB 0 loudspeakers parallel application heterogeneous architecture design space exploration (performance analysis & mapping optimization) software synthesis Computer Engineering Swiss Federal 7 - 21 Institute of Technology and Networks Laboratory
Simple Analysis Model number of activations runtime of process p processor c with of process p on processor c worst total runtime max processor load bandwidth of communication link communication request communication link g with worst load from channel s max bus load Computer Engineering Swiss Federal 7 - 22 Institute of Technology and Networks Laboratory
Computer Engineering Swiss Federal 7 - 23 Institute of Technology and Networks Laboratory
Computer Engineering Swiss Federal 7 - 24 Institute of Technology and Networks Laboratory
Where Are Data Obtained From? Static parameters: bandwidth of buses t(g) Functional simulation: number of activations for each process n(p) , amount of data for each channel b(s) Instruction-set simulation: runtime of each process on different processors r(p,c) by using benchmark mappings AR M mAgic AR M mAgic AHB 0 AHB 0 Computer Engineering Swiss Federal 7 - 25 Institute of Technology and Networks Laboratory
Recommend
More recommend