philips
play

Philips Parallel Programming Models for Heterogeneous MPSoCs - PowerPoint PPT Presentation

Philips Parallel Programming Models for Heterogeneous MPSoCs Pieter van der Wolf Philips Research MPSoC05 July 11-15, 2005 Outline Introduction Task Transaction Level interface: TTL Abstract interface for streaming in MPSoCs


  1. Philips Parallel Programming Models for Heterogeneous MPSoCs Pieter van der Wolf Philips Research MPSoC’05 July 11-15, 2005

  2. Outline • Introduction • Task Transaction Level interface: TTL – Abstract interface for streaming in MPSoCs • Programming TTL multiprocessors – Constraint-driven code transformations • Design cases – Sea-of-DSP – Smart Camera – Cake / Wasabi • Conclusion MPSoC’05 2 Philips Confidential

  3. MPSoC Design • Need for MPSoCs: – Implement advanced functionalities – Low cost – Power efficient – Flexible • Increasing complexity of MPSoCs: – Increasing design efforts Gates/cm 2 – SW effort overtaking HW effort Moore’s Law Log Scale (59% CAGR) – Increasing time-to-market Design Productivity • Productivity increase through: (20-25% CAGR) – Raise level of abstraction – Structured design Software Productivity (8-10% CAGR) – IP reuse 0.35µ 0.25µ 0.18µ 0.15µ 0.12µ 0.1µ – EDA support MPSoC’05 3 Philips Confidential

  4. Example TV application audio Audio decoding decoding Audio out 1 Audio in 1 AC-3 Audio decoding Audio out 2 Audio in 2 AC-3 Spatial video pixel Picture rate up-conversion scaling Analog processing NR ME, MC DEINT UPC VS, HS Sharpness improvement Video Video LTI out PEAK Picture rate up-conversion Spatial PCOMP DA CTI scaling MPEG MPEG ME, MC DEINT UPC VS, HS bit stream Sharpness improvement Spt . Scal. video VCR LTI VS, HS PEAK PCOMP DA decoding CTI Many task graphs like this have to be supported MPSoC’05 4 Philips Confidential

  5. Example MPSoC Hardware • Philips's advanced set-top box and digital TV SoC (Viper2) • 0.13 µ m MBS • 50 M transistors VMPG TM3260 • 100 clock domains • > 60 IP blocks TDCS VIP MIPS3960 TM3260 QVCP5L MSP MDCS QVCP2L MPSoC’05 5 Philips Confidential

  6. Example MPSoC Software Stack Applications Middleware Kernel: pSOS, WinCE, JavaOS JavaTV, TVPAK, OpenTV, MHP/Java, proprietary ... Streaming Components Streaming Components Streaming Infrastructure Streaming Infrastructure Nexperia Hardware Hardware Nexperia MPSoC’05 6 Philips Confidential

  7. MPSoC Integration • Current practice – Ad hoc approaches Computation – Low-level interfaces • Examples IP Module – Synchronization via low-level primitives • Interrupts, MMIO, semaphores Communication – Data access services partly in IP • Buffering, DMA control, address generation DTL, AXI, … • Consequence – Part of IP is specific for underlying communication infrastructure • IP just wants the next pixel or block or … • But also knows about burst transfers, interrupts, semaphores, …. MPSoC’05 7 Philips Confidential

  8. MPSoC Integration • Low-level interfaces – Hardware / software IP designer must deal with low-level issues • Increases design effort • Same problems solved again and again: error prone – IP becomes specific for particular use • Hampers reusability – IP integrator must deal with low-level issues • Increases design effort – Infrastructures cannot evolve • Changes in infrastructure affect hardware / software IP MPSoC’05 8 Philips Confidential

  9. Interface Centric Design: TTL • Aim: Improve MPSoC integration • Means: Raise level of abstraction • TTL Task Transaction Level interface: – Parallel application models Task • Executable specifications – Platform interface Task Task • Integration of HW and SW tasks TTL Mapping • Mapping technology – Structured design & programming T A S K S – Based on TTL TTL Platform Infrastructure MPSoC’05 9 Philips Confidential

  10. TTL Requirements • Well-defined semantics for application modeling – Focus: stream processing applications – Make concurrency and communication explicit • High-level interface Computation – Make high-level services available IP module • Inter-task communication • Multi-tasking IP Module – Easy to use for IP development TTL – Facilitate reuse and integration of IP Communication Shell – Provide implementation freedom • Allow efficient and cheap implementations – E.g. supporting fine grain synchronization for on-chip memory • Support integration of hardware and software tasks MPSoC’05 10 Philips Confidential

  11. TTL in Example Architecture • Platform interface for integration of HW and SW tasks – Enable communication in heterogeneous MPSoCs Task 1 Task 2 TTL Task 3 SW-API SW Shell ASP TTL CPU HW-interface HW Shell DTL, AHB, AXI, OCP Interconnect MPSoC’05 11 Philips Confidential

  12. TTL Inter-Task Communication Logical model and terminology private variable full token with value empty token channel port task TTL interface • Communicating tasks are organized as task graph • Tasks communicate by invoking TTL interface functions on their ports • Uni-directional channels with reliable ordered communication • Arbitrary data types, but single type per channel • Support for multi-cast MPSoC’05 12 Philips Confidential

  13. Example: Message Passing Interface Producer side • write(port, data, …) – Write data into channel connected to port Consumer side • data = read(port, …) – Read data from channel connected to port • Abstract interface for tasks • Right interface ? – Appropriate for modeling application ? – Appropriate for implementation on architecture ? MPSoC’05 13 Philips Confidential

  14. TTL Interface Types • Different needs for communication arising from: – Different applications • In-order – out-of-order – Different implementation styles • Hardware – software • Shared memory – message passing • Support set of interface types – Each interface type offers narrow interface • Easy to use • Simple to implement – Each interface type supports particular communication style – Offer multiple interface types in one framework – Based on single model for interoperability MPSoC’05 14 Philips Confidential

  15. TTL Interface Types • TTL offers a number of different interface types • Allow selection of interface type per port of task • Enable interoperability by allowing mix & match T1 T2 T3 T4 T5 T6 T7 MPSoC’05 15 Philips Confidential

  16. TTL Interface Types Acronym Full name CB Combined Blocking RB Relative Blocking RN Relative Non-blocking DBI Direct Blocking In-order DNI Direct Non-blocking In-order DBO Direct Blocking Out-of-order DNO Direct Non-blocking Out-of-order MPSoC’05 16 Philips Confidential

  17. Interface Type CB Producer side • write(port, vector, size) – Write vector of size values into channel Consumer side • read(port, vector, size) – Read vector of size values from channel • Most abstract TTL interface type • Blocking semantics • Combined synchronization and data transfer • Vector operations • Based on earlier work on YAPI for KPN style modeling MPSoC’05 17 Philips Confidential

  18. Pros / Cons Interface Type CB + Easy to use + Reusable tasks – Copying overhead if private variables not in local buffers – Smart compiler may help in some cases – If local buffers: – Large tokens / vectors � large local buffers – Small tokens / vectors � large synchronization overhead Task 1 Task 2 TTL SW Shell Mem ASP TTL CPU HW Shell 1 4 2 3 Interconnect MPSoC’05 18 Philips Confidential

  19. Separate Synchronization and Data Transfer Producer Consumer acquireRoom (2) acquireData (2) store/dereference load/dereference releaseData (2) releaseRoom (2) MPSoC’05 19 Philips Confidential

  20. Interface Types RB and RN Producer side • reAcquireRoom(port, count) (RB) • tryReAcquireRoom(port, count) (RN) – Acquire count empty tokens, blocking (RB) / non-blocking (RN) • store(port, offset, vector, size) – Store vector of size values into the tokens with offset..offset+size-1 to the oldest acquired token • releaseData(port, count) – Release count oldest acquired tokens as full tokens • Separate synchronization and data transfer • Vector operations • Re-acquire operations do not change state of the channel MPSoC’05 20 Philips Confidential

  21. Pros / Cons Interface Types RB / RN + Coarse grain synchronization with fine grain data transfer – Low synchronization overhead with small local buffers + Out-of-order data accesses – Reduce cost of private variables + Load only subset of tokens from channel – Reduce cost of data transfers – Less abstract than CB – Increases programming effort – Makes tasks less reusable – Inefficiencies upon data transfers – Function call, access to channel admin, address calculations – Copying may still occur MPSoC’05 21 Philips Confidential

  22. Interface Types DBI and DNI Producer side • acquireRoom(port, &token) (DBI) • tryAcquireRoom(port, &token) (DNI) – Acquire empty token, blocking (DBI) / non-blocking (DNI) • token->field = value; – Assign value to (part of) token • releaseData(port) – Release oldest acquired token as full token • Separate synchronization and data transfer • Direct access to data via token references (pointers) • Scalar operations only • Tokens are released in same order as they are acquired MPSoC’05 22 Philips Confidential

Recommend


More recommend