hipeac 11
play

HiPEAC11 Heraklion - Crete DDM-VM c : The Data-Driven Multithreading - PowerPoint PPT Presentation

HiPEAC11 Heraklion - Crete DDM-VM c : The Data-Driven Multithreading Virtual Machine for the Cell Processor Samer Arandi Skevos (Paraskevas) Evripidou University of Cyprus Computer Science Department 1 Outline Motivation Data


  1. HiPEAC’11 Heraklion - Crete DDM-VM c : The Data-Driven Multithreading Virtual Machine for the Cell Processor Samer Arandi Skevos (Paraskevas) Evripidou University of Cyprus Computer Science Department 1

  2. Outline  Motivation  Data Driven Multithreading  The DDM-VM c  Programming Toolchain  Evaluation  Conclusion 2

  3. Outline  Motivation  Data Driven Multithreading  The DDM-VM c  Programming Toolchain  Evaluation  Conclusion 3

  4. Motivation  The adoption of multicore architectures ushered the beginning of the “Concurrency Era” which gave rise to new challenges:  Traditional programming models do not allow for efficient utilization of the large multicore resources  Heterogeneous multicores ( motivated by a more power and area efficient design) makes this task even more complex  Multi-cores still suffers from the effects of the Memory Wall  One technique to combat the memory wall is to utilize explicitly managed on-chip local memories (scratchpads)  This offers great opportunities for optimizations but burdens the programmer with the management of the memory hierarchy 4

  5. Our Take:  Instead of extending sequential models with concurrent constructs which is mostly an ad hoc solution  re-visit alternative models that are inherently parallel, offering distributed concurrency i.e. Dataflow Our Goal:  Exploit Data-flow concurrency on commercial multicores with performance as well as or better than similar systems 5

  6. Outline  Motivation  Data Driven Multithreading  The DDM-VM c  Programming Toolchain  Evaluation  Conclusion 6

  7. Data Driven Multithreading (DDM)  Execution model that combines:  Distributed data-flow concurrency for scheduling threads  Efficient sequential execution within a thread  Decouples synchronization from computation  Non Blocking- Threads execute to completion  The core of DDM is the Thread Scheduling Unit (TSU) • Holds the meta-data of the threads (dependency graph) • Uses the graph to schedule thread dynamically at runtime based on data availability  CacheFlow: Data-Driven prefetching improves drastically the hit ratio of the cache and requires much smaller caches  RQ gives the near-future execution patterns. 7

  8. Data Driven Multithreading (DDM) - Projects  Data-Driven Network of Workstations (D2Now)  A simulated cluster of distributed machines augmented with a hardware Thread Scheduling Unit  Explored CacheFlow optimizations and showed that Data- Driven scheduling could generally improve locality  ThreadFlux (TFlux)  Developed a portable software platform that runs on a variety of commercial multi-core systems  The first full system simulation of a DDM machine  TFlux Pragmas: Data-flow specific directives  Data Driven Multithreading Virtual Machine (DDM-VM)  A virtual machine that supports DDM execution on homogeneous and heterogeneous multi-cores 8

  9. Data-Driven Multithreading Execution PC Motherboard Thread Synchronization Unit (TSU) Threads Dependency Graph Ack. Queue (AQ) Processor Synchronization L1 Cache Memory (SM) Snooping Unit Graph Memory 32 31 (GM) 0? 33 L2 Cache Ready Queue (RQ) 36 34 Thread Thread Con1 Con2 IFP DFP Memory 0031 1 0031 0033 0034 0100 3A00 0032 1 1 1 1 0032 0033 0036 0108 3A00 0033 3 3 3 3 0033 0034 0000 011C 3A00 0034 2 2 2 2 0034 0032 0033 0122 3A00 DDM PE with Hardware TSU 9

  10. Data-Driven Multithreading Execution PC Motherboard Thread Synchronization Unit (TSU) Threads Dependency Graph Ack. Queue (AQ) Processor Synchronization L1 Cache Memory (SM) Snooping Unit Graph Memory 32 31 (GM) 0? 33 L2 Cache Ready Queue (RQ) 36 34 Thread Thread Con1 Con2 IFP DFP Memory 0031 1 0031 0033 0034 0100 3A00 0032 1 1 1 1 0032 0033 0036 0108 3A00 0033 3 3 3 3 0033 0034 0000 011C 3A00 0034 2 2 2 2 0034 0032 0033 0122 3A00 The GM contains the IFP, DFP and the two consumers (Con1 and Con2). 10

  11. Data-Driven Multithreading Execution PC Motherboard Thread Synchronization Unit (TSU) Threads Dependency Graph Ack. Queue (AQ) Processor Synchronization L1 Cache Memory (SM) Snooping Unit Graph Memory 32 31 (GM) 0? 33 L2 Cache Ready Queue (RQ) 36 34 Thread Thread Con1 Con2 IFP DFP Memory 0031 1 0031 0033 0034 0100 3A00 0032 1 1 1 1 0032 0033 0036 0108 3A00 0033 3 3 3 3 0033 0034 0000 011C 3A00 0034 2 2 2 2 0034 0032 0033 0122 3A00 The SM contains the Ready Counts. One value for each loop iteration. 11

  12. Data-Driven Multithreading Execution PC Motherboard Thread Synchronization Unit (TSU) Threads Dependency Graph Ack. Queue (AQ) Processor Synchronization L1 Cache Memory (SM) Snooping Unit Graph Memory 32 31 (GM) 0? 33 L2 Cache Ready Queue (RQ) 36 34 Thread Thread Con1 Con2 IFP DFP Memory 0031 1 0031 0033 0034 0100 3A00 0032 1 1 1 1 0032 0033 0036 0108 3A00 0033 3 3 3 3 0033 0034 0000 011C 3A00 0034 2 2 2 2 0034 0032 0033 0122 3A00 The processor reads from the RQ pointers (IFP, DFP and index) of ready threads and executes them 12

  13. Data-Driven Multithreading Execution PC Motherboard Thread Synchronization Unit (TSU) Threads Dependency Graph Ack. Queue (AQ) Processor Synchronization L1 Cache Memory (SM) Snooping Unit Graph Memory 32 31 (GM) 0? 33 L2 Cache Ready Queue (RQ) 36 34 Thread Thread Con1 Con2 IFP DFP Memory 0031 1 0031 0033 0034 0100 3A00 0032 1 1 1 1 0032 0033 0036 0108 3A00 0033 3 3 3 3 0033 0034 0000 011C 3A00 0034 2 2 2 2 0034 0032 0033 0122 3A00 After executing a thread, the processor stores in the AQ information (Thread#, index and status) of the executed thread. 13

  14. Data-Driven Multithreading Execution PC Motherboard Thread Synchronization Unit (TSU) Threads Dependency Graph Ack. Queue (AQ) Processor Synchronization L1 Cache Consumers Memory (SM) Snooping Unit Graph Memory 32 31 (GM) 0? 33 L2 Cache Ready Queue (RQ) 36 34 Thread Thread Con1 Con2 IFP DFP Memory 0031 1 0031 0033 0034 0100 3A00 0032 1 1 1 1 0032 0033 0036 0108 3A00 0033 3 3 3 3 0033 0034 0000 011C 3A00 0034 2 2 2 2 0034 0032 0033 0122 3A00 The TSU determines the consumers of completed threads from the GM. 14

  15. Data-Driven Multithreading Execution PC Motherboard Thread Synchronization Unit (TSU) Threads Dependency Graph Ack. Queue (AQ) Processor Synchronization L1 Cache Memory (SM) Snooping Unit Graph Memory 32 31 (GM) 0? 33 L2 Cache Ready Queue (RQ) 36 34 Thread Thread Con1 Con2 IFP DFP Memory 0031 1 0031 0033 0034 0100 3A00 0 0032 1 1 1 1 0032 0033 0036 0108 3A00 2 0033 3 3 3 3 0033 0034 0000 011C 3A00 0034 2 2 2 2 0034 0032 0033 0122 3A00 Update SM and check if any of the consumers is ready (Ready Count = 0) 15

  16. Data-Driven Multithreading Execution PC Motherboard Thread Synchronization Unit (TSU) Threads Dependency Graph Ack. Queue (AQ) Processor Synchronization L1 Cache Memory (SM) Snooping Unit Graph Memory 32 31 (GM) 0? 33 L2 Cache Ready Queue (RQ) 36 34 Thread Thread Con1 Con2 IFP DFP Memory 0031 1 0031 0033 0034 0100 3A00 0 0032 1 1 1 1 0032 0033 0036 0108 3A00 2 0033 3 3 3 3 0033 0034 0000 011C 3A00 0034 2 2 2 2 0034 0032 0033 0122 3A00 The TSU loads in the RQ the pointers (IFP, DFP) of ready thread from the GM and index the SM . 16

  17. Outline  Motivation  Data Driven Multithreading  The DDM-VM c  Programming Toolchain  Evaluation  Conclusion 17

  18. The Data-Driven Virtual Machine (DDM-VM)  The DDM-VM is a virtual machine that supports DDM execution on homogeneous and heterogeneous multicores The DDM-VM s The DDM-VM c PPE SPE 8 SPE 1 Core 1 Core 2 Core n PPU SPU SPU DDM-VM c DDM-VM s DDM-VM s DDM-VM s DDM-VM c DDM-VM c Runtime Runtime PPE Runtime SPE Runtime SPE Runtime Runtime ... ... DDM Thread DDM Thread DDM DDM TSU Execution Execution Thread Thread TSU + + CacheFlow Execution Execution S-CachFlow Cache Hierarchy Execution LS LS Bus BUS Main Memory I/O Main Memory I/O TSU Memory TSU Memory Structures Structures Network Network Program Program Data Data Other Other Nodes Nodes 18

  19. The Data-Driven Virtual Machine (DDM-VM)  The DDM-VM is a virtual machine that supports DDM execution on homogeneous and heterogeneous multicores The DDM-VM s The DDM-VM c TSU TSU PPE SPE 8 SPE 1 Core 1 Core 2 Core n PPU SPU SPU DDM-VM c DDM-VM s DDM-VM s DDM-VM s DDM-VM c DDM-VM c Runtime Runtime PPE Runtime SPE Runtime SPE Runtime Runtime ... ... DDM Thread DDM Thread DDM DDM TSU Execution Execution Thread Thread TSU + + CacheFlow Execution Execution S-CachFlow Cache Hierarchy Execution LS LS Bus BUS Main Memory I/O Main Memory I/O TSU Memory TSU Memory Structures Structures Network Network Program Program Data Data Other Other Nodes Nodes 19

Recommend


More recommend