many core scheduling of data parallel applications using
play

Many-Core Scheduling of Data Parallel Applications using SMT Solvers - PowerPoint PPT Presentation

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions Many-Core Scheduling of Data Parallel Applications using SMT Solvers Pranav Tendulkar Peter Poplavko Ioannis Galanommatis Oded Maler Verimag, FRANCE


  1. Motivation Application Model Hardware Platform Scheduling Experiments Conclusions Many-Core Scheduling of Data Parallel Applications using SMT Solvers Pranav Tendulkar Peter Poplavko Ioannis Galanommatis Oded Maler Verimag, FRANCE August 2014 Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 1 / 26

  2. Motivation Application Model Hardware Platform Scheduling Experiments Conclusions Multi-core Processors Everywhere Tablets Laptops Space-shuttle Phones Cameras Cars Smart-TV Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 2 / 26

  3. Motivation Application Model Hardware Platform Scheduling Experiments Conclusions Context Mapping and Scheduling solutions is exponential 1e14 7 1e14 Many-core platforms 2.00 6 1.75 involve extra complexity 5 Solutions 1.50 4 factors 1.25 3 1.00 2 Explicit modeling of 0.75 1 0.50 network communication 0 0.25 300 0.00 250 is necessary 200 Processors 1 150 2 Orchestration of 100 3 4 Tasks 50 5 processor and network 0 6 resources is non-trivial Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 3 / 26

  4. Motivation Application Model Hardware Platform Scheduling Experiments Conclusions Design Problems How to : Maximize the performance of the application Optimally utilize memory resources Orchestrate shared resources such as Processors, DMA etc. Load balance the processors Minimize communication costs Schedule tasks in parallel sharing limited resources Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 4 / 26

  5. Motivation Application Model Hardware Platform Scheduling Experiments Conclusions Outline Motivation 1 Application Model 2 Hardware Platform 3 Scheduling 4 Experiments 5 Conclusions 6 Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 5 / 26

  6. Motivation Application Model Hardware Platform Scheduling Experiments Conclusions Overview Motivation 1 Application Model 2 Hardware Platform 3 Scheduling 4 Experiments 5 Conclusions 6 Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 6 / 26

  7. Motivation Application Model Hardware Platform Scheduling Experiments Conclusions Model of Computation synchronous dataflow graphs (SDF) by E. Lee and D. Messerschmitt in 1987 task graph + symbolic representation of data parallelism signal-processing, video-coding applications a ‘standard’ in academic multicore compilers: StreamIt compiler of MIT Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 7 / 26

  8. Motivation Application Model Hardware Platform Scheduling Experiments Conclusions Model of Computation synchronous dataflow graphs (SDF) by E. Lee and D. Messerschmitt in 1987 task graph + symbolic representation of data parallelism signal-processing, video-coding applications a ‘standard’ in academic multicore compilers: StreamIt compiler of MIT we use split-join graphs : restriction of SDF still covering perhaps 90% of use cases Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 7 / 26

  9. Motivation Application Model Hardware Platform Scheduling Experiments Conclusions Model of Computation synchronous dataflow graphs (SDF) by E. Lee and D. Messerschmitt in 1987 task graph + symbolic representation of data parallelism signal-processing, video-coding applications a ‘standard’ in academic multicore compilers: StreamIt compiler of MIT we use split-join graphs : restriction of SDF still covering perhaps 90% of use cases Pranav Tendulkar, Peter Poplavko, and Oded Maler. “Symmetry Breaking for Multi-criteria Mapping and Scheduling on Multicores”. In: Formal Modeling and Analysis of Timed Systems . Lecture Notes in Computer Science. 2013 Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 7 / 26

  10. Motivation Application Model Hardware Platform Scheduling Experiments Conclusions Split-Join Graphs a simple split-join graph example: α : spawn and split 1 /α : wait and join Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 8 / 26

  11. Motivation Application Model Hardware Platform Scheduling Experiments Conclusions Overview Motivation 1 Application Model 2 Hardware Platform 3 Scheduling 4 Experiments 5 Conclusions 6 Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 9 / 26

  12. Motivation Application Model Hardware Platform Scheduling Experiments Conclusions Kalray MPPA-256 Many-core platform = network of clusters 512 USMC PCIe inter laken DDR KB D-Noc C-Noc Router Router Quad GPIOs Core DMA C-NoC Eth Eth syst. DSU core Shared laken Inter laken Inter Memory C 0 C 1 C 4 C 5 Core Quad Core Quad C 2 C 3 C 5 C 6 C 8 C 9 C 12 C 13 512 KB KB 512 C 10 C 11 C 14 C 15 Quad DDR Core GPIOs PCIe interlaken 512 KB Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 10 / 26

  13. Motivation Application Model Hardware Platform Scheduling Experiments Conclusions Kalray MPPA-256 Many-core platform = network of clusters 512 USMC PCIe inter laken DDR KB D-Noc C-Noc Router Router Quad GPIOs Core DMA C-NoC Eth Eth syst. DSU core Shared laken Inter laken Inter Memory C 0 C 1 C 4 C 5 Core Quad Core Quad C 2 C 3 C 5 C 6 C 8 C 9 C 12 C 13 512 KB KB 512 C 10 C 11 C 14 C 15 Quad DDR Core GPIOs PCIe interlaken 512 KB Efficient orchestration of network communication and cluster scheduling is non-trivial Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 10 / 26

  14. Motivation Application Model Hardware Platform Scheduling Experiments Conclusions Platform characteristics 16 symmetric processors in a cluster Shared Memory within a cluster (2 MB) Data cache 8KB per core (disabled) Inter-cluster communication using DMA and NoC NoC with Toroidal 2D topology Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 11 / 26

  15. Motivation Application Model Hardware Platform Scheduling Experiments Conclusions Overview Motivation 1 Application Model 2 Hardware Platform 3 Scheduling 4 Experiments 5 Conclusions 6 Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 12 / 26

  16. Motivation Application Model Hardware Platform Scheduling Experiments Conclusions Design Flow Application Graph Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 13 / 26

  17. Motivation Application Model Hardware Platform Scheduling Experiments Conclusions Design Flow Application Graph max workload per group Partitioning #groups estimated (3D Pareto solutions) comm. cost Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 13 / 26

  18. Motivation Application Model Hardware Platform Scheduling Experiments Conclusions Design Flow Partitioning Output : Application Graph - Application graph partitioned into groups max workload per group Partitioning C #groups estimated A B E F (3D Pareto solutions) comm. cost D Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 13 / 26

  19. Motivation Application Model Hardware Platform Scheduling Experiments Conclusions Design Flow Partitioning Output : Application Graph - Application graph partitioned into groups max workload per group Partitioning Goals : C #groups - Load balance the groups estimated A B E F (3D Pareto solutions) - Minimize communication comm. cost between groups D Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 13 / 26

  20. Motivation Application Model Hardware Platform Scheduling Experiments Conclusions Design Flow Partitioning Output : Application Graph - Application graph partitioned into groups max workload per group Partitioning Goals : C #groups - Load balance the groups estimated A B E F (3D Pareto solutions) - Minimize communication comm. cost between groups D Problem Inputs : - Application Graph - Hardware Architecture Model Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 13 / 26

  21. Motivation Application Model Hardware Platform Scheduling Experiments Conclusions Design Flow Application Graph max workload per group Partitioning #groups estimated (3D Pareto solutions) comm. cost Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 13 / 26

  22. Motivation Application Model Hardware Platform Scheduling Experiments Conclusions Design Flow Application Graph max workload per group Partitioning #groups estimated (3D Pareto solutions) comm. cost Placement minimal solution communication cost Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 13 / 26

  23. Motivation Application Model Hardware Platform Scheduling Experiments Conclusions Design Flow Placement C Output : Application Graph - Group to platform cluster A B E F assignment max workload per group D Partitioning #groups estimated (3D Pareto solutions) comm. cost Placement minimal solution communication cost Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 13 / 26

Recommend


More recommend