data centric execution of speculative parallel programs
play

Data-Centric Execution of Speculative Parallel Programs MA MARK - PowerPoint PPT Presentation

Data-Centric Execution of Speculative Parallel Programs MA MARK JEFFREY, SUVINAY SUBRAMANIAN, MALEEN ABEYDEERA, JOEL EMER, DANIEL SANCHEZ MI MICRO 2016 Executive summary Many-cores must exploit cache locality to scale Current speculative


  1. Data-Centric Execution of Speculative Parallel Programs MA MARK JEFFREY, SUVINAY SUBRAMANIAN, MALEEN ABEYDEERA, JOEL EMER, DANIEL SANCHEZ MI MICRO 2016

  2. Executive summary Many-cores must exploit cache locality to scale Current speculative systems, e.g. TLS or TM, do not exploit locality Spatial Hints: run tasks likely to access the same data in the same place ◦ A software-given hint denotes the data a new task is likely to access ◦ Hardware maps tasks with the same hint to the same place ◦ Hardware uses hints to perform locality-aware load balancing Our techniques make speculative parallelism practical at large scale ◦ It is easy to modify programs to convey locality through hints ◦ Performance improves by 3.3x at 256 cores ◦ We reduce network traffic by 6.4x and wasted work by 3.5x DATA-CENTRIC EXECUTION OF SPECULATIVE PARALLEL PROGRAMS 2

  3. Prior speculative systems scale poorly TRANSACTIONAL MEMORY (TM) SCHEDULERS SPATIAL HINTS Reduce wasted work of coarse-grain txns Make accesses local for fine-grain tasks Limit concurrency: When to run a task? Less data movement: Where to run a task? DATA-CENTRIC EXECUTION OF SPECULATIVE PARALLEL PROGRAMS 3

  4. Prior speculative systems scale poorly TRANSACTIONAL MEMORY (TM) SCHEDULERS SPATIAL HINTS Reduce wasted work of coarse-grain txns Make accesses local for fine-grain tasks Limit concurrency: When to run a task? Less data movement: Where to run a task? Spatially map tasks for improved locality and less waste DATA-CENTRIC EXECUTION OF SPECULATIVE PARALLEL PROGRAMS 3

  5. Prior non-speculative locality techniques do not work for speculation STATIC TASK MAPPING DYNAMIC TASK MAPPING Data dependences known a priori Work stealing ◦ Linear algebra, Anton 2 [ASPLOS ‘13] ◦ Cheap, local enqueues ◦ Steals to adapt to imbalance ◦ Limited application types Graph partitioning ◦ Stealing interferes with speculation ◦ Localizes communication and scheduling ◦ Slow preprocessing step ◦ Cannot adapt to imbalance DATA-CENTRIC EXECUTION OF SPECULATIVE PARALLEL PROGRAMS 4

  6. Baseline Architecture: Swarm [MICRO ‘15] DATA-CENTRIC EXECUTION OF SPECULATIVE PARALLEL PROGRAMS 5

  7. Baseline Swarm execution model Programs consist of timestamped tasks ◦ Tasks can create children tasks with >= timestamp ◦ Tasks appear to execute in timestamp order swarm::enqueue(function_pointer, timestamp, arguments...); General execution model supports ordered and unordered parallelism DATA-CENTRIC EXECUTION OF SPECULATIVE PARALLEL PROGRAMS 6

  8. Baseline Swarm architecture Speculatively executes tasks out of order 64-tile, 256-core chip Tile organization Mem / IO Large hardware task queues L3 slice Router Scalable ordered speculation L2 Tile Mem / IO Mem / IO Scalable ordered commits L1I/D L1I/D L1I/D L1I/D Core Core Core Core Task unit Mem / IO Efficiently supports tiny speculative tasks DATA-CENTRIC EXECUTION OF SPECULATIVE PARALLEL PROGRAMS 7

  9. Spatial Hints in Action COMBINING SPECULATION AND LOCALITY DATA-CENTRIC EXECUTION OF SPECULATIVE PARALLEL PROGRAMS 8

  10. Example: Discrete event simulation (DES) r s t = r XOR s A r C 0 0 0 E t 1 0 1 D 1 1 0 s B DATA-CENTRIC EXECUTION OF SPECULATIVE PARALLEL PROGRAMS 9

  11. Example: Discrete event simulation (DES) 1 0 r s t = r XOR s A 0 r C 0 0 0 0 E t 1 0 1 0 D 0 1 1 1 0 s B DATA-CENTRIC EXECUTION OF SPECULATIVE PARALLEL PROGRAMS 9

  12. Example: Discrete event simulation (DES) 1 0 r s t = r XOR s A 0 r C 0 0 0 0 E t 1 0 1 0 D 0 1 1 1 0 s B Tasks r=1 0 1 2 3 4 5 6 Order = Simulated time ( ns ) DATA-CENTRIC EXECUTION OF SPECULATIVE PARALLEL PROGRAMS 9

  13. Example: Discrete event simulation (DES) 1 0 1 r s t = r XOR s A 0 r C 0 0 0 0 E t 1 0 1 0 D 0 1 1 1 0 s B Tasks r=1 0 1 2 3 4 5 6 Order = Simulated time ( ns ) DATA-CENTRIC EXECUTION OF SPECULATIVE PARALLEL PROGRAMS 9

  14. Example: Discrete event simulation (DES) 1 1 0 r s t = r XOR s A 0 r C 0 0 0 0 E t 1 0 1 0 D 0 1 1 1 0 s B Tasks r=1 A=1 D 0 =1 0 1 2 3 4 5 6 Order = Simulated time ( ns ) DATA-CENTRIC EXECUTION OF SPECULATIVE PARALLEL PROGRAMS 9

  15. Example: Discrete event simulation (DES) 0 1 0 1 r s t = r XOR s A 0 r C 0 0 0 0 E t 1 0 1 0 D 0 1 1 1 0 s B Tasks r=1 A=1 D 0 =1 0 1 2 3 4 5 6 Order = Simulated time ( ns ) DATA-CENTRIC EXECUTION OF SPECULATIVE PARALLEL PROGRAMS 9

  16. Example: Discrete event simulation (DES) 1 0 1 0 r s t = r XOR s A 0 r C 0 0 0 0 E t 1 0 1 0 D 0 1 1 1 0 s B Tasks r=1 A=1 C 0 =0 D 0 =1 0 1 2 3 4 5 6 Order = Simulated time ( ns ) DATA-CENTRIC EXECUTION OF SPECULATIVE PARALLEL PROGRAMS 9

  17. Example: Discrete event simulation (DES) 1 0 1 0 r s t = r XOR s A 0 r C 0 0 0 0 E t 1 0 1 0 D 0 1 1 1 0 s B Tasks r=1 A=1 C 0 =0 D 0 =1 0 1 2 3 4 5 6 Order = Simulated time ( ns ) DATA-CENTRIC EXECUTION OF SPECULATIVE PARALLEL PROGRAMS 9

  18. Example: Discrete event simulation (DES) 0 1 0 1 r s t = r XOR s A 0 r C 0 0 0 0 E t 1 0 1 1 0 D 0 1 1 1 0 s B Tasks r=1 A=1 C 0 =0 D 0 =1 0 1 2 3 4 5 6 Order = Simulated time ( ns ) DATA-CENTRIC EXECUTION OF SPECULATIVE PARALLEL PROGRAMS 9

  19. Example: Discrete event simulation (DES) 1 0 1 0 r s t = r XOR s A 0 r C 0 0 0 0 E t 1 0 1 1 0 D 0 1 1 1 0 s B Tasks r=1 A=1 C 0 =0 D 0 =1 E 1 =1 0 1 2 3 4 5 6 Order = Simulated time ( ns ) DATA-CENTRIC EXECUTION OF SPECULATIVE PARALLEL PROGRAMS 9

  20. Example: Discrete event simulation (DES) 0 1 0 1 r s t = r XOR s A 0 r C 0 0 0 1 0 E t 1 0 1 1 0 D 0 1 1 1 0 s B Tasks r=1 A=1 C 0 =0 D 0 =1 E 1 =1 0 1 2 3 4 5 6 Order = Simulated time ( ns ) DATA-CENTRIC EXECUTION OF SPECULATIVE PARALLEL PROGRAMS 9

  21. Example: Discrete event simulation (DES) 1 0 1 0 r s t = r XOR s A 0 r C 0 0 0 0 1 E t 1 0 1 1 0 D 0 1 1 1 0 s B Tasks r=1 A=1 C 0 =0 D 0 =1 E 1 =1 t=1 0 1 2 3 4 5 6 Order = Simulated time ( ns ) DATA-CENTRIC EXECUTION OF SPECULATIVE PARALLEL PROGRAMS 9

  22. Example: Discrete event simulation (DES) 1 0 1 0 r s t = r XOR s A 0 r C 0 0 0 0 1 E t 1 0 1 1 0 D 0 1 1 1 0 s B Tasks r=1 A=1 C 0 =0 D 0 =1 E 1 =1 t=1 0 1 2 3 4 5 6 Order = Simulated time ( ns ) DATA-CENTRIC EXECUTION OF SPECULATIVE PARALLEL PROGRAMS 9

  23. Example: Discrete event simulation (DES) 1 0 1 0 r s t = r XOR s A 0 r C 0 0 0 1 0 E t 1 0 1 0 1 D 0 1 1 1 0 s B Tasks s=1 r=1 A=1 C 0 =0 D 0 =1 E 1 =1 t=1 0 1 2 3 4 5 6 Order = Simulated time ( ns ) DATA-CENTRIC EXECUTION OF SPECULATIVE PARALLEL PROGRAMS 9

  24. Example: Discrete event simulation (DES) 1 0 1 0 r s t = r XOR s A 0 r C 0 0 0 0 1 E t 1 0 1 0 1 D 1 0 1 1 1 0 s B Tasks s=1 r=1 A=1 C 0 =0 D 0 =1 E 1 =1 t=1 0 1 2 3 4 5 6 Order = Simulated time ( ns ) DATA-CENTRIC EXECUTION OF SPECULATIVE PARALLEL PROGRAMS 9

  25. Example: Discrete event simulation (DES) 0 1 0 1 r s t = r XOR s A 0 r C 0 0 0 0 1 E t 1 0 1 0 1 D 1 0 1 1 1 0 s B Tasks s=1 C 1 =1 r=1 A=1 C 0 =0 B=1 D 0 =1 E 1 =1 t=1 0 1 2 3 4 5 6 Order = Simulated time ( ns ) DATA-CENTRIC EXECUTION OF SPECULATIVE PARALLEL PROGRAMS 9

  26. Example: Discrete event simulation (DES) 1 0 1 0 r s t = r XOR s A 0 r C 0 0 0 0 1 E t 1 0 1 0 1 D 0 1 1 0 1 1 0 s B Tasks s=1 C 1 =1 r=1 A=1 C 0 =0 B=1 D 0 =1 E 1 =1 t=1 0 1 2 3 4 5 6 Order = Simulated time ( ns ) DATA-CENTRIC EXECUTION OF SPECULATIVE PARALLEL PROGRAMS 9

  27. Example: Discrete event simulation (DES) 0 1 0 1 r s t = r XOR s A 0 r C 0 0 0 0 1 E t 1 0 1 1 0 D 0 1 1 0 1 1 0 s B Tasks s=1 C 1 =1 r=1 A=1 C 0 =0 B=1 D 1 =0 D 0 =1 E 1 =1 t=1 0 1 2 3 4 5 6 Order = Simulated time ( ns ) DATA-CENTRIC EXECUTION OF SPECULATIVE PARALLEL PROGRAMS 9

  28. Example: Discrete event simulation (DES) 0 1 0 1 r s t = r XOR s A 0 r C 0 0 0 0 1 E t 1 0 1 1 0 D 0 1 1 0 1 1 0 s B Tasks s=1 C 1 =1 r=1 A=1 C 0 =0 B=1 D 1 =0 D 0 =1 E 1 =1 t=1 0 1 2 3 4 5 6 Order = Simulated time ( ns ) DATA-CENTRIC EXECUTION OF SPECULATIVE PARALLEL PROGRAMS 9

  29. Example: Discrete event simulation (DES) 0 1 1 0 r s t = r XOR s A 0 r C 0 0 0 1 0 E t 1 0 1 0 D 1 0 0 1 1 1 0 s B Tasks s=1 C 1 =1 r=1 A=1 C 0 =0 B=1 D 1 =0 E 1 =0 D 0 =1 E 1 =1 t=1 0 1 2 3 4 5 6 Order = Simulated time ( ns ) DATA-CENTRIC EXECUTION OF SPECULATIVE PARALLEL PROGRAMS 9

  30. Example: Discrete event simulation (DES) 0 1 1 0 r s t = r XOR s A 0 r C 0 0 0 0 E t 1 0 1 0 D 1 0 0 1 1 1 0 s B Tasks s=1 C 1 =1 r=1 A=1 C 0 =0 B=1 D 1 =0 E 1 =0 t=0 D 0 =1 E 1 =1 t=1 0 1 2 3 4 5 6 Order = Simulated time ( ns ) DATA-CENTRIC EXECUTION OF SPECULATIVE PARALLEL PROGRAMS 9

  31. Example: Discrete event simulation (DES) 0 1 1 0 r s t = r XOR s A 0 r C 0 0 0 0 E t 1 0 1 0 D 1 0 0 1 1 1 0 s B Tasks s=1 C 1 =1 r=1 A=1 C 0 =0 B=1 D 1 =0 E 1 =0 t=0 D 0 =1 E 1 =1 t=1 0 1 2 3 4 5 6 Order = Simulated time ( ns ) DATA-CENTRIC EXECUTION OF SPECULATIVE PARALLEL PROGRAMS 9

Recommend


More recommend