sam optimizing multithreaded cores for speculative
play

SAM: Optimizing Multithreaded Cores for Speculative Parallelism MA - PowerPoint PPT Presentation

SAM: Optimizing Multithreaded Cores for Speculative Parallelism MA MALEEN ABEYDEERA, SUVINAY SUBRAMANIAN, MARK JEFFREY, JOEL EMER, DANIEL SANCHEZ PA PACT 2017 Executive Summary Analyzes the interplay between hardware multithreading and


  1. SAM: Optimizing Multithreaded Cores for Speculative Parallelism MA MALEEN ABEYDEERA, SUVINAY SUBRAMANIAN, MARK JEFFREY, JOEL EMER, DANIEL SANCHEZ PA PACT 2017

  2. Executive Summary Analyzes the interplay between hardware multithreading and speculative parallelism (eg: Thread Level Speculation and Transactional Memory ) Conventional multithreading causes performance pathologies on speculative workloads Increase in aborted work • Inefficient use of speculation resources • Why? All threads are treated equally Speculation Aware Multithreading (SAM) Prioritize threads running tasks more likely to commit • SAM makes multithreading more useful SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2

  3. Executive Summary Analyzes the interplay between hardware multithreading and speculative parallelism (eg: Thread Level Speculation and Transactional Memory ) Conventional multithreading causes performance pathologies on speculative workloads Increase in aborted work • Inefficient use of speculation resources • Why? All threads are treated equally Speculation Aware Multithreading (SAM) Prioritize threads running tasks more likely to commit • SAM makes multithreading more useful SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2

  4. Outline Background on speculative parallelism Pitfalls of speculative parallelism with conventional multithreading SAM on in-order cores SAM on out-of-order cores SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 3

  5. Background on Speculative Parallelism Parallelize tasks when the dependences are not known in advance Hardware executes all tasks in parallel, aborting upon conflicts Which task to abort? Conflict resolution policy SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 4

  6. Background on Speculative Parallelism Parallelize tasks when the dependences are not known in advance Hardware executes all tasks in parallel, aborting upon conflicts Which task to abort? Conflict resolution policy Speculative Parallelism Ordered Unordered e.g. Thread-Level Speculation (TLS) e.g. Hardware Transactional Memory (Any execution order is valid, but high-performance (Program order dictates the conflict resolution order) conflict resolution policies define an order) SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 4

  7. Background on Speculative Parallelism Parallelize tasks when the dependences are not known in advance Hardware executes all tasks in parallel, aborting upon conflicts Which task to abort? Conflict resolution policy Speculative Parallelism Ordered Unordered e.g. Thread-Level Speculation (TLS) e.g. Hardware Transactional Memory (Any execution order is valid, but high-performance (Program order dictates the conflict resolution order) conflict resolution policies define an order) Implicit order among all tasks in any speculative system SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 4

  8. Baseline System - Swarm [Jeffrey, MICRO’ 15] void desTask(Timestamp ts , GateInput* input) { Gate* g = input ->gate (); bool toggledOutput = g.simulateToggle(input); if ( toggledOutput ) { for (GateInput* i : g-> connectedInputs ()) { swarm::enqueue(desTask , ts+delay(g,i), i); } } } SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 5

  9. Baseline System - Swarm [Jeffrey, MICRO’ 15] Timestamped tasks void desTask(Timestamp ts , GateInput* input) { Gate* g = input ->gate (); bool toggledOutput = g.simulateToggle(input); if ( toggledOutput ) { for (GateInput* i : g-> connectedInputs ()) { swarm::enqueue(desTask , ts+delay(g,i), i); } } Tasks create children tasks } (function ptr, timestamp, args ) SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 5

  10. Baseline System - Swarm [Jeffrey, MICRO’ 15] Timestamped tasks void desTask(Timestamp ts , GateInput* input) { Gate* g = input ->gate (); bool toggledOutput = g.simulateToggle(input); if ( toggledOutput ) { for (GateInput* i : g-> connectedInputs ()) { swarm::enqueue(desTask , ts+delay(g,i), i); } } Tasks create children tasks } (function ptr, timestamp, args ) Tasks appear to execute in timestamp order Unordered execution via equal timestamps SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 5

  11. Swarm Microarchitecture Equal timestamps: global order via Virtual Time (VT) Timestamp Tiebreaker Virtual Time SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 6

  12. Swarm Microarchitecture Equal timestamps: 16-tile, 64-core CMP Tile Organization global order via Virtual Time (VT) Mem / IO Router L3 Slice Timestamp Tiebreaker L2 Tile Mem / IO Mem / IO Virtual Time L1I/D L1I/D L1I/D L1I/D Core Core Core Core Task Unit Mem / IO SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 6

  13. Swarm Microarchitecture Equal timestamps: 16-tile, 64-core CMP Tile Organization global order via Virtual Time (VT) Mem / IO Router L3 Slice Timestamp Tiebreaker L2 Tile Mem / IO Mem / IO Virtual Time L1I/D L1I/D L1I/D L1I/D Core Core Core Core Tasks execute out-of-order, but commit in VT order Task Unit Mem / IO Commit queue: state of tasks waiting to commit SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 6

  14. Outline Background on speculative parallelism Pitfalls of speculative parallelism with conventional multithreading SAM on in-order cores SAM on out-of-order cores SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 7

  15. Pitfalls of Speculation-Oblivious Multithreading System configuration: 64-core SMT system In-order core with 2-wide issue Speculation-oblivious round-robin order SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 8

  16. Pitfalls of Speculation-Oblivious Multithreading System configuration: 64-core SMT system In-order core with 2-wide issue Speculation-oblivious round-robin order SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 8

  17. Pitfalls of Speculation-Oblivious Multithreading System configuration: No ready micro-ops to issue 64-core SMT system In-order core with 2-wide issue Speculation-oblivious round-robin order Insights: 1. Multithreading can be highly beneficial Micro-ops issued from committed tasks SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 8

  18. Pitfalls of Speculation-Oblivious Multithreading System configuration: No ready micro-ops to issue 64-core SMT system In-order core with 2-wide issue Speculation-oblivious round-robin order Insights: 1. Multithreading can be highly beneficial However, multithreading can also lead to: 2. Increased aborts Micro-ops issued from Micro-ops issued from committed tasks aborted tasks SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 8

  19. Pitfalls of Speculation-Oblivious Multithreading System configuration: No ready micro-ops to issue 64-core SMT system In-order core with 2-wide issue Speculation-oblivious round-robin order Insights: 1. Multithreading can be highly beneficial However, multithreading can also lead to: 2. Increased aborts 3. Inefficient use of speculation resources Micro-ops issued from Micro-ops issued from Resource stalls committed tasks aborted tasks SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 8

  20. Pitfalls of Speculation-Oblivious Multithreading System configuration: No ready micro-ops to issue 64-core SMT system In-order core with 2-wide issue Speculation-oblivious round-robin order Insights: 1. Multithreading can be highly beneficial However, multithreading can also lead to: 2. Increased aborts 3. Inefficient use of speculation resources Unlikely-to-commit tasks hurt the Micro-ops issued from Micro-ops issued from throughput of likely-to-commit ones Resource stalls committed tasks aborted tasks SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 8

  21. Speculation-Aware Multithreading Prioritize threads according to their conflict resolution priorities Reduce Aborts Reduce Speculation Resource Stalls (focus resources on tasks likely to commit) (tasks commit early) SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 9

  22. Outline Background on speculative parallelism Pitfalls of speculative parallelism with conventional multithreading SAM on in-order cores SAM on out-of-order cores SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 10

  23. SAM on in-order cores Pipe 0 Int ALU FP ALU Thread SMT les Register Fetch Decode micro-op les Issue Pipe 1 Int ALU Files queues Mem/DCache SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 11

  24. SAM on in-order cores Pipe 0 Int ALU FP ALU Thread SMT les Register Fetch Decode micro-op les Issue Pipe 1 Int ALU Files queues Mem/DCache SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 11

  25. SAM on in-order cores Pipe 0 Int ALU FP ALU Thread SMT les Register Fetch Decode micro-op les Issue Pipe 1 Int ALU Files queues Mem/DCache Conflict resolution Task priority updates Unit (Virtual Times) SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 11

Recommend


More recommend