sam optimizing multithreaded cores for
play

SAM: Optimizing Multithreaded Cores for Speculative Parallelism - PowerPoint PPT Presentation

SAM: Optimizing Multithreaded Cores for Speculative Parallelism MALEEN ABEYDEERA, SUVINAY SUBRAMANIAN, MARK JEFFREY, JOEL EMER, DANIEL SANCHEZ PACT 2017 Executive Summary Analyzes the interplay between hardware multithreading and speculative


  1. SAM: Optimizing Multithreaded Cores for Speculative Parallelism MALEEN ABEYDEERA, SUVINAY SUBRAMANIAN, MARK JEFFREY, JOEL EMER, DANIEL SANCHEZ PACT 2017

  2. Executive Summary Analyzes the interplay between hardware multithreading and speculative parallelism (eg: Thread Level Speculation and Transactional Memory ) Conventional multithreading causes performance pathologies on speculative workloads • Increase in aborted work • Inefficient use of speculation resources Why? All threads are treated equally Speculation Aware Multithreading (SAM) • Prioritize threads running tasks more likely to commit SAM makes multithreading more useful SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2

  3. Outline Background on speculative parallelism Pitfalls of speculative parallelism with conventional multithreading SAM on in-order cores SAM on out-of-order cores SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 3

  4. Background on Speculative Parallelism Parallelize tasks when the dependences are not known in advance Hardware executes all tasks in parallel, aborting upon conflicts Which task to abort? Conflict resolution policy Speculative Parallelism Ordered Unordered e.g. Thread-Level Speculation (TLS) e.g. Hardware Transactional Memory (Any execution order is valid, but high-performance (Program order dictates the conflict resolution order) conflict resolution policies define an order) Implicit order among all tasks in any speculative system SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 4

  5. Baseline System - Swarm [Jeffrey, MICRO’ 15] Timestamped tasks void desTask(Timestamp ts , GateInput* input) { Gate* g = input ->gate (); bool toggledOutput = g.simulateToggle(input); if ( toggledOutput ) { for (GateInput* i : g-> connectedInputs ()) { swarm::enqueue(desTask , ts+delay(g,i), i); } } Tasks create children tasks } (function ptr, timestamp, args ) Tasks appear to execute in timestamp order Unordered execution via equal timestamps SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 5

  6. Swarm Microarchitecture 16-tile, 64-core CMP Tile Organization Equal timestamps: global order via Virtual Time (VT) Mem / IO Router L3 Slice Timestamp Tiebreaker L2 Tile Mem / IO Mem / IO Virtual Time L1I/D L1I/D L1I/D L1I/D Core Core Core Core Tasks execute out-of-order, but commit in VT order Task Unit Mem / IO Commit queue: state of tasks waiting to commit SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 6

  7. Outline Background on speculative parallelism Pitfalls of speculative parallelism with conventional multithreading SAM on in-order cores SAM on out-of-order cores SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 7

  8. Pitfalls of Speculation-Oblivious Multithreading System configuration: No ready micro-ops to issue 64-core SMT system In-order core with 2-wide issue Speculation-oblivious round-robin order Insights: 1. Multithreading can be highly beneficial However, multithreading can also lead to: 2. Increased aborts 3. Inefficient use of speculation resources Unlikely-to-commit tasks hurt the Micro-ops issued from Micro-ops issued from Resource stalls throughput of likely-to-commit ones committed tasks aborted tasks SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 8

  9. Speculation-Aware Multithreading Prioritize threads according to their conflict resolution priorities Reduce Aborts Reduce Speculation Resource Stalls (focus resources on tasks likely to commit) (tasks commit early) SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 9

  10. Outline Background on speculative parallelism Pitfalls of speculative parallelism with conventional multithreading SAM on in-order cores SAM on out-of-order cores SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 10

  11. SAM on in-order cores Pipe 0 Int ALU FP ALU Thread SMT les Register Fetch Decode micro-op les Issue Pipe 1 Int ALU Files queues Mem/DCache SAM issue priorities Virtual Times (higher is better) 3 Conflict resolution 52:9 Issue Task priority updates Max 2 52:7 ThreadID Unit Sort (Virtual Times) Ready 4 17:1 95:4 1 SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 11

  12. Experimental Methodology Baseline System • Swarm + Wait-N-GoTM [Jafri et al. ASPLOS’13] conflict resolution techniques • Cycle-accurate, event-driven, Pin-based simulator • Model systems up to 64 cores • Cores: 2 wide issue, up to 8 threads per core Benchmarks • Ordered : Swarm [Jeffrey et al. MICRO’15, MICRO’16] – 8 benchmarks • Unordered : STAMP [Minh et al. IISWC’ 08] – 8 benchmarks SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 12

  13. SAM makes multithreading more effective 8 Thread SAM 8 Thread Round Robin 1 Thread 8 threaded cores outperform single threaded cores by 1.85X With SAM, the benefit increases to 2.33X Ordered Benchmarks Unordered Benchmarks 13 SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM

  14. Why does SAM help? Micro-ops issued Unused issue slots (reason) Resource Committed Aborted Not ready Other SAM matches RR when there are no pathologies SAM reduces wasted work SAM reduces resource stalls SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 14

  15. Outline Background on speculative parallelism Pitfalls of speculative parallelism with conventional multithreading SAM on in-order cores SAM on out-of-order cores SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 15

  16. SAM on out-of-order cores Unlike in-order cores, priorities affect Pipe 0 pipeline efficiency Physical Thread Issue Reorder Reg • A single thread can clog core resources micro-op Buffer Buffer SMT File queues Fetch Decode Issue • Increased wrong path execution Pipe 1 Despite these, prioritizing tasks is better Conflict res. priorities In-flight uops (for ICount) Conflict resolution priority updates 3 2 1 9 4 2 2 3 Need for aggressive prioritization affects core (from task unit) design SAM priorities • Shared, not partitioned ROBs 3 4 2 1 SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 16

  17. SAM tradeoffs with out-of-order cores Micro-ops issued Unused issue slots (reason) Committed Aborted Wrong path Resource Not ready Other Baseline policy - ICount (IC) SAM is more beneficial with dynamically shared ROBs Reduces aborts + resource stalls But reduced pipeline efficiency Increase in wrong-path issues + not-ready stalls sssp – 8 threads SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 17

  18. Adaptive SAM policy Unused issue slots (reason) Micro-ops issued Hardware counters to track cycles Committed Aborted Wrong path Resource Not ready Other Not ready Resource Wrong path Aborted + + Cycles lost to Cycles lost to > task level speculation pipeline inefficiencies False True Use SAM Use ICount SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 18

  19. SAM on OoO cores (all benchmarks) Unused issue slots (reason) Micro-ops issued Committed Aborted Resource Not ready Wrong path Other At 8 threads / core: • Multithreading improves performance over single threaded cores by 1.1x • With SAM, improvement rises to 1.5x Adaptive policy slightly increases performance at 2 and 4 threads Average over all benchmarks SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 19

  20. Conclusion Conventional multithreading causes performance pathologies on speculative workloads • Increase in aborted work • Inefficient use of speculation resources Speculation Aware Multithreading (SAM) Prioritize threads running tasks more likely to commit SAM makes multithreading more useful SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 20

  21. Questions? Conventional multithreading causes performance pathologies on speculative workloads • Increase in aborted work • Inefficient use of speculation resources Speculation Aware Multithreading (SAM) Prioritize threads running tasks more likely to commit SAM makes multithreading more useful SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 21

Recommend


More recommend