virtues and limitations of commodity hardware
play

VIRTUES AND LIMITATIONS OF COMMODITY HARDWARE TRANSACTIONAL MEMORY - PowerPoint PPT Presentation

PACT 2014 VIRTUES AND LIMITATIONS OF COMMODITY HARDWARE TRANSACTIONAL MEMORY Nuno Diegues, Paolo Romano and Lus Rodrigues 2 Virtues and Limitations of HTM PACT 2014 The multi-core (r)evolution 2 Virtues and Limitations of HTM PACT 2014


  1. PACT 2014 VIRTUES AND LIMITATIONS OF COMMODITY HARDWARE TRANSACTIONAL MEMORY Nuno Diegues, Paolo Romano and Luís Rodrigues

  2. 2 Virtues and Limitations of HTM PACT 2014 The multi-core (r)evolution

  3. 2 Virtues and Limitations of HTM PACT 2014 The multi-core (r)evolution Multi-cores are now ubiquitous

  4. 2 Virtues and Limitations of HTM PACT 2014 The multi-core (r)evolution Shared Memory Multi-cores are now ubiquitous CPU1 CPU2 CPU3 CPU4

  5. 2 Virtues and Limitations of HTM PACT 2014 The multi-core (r)evolution Shared Memory Multi-cores are now ubiquitous CPU1 CPU2 CPU3 CPU4

  6. 2 Virtues and Limitations of HTM PACT 2014 The multi-core (r)evolution Shared Memory Multi-cores are now ubiquitous Concurrent programming is complex CPU1 CPU2 CPU3 CPU4

  7. 2 Virtues and Limitations of HTM PACT 2014 The multi-core (r)evolution Shared Memory Multi-cores are now ubiquitous Concurrent programming is complex CPU1 CPU2 CPU3 CPU4

  8. 2 Virtues and Limitations of HTM PACT 2014 The multi-core (r)evolution Shared Memory Multi-cores are now ubiquitous Concurrent programming is complex CPU1 CPU2 CPU3 CPU4 Classic approach: Locking Hard to get right: • fine-grained locks • deadlocks • correctness

  9. 2 Virtues and Limitations of HTM PACT 2014 The multi-core (r)evolution Shared Memory Multi-cores are now ubiquitous Concurrent programming is complex Transactional Memory System CPU1 CPU2 CPU3 CPU4 Classic approach: Transactional Locking Memory abstraction atomic { Hard to get right: • fine-grained locks withdraw(acc1,val); • deadlocks deposit(acc2,val); • correctness } Programmer identifies atomic blocks Runtime implements synchronization

  10. 3 Virtues and Limitations of HTM PACT 2014 TM is now available in commodity processors

  11. 3 Virtues and Limitations of HTM PACT 2014 TM is now available in commodity processors • Intel: Haswell in desktops, laptops, tablets, servers … • IBM: BG/Q, zEC12, Power8

  12. 3 Virtues and Limitations of HTM PACT 2014 TM is now available in commodity processors • Intel: Haswell in desktops, laptops, tablets, servers … • IBM: BG/Q, zEC12, Power8 Over 10 years of: • Software implementations (STMs) • Simulations of HTMs and HybridTMs

  13. 3 Virtues and Limitations of HTM PACT 2014 TM is now available in commodity processors • Intel: Haswell in desktops, laptops, tablets, servers … • IBM: BG/Q, zEC12, Power8 Over 10 years of: • Software implementations (STMs) • Simulations of HTMs and HybridTMs Where does commodity HTM stand in the big picture?

  14. 3 Virtues and Limitations of HTM PACT 2014 TM is now available in commodity processors • Intel: Haswell in desktops, laptops, tablets, servers … • IBM: BG/Q, zEC12, Power8 Over 10 years of: • Software implementations (STMs) • Simulations of HTMs and HybridTMs Where does commodity HTM stand in the big picture? Our contribution: largest TM study to date

  15. 3 Virtues and Limitations of HTM PACT 2014 TM is now available in commodity processors • Intel: Haswell in desktops, laptops, tablets, servers … • IBM: BG/Q, zEC12, Power8 Over 10 years of: • Software implementations (STMs) • Simulations of HTMs and HybridTMs Where does commodity HTM stand in the big picture? Our contribution: largest TM study to date Framework with 4 STMs, Intel HTM, 2 HyTMs and locking strategies; Metrics for performance and power consumption; 10 benchmarks.

  16. 4 Virtues and Limitations of HTM PACT 2014 HTM: Intel Transactional Synchronization Extensions (TSX)

  17. 4 Virtues and Limitations of HTM PACT 2014 HTM: Intel Transactional Synchronization Extensions (TSX) Widely available in millions of machines Similar in nature to IBM’s HTMs Memory Bus L1 L1 64KB CPU1 CPU2 Cache Cache L2 Cache L2 Cache 256KB L3 Cache

  18. 4 Virtues and Limitations of HTM PACT 2014 HTM: Intel Transactional Synchronization Extensions (TSX) Widely available in millions of machines Similar in nature to IBM’s HTMs • L1 modified to be transactional • Cache coherence detects conflicts eagerly • Strong atomicity Memory Bus L1 L1 64KB CPU1 CPU2 Cache Cache L2 Cache L2 Cache 256KB L3 Cache

  19. 5 Virtues and Limitations of HTM PACT 2014 HTM: Intel Transactional Synchronization Extensions (TSX) CPU 1 CPU 2 Memory Bus L1 L1 CPU1 CPU2 Cache Cache L2 Cache L2 Cache L3 Cache

  20. 5 Virtues and Limitations of HTM PACT 2014 HTM: Intel Transactional Synchronization Extensions (TSX) CPU 1 CPU 2 xbegin Memory Bus L1 L1 CPU1 CPU2 Cache Cache TSX: on L2 Cache L2 Cache L3 Cache

  21. 5 Virtues and Limitations of HTM PACT 2014 HTM: Intel Transactional Synchronization Extensions (TSX) CPU 1 CPU 2 xbegin read x: 0 // Set bit read on x cache line Memory Bus L1 L1 CPU1 CPU2 Cache Cache TSX: on L2 Cache L2 Cache L3 Cache

  22. 5 Virtues and Limitations of HTM PACT 2014 HTM: Intel Transactional Synchronization Extensions (TSX) CPU 1 CPU 2 xbegin read x: 0 // Set bit read on x cache line Memory Bus x: 0 -- r L1 L1 CPU1 CPU2 Cache Cache TSX: on L2 Cache L2 Cache L3 Cache

  23. 5 Virtues and Limitations of HTM PACT 2014 HTM: Intel Transactional Synchronization Extensions (TSX) CPU 1 CPU 2 xbegin read x: 0 // Set bit read on x cache line write y = 1 // Buffer write in L1 cache Memory Bus x: 0 -- r L1 L1 CPU1 CPU2 y: 1 -- w Cache Cache TSX: on L2 Cache L2 Cache L3 Cache

  24. 5 Virtues and Limitations of HTM PACT 2014 HTM: Intel Transactional Synchronization Extensions (TSX) CPU 1 CPU 2 xbegin … read x: 0 // Set bit read on x cache line write y = 1 // Buffer write in L1 cache xend // Atomically clean bits and publish Memory Bus x: 0 L1 L1 CPU1 CPU2 y: 1 Cache Cache L2 Cache L2 Cache L3 Cache

  25. 5 Virtues and Limitations of HTM PACT 2014 HTM: Intel Transactional Synchronization Extensions (TSX) CPU 1 CPU 2 xbegin … read x: 0 // Set bit read on x cache line write y = 1 // Buffer write in L1 cache xend // Atomically clean bits and publish xbegin read y: 1 Memory Bus x: 0 L1 L1 CPU1 CPU2 y: 1 Cache Cache L2 Cache L2 Cache L3 Cache

  26. 5 Virtues and Limitations of HTM PACT 2014 HTM: Intel Transactional Synchronization Extensions (TSX) CPU 1 CPU 2 xbegin … read x: 0 // Set bit read on x cache line write y = 1 // Buffer write in L1 cache xend // Atomically clean bits and publish xbegin read y: 1 Memory Bus x: 0 L1 L1 CPU1 CPU2 y: 1 -- r y: 1 Cache Cache L2 Cache L2 Cache L3 Cache

  27. 5 Virtues and Limitations of HTM PACT 2014 HTM: Intel Transactional Synchronization Extensions (TSX) CPU 1 CPU 2 xbegin … read x: 0 // Set bit read on x cache line write y = 1 // Buffer write in L1 cache xend // Atomically clean bits and publish xbegin … read y: 1 write y = 2 Memory Bus x: 0 L1 L1 CPU1 CPU2 y: 1 -- r y: 1 Cache Cache L2 Cache L2 Cache L3 Cache

  28. 5 Virtues and Limitations of HTM PACT 2014 HTM: Intel Transactional Synchronization Extensions (TSX) CPU 1 CPU 2 xbegin … read x: 0 // Set bit read on x cache line write y = 1 // Buffer write in L1 cache xend // Atomically clean bits and publish xbegin … read y: 1 write y = 2 Memory Bus x: 0 L1 L1 CPU1 CPU2 y: 2 y: 1 -- r Cache Cache L2 Cache L2 Cache L3 Cache

  29. 5 Virtues and Limitations of HTM PACT 2014 HTM: Intel Transactional Synchronization Extensions (TSX) CPU 1 CPU 2 xbegin … read x: 0 // Set bit read on x cache line write y = 1 // Buffer write in L1 cache xend // Atomically clean bits and publish xbegin … read y: 1 invalidation snooped write write y = 2 invalidates tx read Memory Bus xabort x: 0 L1 L1 CPU1 CPU2 y: 2 y: 1 -- r Cache Cache L2 Cache L2 Cache L3 Cache

  30. 6 Virtues and Limitations of HTM PACT 2014 In an ideal world … xbegin widthdraw(acc1,val) deposit(acc2,val) xend

  31. 6 Virtues and Limitations of HTM PACT 2014 In an ideal world … Transactions restart xbegin widthdraw(acc1,val) deposit(acc2,val) Transactions may abort: xend • because of contention on same memory locations

  32. 6 Virtues and Limitations of HTM PACT 2014 In an ideal world … Transactions restart xbegin widthdraw(acc1,val) deposit(acc2,val) Transactions may abort: xend • because of contention on same memory locations … and every transaction shall eventually succeed

  33. 7 Virtues and Limitations of HTM PACT 2014 … in practice: Best-Effort Nature No progress guarantees:

  34. 7 Virtues and Limitations of HTM PACT 2014 … in practice: Best-Effort Nature No progress guarantees: • A transaction may always abort 


  35. 7 Virtues and Limitations of HTM PACT 2014 … in practice: Best-Effort Nature No progress guarantees: • A transaction may always abort 
 … due to a number of reasons: • Forbidden instructions • Capacity of caches • Faults and signals • Contending transactions, aborting each other

  36. 8 Virtues and Limitations of HTM PACT 2014 Restrictions of TSX

  37. 8 Virtues and Limitations of HTM PACT 2014 Restrictions of TSX • Writes: • size of L1 cache: 32KB • non-negligible aborts for >8KB • cache associativity

Recommend


More recommend