extending hardware transactional memory capacity via
play

Extending Hardware Transactional Memory Capacity via Rollback-Only - PowerPoint PPT Presentation

Extending Hardware Transactional Memory Capacity via Rollback-Only Transactions and Suspend/Resume Alexander Shady Issa Pascal Felber Paolo Romano Matveev 1 Extending Hardware Transactional Memory Capacity via Rollback-Only Transactions and


  1. Extending Hardware Transactional Memory Capacity via Rollback-Only Transactions and Suspend/Resume Alexander Shady Issa Pascal Felber Paolo Romano Matveev 1

  2. Extending Hardware Transactional Memory Capacity via Rollback-Only Transactions and Suspend/Resume POWER8-TM Alexander Shady Issa Pascal Felber Paolo Romano Matveev 1

  3. Transactional Memory • alternative paradigm for parallel programming • easy to use • potential of fine-grained locking performance withdraw(account, value){ __transaction{ if account.balance > value: account.balance -= value; return account.balance; else return -1; } } Transactional memory implementation 2

  4. Hardware Transactional Memory • Intel and IBM processors • implemented in the cache coherence protocol • cache line granularity • best effort • S/W fallback is needed 3

  5. Capacity Limitations 6 90 Throughput (10 6 Tx/s) ROT capacity 80 HTM-SGL 5 ROT conflicts Abort rate (%) 70 Lock aborts 4 60 HTM capacity 50 HTM non-tx 3 40 HTM tx 2 30 20 1 10 0 0 Transaction size 4

  6. Capacity Limitations capacity aborts 6 90 Throughput (10 6 Tx/s) ROT capacity 80 HTM-SGL 5 ROT conflicts Abort rate (%) 70 Lock aborts 4 60 HTM capacity 50 HTM non-tx 3 40 HTM tx 2 30 20 1 10 0 0 Transaction size 4

  7. Capacity Limitations capacity aborts 6 90 Throughput (10 6 Tx/s) ROT capacity 80 HTM-SGL 5 ROT conflicts Abort rate (%) 70 Lock aborts 4 60 HTM capacity 50 HTM non-tx 3 40 HTM tx 2 30 20 1 10 0 0 activation of Transaction size the fallback path 4

  8. POWER8-TM • hardware/software co-design • utilises specific features available in POWER8: • suspend/resume • ROTs • to support execution of larger transactions 5

  9. Rollback-only Transaction • lightweight transaction type • updates are applied atomically • does not track the reads • theoretically infinite read-set • not serialisable 6

  10. ROTs X = 0 X = 0 Thread 1 Thread 2 Begin ROT Begin ROT read X returns 0 X = 1 End ROT read X returns 1 inconsistent value 7

  11. ROTs X = 0 X = 0 Thread 1 Thread 2 Begin ROT Begin ROT WAR read X returns 0 X = 1 End ROT read X returns 1 inconsistent value 7

  12. ROTs X = 0 X = 0 Thread 1 Thread 2 Begin ROT Begin ROT read X returns 0 X = 1 End ROT read X 8

  13. ROTs X = 0 X = 0 Thread 1 Thread 2 Begin ROT Begin ROT read X returns 0 X = 1 read X End ROT returns 0 new value can consistent only appear now 8

  14. ROTs X = 0 X = 0 Thread 1 Thread 2 Begin ROT Begin ROT read X returns 0 X = 1 RAW read X End ROT returns 0 new value can consistent only appear now 8

  15. ROTs X = 0 X = 0 Thread 1 Thread 2 Begin ROT Begin ROT read X X = 1 End ROT read X wait for concurrent ROTs non-transactionally 9

  16. ROTs X = 0 X = 0 Y = 0 Y = 0 Thread 1 Thread 2 Begin ROT Begin ROT WAR read X X = 1 WAR read Y Y = 1 End ROT End ROT 10

  17. ROTs X = 0 X = 0 Y = 0 Y = 0 Thread 1 Thread 2 Begin ROT Begin ROT read X X = 1 X = 0 X = 1 Y = 1 Y = 0 read Y Y = 1 End ROT End ROT 10

  18. Touch-to-Validate • core algorithm of P8TM • to make concurrent execution of ROTs safe and serialisable • basic intuition: convert WAR to RAW 11

  19. T2V X = 0 X = 0 Y = 0 Y = 0 Thread 1 Thread 2 Begin ROT Begin ROT read X write X read Y End ROT write Y End ROT 12

  20. T2V X = 0 X = 0 Y = 0 Y = 0 Thread 1 Thread 2 Begin ROT Begin ROT read X write X read Y write Y End ROT End ROT 12

  21. T2V X = 0 X = 0 Y = 0 Y = 0 Thread 1 Thread 2 Begin ROT Begin ROT read X write X read Y write Y re-read X re-read Y End ROT End ROT 12

  22. T2V X = 0 X = 0 Y = 0 Y = 0 Thread 1 Thread 2 Begin ROT Begin ROT read X write X read Y write Y re-read X re-read Y End ROT End ROT 12

  23. T2V X = 0 X = 0 Y = 0 Y = 0 Thread 1 Thread 2 Begin ROT Begin ROT read X write X read Y write Y re-read X re-read Y End ROT End ROT 12

  24. T2V • needs to track only the addresses • this must be done in software • how can software outperform hardware? 13

  25. TMCAM 1:____________ 2:____________ Begin HTM 3:____________ read A 4:____________ 5:____________ read B 6:____________ 7:____________ read C 8:____________ 9:____________ read D 10:____________ write E End HTM 64:___________ TMCAM 14

  26. TMCAM &A 1:____________ &B 2:____________ Begin HTM &C 3:____________ &D read A 4:____________ &E 5:____________ read B 6:____________ 7:____________ read C 8:____________ 9:____________ read D 10:____________ write E End HTM 64:___________ TMCAM 14

  27. Read-set Tracking 1:___________________________________ 2:___________________________________ Begin ROT 3:___________________________________ read A 4:___________________________________ 5:___________________________________ read B 6:___________________________________ 7:___________________________________ read C 8:___________________________________ 9:___________________________________ read D 10:__________________________________ write E End ROT 64:__________________________________ 15

  28. Read-set Tracking 1:___________________________________ 2:___________________________________ Begin ROT 3:___________________________________ read A 4:___________________________________ store &A 5:___________________________________ read B 6:___________________________________ store &B 7:___________________________________ read C 8:___________________________________ store &C 9:___________________________________ read D 10:__________________________________ store &D write E End ROT 64:__________________________________ 15

  29. Read-set Tracking &A &B &C&D 1:___________________________________ &E 2:___________________________________ Begin ROT 3:___________________________________ read A 4:___________________________________ store &A 5:___________________________________ read B 6:___________________________________ store &B 7:___________________________________ read C 8:___________________________________ store &C 9:___________________________________ read D 10:__________________________________ store &D write E End ROT 64:__________________________________ 15

  30. Read-set Tracking 8 bytes &A &B &C&D 1:___________________________________ &E 2:___________________________________ Begin ROT 3:___________________________________ read A 4:___________________________________ store &A 5:___________________________________ read B 6:___________________________________ store &B 7:___________________________________ read C 8:___________________________________ store &C 9:___________________________________ read D 10:__________________________________ store &D write E End ROT 64:__________________________________ 128bytes 15

  31. Read-set Tracking 8 bytes &A &B &C&D 1:___________________________________ &E 2:___________________________________ Begin ROT 3:___________________________________ up to 16x read A 4:___________________________________ store &A 5:___________________________________ larger read-set read B 6:___________________________________ store &B 7:___________________________________ read C 8:___________________________________ store &C 9:___________________________________ read D 10:__________________________________ store &D write E End ROT 64:__________________________________ 128bytes 15

  32. HTM • transactions may fit in HTM • we need to avoid extra overheads of using ROTs • try first in HTM, if it overflows, fallback to ROT • how can HTMs and ROTs run concurrently? 16

  33. HTM + ROT X = 0 X = 0 Y = 0 Y = 0 Thread 1 Thread 2 Begin HTM Begin ROT read X X = 1 Y = 1 End HTM End ROT 17

  34. HTM + ROT X = 0 X = 0 Y = 0 Y = 0 Thread 1 Thread 2 Begin HTM Begin ROT read X X = 1 Y = 1 End HTM End ROT HTM is protected by H/W 17

  35. HTM + ROT X = 0 X = 0 Y = 0 Y = 0 Thread 1 Thread 2 Begin HTM Begin ROT read X Y = 1 End HTM End ROT HTM is protected by H/W 17

  36. HTM + ROT X = 0 X = 0 Y = 0 Y = 0 Thread 1 Thread 2 Begin HTM Begin ROT read X read Y Y = 1 End HTM read Y End ROT HTM is protected by H/W 17

  37. HTM + ROT X = 0 X = 0 Y = 0 Y = 0 Thread 1 Thread 2 Begin HTM Begin ROT read X read Y returns 0 Y = 1 End HTM inconsistent value read Y returns 1 End ROT HTM is protected by H/W 17

  38. HTM + ROT X = 0 X = 0 Y = 0 Y = 0 Thread 1 Thread 2 Begin HTM Begin ROT read X read Y returns 0 Y = 1 End HTM inconsistent value read Y returns 1 T2V End ROT HTM is protected by H/W 17

  39. HTM + ROT X = 0 X = 0 Y = 0 Y = 0 Thread 1 Thread 2 Begin HTM Begin ROT read X read Y returns 0 Y = 1 using S/R consistent value read Y returns 0 End HTM T2V End ROT HTM is protected by H/W 17

  40. Uninstrumented Read-only • read only transactions without any instrumentation • outside the context of HTM or ROT • no bounds on Tx size • HTMs and ROTs must wait for UROs 18

  41. POWER8-TM w/o Transaction read-only instrumentation Tx update Tx GL HTM ROT 19

Recommend


More recommend