Extending Hardware Transactional Memory Capacity via Rollback-Only Transactions and Suspend/Resume Alexander Shady Issa Pascal Felber Paolo Romano Matveev 1
Extending Hardware Transactional Memory Capacity via Rollback-Only Transactions and Suspend/Resume POWER8-TM Alexander Shady Issa Pascal Felber Paolo Romano Matveev 1
Transactional Memory • alternative paradigm for parallel programming • easy to use • potential of fine-grained locking performance withdraw(account, value){ __transaction{ if account.balance > value: account.balance -= value; return account.balance; else return -1; } } Transactional memory implementation 2
Hardware Transactional Memory • Intel and IBM processors • implemented in the cache coherence protocol • cache line granularity • best effort • S/W fallback is needed 3
Capacity Limitations 6 90 Throughput (10 6 Tx/s) ROT capacity 80 HTM-SGL 5 ROT conflicts Abort rate (%) 70 Lock aborts 4 60 HTM capacity 50 HTM non-tx 3 40 HTM tx 2 30 20 1 10 0 0 Transaction size 4
Capacity Limitations capacity aborts 6 90 Throughput (10 6 Tx/s) ROT capacity 80 HTM-SGL 5 ROT conflicts Abort rate (%) 70 Lock aborts 4 60 HTM capacity 50 HTM non-tx 3 40 HTM tx 2 30 20 1 10 0 0 Transaction size 4
Capacity Limitations capacity aborts 6 90 Throughput (10 6 Tx/s) ROT capacity 80 HTM-SGL 5 ROT conflicts Abort rate (%) 70 Lock aborts 4 60 HTM capacity 50 HTM non-tx 3 40 HTM tx 2 30 20 1 10 0 0 activation of Transaction size the fallback path 4
POWER8-TM • hardware/software co-design • utilises specific features available in POWER8: • suspend/resume • ROTs • to support execution of larger transactions 5
Rollback-only Transaction • lightweight transaction type • updates are applied atomically • does not track the reads • theoretically infinite read-set • not serialisable 6
ROTs X = 0 X = 0 Thread 1 Thread 2 Begin ROT Begin ROT read X returns 0 X = 1 End ROT read X returns 1 inconsistent value 7
ROTs X = 0 X = 0 Thread 1 Thread 2 Begin ROT Begin ROT WAR read X returns 0 X = 1 End ROT read X returns 1 inconsistent value 7
ROTs X = 0 X = 0 Thread 1 Thread 2 Begin ROT Begin ROT read X returns 0 X = 1 End ROT read X 8
ROTs X = 0 X = 0 Thread 1 Thread 2 Begin ROT Begin ROT read X returns 0 X = 1 read X End ROT returns 0 new value can consistent only appear now 8
ROTs X = 0 X = 0 Thread 1 Thread 2 Begin ROT Begin ROT read X returns 0 X = 1 RAW read X End ROT returns 0 new value can consistent only appear now 8
ROTs X = 0 X = 0 Thread 1 Thread 2 Begin ROT Begin ROT read X X = 1 End ROT read X wait for concurrent ROTs non-transactionally 9
ROTs X = 0 X = 0 Y = 0 Y = 0 Thread 1 Thread 2 Begin ROT Begin ROT WAR read X X = 1 WAR read Y Y = 1 End ROT End ROT 10
ROTs X = 0 X = 0 Y = 0 Y = 0 Thread 1 Thread 2 Begin ROT Begin ROT read X X = 1 X = 0 X = 1 Y = 1 Y = 0 read Y Y = 1 End ROT End ROT 10
Touch-to-Validate • core algorithm of P8TM • to make concurrent execution of ROTs safe and serialisable • basic intuition: convert WAR to RAW 11
T2V X = 0 X = 0 Y = 0 Y = 0 Thread 1 Thread 2 Begin ROT Begin ROT read X write X read Y End ROT write Y End ROT 12
T2V X = 0 X = 0 Y = 0 Y = 0 Thread 1 Thread 2 Begin ROT Begin ROT read X write X read Y write Y End ROT End ROT 12
T2V X = 0 X = 0 Y = 0 Y = 0 Thread 1 Thread 2 Begin ROT Begin ROT read X write X read Y write Y re-read X re-read Y End ROT End ROT 12
T2V X = 0 X = 0 Y = 0 Y = 0 Thread 1 Thread 2 Begin ROT Begin ROT read X write X read Y write Y re-read X re-read Y End ROT End ROT 12
T2V X = 0 X = 0 Y = 0 Y = 0 Thread 1 Thread 2 Begin ROT Begin ROT read X write X read Y write Y re-read X re-read Y End ROT End ROT 12
T2V • needs to track only the addresses • this must be done in software • how can software outperform hardware? 13
TMCAM 1:____________ 2:____________ Begin HTM 3:____________ read A 4:____________ 5:____________ read B 6:____________ 7:____________ read C 8:____________ 9:____________ read D 10:____________ write E End HTM 64:___________ TMCAM 14
TMCAM &A 1:____________ &B 2:____________ Begin HTM &C 3:____________ &D read A 4:____________ &E 5:____________ read B 6:____________ 7:____________ read C 8:____________ 9:____________ read D 10:____________ write E End HTM 64:___________ TMCAM 14
Read-set Tracking 1:___________________________________ 2:___________________________________ Begin ROT 3:___________________________________ read A 4:___________________________________ 5:___________________________________ read B 6:___________________________________ 7:___________________________________ read C 8:___________________________________ 9:___________________________________ read D 10:__________________________________ write E End ROT 64:__________________________________ 15
Read-set Tracking 1:___________________________________ 2:___________________________________ Begin ROT 3:___________________________________ read A 4:___________________________________ store &A 5:___________________________________ read B 6:___________________________________ store &B 7:___________________________________ read C 8:___________________________________ store &C 9:___________________________________ read D 10:__________________________________ store &D write E End ROT 64:__________________________________ 15
Read-set Tracking &A &B &C&D 1:___________________________________ &E 2:___________________________________ Begin ROT 3:___________________________________ read A 4:___________________________________ store &A 5:___________________________________ read B 6:___________________________________ store &B 7:___________________________________ read C 8:___________________________________ store &C 9:___________________________________ read D 10:__________________________________ store &D write E End ROT 64:__________________________________ 15
Read-set Tracking 8 bytes &A &B &C&D 1:___________________________________ &E 2:___________________________________ Begin ROT 3:___________________________________ read A 4:___________________________________ store &A 5:___________________________________ read B 6:___________________________________ store &B 7:___________________________________ read C 8:___________________________________ store &C 9:___________________________________ read D 10:__________________________________ store &D write E End ROT 64:__________________________________ 128bytes 15
Read-set Tracking 8 bytes &A &B &C&D 1:___________________________________ &E 2:___________________________________ Begin ROT 3:___________________________________ up to 16x read A 4:___________________________________ store &A 5:___________________________________ larger read-set read B 6:___________________________________ store &B 7:___________________________________ read C 8:___________________________________ store &C 9:___________________________________ read D 10:__________________________________ store &D write E End ROT 64:__________________________________ 128bytes 15
HTM • transactions may fit in HTM • we need to avoid extra overheads of using ROTs • try first in HTM, if it overflows, fallback to ROT • how can HTMs and ROTs run concurrently? 16
HTM + ROT X = 0 X = 0 Y = 0 Y = 0 Thread 1 Thread 2 Begin HTM Begin ROT read X X = 1 Y = 1 End HTM End ROT 17
HTM + ROT X = 0 X = 0 Y = 0 Y = 0 Thread 1 Thread 2 Begin HTM Begin ROT read X X = 1 Y = 1 End HTM End ROT HTM is protected by H/W 17
HTM + ROT X = 0 X = 0 Y = 0 Y = 0 Thread 1 Thread 2 Begin HTM Begin ROT read X Y = 1 End HTM End ROT HTM is protected by H/W 17
HTM + ROT X = 0 X = 0 Y = 0 Y = 0 Thread 1 Thread 2 Begin HTM Begin ROT read X read Y Y = 1 End HTM read Y End ROT HTM is protected by H/W 17
HTM + ROT X = 0 X = 0 Y = 0 Y = 0 Thread 1 Thread 2 Begin HTM Begin ROT read X read Y returns 0 Y = 1 End HTM inconsistent value read Y returns 1 End ROT HTM is protected by H/W 17
HTM + ROT X = 0 X = 0 Y = 0 Y = 0 Thread 1 Thread 2 Begin HTM Begin ROT read X read Y returns 0 Y = 1 End HTM inconsistent value read Y returns 1 T2V End ROT HTM is protected by H/W 17
HTM + ROT X = 0 X = 0 Y = 0 Y = 0 Thread 1 Thread 2 Begin HTM Begin ROT read X read Y returns 0 Y = 1 using S/R consistent value read Y returns 0 End HTM T2V End ROT HTM is protected by H/W 17
Uninstrumented Read-only • read only transactions without any instrumentation • outside the context of HTM or ROT • no bounds on Tx size • HTMs and ROTs must wait for UROs 18
POWER8-TM w/o Transaction read-only instrumentation Tx update Tx GL HTM ROT 19
Recommend
More recommend