Self-Tuning Intel TSX Nuno Diegues and Paolo Romano HTDC 2014
PhD Thesis: Protocols and Abstractions for Efficient Transactional Systems • Reduce aborts • preserving consistency • shared memory / distributed � • Efficient transactional indexation/search � • Energy-efficiency of TM systems � • More recently: how can we leverage on hardware for efficient transactional systems
Using TSX _xbegin � // your transactional code � _xend
Using TSX _xbegin � // your transactional code � _xend May Abort
Using TSX _xbegin � // your transactional code � _xend Data contention • Forbidden • instructions May Abort Hardware buffers’ • capacity Signals and faults •
Using TSX _xbegin � // your transactional code � _xend Data contention • Forbidden • instructions May Abort Hardware buffers’ • Transparently � capacity Restarts Signals and faults •
Using TSX Best-effort nature � we cannot rely exclusively on TSX
Best-effort nature Not *that* specific to Intel TSX. IBM HTMs apply partly here too
Best-effort nature Not *that* specific to Intel TSX. IBM HTMs apply begin: partly here too unsigned int status = _xbegin if (status == ok) goto code // retry policy if (shouldRetry) goto begin else acquire(lock) � code: // your transactional code � if (shouldRetry) _xend else release(lock)
Best-effort nature Not *that* specific to Intel TSX. IBM HTMs apply begin: partly here too unsigned int status = _xbegin if (status == ok) goto code // retry policy if (shouldRetry) goto begin else Transactions need acquire(lock) to be aware of this � code: // your transactional code � if (shouldRetry) _xend else release(lock)
Summary of issues Abort Code • Lemming effect � retry Transient Failure • Number of attempts � Contention to conflict Data • Retry policy � Exceeded capacity Cache Capacity • Management of fall-back explicit _xabort invoked other …
Lemming Effect begin: wait lock is free unsigned int status = _xbegin if (status == ok) goto code // retry policy if (shouldRetry) goto begin Programming with HLE else acquire(lock) or Afek et al. PPoPP 2013 � ...
Number of attempts Kmeans from STAMP high contention 4 low contention speedup 2 1 2 4 6 retries 12 14 16
Number of attempts Kmeans from STAMP high contention 4 low contention � t speedup n e c s e D t n e i d a r G n o i t a r o l p x e r o f 2 1 2 4 6 retries 12 14 16
Retry policy • How many attempts in hardware? • Give up on capacity aborts? • How to manage the fall-back?
Retry policy wait-stubborn-4 GCC (Possible) Self-Tuning 3 wait-giveup-4 wait-stubborn-11 2 speedup wait-stubborn-4 wait-half-8 wait-half-11 aux-giveup-3 none-giveup-1 1 0 1 2 3 4 5 6 7 8 threads
Retry policy wait-stubborn-4 GCC (Possible) Self-Tuning 3 wait-giveup-4 wait-stubborn-11 2 speedup wait-stubborn-4 � g n wait-half-8 wait-half-11 i n r a aux-giveup-3 e l t n e m e c r o f n d i e n R u o B e c n e d fi n o C r e p none-giveup-1 p U 1 0 1 2 3 4 5 6 7 8 threads
Self-Tuning TSX fetch atomic block's stats yes Profile cycles atomic_begin Re-optimize? no govern retry management retry abort fetch last Begin Tx procedure configuration gcc libitm application execute logic atomic block gcc libitm changes next yes configuration Re-optimize? atomic_end requires Profile cycles no more Run grad() End Tx Run ucb() continue work Procedure program
Quick flavour on results Yada from STAMP GCC Heuristic 3 throughput (1000 txs/sec) AdaptiveLocks Tuner 2 1 5 execution time (sec) 20 25 benchmark finished
Quick flavour on results Intruder from STAMP “ideal” 4 self-tuning 3 speedup 2 1 1 2 3 4 5 6 7 8 threads
Summary • Best-effort HTMs need proper tuning • No one-size fits all • We used lightweight exploration/learning techniques • Transparent to the programmer
Recommend
More recommend