self tuning intel tsx
play

Self-Tuning Intel TSX Nuno Diegues and Paolo Romano HTDC 2014 PhD - PowerPoint PPT Presentation

Self-Tuning Intel TSX Nuno Diegues and Paolo Romano HTDC 2014 PhD Thesis: Protocols and Abstractions for Efficient Transactional Systems Reduce aborts preserving consistency shared memory / distributed Efficient transactional


  1. Self-Tuning Intel TSX Nuno Diegues and Paolo Romano HTDC 2014

  2. PhD Thesis: Protocols and Abstractions for Efficient Transactional Systems • Reduce aborts • preserving consistency • shared memory / distributed � • Efficient transactional indexation/search � • Energy-efficiency of TM systems � • More recently: how can we leverage on hardware for efficient transactional systems

  3. Using TSX _xbegin � // your transactional code � _xend

  4. Using TSX _xbegin � // your transactional code � _xend May Abort

  5. Using TSX _xbegin � // your transactional code � _xend Data contention • Forbidden • instructions May Abort Hardware buffers’ • capacity Signals and faults •

  6. Using TSX _xbegin � // your transactional code � _xend Data contention • Forbidden • instructions May Abort Hardware buffers’ • Transparently � capacity Restarts Signals and faults •

  7. Using TSX Best-effort nature � we cannot rely exclusively on TSX

  8. Best-effort nature Not *that* specific to Intel TSX. IBM HTMs apply partly here too

  9. Best-effort nature Not *that* specific to Intel TSX. IBM HTMs apply begin: partly here too unsigned int status = _xbegin if (status == ok) goto code // retry policy if (shouldRetry) goto begin else acquire(lock) � code: // your transactional code � if (shouldRetry) _xend else release(lock)

  10. Best-effort nature Not *that* specific to Intel TSX. IBM HTMs apply begin: partly here too unsigned int status = _xbegin if (status == ok) goto code // retry policy if (shouldRetry) goto begin else Transactions need acquire(lock) to be aware of this � code: // your transactional code � if (shouldRetry) _xend else release(lock)

  11. Summary of issues Abort Code • Lemming effect � retry Transient Failure • Number of attempts � Contention to conflict Data • Retry policy � Exceeded capacity Cache Capacity • Management of fall-back explicit _xabort invoked other …

  12. Lemming Effect begin: wait lock is free unsigned int status = _xbegin if (status == ok) goto code // retry policy if (shouldRetry) goto begin Programming with HLE else acquire(lock) or Afek et al. PPoPP 2013 � ...

  13. Number of attempts Kmeans from STAMP high contention 4 low contention speedup 2 1 2 4 6 retries 12 14 16

  14. Number of attempts Kmeans from STAMP high contention 4 low contention � t speedup n e c s e D t n e i d a r G n o i t a r o l p x e r o f 2 1 2 4 6 retries 12 14 16

  15. Retry policy • How many attempts in hardware? • Give up on capacity aborts? • How to manage the fall-back?

  16. Retry policy wait-stubborn-4 GCC (Possible) Self-Tuning 3 wait-giveup-4 wait-stubborn-11 2 speedup wait-stubborn-4 wait-half-8 wait-half-11 aux-giveup-3 none-giveup-1 1 0 1 2 3 4 5 6 7 8 threads

  17. Retry policy wait-stubborn-4 GCC (Possible) Self-Tuning 3 wait-giveup-4 wait-stubborn-11 2 speedup wait-stubborn-4 � g n wait-half-8 wait-half-11 i n r a aux-giveup-3 e l t n e m e c r o f n d i e n R u o B e c n e d fi n o C r e p none-giveup-1 p U 1 0 1 2 3 4 5 6 7 8 threads

  18. Self-Tuning TSX fetch atomic block's stats yes Profile cycles atomic_begin Re-optimize? no govern retry management retry abort fetch last Begin Tx procedure configuration gcc libitm application execute logic atomic block gcc libitm changes next yes configuration Re-optimize? atomic_end requires Profile cycles no more Run grad() End Tx Run ucb() continue work Procedure program

  19. Quick flavour on results Yada from STAMP GCC Heuristic 3 throughput (1000 txs/sec) AdaptiveLocks Tuner 2 1 5 execution time (sec) 20 25 benchmark finished

  20. Quick flavour on results Intruder from STAMP “ideal” 4 self-tuning 3 speedup 2 1 1 2 3 4 5 6 7 8 threads

  21. Summary • Best-effort HTMs need proper tuning • No one-size fits all • We used lightweight exploration/learning techniques • Transparent to the programmer

Recommend


More recommend