self tuning intel tsx
play

Self-Tuning Intel TSX 3rd Euro-TM Workshop on Transactional Memory - PowerPoint PPT Presentation

Self-Tuning Intel TSX 3rd Euro-TM Workshop on Transactional Memory Nuno Diegues and Paolo Romano to appear on the 11th USENIX ICAC 2014 Using TSX _xbegin // your transactional code _xend Using TSX _xbegin // your


  1. Self-Tuning Intel TSX 3rd Euro-TM Workshop on Transactional Memory Nuno Diegues and Paolo Romano to appear on the 11th USENIX ICAC 2014

  2. Using TSX _xbegin � // your transactional code � _xend

  3. Using TSX _xbegin � // your transactional code � _xend May Abort

  4. Using TSX _xbegin � // your transactional code � _xend Data contention • Forbidden • instructions May Abort Hardware buffers’ • capacity Signals and faults •

  5. Using TSX _xbegin � // your transactional code � _xend Data contention • Forbidden • instructions May Abort Hardware buffers’ • Transparently � capacity Restarts Signals and faults •

  6. Using TSX Best-effort nature � we cannot rely exclusively on TSX

  7. Best-effort nature Not *that* specific to Intel TSX. IBM HTMs apply partly here too

  8. Best-effort nature Not *that* specific to Intel TSX. IBM HTMs apply begin: partly here too unsigned int status = _xbegin if (status == ok) goto code � goto begin � � � code: // your transactional code � � _xend �

  9. Best-effort nature Not *that* specific to Intel TSX. IBM HTMs apply begin: � partly here too unsigned int status = _xbegin � if (status == ok) � goto code goto code // fast path � � goto begin � � � � � � � code: � // your transactional code � � � � � _xend _xend // fast path � �

  10. Best-effort nature Not *that* specific to Intel TSX. IBM HTMs apply begin: begin: � partly here too unsigned int status = _xbegin unsigned int status = _xbegin � if (status == ok) if (status == ok) � goto code goto code goto code // fast path // fast path if (shouldRetry) // retry policy � � goto begin goto begin � � � � � � � � � � code: code: � // your transactional code // your transactional code � � � � if (shouldRetry) � � _xend _xend _xend // fast path // fast path � � �

  11. Best-effort nature Not *that* specific to Intel TSX. IBM HTMs apply begin: begin: begin: � partly here too unsigned int status = _xbegin unsigned int status = _xbegin unsigned int status = _xbegin � if (status == ok) if (status == ok) if (status == ok) � goto code goto code goto code goto code // fast path // fast path // fast path if (shouldRetry) // retry policy if (shouldRetry) // retry policy � � goto begin goto begin goto begin � else � � � acquire(lock) // fallback � � � � � � � code: code: code: � // your transactional code // your transactional code // your transactional code � � � � � if (shouldRetry) if (shouldRetry) � � _xend _xend _xend _xend // fast path // fast path // fast path else � � � release(lock) // fallback

  12. Best-effort nature Not *that* specific to Intel TSX. IBM HTMs apply begin: begin: begin: � partly here too unsigned int status = _xbegin unsigned int status = _xbegin unsigned int status = _xbegin � if (status == ok) if (status == ok) if (status == ok) � goto code goto code goto code goto code // fast path // fast path // fast path if (shouldRetry) // retry policy if (shouldRetry) // retry policy � � goto begin goto begin goto begin � else � � � acquire(lock) // fallback � � � � � � � Transactions need code: code: code: � to be aware of this // your transactional code // your transactional code // your transactional code � � � � � if (shouldRetry) if (shouldRetry) � � _xend _xend _xend _xend // fast path // fast path // fast path else � � � release(lock) // fallback

  13. Summary of issues • Lemming effect � • Number of attempts � • Retry policy � • Management of fall-back

  14. Summary of issues wait-stubborn-4 GCC (Possible) Self-Tuning 3 wait-giveup-4 wait-stubborn-11 2 speedup wait-stubborn-4 wait-half-8 wait-half-11 aux-giveup-3 none-giveup-1 1 0 1 2 3 4 5 6 7 8 threads Genome from STAMP suite

  15. Number of attempts Kmeans from STAMP high contention 4 low contention speedup 2 1 2 4 6 retries 12 14 16

  16. Number of attempts Kmeans from STAMP high contention 4 low contention � t speedup n e c s e D t n e i d a r G n o i t a r o l p x e r o f 2 1 2 4 6 retries 12 14 16

  17. Gradient Descent tuning the number of attempts performance ? optimization round #attempts

  18. Gradient Descent tuning the number of attempts performance ? 1 optimization round #attempts

  19. Gradient Descent tuning the number of attempts performance ? 1 optimization round #attempts randomly search some direction; explore it while profitable

  20. Gradient Descent tuning the number of attempts performance 2 ? 1 optimization round #attempts randomly search some direction; explore it while profitable

  21. Gradient Descent tuning the number of attempts performance 3 2 4 ? 1 optimization round #attempts randomly search some direction; explore it while profitable

  22. Gradient Descent tuning the number of attempts performance 3 2 4 ? 1 optimization round #attempts randomly search some direction; explore it while profitable revert direction when not profitable

  23. Gradient Descent tuning the number of attempts threshold for stabilization performance 3 2 4 ? 1 optimization round #attempts randomly search some direction; explore it while profitable revert direction when not profitable

  24. Gradient Descent tuning the number of attempts threshold for stabilization performance 5 3 2 4 ? 1 random optimization jump round #attempts randomly search some direction; explore it while profitable revert direction when not profitable random jumps to avoid local minima

  25. Gradient Descent tuning the number of attempts threshold for stabilization 6 performance 5 3 2 4 ? 1 random optimization jump round #attempts randomly search some direction; explore it while profitable revert direction when not profitable random jumps to avoid local minima

  26. Gradient Descent tuning the number of attempts threshold for stabilization 6 performance 5 3 2 4 ? 1 random optimization jump 7 round #attempts randomly search some direction; explore it while profitable revert direction when not profitable random jumps to avoid local minima

  27. Gradient Descent tuning the number of attempts memorize maxima threshold for stabilization 6 performance 5 3 2 4 ? recover from 1 random unlucky jumps optimization jump 7 round #attempts randomly search some direction; explore it while profitable revert direction when not profitable random jumps to avoid local minima

  28. Retry policy • Give up on capacity aborts? • How should we “consume” the attempts’ budget? • How to manage the fall-back?

  29. Retry policy � g n i n r a e l t n e m e c r o f n d i e n R u o B e c n e d fi n o C r e p p U

  30. UCB tuning the retry policy ? ? ? Lever A Lever B Lever C

  31. UCB tuning the retry policy ? ? ? Lever A Lever B Lever C A quest for exploration vs benefit from current knowledge

  32. UCB tuning the retry policy ? ? ? Lever A Lever B Lever C A quest for exploration vs benefit from current knowledge UCB adapts the strategy to maximize reward Logarithmic bound on the optimization error

  33. UCB tuning the retry policy Model the belief about capacity aborts: • giveup — exhaust attempts • half — drops half the attempts • stubborn — decrements attempts Reward: function of processor cycles (RDTSC)

  34. Adaptation of one atomic block in Yada

  35. Adaptation of one atomic block in Yada optimizers � are *not* � independent

  36. Transparency to the User fetch atomic block's stats yes Profile cycles atomic_begin Re-optimize? no govern retry management retry abort fetch last Begin Tx procedure configuration gcc libitm application execute logic atomic block gcc libitm changes next yes configuration Re-optimize? atomic_end Profile cycles no Run grad() End Tx Run ucb() continue Procedure program

  37. Transparency to the User fetch atomic block's stats yes Profile cycles atomic_begin Re-optimize? no govern retry management retry abort fetch last Begin Tx procedure configuration gcc libitm application execute logic atomic block gcc libitm changes next yes configuration Re-optimize? atomic_end Profile cycles no Run grad() End Tx Run ucb() continue Procedure program

  38. Transparency to the User fetch atomic block's stats yes Profile cycles atomic_begin Re-optimize? no govern retry management retry abort fetch last Begin Tx procedure configuration gcc libitm application execute logic atomic block gcc libitm changes next yes configuration Re-optimize? atomic_end Profile cycles no Run grad() End Tx Run ucb() continue Procedure program

  39. Summary of Evaluation

  40. Summary of Evaluation

  41. Peek view on results Intruder from STAMP “ideal” 4 self-tuning 3 speedup 2 1 1 2 3 4 5 6 7 8 threads

  42. Peek view on results Yada with 8 threads GCC Heuristic 3 throughput (1000 txs/sec) AdaptiveLocks Tuner 2 1 5 execution time (sec) 20 25 benchmark finished

Recommend


More recommend