e xploiting s emantic c ommutativity in h ardware s
play

E XPLOITING S EMANTIC C OMMUTATIVITY IN H ARDWARE S PECULATION G - PowerPoint PPT Presentation

E XPLOITING S EMANTIC C OMMUTATIVITY IN H ARDWARE S PECULATION G UOWEI Z HANG , V IRGINIA C HIU , D ANIEL S ANCHEZ MICRO 2016 Executive summary 2 Exploiting commutativity benefits update-heavy apps Software techniques that exploit


  1. E XPLOITING S EMANTIC C OMMUTATIVITY IN H ARDWARE S PECULATION G UOWEI Z HANG , V IRGINIA C HIU , D ANIEL S ANCHEZ MICRO 2016

  2. Executive summary 2 ¨ Exploiting commutativity benefits update-heavy apps ¤ Software techniques that exploit commutativity incur high run- time overheads (STM is 2-6x slower than HTM) ¤ Prior hardware exploits only single-instruction commutative operations (e.g., addition) ¨ CommTM exploits multi-instruction commutativity ¤ Extends coherence protocol to perform commutative operations locally and concurrently ¤ Leverages HTM to support multi-instruction updates ¤ Benefits speculative execution by reducing conflicts ¤ Accelerates full applications by up to 3.4x at 128 cores

  3. Commutativity 3 ¨ Commutative operations produce equivalent results when reordered ¤ No true data dependence à No need for communication ¤ Software exploits commutativity but incurs high run-time overheads Multi-instruction commutativity Single-instruction Top-K insertion commutativity Set insertion ADD MIN OR Ordered put Coup CommTM [Zhang et al, MICRO 2015]

  4. Commutativity 4 ¨ Commutative operations produce equivalent results when reordered ¤ No true data dependence à No need for communication ¤ Software exploits commutativity but incurs high run-time overheads ¤ Multi-instruction example: set (linked-list) insertion head null insert( a ); insert( b ); a b head null insert( b ); insert( a ); b a head null Different but semantically equivalent states

  5. Example: addition in conventional HTM 6 void add (int* counter, int delta) { tx_begin(); int v = load(counter) ; int nv = v + delta; store(counter, nv) ; tx_end(); }

  6. Example: addition in conventional HTM 6 void add (int* counter, int delta) { tx_begin(); int v = load(counter) ; int nv = v + delta; store(counter, nv) ; tx_end(); }

  7. Example: addition in conventional HTM 6 void add (int* counter, int delta) { A: 20 tx_begin(); int v = load(counter) ; int nv = v + delta; Core 0 Core 1 store(counter, nv) ; tx_end(); } Core 0 Core 1 add(A, 1); add(A, 1); add(A, 1); add(A, 1);

  8. Example: addition in conventional HTM 6 void add (int* counter, int delta) { A: 20 tx_begin(); int v = load(counter) ; read read int nv = v + delta; Core 0 Core 1 store(counter, nv) ; tx_end(); add(A, 1): Txn 0 load A add(A, 1): Txn 1 } load A Core 0 Core 1 add(A, 1); add(A, 1); add(A, 1); add(A, 1);

  9. Example: addition in conventional HTM 6 void add (int* counter, int delta) { A: 20 tx_begin(); int v = load(counter) ; read write read Conflict! int nv = v + delta; Core 0 Core 1 store(counter, nv) ; tx_end(); add(A, 1): Txn 0 load A add(A, 1): Txn 1 } load A store A Core 0 Core 1 add(A, 1); add(A, 1); add(A, 1); add(A, 1);

  10. Example: addition in conventional HTM 6 void add (int* counter, int delta) { A: 20 tx_begin(); int v = load(counter) ; read write int nv = v + delta; Core 0 Core 1 store(counter, nv) ; tx_end(); add(A, 1): Txn 0 load A add(A, 1): Txn 1 } load A store A a b o r t Core 0 Core 1 add(A, 1); add(A, 1); add(A, 1); add(A, 1);

  11. Example: addition in conventional HTM 6 void add (int* counter, int delta) { A: 24 tx_begin(); int v = load(counter) ; int nv = v + delta; Core 0 Core 1 store(counter, nv) ; tx_end(); add(A, 1): Txn 0 load A add(A, 1): Txn 1 } load A store A a b o r t commit Core 0 Core 1 load A restart add(A, 1): Txn 2 load A store A add(A, 1); add(A, 1); abort commit add(A, 1); add(A, 1); add(A, 1): Txn 3 restart load A load A store A a b o r t commit restart load A store A commit

  12. Example: addition in conventional HTM 6 void add (int* counter, int delta) { A: 24 tx_begin(); int v = load(counter) ; int nv = v + delta; Core 0 Core 1 store(counter, nv) ; tx_end(); add(A, 1): Txn 0 load A add(A, 1): Txn 1 } load A store A a b o r t commit Core 0 Core 1 load A restart add(A, 1): Txn 2 load A store A add(A, 1); add(A, 1); abort commit add(A, 1); add(A, 1); add(A, 1): Txn 3 restart load A load A store A a b o r t commit Traffic restart load A Serialization store A commit Wasted transactional work

  13. Example: addition in CommTM 7 void add (int* counter, int delta) { A: 20 tx_begin(); int v = load(counter) ; int nv = v + delta; Core 0 Core 1 store(counter, nv) ; tx_end(); } Core 0 Core 1 add(A, 1); add(A, 1); add(A, 1); add(A, 1);

  14. Example: addition in CommTM 7 void add (int* counter, int delta) { A: 20 tx_begin(); int v = load[ADD](counter) ; int nv = v + delta; Core 0 Core 1 store[ADD](counter, nv) ; tx_end(); } Core 0 Core 1 add(A, 1); add(A, 1); add(A, 1); add(A, 1);

  15. Example: addition in CommTM 7 void add (int* counter, int delta) { ADD ADD A: 20 A: 0 tx_begin(); int v = load[ADD](counter) ; int nv = v + delta; Core 0 Core 1 store[ADD](counter, nv) ; tx_end(); } Core 0 Core 1 add(A, 1); add(A, 1); add(A, 1); add(A, 1);

  16. Example: addition in CommTM 7 void add (int* counter, int delta) { ADD ADD A: 20 A: 0 tx_begin(); int v = load[ADD](counter) ; read read int nv = v + delta; Core 0 Core 1 store[ADD](counter, nv) ; tx_end(); add(A, 1): Txn1 add(A, 1): Txn0 } load[ADD] A load[ADD] A Core 0 Core 1 add(A, 1); add(A, 1); add(A, 1); add(A, 1);

  17. Example: addition in CommTM 7 void add (int* counter, int delta) { ADD ADD A: 20 A: 0 tx_begin(); int v = load[ADD](counter) ; read write read write int nv = v + delta; Core 0 Core 1 store[ADD](counter, nv) ; tx_end(); add(A, 1): Txn1 add(A, 1): Txn0 } load[ADD] A load[ADD] A store[ADD] A store[ADD] A Core 0 Core 1 add(A, 1); add(A, 1); add(A, 1); add(A, 1);

  18. Example: addition in CommTM 7 void add (int* counter, int delta) { ADD ADD A: 21 A: 1 tx_begin(); int v = load[ADD](counter) ; int nv = v + delta; Core 0 Core 1 store[ADD](counter, nv) ; tx_end(); add(A, 1): Txn1 add(A, 1): Txn0 } load[ADD] A load[ADD] A store[ADD] A store[ADD] A commit commit Core 0 Core 1 add(A, 1); add(A, 1); add(A, 1); add(A, 1);

  19. Example: addition in CommTM 7 void add (int* counter, int delta) { ADD ADD A: 22 A: 2 tx_begin(); int v = load[ADD](counter) ; int nv = v + delta; Core 0 Core 1 store[ADD](counter, nv) ; tx_end(); add(A, 1): Txn1 add(A, 1): Txn0 } load[ADD] A load[ADD] A store[ADD] A store[ADD] A commit commit Core 0 Core 1 add(A, 1): Txn3 add(A, 1): Txn2 load[ADD] A load[ADD] A add(A, 1); add(A, 1); store[ADD] A store[ADD] A add(A, 1); add(A, 1); commit commit

  20. Example: addition in CommTM 7 void add (int* counter, int delta) { ADD ADD A: 22 A: 2 tx_begin(); int v = load[ADD](counter) ; int nv = v + delta; Core 0 Core 1 store[ADD](counter, nv) ; tx_end(); add(A, 1): Txn1 add(A, 1): Txn0 } load[ADD] A load[ADD] A store[ADD] A store[ADD] A commit commit Core 0 Core 1 add(A, 1): Txn3 add(A, 1): Txn2 load[ADD] A load[ADD] A add(A, 1); add(A, 1); store[ADD] A store[ADD] A add(A, 1); add(A, 1); commit commit load A

  21. Example: addition in CommTM 7 void add (int* counter, int delta) { A: 24 tx_begin(); int v = load[ADD](counter) ; int nv = v + delta; reduction Core 0 Core 1 store[ADD](counter, nv) ; tx_end(); add(A, 1): Txn1 add(A, 1): Txn0 } load[ADD] A load[ADD] A store[ADD] A store[ADD] A commit commit Core 0 Core 1 add(A, 1): Txn3 add(A, 1): Txn2 load[ADD] A load[ADD] A add(A, 1); add(A, 1); store[ADD] A store[ADD] A add(A, 1); add(A, 1); commit commit load A User-defined reduction

  22. Example: addition in CommTM 7 void add (int* counter, int delta) { A: 24 tx_begin(); int v = load[ADD](counter) ; int nv = v + delta; reduction Core 0 Core 1 store[ADD](counter, nv) ; tx_end(); add(A, 1): Txn1 add(A, 1): Txn0 } load[ADD] A load[ADD] A store[ADD] A store[ADD] A commit commit Core 0 Core 1 add(A, 1): Txn3 add(A, 1): Txn2 load[ADD] A load[ADD] A add(A, 1); add(A, 1); store[ADD] A store[ADD] A add(A, 1); add(A, 1); commit commit load A Less traffic User-defined reduction Concurrent updates Less wasted transactional work Less run-time/memory overheads than STM

  23. CommTM

  24. Programming interface 9 Transactional update void add (int* counter, int delta) { tx_begin(); int v = load[ADD](counter) ; Labeled loads/stores int nv = v + delta; store[ADD](counter, nv) ; tx_end(); } Non-transactional reduction handler counter 16 + void reduce[ADD] (int* counter, int delta) { int v = load[ADD](counter); 20 delta int nv = v + delta; store[ADD](counter, nv); reduce[ADD] } 36 counter

  25. Handling arbitrary object sizes 10 ¨ For objects smaller than a cache line, assume lines are full of aligned elements and reduce all of them void reduce[ADD] (int* counterLine, int[] deltas) { for (int i = 0; I < intsPerCacheLine; i++) { int v = load[ADD](counterLine[i]); int nv = v + deltas[i]; store[ADD](counterLine[i], nv); } }

  26. Handling arbitrary object sizes 10 ¨ For objects smaller than a cache line, assume lines are full of aligned elements and reduce all of them counterLine void reduce[ADD] (int* counterLine, int[] deltas) { for (int i = 0; I < intsPerCacheLine; i++) { int v = load[ADD](counterLine[i]); deltas int nv = v + deltas[i]; store[ADD](counterLine[i], nv); } }

Recommend


More recommend