E XPLOITING S EMANTIC C OMMUTATIVITY IN H ARDWARE S PECULATION G UOWEI Z HANG , V IRGINIA C HIU , D ANIEL S ANCHEZ MICRO 2016
Executive summary 2 ¨ Exploiting commutativity benefits update-heavy apps ¤ Software techniques that exploit commutativity incur high run- time overheads (STM is 2-6x slower than HTM) ¤ Prior hardware exploits only single-instruction commutative operations (e.g., addition) ¨ CommTM exploits multi-instruction commutativity ¤ Extends coherence protocol to perform commutative operations locally and concurrently ¤ Leverages HTM to support multi-instruction updates ¤ Benefits speculative execution by reducing conflicts ¤ Accelerates full applications by up to 3.4x at 128 cores
Commutativity 3 ¨ Commutative operations produce equivalent results when reordered ¤ No true data dependence à No need for communication ¤ Software exploits commutativity but incurs high run-time overheads Multi-instruction commutativity Single-instruction Top-K insertion commutativity Set insertion ADD MIN OR Ordered put Coup CommTM [Zhang et al, MICRO 2015]
Commutativity 4 ¨ Commutative operations produce equivalent results when reordered ¤ No true data dependence à No need for communication ¤ Software exploits commutativity but incurs high run-time overheads ¤ Multi-instruction example: set (linked-list) insertion head null insert( a ); insert( b ); a b head null insert( b ); insert( a ); b a head null Different but semantically equivalent states
Example: addition in conventional HTM 6 void add (int* counter, int delta) { tx_begin(); int v = load(counter) ; int nv = v + delta; store(counter, nv) ; tx_end(); }
Example: addition in conventional HTM 6 void add (int* counter, int delta) { tx_begin(); int v = load(counter) ; int nv = v + delta; store(counter, nv) ; tx_end(); }
Example: addition in conventional HTM 6 void add (int* counter, int delta) { A: 20 tx_begin(); int v = load(counter) ; int nv = v + delta; Core 0 Core 1 store(counter, nv) ; tx_end(); } Core 0 Core 1 add(A, 1); add(A, 1); add(A, 1); add(A, 1);
Example: addition in conventional HTM 6 void add (int* counter, int delta) { A: 20 tx_begin(); int v = load(counter) ; read read int nv = v + delta; Core 0 Core 1 store(counter, nv) ; tx_end(); add(A, 1): Txn 0 load A add(A, 1): Txn 1 } load A Core 0 Core 1 add(A, 1); add(A, 1); add(A, 1); add(A, 1);
Example: addition in conventional HTM 6 void add (int* counter, int delta) { A: 20 tx_begin(); int v = load(counter) ; read write read Conflict! int nv = v + delta; Core 0 Core 1 store(counter, nv) ; tx_end(); add(A, 1): Txn 0 load A add(A, 1): Txn 1 } load A store A Core 0 Core 1 add(A, 1); add(A, 1); add(A, 1); add(A, 1);
Example: addition in conventional HTM 6 void add (int* counter, int delta) { A: 20 tx_begin(); int v = load(counter) ; read write int nv = v + delta; Core 0 Core 1 store(counter, nv) ; tx_end(); add(A, 1): Txn 0 load A add(A, 1): Txn 1 } load A store A a b o r t Core 0 Core 1 add(A, 1); add(A, 1); add(A, 1); add(A, 1);
Example: addition in conventional HTM 6 void add (int* counter, int delta) { A: 24 tx_begin(); int v = load(counter) ; int nv = v + delta; Core 0 Core 1 store(counter, nv) ; tx_end(); add(A, 1): Txn 0 load A add(A, 1): Txn 1 } load A store A a b o r t commit Core 0 Core 1 load A restart add(A, 1): Txn 2 load A store A add(A, 1); add(A, 1); abort commit add(A, 1); add(A, 1); add(A, 1): Txn 3 restart load A load A store A a b o r t commit restart load A store A commit
Example: addition in conventional HTM 6 void add (int* counter, int delta) { A: 24 tx_begin(); int v = load(counter) ; int nv = v + delta; Core 0 Core 1 store(counter, nv) ; tx_end(); add(A, 1): Txn 0 load A add(A, 1): Txn 1 } load A store A a b o r t commit Core 0 Core 1 load A restart add(A, 1): Txn 2 load A store A add(A, 1); add(A, 1); abort commit add(A, 1); add(A, 1); add(A, 1): Txn 3 restart load A load A store A a b o r t commit Traffic restart load A Serialization store A commit Wasted transactional work
Example: addition in CommTM 7 void add (int* counter, int delta) { A: 20 tx_begin(); int v = load(counter) ; int nv = v + delta; Core 0 Core 1 store(counter, nv) ; tx_end(); } Core 0 Core 1 add(A, 1); add(A, 1); add(A, 1); add(A, 1);
Example: addition in CommTM 7 void add (int* counter, int delta) { A: 20 tx_begin(); int v = load[ADD](counter) ; int nv = v + delta; Core 0 Core 1 store[ADD](counter, nv) ; tx_end(); } Core 0 Core 1 add(A, 1); add(A, 1); add(A, 1); add(A, 1);
Example: addition in CommTM 7 void add (int* counter, int delta) { ADD ADD A: 20 A: 0 tx_begin(); int v = load[ADD](counter) ; int nv = v + delta; Core 0 Core 1 store[ADD](counter, nv) ; tx_end(); } Core 0 Core 1 add(A, 1); add(A, 1); add(A, 1); add(A, 1);
Example: addition in CommTM 7 void add (int* counter, int delta) { ADD ADD A: 20 A: 0 tx_begin(); int v = load[ADD](counter) ; read read int nv = v + delta; Core 0 Core 1 store[ADD](counter, nv) ; tx_end(); add(A, 1): Txn1 add(A, 1): Txn0 } load[ADD] A load[ADD] A Core 0 Core 1 add(A, 1); add(A, 1); add(A, 1); add(A, 1);
Example: addition in CommTM 7 void add (int* counter, int delta) { ADD ADD A: 20 A: 0 tx_begin(); int v = load[ADD](counter) ; read write read write int nv = v + delta; Core 0 Core 1 store[ADD](counter, nv) ; tx_end(); add(A, 1): Txn1 add(A, 1): Txn0 } load[ADD] A load[ADD] A store[ADD] A store[ADD] A Core 0 Core 1 add(A, 1); add(A, 1); add(A, 1); add(A, 1);
Example: addition in CommTM 7 void add (int* counter, int delta) { ADD ADD A: 21 A: 1 tx_begin(); int v = load[ADD](counter) ; int nv = v + delta; Core 0 Core 1 store[ADD](counter, nv) ; tx_end(); add(A, 1): Txn1 add(A, 1): Txn0 } load[ADD] A load[ADD] A store[ADD] A store[ADD] A commit commit Core 0 Core 1 add(A, 1); add(A, 1); add(A, 1); add(A, 1);
Example: addition in CommTM 7 void add (int* counter, int delta) { ADD ADD A: 22 A: 2 tx_begin(); int v = load[ADD](counter) ; int nv = v + delta; Core 0 Core 1 store[ADD](counter, nv) ; tx_end(); add(A, 1): Txn1 add(A, 1): Txn0 } load[ADD] A load[ADD] A store[ADD] A store[ADD] A commit commit Core 0 Core 1 add(A, 1): Txn3 add(A, 1): Txn2 load[ADD] A load[ADD] A add(A, 1); add(A, 1); store[ADD] A store[ADD] A add(A, 1); add(A, 1); commit commit
Example: addition in CommTM 7 void add (int* counter, int delta) { ADD ADD A: 22 A: 2 tx_begin(); int v = load[ADD](counter) ; int nv = v + delta; Core 0 Core 1 store[ADD](counter, nv) ; tx_end(); add(A, 1): Txn1 add(A, 1): Txn0 } load[ADD] A load[ADD] A store[ADD] A store[ADD] A commit commit Core 0 Core 1 add(A, 1): Txn3 add(A, 1): Txn2 load[ADD] A load[ADD] A add(A, 1); add(A, 1); store[ADD] A store[ADD] A add(A, 1); add(A, 1); commit commit load A
Example: addition in CommTM 7 void add (int* counter, int delta) { A: 24 tx_begin(); int v = load[ADD](counter) ; int nv = v + delta; reduction Core 0 Core 1 store[ADD](counter, nv) ; tx_end(); add(A, 1): Txn1 add(A, 1): Txn0 } load[ADD] A load[ADD] A store[ADD] A store[ADD] A commit commit Core 0 Core 1 add(A, 1): Txn3 add(A, 1): Txn2 load[ADD] A load[ADD] A add(A, 1); add(A, 1); store[ADD] A store[ADD] A add(A, 1); add(A, 1); commit commit load A User-defined reduction
Example: addition in CommTM 7 void add (int* counter, int delta) { A: 24 tx_begin(); int v = load[ADD](counter) ; int nv = v + delta; reduction Core 0 Core 1 store[ADD](counter, nv) ; tx_end(); add(A, 1): Txn1 add(A, 1): Txn0 } load[ADD] A load[ADD] A store[ADD] A store[ADD] A commit commit Core 0 Core 1 add(A, 1): Txn3 add(A, 1): Txn2 load[ADD] A load[ADD] A add(A, 1); add(A, 1); store[ADD] A store[ADD] A add(A, 1); add(A, 1); commit commit load A Less traffic User-defined reduction Concurrent updates Less wasted transactional work Less run-time/memory overheads than STM
CommTM
Programming interface 9 Transactional update void add (int* counter, int delta) { tx_begin(); int v = load[ADD](counter) ; Labeled loads/stores int nv = v + delta; store[ADD](counter, nv) ; tx_end(); } Non-transactional reduction handler counter 16 + void reduce[ADD] (int* counter, int delta) { int v = load[ADD](counter); 20 delta int nv = v + delta; store[ADD](counter, nv); reduce[ADD] } 36 counter
Handling arbitrary object sizes 10 ¨ For objects smaller than a cache line, assume lines are full of aligned elements and reduce all of them void reduce[ADD] (int* counterLine, int[] deltas) { for (int i = 0; I < intsPerCacheLine; i++) { int v = load[ADD](counterLine[i]); int nv = v + deltas[i]; store[ADD](counterLine[i], nv); } }
Handling arbitrary object sizes 10 ¨ For objects smaller than a cache line, assume lines are full of aligned elements and reduce all of them counterLine void reduce[ADD] (int* counterLine, int[] deltas) { for (int i = 0; I < intsPerCacheLine; i++) { int v = load[ADD](counterLine[i]); deltas int nv = v + deltas[i]; store[ADD](counterLine[i], nv); } }
Recommend
More recommend