27 th Symposium on Parallel Architectures and Algorithms SEER PROBABILISTIC SCHEDULING FOR COMMODITY HARDWARE TRANSACTIONAL MEMORY Nuno Diegues , Paolo Romano and Stoyan Garbatov
2 Seer: Scheduling for Commodity HTM SPAA 2015 The multi-core (r)evolution Shared Memory Multi-cores are now ubiquitous Concurrent programming is complex Transactional Memory System CPU CPU CPU CPU Classic approach: Transactional 1 2 3 4 Locking Memory abstraction atomic { Hard to get right: withdraw(acc1,val); • fine-grained locks deposit(acc2,val); • deadlocks } • correctness Programmer identifies atomic blocks Runtime implements synchronization
3 Seer: Scheduling for Commodity HTM SPAA 2015 Too much optimism y = x x++ • Problem: CPU time is wasted y e l k l i y i f t n e d I • run other computations instead e r o f e b s c t l i n f o c • inhibit parallelism n e p p a h y e • improve cache usage h t • increase core frequency • reduce power consumption
4 Seer: Scheduling for Commodity HTM SPAA 2015 Scheduler • Software TM (STM): library has full concurrency control • can point precisely the culprit for the conflict e l b a l a i v a M T H y t d i o m m o c n i • Hardware TM (HTM): feedback is quite limited s r o s s e c o r p • rough categorization for the type of conflict
5 Seer: Scheduling for Commodity HTM SPAA 2015 Objective: Scheduling for Commodity HTM How to find the root cause for the data conflict? Avoid running T1 and T2 concurrently
6 Seer: Scheduling for Commodity HTM SPAA 2015 In an ideal world for HTMs … Transactions restart xbegin widthdraw(acc1,val) deposit(acc2,val) Transactions may abort: xend • because of contention on same memory locations … and every transaction shall eventually succeed
7 Seer: Scheduling for Commodity HTM SPAA 2015 … in practice: HTMS are Best-Effort No progress guarantees: • A transaction may always abort … due to a number of reasons: • Forbidden instructions • Capacity of caches (for reads and writes) • Faults and signals • Contending transactions, aborting each other
8 Seer: Scheduling for Commodity HTM SPAA 2015 Single Global Lock SGL fall-back path for HTM • Hardware transaction executes if SGL is free • Acquire SGL depending on retry policy • SGL is a very simple scheduler • Ignores the root cause • Takes a global decision --- the SGL • Adaptive Transaction Scheduling [SPAA08] r e t t e b d e e n e W r o f g n l i u d e h c S s M T H y t i d o m m o C
9 Seer: Scheduling for Commodity HTM SPAA 2015 Related Work Scheduler Support for Support for Imprecise Schedules Transactions in a HTM? Information? Fine-Grained Fashion? ATS [SPAA08] Yes Yes No CAR-STM [PODC08] No No Yes Shrink [PODC09] No No Yes ProPS [Euro-Par14] No No Yes SER [PPoPP10] No No Yes TxLinux [SOSP07] Yes No Yes SOA [HiPEAC09/10] Yes No Yes Seer Yes Yes Yes
10 Seer: Scheduling for Commodity HTM SPAA 2015 Key Idea • Transactions to be executed are announced • Many observations are collected • upon transaction commit and abort • which transactions were active at the same time? • Over time, the outliers will be identifiable w.h.p. • A dynamic , fine-grained , locking scheme is devised
11 Seer: Scheduling for Commodity HTM SPAA 2015 Seer: overview Transaction = source code transaction active transactions
12 Seer: Scheduling for Commodity HTM SPAA 2015 Seer: details • Threads collect lightweight events independently --- low overhead • Locking scheme (re-)calculated periodically • One lock per transaction (atomic block in the application) • T1 lock (L1) taken by T2 if they are deemed to conflict • T1 waits for L1 to be free before executing • Calculate conditional probabilities of commit/abort • Relevance threshold based on mean/stdev
13 Seer: Scheduling for Commodity HTM SPAA 2015 Seer: details For each pair of transactions (x,y) acquire lock of each other if: Are abort events of x common enough with y running concurrently? Is y one of the main causes for x to abort? Hill climbing based adaptive loop for optimal Threshold search.
14 Seer: Scheduling for Commodity HTM SPAA 2015 Seer: optimizations Only one thread (re-)calculates the locking scheme: • Whenever it is waiting for the SGL (some thread is on the fallback path) • If the SGL is rarely taken, then scheduling will not improve • Capacity Aborts: another limitation from best-effort nature • Per-core lock • Taken when capacity aborts occur • Tailored for hyper-thread usage • Lock acquisition • Hardware transaction used as multi-CAS for 2+ locks
15 Seer: Scheduling for Commodity HTM SPAA 2015 Evaluation Intel Haswell 4 cores (8 hyper-threads) • HLE : Intel Hardware Lock Elision, i.e., no scheduling • RTM : Intel Commodity HTM with a SGL • SCM : Software-assisted Contention Management • [PODC14] --- schedule with a (single) auxiliary lock • aux lock is not read speculatively (in hw tx) • Seer : our Probabilistic Scheduler on top of Intel RTM
16 Seer: Scheduling for Commodity HTM SPAA 2015 How much can we gain with Seer? Genome Intruder Speedup Threads Threads 50% Speedup Geometric Mean Speedup in STAMP
17 Seer: Scheduling for Commodity HTM SPAA 2015 What motivates these gains? • HLE : 77% with fall-back lock • RTM : 37% with SGL • SCM : 5% with SGL, 29% with (single) auxiliary lock • Seer : • 3% with at least one tx lock • 4% with core lock Fine-grained locks • 12% with tx + core locks • 1% with SGL Geometric Mean over STAMP w/ 8 threads
18 Seer: Scheduling for Commodity HTM SPAA 2015 Relevance of each mechanism? Transaction locks: Core locks: Detect conflicts inherent to benchmarks Only relevant for >4t (hyper-threading) HTM lock acquisition: Threshold tuning for probabilities Small improvement --- benchmark dependent Consistent/small improvement the more locks, the better Baseline: Seer with all mechanisms enabled (i.e., their overhead) but without any lock acquisitions.
19 Seer: Scheduling for Commodity HTM SPAA 2015 Summary First scheduler tailored for Commodity HTMs: • Copes with imprecise information • Schedules transactions in a fine-grained manner • 50% performance improvement with 8 threads • 0-8% overhead from monitoring/calculation • Taken by measuring Seer, but without acquiring locks
20 Seer: Scheduling for Commodity HTM SPAA 2015 Thank you Questions? • Nuno Diegues , Paolo Romano and Stoyan Garbatov
21 Seer: Scheduling for Commodity HTM SPAA 2015 Backup slides
22 Seer: Scheduling for Commodity HTM SPAA 2015 HTM with a fall-back path start: int status = htm_begin code: application logic htm_end // fast-path
23 Seer: Scheduling for Commodity HTM SPAA 2015 HTM with a fall-back path start: int status = htm_begin if (status == ok) // != ok when aborted if (fallback-in-use()) htm_abort // fall-back in use else goto code // fast-path ?? code: application logic if (inFastPath) htm_end // fast-path else ??
24 Seer: Scheduling for Commodity HTM SPAA 2015 HTM with a fall-back path start: int status = htm_begin if (status == ok) // != ok when aborted if (fallback-in-use()) htm_abort // fall-back in use else goto code // fast-path if (shouldRetry()) // retry policy goto start else use-fallback() // use fall-back code: application logic if (inFastPath) htm_end // fast-path else quit-fallback() // fall-back
25 Seer: Scheduling for Commodity HTM SPAA 2015 HTM with a fall-back: a single lock start: int status = htm_begin Still simple enough. if (status == ok) // != ok when aborted if (isTaken(lock)) htm_abort // fall-back in use else goto code // fast-path if (shouldRetry()) // retry policy: e.g., limit retries to 10 goto start else acquire(lock) // use fall-back code: application logic if (inFastPath) // fast-path htm_end else // fall-back release(lock)
Recommend
More recommend