Performance Evaluation of Adaptivity in STM Mathias Payer and Thomas R. Gross Department of Computer Science, ETH Zürich
Motivation ● STM systems rely on many assumptions ● Often contradicting for different programs ● Statically tuned to a baseline ● Use self-optimizing systems ● Adapt to different workloads ● What parameters can be adapted? ● How to measure effectiveness? ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 2
Outline ● Introduction ● STM System ● STM Baseline ● Adaptive Parameters ● Evaluation ● Related work ● Conclusion ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 3
Introduction ● Software Transactional Memory (STM) applies transactions to memory ● (Optimistic) concurrency control mechanism ● Alternative to lock-based synchronization ● Multiple concurrent threads run transactions ● Concurrent memory modifications ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 4
Introduction ● Concurrent transactions modify memory without synchronization ● Transaction is verified after completion ● Conflicts are detected and resolved ● Changes committed for conflict-free transactions ● Modifications only visible after commit ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 5
Introduction TX starts balance in read-set withdraw { deposit { balance in tmp = balance; tmp = balance; write-set tmp = tmp – 100 tmp = tmp + 100 Conflict detection, balance = tmp; balance = tmp; data committed } } ● What happens when balance is accessed concurrently? ● Either locking or STM needed to ensure correct end balance ● STM system decides which tx is executed first ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 6
STM Baseline ● Many efficient STM implementations agree on important design decisions: ● Word-based locking ● Global locking / version table ● Eager locking ● (Almost) no contention management ● Simple write-set and read-set implementations ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 7
STM Baseline Combined global write lock / version array Read Read Lock Lock Write Write list / list / list list list / list / buffer buffer buffer buffer Write Write Read Read Hash Hash Hash Hash Transaction Transaction ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 8
Adaptive STM Parameters ● Global adaptivity ● Synchronization needed ● Optimizes to global optimum ● Averages over all concurrent transactions ● (Thread-) local adaptivity ● No synchronization needed ● Limits adaptable parameters ● Best parameters for each thread/transaction ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 9
Adaptive STM Parameters ● Different adaptive parameters measured: ● Size of global locking/version-table *G ● Size of local hash-tables *L ● Write strategy *L ● Locality tuning for hash-functions *L ● Contention management *L *L – local, *G – global ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 10
Adaptive Hash-Table ● Global hash-table: trade-off between over- locking and locality ● Global strategy: coordinate lock collisions and over- locking between threads ● Adapt size based on global information ● Local hash-table: trade-off between reset cost, and # hash-collisions ● Local strategy: sample moving average of unique write locations ● Adapt size based on trend ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 11
Adaptive Write Strategy ● Different costs depending on strategy ● Write-back: cheap abort, expensive commit ● Write-through: expensive abort, cheap commit ● Adapt strategy to per-thread workload ● Measure abort rate ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 12
Adaptive Locality Tuning ● Different applications have different data access patterns ● No optimal hash function for all data accesses ● Measure number of hash collisions for thread- local hash tables ● Circle through different hash functions ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 13
Adaptive Contention Management ● No single strategy works in all environments ● Measure contention and implement an adaptive back-off strategy ● Wait and retry ● Abort later ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 14
Local Adaptive STM Parameters (for local hash-table) # writes vs. hash-table space enlarge write-hash no change shrink write-hash 0 ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 15
Local Adaptive STM Parameters (for local hash-table) no change change hash-function 0 # hash collisions ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 16
Local Adaptive STM Parameters (for local hash-table) # writes vs. hash-table space enlarge write-hash enlarge write-hash & change hash-function no change change hash-function shrink write-hash & shrink write-hash change hash-function 0 # hash collisions ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 17
AdaptSTM ● Adaptive STM system built on presented features ● Statically tuned competitive baseline – Static global hash function and hash table ● Mature and stable implementation ● Different local adaptive parameters – Write-set hash function and size of hash table – Write-through and write-back write strategy – Adaptive contention management ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 18
Evaluation ● Benchmark: STAMP 0.9.10 ● ++ configuration (increased workload for kmeans) ● AdaptSTM version 0.5.1 ● Intel 4-core Xeon E5520 CPU ● 8 cores @ 2.27GHz, 12GB RAM ● 64bit Ubuntu 9.04 ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 19
Evaluation: Global Hash-Table kmeans Genome 4 Threads 4 Threads 80 4.5 4 70 2^16 2^16 3.5 60 2^18 2^18 3 2^20 2^20 50 2^22 2^22 2.5 Time [s] 2^24 Time [s] 2^24 40 2^26 2^26 2 30 1.5 20 1 10 0.5 0 0 0 2 4 6 8 10 0 2 4 6 8 10 # Shifts # Shifts ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 20
Evaluation: Global Adaptivity ● Global optimizations have limited potential ● Small optimization potential ● High synchronization cost ● Reasonable baseline outperforms global optimization ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 21
Evaluation: Local Adaptivity ● Different configurations: ● naWB: no adaptivity, use write-back ● aWBT: adaptivity, adjust write-through / write-back ● aWWH: aWBT plus an adaptive hash-table for the write-set ● aWHH: aWWH plus different hash functions ● aALL: all adaptive parameters plus Bloom filter for write-entries ● Adaptation system starts with best 'average' parameters, improves from there ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 22
Evaluation: Local Adaptivity kmeans Labyrinth 15.00% 3.00% 2.00% 10.00% Speedup to non adaptive Speedup to non adaptive 1.00% 5.00% aWBT 0.00% aWBT aWWH 0.00% aWWH aWHH aWHH -1.00% aALL aALL -5.00% -2.00% -10.00% -3.00% -15.00% -4.00% 1 2 4 8 16 1 2 4 8 16 Threads Threads aWBT: adaptive, write-back/-through ● aWWH: adaptive, write-back/-through, write-hash ● aWHH: adaptive, write-back/-through, write-hash, hash-function ● aALL: adaptive, write-back/-through, write-hash, hash-function, Bloom filter ● ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 23
Evaluation: Local Adaptivity Genome Vacation 6.00% 5.00% 5.00% 4.00% 4.00% Speedup to non adaptive Speedup to non adaptive 3.00% 3.00% aWBT 2.00% 2.00% aWBT aWWH aWWH 1.00% aWHH aWHH 1.00% aALL aALL 0.00% 0.00% -1.00% -1.00% -2.00% -3.00% -2.00% 1 2 4 8 16 1 2 4 8 16 Threads Threads aWBT: adaptive, write-back/-through ● aWWH: adaptive, write-back/-through, write-hash ● aWHH: adaptive, write-back/-through, write-hash, hash-function ● aALL: adaptive, write-back/-through, write-hash, hash-function, Bloom filter ● ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 24
Evaluation: Local Adaptivity ● No single optimization works for all benchmarks ● Combination of all options leads to best performance ● Impressive speed-ups for individual benchmarks compared to the globally optimized case ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 25
Related Work ● TL2 (Dice et al.): baseline STM system ● Different related work on static tuning of global parameters (Harris, Dice, Ennals, Felber) ● Crucial for efficient baseline ● TinySTM (Felber et al.): adapts size and hash function of global locking table ● ASTM (Marathe et. al.): adapts lazy-eager locking strategies and different meta-formats ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 26
Conclusions ● Adaptivity in STM is important for good performance ● Speedups up to 10% possible ● Global optimization are limited ● Low potential, high synchronization cost ● Local optimizations tune thread-local parameters ● High correlation with workload ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 27
Questions ? ● Contact: mathias.payer@nebelwelt.net ● Source: http://nebelwelt.net/projects/adaptSTM/ ISPASS'11 / 2011-04-12 Mathias Payer / ETH Zürich 28
Recommend
More recommend