on the performance of window based
play

On the Performance of Window-Based Contention Managers for - PowerPoint PPT Presentation

On the Performance of Window-Based Contention Managers for Transactional Memory Gokarna Sharma and Costas Busch Louisiana State University Agenda Introduction and Motivation Previous Studies and Limitations Execution Window Model


  1. On the Performance of Window-Based Contention Managers for Transactional Memory Gokarna Sharma and Costas Busch Louisiana State University

  2. Agenda Introduction and Motivation • Previous Studies and Limitations • Execution Window Model • ➢ Theoretical Results ➢ Experimental Results Conclusions and Future Directions •

  3. Retrospective 1993 • A seminal paper by Maurice Herlihy and J. Eliot B. Moss: ➢ “ Transactional Memory: Architectural Support for Lock-Free Data Structures ” Today • Several STM/HTM implementation efforts by Intel, Sun, IBM; ➢ growing attention Why TM? • Many drawbacks of traditional approaches using Locks, Monitors: ➢ error-prone, difficult, composability , … Lock: only one thread can execute TM: many threads can execute lock data atomic { modify/use data modify/use data unlock data }

  4. Transactional Memory Transactions perform a sequence of read and write operations on • shared resources and appear to execute atomically TM may allow transactions to run concurrently but the results must • be equivalent to some sequential execution Example: Initially, x == 1, y == 2 atomic { T2 T1 T1 T2 atomic { r1 == 1 x = 2; r1 = x; x = 2; y = x+1; r2 = y; y = 3; } } r2 = 3; T1 then T2 r1==2, r2==3 Incorrect r1 == 1, r2 == 3 T2 then T1 r1==1, r2==2 ACI(D) properties to ensure correctness •

  5. Software TM Systems Conflicts: A contention manager decides ➢ Aborts or delay a transaction ➢ Centralized or Distributed: Each thread may have its own CM ➢ Example: Initially, x == 1, y == 1 T2 T2 T1 atomic { atomic { atomic { atomic { T1 … … y = 2; y = 2; conflict conflict x = 2; x = 2; … … } } x = 3; x = 3; } } Abort (set y==1) and restart Abort undo changes (set x==1) OR wait and retry and restart

  6. Transaction Scheduling The most common model: m concurrent transactions on m cores that share s objects ➢ Sequence of operations and a operation takes one time unit ➢ Duration is fixed ➢ Throughput Guarantees: Makespan: the time needed to commit all m transactions ➢ Makespan of my CM ➢ Competitive Ratio : Makespan of optimal CM 1 3 4 Problem Complexity: 2 5 NP-Hard (related to vertex coloring) ➢ 7 6 8 Challenge: How to schedule transactions so that makespan is minimized? ➢

  7. Literature Lots of proposals • Polka, Priority, Karma, SizeMatters , … ➢ • Drawbacks ➢ Some need globally shared data (i.e., global clock) ➢ Workload dependent ➢ Many have no theoretical provable properties ✓ i.e., Polka – but overall good empirical performance • Mostly empirical evaluation using different benchmarks ➢ Choice of a contention manager significantly affects the performance ➢ Do not perform well in the worst-case (i.e., contention, system size, and number of threads increase)

  8. Literature on Theoretical Bounds Guerraoui et al. [PODC’05]: First contention manager GREEDY with O( s 2 ) competitive bound Attiya et al. [PODC’06]: Bound of GREEDY improved to O( s ) Schneider and Wattenhofer [ISAAC’09]: RandomizedRounds with O( C . log m ) ( C is the maximum degree of a transaction in the conflict graph) Attiya et al. [OPODIS’09]: Bimodal scheduler with O( s ) bound for read-dominated workloads Sharma and Busch [OPODIS’10]: Two algorithms with O( √𝑡 ) and O( 𝑡. log 𝑜 ) bounds for balanced workloads

  9. Objectives Scalable transactional memory scheduling: ➢ Design contention managers that exhibit both good theoretical and empirical performance guarantees ➢ Design contention managers that scale well with the system size and complexity

  10. Execution Window Model • Collection of n sets of m concurrent transactions that share s objects Transactions . . . n 2 3 1 1 2 3 m Threads . . . m n Assuming maximum degree Serialization upper bound: τ . min(C n , mn ) in conflict graph C and One-shot bound: O( sn ) [Attiya et al., PODC’06] execution time duration τ Using RandomizedRounds : O( τ . C n log m )

  11. Theoretical Results Offline Algorithm: (maximal independent sets) • For scheduling with conflicts environments, i.e., traffic ➢ intersection control, dining philosophers problem Makespan: O( τ . (C + n log ( mn )), (C is the conflict measure) ➢ Competitive ratio: O( s + log ( mn )) whp ➢ Online Algorithm: (random priorities) • For online scheduling environments ➢ Makespan: O( τ . (C log ( mn ) + n log 2 ( mn ))) ➢ Competitive ratio: O( s log ( mn ) + log 2 ( mn ))) whp ➢ Adaptive Algorithm • Conflict graph and maximum degree C both not known ➢ Adaptively guesses C starting from 1 ➢

  12. Intuition (1) Introduce random delays at the beginning of the • execution window n’ Transactions 1 2 3 n . . . n 1 2 3 1 2 3 m m m n Random interval n Random delays help conflicting transactions shift • avoiding many conflicts

  13. Intuition (2) Frame based execution to handle conflicts • Frame size q 1 F 1n F 11 F 12 q 2 F 21 q 3 F 31 F 3n Threads q 4 F 41 m F m1 Makespan: max {qi} + No of frames X frame size

  14. Experimental Results (1) • Platform used ➢ Intel i7 (4-core processor) with 8GB RAM and hyperthreading on • Implemented window algorithms in DSTM2, an eager conflict management STM implementation • Benchmarks used List, RBTree, SkipList, and Vacation from STAMP suite. ➢ • Experiments were run for 10 seconds and the data plotted are average of 6 experiments • Contention managers used for comparison ➢ Polka – Published best CM but no theoretical provable properties ➢ Greedy – First CM with both theoretical and empirical properties ➢ Priority – Simple priority-based CM

  15. Experimental Results (2) Performance throughput: ➢ No of txns committed per second ➢ Measures the useful work done by a CM each time step List Benchmark SkipList Benchmark 18000 20000 16000 18000 16000 14000 Committed transactions/sec Committed transactions/sec 14000 12000 12000 10000 10000 8000 8000 6000 6000 4000 4000 2000 2000 0 0 0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35 No of threads No of threads Polka Greedy Priority Online Adaptive Polka Greedy Priority Online Adaptive

  16. Experimental Results (3) Performance throughput: Vacation Benchmark RBTree Benchmark 18000 14000 16000 12000 14000 Committed transactions/sec Committed transacions/sec 10000 12000 10000 8000 8000 6000 6000 4000 4000 2000 2000 0 0 0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35 No of threads No of threads Polka Greedy Priority Online Adaptive Polka Greedy Priority Online Adaptive Conclusion #1: Window CMs always improve throughput over Greedy and Priority Conclusion #2: Throughput is comparable to Polka (outperforms in Vacation)

  17. Experimental Results (4) Aborts per commit ratio: ➢ No of txns aborted per txn commit ➢ Measures efficiency of a CM in utilizing computing resources SkipList Benchmark List Benchmark 0.16 20 18 0.14 16 0.12 14 No of aborts/commit No of aborts/commit 0.1 12 10 0.08 8 0.06 6 0.04 4 0.02 2 0 0 0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35 No of threads No of threads Polka Greedy Priority Online Adaptive Polka Greedy Priority Online Adaptive

  18. Experimental Results (5) Aborts per commit ratio: RBTree Benchmark Vacation Benchmark 20 9 18 8 16 7 14 No of aborts/commit No of aborts/commit 6 12 5 10 4 8 3 6 2 4 1 2 0 0 0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35 No of threads No of threads Polka Greedy Priority Online Adaptive Polka Greedy Priority Online Adaptive Conclusion #3: Window CMs always reduce no of aborts over Greedy and Priority Conclusion #4: No of aborts are comparable to Polka (outperform in Vacation)

  19. Experimental Results (6) Execution time overhead: ➢ Total time needed to commit all transactions ➢ Measures scalability of a CM in different contention scenarios SkipList Benchmark List Benchmark 2.5 25 Total execution time (in seconds) Total execution time (in seconds) 2 20 1.5 15 1 10 5 0.5 0 0 Low Medium High Low Medium High Amount of contention Amount of contention Polka Greedy Priority Online Adaptive Polka Greedy Priority Online Adaptive

  20. Experimental Results (7) Execution time overhead: Vacation Benchmark RBTree Benchmark 8 20 18 7 Total execution time (in seconds) Total execution time (in seconds) 16 6 14 5 12 4 10 8 3 6 2 4 1 2 0 0 Low Medium High Low Medium High Amount of contention Amount of contention Polka Greedy Priority Online Adaptive Polka Greedy Priority Online Adaptive Conclusion #5: Window CMs generally reduce execution time over Greedy and Priority ( except SkipList) Conclusion #6: Window CMs good at high contention due to randomization overhead

  21. Future Directions Encouraging theoretical and practical results • Plan to explore (experimental) • Wasted Work ➢ Repeat Conflicts ➢ Average Response Time ➢ Average committed transactions durations ➢ • Plan to do experiments using more complex benchmarks ➢ E.g., STAMP, STMBench7, and other STM implementations Plan to explore (theoretical) • Other contention managers with both theoretical and empirical ➢ guarantees

Recommend


More recommend