on the design space of parallel nesting
play

On the design space of Parallel Nesting Nuno Diegues Jo ao Cachopo - PowerPoint PPT Presentation

On the design space of Parallel Nesting Nuno Diegues Jo ao Cachopo nmld, joao.cachopo@ist.utl.pt INESC-ID/Technical University of Lisbon July 19, 2012 1/18 Introduction Selling point of TM Composability 2/18 Introduction Selling point


  1. On the design space of Parallel Nesting Nuno Diegues Jo˜ ao Cachopo nmld, joao.cachopo@ist.utl.pt INESC-ID/Technical University of Lisbon July 19, 2012 1/18

  2. Introduction Selling point of TM Composability 2/18

  3. Introduction Selling point of TM Composability Parallel Nesting 2/18

  4. Time complexity analysis may be deceiving in TMs 3/18

  5. Outline Compare three parallel nesting approaches 1 JVSTM 2 NesTM 1 3 PNSTM 2 1W. Baek, N. Bronson, C. Kozyrakis, and K. Olukotun. Implementing and evaluating nested parallel transactions in software transactional memory. In SPAA ’10. 2J. Barreto, A. Dragojevi´ c, P. Ferreira, R. Guerraoui, and M. Kapalka. Leveraging parallel nesting in transactional memory. In PPoPP ’10. 4/18

  6. Outline Compare three parallel nesting approaches 1 JVSTM ← 2 NesTM 1 3 PNSTM 2 1W. Baek, N. Bronson, C. Kozyrakis, and K. Olukotun. Implementing and evaluating nested parallel transactions in software transactional memory. In SPAA ’10. 2J. Barreto, A. Dragojevi´ c, P. Ferreira, R. Guerraoui, and M. Kapalka. Leveraging parallel nesting in transactional memory. In PPoPP ’10. 4/18

  7. Worst-case complexities - JVSTM JVSTM read O ( maxDepth ) write O (1) commit O ( r + children ) 5/18

  8. Worst-case complexities - JVSTM A JVSTM read O(maxDepth) B C maxDepth write O (1) commit O ( r + children ) D E 5/18

  9. Worst-case complexities - JVSTM A JVSTM read O ( maxDepth ) B C write O (1) commit O(r+children) D E children committed committed 5/18

  10. Worst-case complexities - NesTM JVSTM NesTM read O ( maxDepth ) O (1) write O (1) O ( txDepth ) commit O ( r + children ) O ( r + w ) 6/18

  11. Worst-case complexities - NesTM A JVSTM NesTM txDepth read O ( maxDepth ) O (1) B C write O (1) O(txDepth) commit O ( r + children ) O ( r + w ) D E 6/18

  12. Worst-case complexities - PNSTM JVSTM NesTM PNSTM read O ( maxDepth ) O (1) O (1) write O (1) O ( txDepth ) O (1) commit O ( r + children ) O ( r + w ) O (1) 7/18

  13. Worst-case complexities JVSTM NesTM PNSTM read O ( maxDepth ) O (1) O(1) Best one? write O (1) O ( txDepth ) O(1) commit O ( r + children ) O ( r + w ) O(1) 8/18

  14. Practical comparison STMBench7 - running given number of transactions 9/18

  15. Practical comparison STMBench7 - running given number of transactions Implementation of STMs 9/18

  16. Practical comparison STMBench7 - running given number of transactions Implementation of STMs Same API 9/18

  17. Practical comparison STMBench7 - running given number of transactions Implementation of STMs Same API 48 core machine 9/18

  18. STMBench7 18 jvstm nestm 16 pnstm 14 throughput (txs/sec) 12 10 8 6 4 2 0 1(1) 1(2) 1(3) 2(3) 4(3) 8(3) 16(3) # threads tops(nested) 10/18

  19. STMBench7 18 jvstm nestm 16 pnstm 14 throughput (txs/sec) 12 10 8 6 4 2 0 1(1) 1(2) 1(3) 2(3) 4(3) 8(3) 16(3) # threads tops(nested) 5 and 15 times with 48 threads/parallel nested 10/18

  20. STMBench7 - Large depth count 10 jvstm nestm pnstm 8 throughput (txs/sec) 6 4 2 0 1 8 32 128 # depth 11/18

  21. Discussion What is causing this? 12/18

  22. Complexities of the fast-paths JVSTM NesTM PNSTM read O(1) O (1) O (1) write O (1) O(1) O (1) 13/18

  23. Fast-paths occurrence Fast-path Slow-path JVSTM 0.99 0.01 NesTM 0.39 0.61 PNSTM 0.39 0.61 14/18

  24. Fast-paths occurrence Fast-path Slow-path Time ( µ s) JVSTM 0.99 0.01 1046 NesTM 0.39 0.61 5200 PNSTM 0.39 0.61 7357 14/18

  25. Conflicts detected Conflicts JVSTM 845 NesTM 1627 PNSTM 84496 15/18

  26. Conflict detection JVSTM NesTM PNSTM r-r - - yes r-w yes yes yes w-w yes (if nested) yes yes 16/18

  27. Conflict detection JVSTM NesTM PNSTM r-r - - yes r-w yes yes yes w-w yes (if nested) yes yes Cheaper complexity bounds, more conflicts detected? 16/18

  28. Summary Parallel nesting design is coupled with baseline TM Complexity analysis may be deceiving Average case and conflict detection 17/18

  29. Thank you Questions? 18/18

  30. PNSTM A bn: ? Pool of free bitnums: 0 1 2 3 19/18

  31. PNSTM A bn: 0 Pool of free bitnums: 0 1 2 3 19/18

  32. PNSTM Access Stack of variable X A bn: 0 Pool of free bitnums: 0 1 2 3 TA Ok 1 0 0 0 index 0 1 2 3 T A reads X 19/18

  33. PNSTM A Access Stack of variable X bn: 0 Pool of free bitnums: 0 1 C B 2 bn: 1 bn: 2 3 TA Ok 1 0 0 0 index 0 1 2 3 T A spawns two children 19/18

  34. PNSTM Access Stack of variable X A bn: 0 Pool of free bitnums: 0 1 C B 2 TB ? 1 1 0 0 bn: 1 bn: 2 3 TA Ok 1 0 0 0 index 0 1 2 3 T B reads X 19/18

  35. PNSTM Access Stack of variable X A bn: 0 Pool of free bitnums: 0 1 C B 2 TB Ok 1 1 0 0 bn: 1 bn: 2 3 TA Ok 1 0 0 0 index 0 1 2 3 T B reads X 19/18

  36. PNSTM A Access Stack of variable X bn: 0 Pool of free bitnums: 0 1 C B 2 TB Ok 1 1 0 0 bn: 1 bn: 2 3 TA Ok 1 0 0 0 D bn: 3 index 0 1 2 3 T B spawns a child 19/18

  37. PNSTM Access Stack of variable X A bn: 0 Pool of free bitnums: TD ? 1 0 1 1 0 1 C B 2 TB Ok 1 1 0 0 bn: 1 bn: 2 3 TA Ok 1 0 0 0 D bn: 3 index 0 1 2 3 T D reads X 19/18

  38. PNSTM Access Stack of variable X A bn: 0 Pool of free bitnums: TD Conflict 1 0 1 1 0 1 C B 2 TB Ok 1 1 0 0 bn: 1 bn: 2 3 TA Ok 1 0 0 0 D bn: 3 index 0 1 2 3 T D reads X 19/18

  39. NesTM global clock: 0 tid: 1 ts: 0 timestamp tid 0 0 variable X: T 1 starts 20/18

  40. NesTM global clock: 0 tid: 1 ts: 0 timestamp tid Ok 0 1 variable X: T 1 writes to X 20/18

  41. NesTM global clock: 0 tid: 1 ts: 0 timestamp tid tid: 2 tid: 3 variable X: 0 1 ts: 0 ts: 0 T 1 spawns two children 20/18

  42. NesTM - read operation global clock: 0 tid: 1 ts: 0 RS: X timestamp tid tid: 2 tid: 3 0 1 variable X: ts: 0 ts: 0 T 3 reads X 21/18

  43. NesTM - write operation global clock: 0 tid: 1 ts: 0 RS: X timestamp tid tid: 2 tid: 3 variable X: 0 1 ts: 0 ts: 0 tid: 4 ts: 0 T 3 spawns a child 22/18

  44. NesTM - write operation global clock: 0 tid: 1 ts: 0 RS: X timestamp tid tid: 2 tid: 3 0 1 variable X: ts: 0 ts: 0 tid: 4 did not read X ts: 0 T 4 writes to X 22/18

  45. NesTM - write operation global clock: 0 tid: 1 ts: 0 RS: X timestamp tid tid: 2 tid: 3 0 1 variable X: ts: 0 ts: 0 tid: 4 X's timestamp ≤ T3's timestamp ts: 0 T 4 writes to X 22/18

  46. NesTM - write operation global clock: 0 tid: 1 ts: 0 RS: X timestamp tid tid: 2 tid: 3 0 1 variable X: ts: 0 ts: 0 tid: 4 previous owner: stop ts: 0 T 4 writes to X 22/18

  47. NesTM - write operation global clock: 0 tid: 1 ts: 0 RS: X timestamp tid tid: 2 tid: 3 0 4 Ok variable X: ts: 0 ts: 0 tid: 4 ts: 0 T 4 writes to X 22/18

  48. NesTM - commit operation global clock: 0 tid: 1 ts: 0 RS: X timestamp tid tid: 2 tid: 3 variable X: 0 4 ts: 0 ts: 0 tid: 4 ts: 0 T 4 prepares commits 23/18

  49. NesTM - commit operation global clock: 1 tid: 1 ts: 0 RS: X timestamp tid tid: 2 tid: 3 1 3 variable X: ts: 0 ts: 0 tid: 4 ts: 0 committed T 4 commits 23/18

  50. JVSTM - write operation A Orec 1 variable X: ORec 1 T A writes to X 24/18

  51. JVSTM - write operation A Orec 1 variable X: ORec 1 B C Orec 2 Orec 3 T A spawns two children 24/18

  52. JVSTM - write operation A Orec 1 variable X: ORec3 B C ORec 1 Orec 2 Orec 3 T C writes to X 24/18

  53. JVSTM - write operation A Orec 1 variable X: ORec4 B C ORec3 Orec 2 Orec 3 ORec 1 D Orec 4 T D is spawned and writes to X 24/18

  54. JVSTM - read operation A Orec 1 variable X: ORec4 B C ORec3 Orec 2 Orec 3 ORec 1 D Orec 4 T D reads X 25/18

  55. JVSTM - read operation A Orec 1 variable X: ORec4 B C ORec3 Orec 2 Orec 3 ORec 1 D Orec 4 T B reads X 25/18

  56. JVSTM - read operation A Orec 1 variable X: ORec4 B C ORec3 Orec 2 Orec 3 ORec 1 D Orec 4 T B reads X 25/18

  57. JVSTM - read operation A Orec 1 variable X: ORec4 B C ORec3 Orec 2 Orec 3 ORec 1 D Orec 4 T B reads X 25/18

  58. JVSTM - read operation A Orec 1 variable X: ORec4 B C ORec3 Orec 2 Orec 3 ORec 1 D Orec 4 T B reads X 25/18

  59. JVSTM - commit operation A Orec 1 variable X: ORec4 B C ORec3 Orec 2 Orec 3 Orec 4 ORec 1 D ts: 0 Orec 4 committed T D commits 26/18

  60. JVSTM - commit operation A Orec 3 Orec 1 Orec 4 variable X: ORec4 B C ORec3 Orec 2 Orec 3 ts: 0 Orec 4 Orec 4 committed ORec 1 D Orec 4 ts: 0 committed T C commits 26/18

  61. Evaluation - Top-level txs only 18 jvstm nestm 16 pnstm 14 throughput (txs/sec) 12 10 8 6 4 2 0 1 2 3 6 12 24 48 # threads 27/18

Recommend


More recommend