parallel func onal arrays
play

Parallel Func+onal Arrays Ananya Kumar Guy Blelloch Robert Harper - PowerPoint PPT Presentation

Parallel Func+onal Arrays Ananya Kumar Guy Blelloch Robert Harper Carnegie Mellon University Goals Func+onal arrays Efficient (constant +me) Parallel Well defined cost seman+cs Previous Work - Monads Thread mutable state


  1. Parallel Func+onal Arrays Ananya Kumar Guy Blelloch Robert Harper Carnegie Mellon University

  2. Goals • Func+onal arrays • Efficient (constant +me) • Parallel • Well defined cost seman+cs

  3. Previous Work - Monads • Thread mutable state • Enforce single reference to array • Need completely different code • Not parallel

  4. Previous Work – Specialized Type System • Enforce single threadedness of arrays • Not available in most languages • Hard to reason about

  5. Previous Work – Reference Coun+ng • Check reference counts • If one, update in place, else copy • Depends on compiler • Hard to reason about

  6. Sequences A = NEW(5, 0) 0 0 0 0 0 B = SET(A, 0, 3) C = SET(A, 2, 3) 3 0 0 0 0 0 0 3 0 0 D = SET(C, 4, 14) 0 0 3 0 14 E = SET(D, 1, 11) 0 11 3 0 14

  7. Sequences A = NEW(5, 0) 0 0 0 0 0 B = SET(A, 0, 3) C = SET(A, 2, 3) 3 0 0 0 0 0 0 3 0 0 D = SET(C, 4, 14) 0 0 3 0 14 E = SET(D, 1, 11) 0 11 3 0 14

  8. Sequences A = NEW(5, 0) 0 0 0 0 0 B = SET(A, 0, 3) C = SET(A, 2, 3) 3 0 0 0 0 0 0 3 0 0 D = SET(C, 4, 14) 0 0 3 0 14 E = SET(D, 1, 11) 0 11 3 0 14

  9. Sequences A = NEW(5, 0) 0 0 0 0 0 B = SET(A, 0, 3) C = SET(A, 2, 3) 3 0 0 0 0 0 0 3 0 0 D = SET(C, 4, 14) 0 0 3 0 14 E = SET(D, 1, 11) 0 11 3 0 14

  10. Sequences A = NEW(5, 0) 0 0 0 0 0 B = SET(A, 0, 3) C = SET(A, 2, 3) 3 0 0 0 0 0 0 3 0 0 D = SET(C, 4, 14) 0 0 3 0 14 E = SET(D, 1, 11) 0 11 3 0 14

  11. Sequences A = NEW(5, 0) 0 0 0 0 0 B = SET(A, 0, 3) C = SET(A, 2, 3) 3 0 0 0 0 0 0 3 0 0 D = SET(C, 4, 14) 0 0 3 0 14 E = SET(D, 1, 11) 0 11 3 0 14

  12. Sequences A = NEW(5, 0) 0 0 0 0 0 B = SET(A, 0, 3) C = SET(A, 2, 3) 3 0 0 0 0 0 0 3 0 0 D = SET(C, 4, 14) 0 0 3 0 14 E = SET(D, 1, 11) 0 11 3 0 14

  13. Sequences A = NEW(5, 0) 0 0 0 0 0 B = SET(A, 0, 3) C = SET(A, 2, 3) 3 0 0 0 0 0 0 3 0 0 D = SET(C, 4, 14) 0 0 3 0 14 E = SET(D, 1, 11) 0 11 3 0 14

  14. Sequences A = NEW(5, 0) 0 0 0 0 0 B = SET(A, 0, 3) C = SET(A, 2, 3) 3 0 0 0 0 0 0 3 0 0 D = SET(C, 4, 14) 0 0 3 0 14 E = SET(D, 1, 11) 0 11 3 0 14

  15. Sequences A = NEW(5, 0) 0 0 0 0 0 B = SET(A, 0, 3) C = SET(A, 2, 3) 3 0 0 0 0 0 0 3 0 0 D = SET(C, 4, 14) 0 0 3 0 14 E = SET(D, 1, 11) 0 11 3 0 14

  16. Previous Work • N = size of array • Dietz – O(log log N) per opera+on • Trailer arrays – O(1) for leaves • Improvements by Chuang, O’ Neill • No support for concurrency

  17. Our Approach • Func+onal • Efficient – O(1) for leaves, fast for interior • Parallel – wait-free • Well defined cost seman+cs

  18. Sequence Implementa+on C 0 11 3 0 14 2 D 3 E 4

  19. Main Sec+ons • Cost dynamics • Concurrent implementa+on

  20. Fork-Join Parallelism (1+2) || (3+4)

  21. Fork-Join Parallelism (1+2) || (3+4) Fork

  22. Fork-Join Parallelism (1+2) || (3+4) 3+4 1+2

  23. Fork-Join Parallelism (1+2) || (3+4) 3+4 1+2 7 3

  24. Fork-Join Parallelism (1+2) || (3+4) 3+4 1+2 7 3 Join

  25. Fork-Join Parallelism (1+2) || (3+4) 3+4 1+2 7 3 (3, 7)

  26. Work and Span N log(N) 1 Work: size of cost tree Span: depth of cost tree 1 1 1

  27. Work and Span N log(N) 1 Work: N + log(N) + 4 Span: N + log(N) + 2 1 1 1

  28. Scheduling Theorems • Work + Span gives execu+on cost on P processor machine • Goal: evaluate cost of using sequences on a P processor machine • Sufficient to evaluate work and span

  29. Parallel Structural Dynamics • Cost of running program with ∞ processors • Determinis+c

  30. Interleaved Structural Dynamics • Cost of running program with 1 processor • Non-determinis+c

  31. Interleaved Structural Dynamics • Store which sequences are interior and leaf

  32. Work = Non-Determinis+c A (leaf), size N GET GET GET SET Join

  33. Work (Good Interleaving) A (leaf), size N GET GET Current Work: 1 Total Work: 1 GET SET Join

  34. Work (Good Interleaving) A (leaf), size N GET GET Current Work: 1 Total Work: 2 GET SET Join

  35. Work (Good Interleaving) A (leaf), size N GET GET Current Work: 1 Total Work: 3 GET SET Join

  36. Work (Good Interleaving) A (leaf), size N GET GET Current Work: 1 Total Work: 4 GET SET Join

  37. Work = Non-Determinis+c A (leaf), size N GET GET GET SET Join

  38. Work (Bad Interleaving) A (leaf), size N GET GET Current Work: 1 Total Work: 1 GET SET Join

  39. Work (Bad Interleaving) A (leaf), size N GET GET Current Work: 1 Total Work: 2 GET SET Join

  40. Work (Bad Interleaving) A (leaf), size N GET GET Current Work: log(N) Total Work: 2 + log(N) GET SET Join

  41. Work (Bad Interleaving) A (leaf), size N GET GET Current Work: log(N) Total Work: 2 + 2log(N) GET SET Join

  42. GET-GET Case A (leaf), size N GET GET GET GET Join

  43. SET-GET Case A (leaf), size N GET GET GET SET Join

  44. SET-SET Case A (leaf), size N SET GET GET SET Join

  45. Upper Bounding Work • Determinis+c evalua+onal dynamics • Store which sequences are leaf and interior • Store the number of “cheap” (cost = 1) GETs on each sequence • At the join, if sequence was modified on one side, make the GETs expensive (cost = log(N))

  46. Upper Bounding Work • Showed that upper bounds are valid for all inter-leavings • Showed that the upper bound is +ght *

  47. A = NEW(5, 0) Seq A ArrayData 1 (Version = 1) Version 1 0 0 0 0 0

  48. B = SET(A, 2, 5) Seq A ArrayData 1 (Version = 2) Version 1 0 0 5 0 0 Seq B Version 1 Version 2 Value 0

  49. Naïve SET • Implementa+on of SET(A, i, v) • First set values[i] = v • Then add a log entry to arraydata

  50. GET-SET Race Sequence A, version 1 Array data AD, version 1 Values = [0, 0, 0, 0, 0] Logs = empty Thread 1 Thread 2 Result Step 1 Values[2] = 5 Step 2 GET(A, 2) Step 3 Add log entry to Logs[i]

  51. GET-SET Race Sequence A, version 1 Array data AD, version 1 Values = [0, 0, 0, 0, 0] Logs = empty Thread 1 Thread 2 Result Step 1 Values[2] = 5 ✓ Step 2 GET(A, 2) Step 3 Add log entry to Logs[i]

  52. GET-SET Race Sequence A, version 1 Array data AD, version 1 Values = [0, 0, 0, 0, 0] Logs = empty Thread 1 Thread 2 Result Step 1 Values[2] = 5 ✓ Step 2 GET(A, 2) 5 Step 3 Add log entry to Logs[i]

  53. GET-SET Race Sequence A, version 1 Array data AD, version 1 Values = [0, 0, 0, 0, 0] Logs = empty Thread 1 Thread 2 Result Step 1 Values[2] = 5 ✓ Step 2 GET(A, 2) 5 Step 3 Add log entry to ✓ Logs[i]

  54. A Wait-Free Solu+on • Can be fixed by adding log entry before muta+ng values array • Other issues in GET require careful ordering • Other issues in SET require compare & swap

  55. Experimental Results • Compared sequences to regular arrays • Random & sequen+al accesses • Wri+ng: 2-3 +mes slower • Reading: under 10% slower

  56. Concurrent Results • Compared – 1 thread reading million +mes – 2 threads reading half million +mes • 2 threads were > 1.75 +mes faster

  57. Summary • Func+onal array implementa+on • O(1) opera+ons for leaf • Wait-free concurrent • Well defined cost seman+cs

  58. Future Work • Prove concurrent costs of sequence implementa+on • Tighter cost bounds • Extend to disjoint sets, unordered sets • Lower bound for func+onal array costs

  59. Acknowledgements • Joe Tassaror for lots of advice on correctness proof • Danny Sleator for ideas on lower bounds for func+onal array costs • NSF, Air Force Office, Intel for grants

Recommend


More recommend