efficient parallel functional programming with
play

Efficient Parallel Functional Programming with Hierarchical Memory - PowerPoint PPT Presentation

Efficient Parallel Functional Programming with Hierarchical Memory Management Sam Westrick Carnegie Mellon University Joint work with: Ram Raghunathan, Adrien Guatto, Stefan Muller, Rohan Yadav, Umut Acar, Guy Blelloch, Matthew Fluet Setting


  1. Efficient Parallel Functional Programming with Hierarchical Memory Management Sam Westrick Carnegie Mellon University Joint work with: Ram Raghunathan, Adrien Guatto, Stefan Muller, Rohan Yadav, Umut Acar, Guy Blelloch, Matthew Fluet

  2. Setting the Stage • functional programming is good for expressing parallelism 
 (no side-e ff ects, no concurrency, no race conditions) • the point of parallelism is to make things faster … • absolute e ffi ciency is paramount 
 (speedup w.r.t. fastest sequential solution) • is parallel functional programming e ffi cient ? • existing implementations achieve good scalability 
 but not absolute e ffi ciency • standard challenges: 
 high rate of allocation, heavy reliance upon garbage collection

  3. The Problem we need more e ffi cient memory management for parallel programs (not just functional)

  4. Example: Mergesort fun msort A = if length A < 2 then A else let val (L, R) = splitMid A val (L’, R’) = par ( fn () => msort L, fn () => msort R) val B = merge L’ R’ in B end

  5. Example: Mergesort fun msort A = if length A < 2 then A else let val (L, R) = splitMid A val (L’, R’) = par ( fn () => msort L, fn () => msort R) val B = merge L’ R’ in B end msort [2,4,3,1]

  6. Example: Mergesort fun msort A = if length A < 2 then A else let val (L, R) = splitMid A val (L’, R’) = par ( fn () => msort L, fn () => msort R) val B = merge L’ R’ in B end par ( fn () => msort [2,4], fn () => msort [3,1])

  7. Example: Mergesort fun msort A = if length A < 2 then A else let val (L, R) = splitMid A val (L’, R’) = par ( fn () => msort L, fn () => msort R) val B = merge L’ R’ in B end msort [2,4] msort [3,1]

  8. Example: Mergesort fun msort A = if length A < 2 then A else let val (L, R) = splitMid A val (L’, R’) = par ( fn () => msort L, fn () => msort R) val B = merge L’ R’ in B end msort [2] msort [4] msort [3] msort [1]

  9. Example: Mergesort fun msort A = if length A < 2 then A else let val (L, R) = splitMid A val (L’, R’) = par ( fn () => msort L, fn () => msort R) val B = merge L’ R’ in B end [2] [4] [3] [1]

  10. Example: Mergesort fun msort A = if length A < 2 then A else let val (L, R) = splitMid A val (L’, R’) = par ( fn () => msort L, fn () => msort R) val B = merge L’ R’ in B end merge [2] [4] merge [3] [1]

  11. Example: Mergesort fun msort A = if length A < 2 then A else let val (L, R) = splitMid A val (L’, R’) = par ( fn () => msort L, fn () => msort R) val B = merge L’ R’ in B end [2,4] [1,3]

  12. Example: Mergesort fun msort A = if length A < 2 then A else let val (L, R) = splitMid A val (L’, R’) = par ( fn () => msort L, fn () => msort R) val B = merge L’ R’ in B end merge [2,4] [1,3]

  13. Example: Mergesort fun msort A = if length A < 2 then A else let val (L, R) = splitMid A val (L’, R’) = par ( fn () => msort L, fn () => msort R) val B = merge L’ R’ in B end [1,2,3,4]

  14. Hierarchical Memory Management fun msort A = if length A < 2 then A else let val (L, R) = splitMid A val (L’, R’) = par ( fn () => msort L, fn () => msort R) val B = merge L’ R’ in B end

  15. Hierarchical Memory Management fun msort A = if length A < 2 then A else let val (L, R) = splitMid A val (L’, R’) = par ( fn () => msort L, fn () => msort R) val B = merge L’ R’ in B end X join Y Z

  16. Hierarchical Memory Management fun msort A = if length A < 2 then A else let val (L, R) = splitMid A val (L’, R’) = par ( fn () => msort L, fn () => msort R) val B = merge L’ R’ in B end X Y Z join

  17. Hierarchical Memory Management fun msort A = if length A < 2 then A else let val (L, R) = splitMid A val (L’, R’) = par ( fn () => msort L, fn () => msort R) val B = merge L’ R’ in B end X fork (spawn)

  18. Hierarchical Memory Management fun msort A = if length A < 2 then A else let val (L, R) = splitMid A val (L’, R’) = par ( fn () => msort L, fn () => msort R) val B = merge L’ R’ in B end X fork (spawn) fresh empty heaps

  19. Hierarchical Memory Management fun msort A = if length A < 2 then A else let val (L, R) = splitMid A val (L’, R’) = par ( fn () => msort L, fn () => msort R) val B = merge L’ R’ in B A end L R

  20. Hierarchical Memory Management fun msort A = if length A < 2 then A else let val (L, R) = splitMid A val (L’, R’) = par ( fn () => msort L, fn () => msort R) val B = merge L’ R’ in B A end L R fork (spawn)

  21. Hierarchical Memory Management fun msort A = if length A < 2 then A else let val (L, R) = splitMid A val (L’, R’) = par ( fn () => msort L, fn () => msort R) val B = merge L’ R’ in B A end L R join L1 R1 L2 R2 B1 B2

  22. Hierarchical Memory Management fun msort A = if length A < 2 then A else let val (L, R) = splitMid A val (L’, R’) = par ( fn () => msort L, fn () => msort R) val B = merge L’ R’ in B A end L R L1 R1 L2 R2 B1 B2

  23. Hierarchical Memory Management fun msort A = if length A < 2 then A else let val (L, R) = splitMid A val (L’, R’) = par ( fn () => msort L, fn () => msort R) val B = merge L’ R’ in B A end L R L1 R1 L2 R2 B1 B2 B

  24. Hierarchical Memory Management • give each task its own heap • tasks allocate new data inside their own heaps • organize heaps to mirror the nesting structure of tasks • fork (spawn, async, etc): fresh heaps for children • join (sync, finish, etc): merge heaps into parent

  25. Disentanglement: in strict purely functional programs, all pointers either point up or are internal [Raghunathan et al, ICFP’16]

  26. Disentanglement: in strict purely functional programs, all pointers either point up or are internal [Raghunathan et al, ICFP’16]

  27. Disentanglement: in strict purely functional programs, all pointers either point up or are internal [Raghunathan et al, ICFP’16]

  28. Disentanglement: in strict purely functional programs, all pointers either point up or are internal [Raghunathan et al, ICFP’16]

  29. Local Garbage Collection pick a subtree reorganize, compact, etc. inside subtree

  30. Local Garbage Collection pick a subtree reorganize, compact, etc. inside subtree

  31. Local Garbage Collection Disentanglement is necessary:

  32. Local Garbage Collection Disentanglement is necessary: dangling pointer

  33. Local Garbage Collection • localized within a subtree of heaps • independent of • tasks whose heaps are outside the subtree • other local collections (on disjoint subtrees) • can easily apply any existing GC algorithm • just ignore pointers that exit the subtree

  34. In-place Updates • often crucial for e ffi ciency, especially under the hood • but, can break disentanglement (not always) let val r = ref [] fun f () = (r := 0 :: !r) r [] fun g () = (r := 1 :: !r) in par (f, g) end

  35. In-place Updates • often crucial for e ffi ciency, especially under the hood • but, can break disentanglement (not always) let val r = ref [] fun f () = (r := 0 :: !r) r [] fun g () = (r := 1 :: !r) in par (f, g) end 0

  36. In-place Updates • often crucial for e ffi ciency, especially under the hood • but, can break disentanglement (not always) let val r = ref [] fun f () = (r := 0 :: !r) r [] fun g () = (r := 1 :: !r) in par (f, g) end 0

  37. In-place Updates • often crucial for e ffi ciency, especially under the hood • but, can break disentanglement (not always) let val r = ref [] fun f () = (r := 0 :: !r) r [] fun g () = (r := 1 :: !r) in par (f, g) end 0 1

  38. In-place Updates • often crucial for e ffi ciency, especially under the hood • but, can break disentanglement (not always) let val r = ref [] fun f () = (r := 0 :: !r) r [] fun g () = (r := 1 :: !r) in par (f, g) end 0 1

  39. In-place Updates • often crucial for e ffi ciency, especially under the hood • but, can break disentanglement (not always) • options: • enforce disentanglement dynamically with promotion 
 [Guatto et al, PPoPP’18] • weaken to permit important classes of e ff ects 
 [Westrick et al, work in progress] 


  40. Implementation • extend MLton compiler with fork-join library 
 val par : (unit -> ‘a) * (unit -> ‘b) -> ‘a * ‘b • block-structured heaps • heaps are lists of blocks: 
 merge heaps in O(1) time • no read barrier. write barrier only on mutable pointer data • local collections: sequential Cheney-style copying/compacting • work-stealing scheduler • GC policy influenced by scheduler decisions

  41. Runtime Overhead Ours / MLton, 1 core 0.00 0.50 1.00 1.50 2.00 f t i a b b u l a t e m a m p a - i p n - p l a c e s c a n r e d u c e s f a i l t m e r p l e m s o e r r t g e s o r t d m m d e h d i u s p t o g b r a a m r n e s a - h l l u - n t e a r e s t

  42. Speedups MLton / Ours, 72 cores 90 72 54 36 18 0 t e t t t e e n r m p m b p r r s e u c c o t a a o u i e f t h a m a u a c s d m s l r l i r - d l s e u f e e s a g d p e g e e b l d o - p r n r n a n t m e i s r - t - m a l p i l a h a b a s m

Recommend


More recommend