Efficient Parallel Functional Programming with Hierarchical Memory Management Sam Westrick Carnegie Mellon University Joint work with: Ram Raghunathan, Adrien Guatto, Stefan Muller, Rohan Yadav, Umut Acar, Guy Blelloch, Matthew Fluet
Setting the Stage • functional programming is good for expressing parallelism (no side-e ff ects, no concurrency, no race conditions) • the point of parallelism is to make things faster … • absolute e ffi ciency is paramount (speedup w.r.t. fastest sequential solution) • is parallel functional programming e ffi cient ? • existing implementations achieve good scalability but not absolute e ffi ciency • standard challenges: high rate of allocation, heavy reliance upon garbage collection
The Problem we need more e ffi cient memory management for parallel programs (not just functional)
Example: Mergesort fun msort A = if length A < 2 then A else let val (L, R) = splitMid A val (L’, R’) = par ( fn () => msort L, fn () => msort R) val B = merge L’ R’ in B end
Example: Mergesort fun msort A = if length A < 2 then A else let val (L, R) = splitMid A val (L’, R’) = par ( fn () => msort L, fn () => msort R) val B = merge L’ R’ in B end msort [2,4,3,1]
Example: Mergesort fun msort A = if length A < 2 then A else let val (L, R) = splitMid A val (L’, R’) = par ( fn () => msort L, fn () => msort R) val B = merge L’ R’ in B end par ( fn () => msort [2,4], fn () => msort [3,1])
Example: Mergesort fun msort A = if length A < 2 then A else let val (L, R) = splitMid A val (L’, R’) = par ( fn () => msort L, fn () => msort R) val B = merge L’ R’ in B end msort [2,4] msort [3,1]
Example: Mergesort fun msort A = if length A < 2 then A else let val (L, R) = splitMid A val (L’, R’) = par ( fn () => msort L, fn () => msort R) val B = merge L’ R’ in B end msort [2] msort [4] msort [3] msort [1]
Example: Mergesort fun msort A = if length A < 2 then A else let val (L, R) = splitMid A val (L’, R’) = par ( fn () => msort L, fn () => msort R) val B = merge L’ R’ in B end [2] [4] [3] [1]
Example: Mergesort fun msort A = if length A < 2 then A else let val (L, R) = splitMid A val (L’, R’) = par ( fn () => msort L, fn () => msort R) val B = merge L’ R’ in B end merge [2] [4] merge [3] [1]
Example: Mergesort fun msort A = if length A < 2 then A else let val (L, R) = splitMid A val (L’, R’) = par ( fn () => msort L, fn () => msort R) val B = merge L’ R’ in B end [2,4] [1,3]
Example: Mergesort fun msort A = if length A < 2 then A else let val (L, R) = splitMid A val (L’, R’) = par ( fn () => msort L, fn () => msort R) val B = merge L’ R’ in B end merge [2,4] [1,3]
Example: Mergesort fun msort A = if length A < 2 then A else let val (L, R) = splitMid A val (L’, R’) = par ( fn () => msort L, fn () => msort R) val B = merge L’ R’ in B end [1,2,3,4]
Hierarchical Memory Management fun msort A = if length A < 2 then A else let val (L, R) = splitMid A val (L’, R’) = par ( fn () => msort L, fn () => msort R) val B = merge L’ R’ in B end
Hierarchical Memory Management fun msort A = if length A < 2 then A else let val (L, R) = splitMid A val (L’, R’) = par ( fn () => msort L, fn () => msort R) val B = merge L’ R’ in B end X join Y Z
Hierarchical Memory Management fun msort A = if length A < 2 then A else let val (L, R) = splitMid A val (L’, R’) = par ( fn () => msort L, fn () => msort R) val B = merge L’ R’ in B end X Y Z join
Hierarchical Memory Management fun msort A = if length A < 2 then A else let val (L, R) = splitMid A val (L’, R’) = par ( fn () => msort L, fn () => msort R) val B = merge L’ R’ in B end X fork (spawn)
Hierarchical Memory Management fun msort A = if length A < 2 then A else let val (L, R) = splitMid A val (L’, R’) = par ( fn () => msort L, fn () => msort R) val B = merge L’ R’ in B end X fork (spawn) fresh empty heaps
Hierarchical Memory Management fun msort A = if length A < 2 then A else let val (L, R) = splitMid A val (L’, R’) = par ( fn () => msort L, fn () => msort R) val B = merge L’ R’ in B A end L R
Hierarchical Memory Management fun msort A = if length A < 2 then A else let val (L, R) = splitMid A val (L’, R’) = par ( fn () => msort L, fn () => msort R) val B = merge L’ R’ in B A end L R fork (spawn)
Hierarchical Memory Management fun msort A = if length A < 2 then A else let val (L, R) = splitMid A val (L’, R’) = par ( fn () => msort L, fn () => msort R) val B = merge L’ R’ in B A end L R join L1 R1 L2 R2 B1 B2
Hierarchical Memory Management fun msort A = if length A < 2 then A else let val (L, R) = splitMid A val (L’, R’) = par ( fn () => msort L, fn () => msort R) val B = merge L’ R’ in B A end L R L1 R1 L2 R2 B1 B2
Hierarchical Memory Management fun msort A = if length A < 2 then A else let val (L, R) = splitMid A val (L’, R’) = par ( fn () => msort L, fn () => msort R) val B = merge L’ R’ in B A end L R L1 R1 L2 R2 B1 B2 B
Hierarchical Memory Management • give each task its own heap • tasks allocate new data inside their own heaps • organize heaps to mirror the nesting structure of tasks • fork (spawn, async, etc): fresh heaps for children • join (sync, finish, etc): merge heaps into parent
Disentanglement: in strict purely functional programs, all pointers either point up or are internal [Raghunathan et al, ICFP’16]
Disentanglement: in strict purely functional programs, all pointers either point up or are internal [Raghunathan et al, ICFP’16]
Disentanglement: in strict purely functional programs, all pointers either point up or are internal [Raghunathan et al, ICFP’16]
Disentanglement: in strict purely functional programs, all pointers either point up or are internal [Raghunathan et al, ICFP’16]
Local Garbage Collection pick a subtree reorganize, compact, etc. inside subtree
Local Garbage Collection pick a subtree reorganize, compact, etc. inside subtree
Local Garbage Collection Disentanglement is necessary:
Local Garbage Collection Disentanglement is necessary: dangling pointer
Local Garbage Collection • localized within a subtree of heaps • independent of • tasks whose heaps are outside the subtree • other local collections (on disjoint subtrees) • can easily apply any existing GC algorithm • just ignore pointers that exit the subtree
In-place Updates • often crucial for e ffi ciency, especially under the hood • but, can break disentanglement (not always) let val r = ref [] fun f () = (r := 0 :: !r) r [] fun g () = (r := 1 :: !r) in par (f, g) end
In-place Updates • often crucial for e ffi ciency, especially under the hood • but, can break disentanglement (not always) let val r = ref [] fun f () = (r := 0 :: !r) r [] fun g () = (r := 1 :: !r) in par (f, g) end 0
In-place Updates • often crucial for e ffi ciency, especially under the hood • but, can break disentanglement (not always) let val r = ref [] fun f () = (r := 0 :: !r) r [] fun g () = (r := 1 :: !r) in par (f, g) end 0
In-place Updates • often crucial for e ffi ciency, especially under the hood • but, can break disentanglement (not always) let val r = ref [] fun f () = (r := 0 :: !r) r [] fun g () = (r := 1 :: !r) in par (f, g) end 0 1
In-place Updates • often crucial for e ffi ciency, especially under the hood • but, can break disentanglement (not always) let val r = ref [] fun f () = (r := 0 :: !r) r [] fun g () = (r := 1 :: !r) in par (f, g) end 0 1
In-place Updates • often crucial for e ffi ciency, especially under the hood • but, can break disentanglement (not always) • options: • enforce disentanglement dynamically with promotion [Guatto et al, PPoPP’18] • weaken to permit important classes of e ff ects [Westrick et al, work in progress]
Implementation • extend MLton compiler with fork-join library val par : (unit -> ‘a) * (unit -> ‘b) -> ‘a * ‘b • block-structured heaps • heaps are lists of blocks: merge heaps in O(1) time • no read barrier. write barrier only on mutable pointer data • local collections: sequential Cheney-style copying/compacting • work-stealing scheduler • GC policy influenced by scheduler decisions
Runtime Overhead Ours / MLton, 1 core 0.00 0.50 1.00 1.50 2.00 f t i a b b u l a t e m a m p a - i p n - p l a c e s c a n r e d u c e s f a i l t m e r p l e m s o e r r t g e s o r t d m m d e h d i u s p t o g b r a a m r n e s a - h l l u - n t e a r e s t
Speedups MLton / Ours, 72 cores 90 72 54 36 18 0 t e t t t e e n r m p m b p r r s e u c c o t a a o u i e f t h a m a u a c s d m s l r l i r - d l s e u f e e s a g d p e g e e b l d o - p r n r n a n t m e i s r - t - m a l p i l a h a b a s m
Recommend
More recommend