Concurrency and Parallelism in ML John Reppy University of Chicago MacQueen Fest — May 12, 2012
History Personal history I ML on Unix (Cardelli ML) I ML + Amber = ⇒ Pegasus ML I Standard ML of New Jersey (Version 0.15 on tape) I Pegasus ML + SML/NJ = ⇒ Concurrent ML ⇒ Ph.D.!!! I = ⇒ Department 11261 at Bell Labs I = MacQueen Fest — May 12, 2012 Concurrency and Parallelism in ML 2
Why ML? What makes parallelism and concurrency hard? The sequential core matters! I Combination of shared mutable state and concurrency leads to data races and non-determinism. I Adding synchronization to avoid these problems leads to deadlock. I Shared memory does not scale well to NUMA and Distributed Memory architectures. I Scaling is hard. Claim: traditional imperative programming languages are a bad fit for concurrent and parallel programming. MacQueen Fest — May 12, 2012 Concurrency and Parallelism in ML 3
Why ML? Alternatives I Java, C#. etc. I Haskell I X10 MacQueen Fest — May 12, 2012 Concurrency and Parallelism in ML 4
Why ML? Standard ML Claim: what we want is a strict, statically typed, functional language. I.e. , Standard ML I Strict CBV semantics I Type system distinguishes between mutable and non-mutable values. I Programming style is value-oriented. MacQueen Fest — May 12, 2012 Concurrency and Parallelism in ML 5
Why ML? Challenges SML does not come without challenges. I Polymorphism I Higher-order functions I Garbage collection I Exceptions MacQueen Fest — May 12, 2012 Concurrency and Parallelism in ML 6
Parallel ML The Manticore Project I The Manticore project is our effort to address the programming needs of commodity applications running on multicore SMP systems I No shared memory I Preserve determinism where possible I Declarative mechanisms for fine-grain parallelism MacQueen Fest — May 12, 2012 Concurrency and Parallelism in ML 7
Parallel ML The Manticore Project (continued ...) Our initial language is called Parallel ML (PML). I Sequential core language based on subset of SML: strict with no mutable storage. I A variety of lightweight implicitly-threaded constructs for fine-grain parallelism. I Explicitly-threaded parallelism based on CML: message passing with first-class synchronization. I Prototype implementation with good scaling on 48-way parallel hardware for a range of applications. MacQueen Fest — May 12, 2012 Concurrency and Parallelism in ML 8
Parallel ML Implicit threading PML provides several light-weight syntactic forms for introducing parallel computation. I Parallel tuples provide a basic fork-join parallel computation. I Nested Data-parallel arrays provide fine-grain data-parallel computations over sequences. I Parallel bindings provide data-flow parallelism with cancellation of unused subcomputations. I Parallel cases provide non-deterministic speculative parallelism. These forms are annotations that mark a computation as a good candidate for parallel execution, but the details are left to the implementation. MacQueen Fest — May 12, 2012 Concurrency and Parallelism in ML 9
Parallel ML Challenges revisited SML does not come without challenges. I Polymorphism — whole program monomorphism using MLton’s front end I Higher-order functions — advanced CFA techniques I Garbage collection — DGL split-heap GC and parallel global GC I Exceptions — reduce use of arithmetic exceptions MacQueen Fest — May 12, 2012 Concurrency and Parallelism in ML 10
Parallel ML PML performance 48 perfect 44 RayTracer QuickSort 40 Black − Scholes 36 Barnes − Hut 32 28 Speedup 24 20 16 12 8 4 0 4 8 16 24 30 36 40 48 Number of processors Speedup over sequential PML. MacQueen Fest — May 12, 2012 Concurrency and Parallelism in ML 11
The future The need for shared mutable state I Mutable storage is a very powerful communication mechanism: essentially a broadcast mechanism supported by the memory hardware. I Sequential algorithms and data-structures gain significant (asymptotic) performance benefits from shared memory ( e.g. , union-find with path compression). I Some algorithms seem hard/impossible to parallelize without shared state (e.g., mesh refinement). I But shared memory makes parallel programming hard, so we want to be cautious in adding it to PML. MacQueen Fest — May 12, 2012 Concurrency and Parallelism in ML 12
The future The design challenge I How do we add shared memory while preserving PML ’s declarative programming model for fine-grain parallelism? I Some races are okay in an implicitly threaded setting. I Deadlock is not okay in an implicitly threaded setting. MacQueen Fest — May 12, 2012 Concurrency and Parallelism in ML 13
The future Limits on parallel performance: Amdahl’s Law 1 100% 0.9 0.8 0.7 99% Efficiency 0.6 0.5 0.4 0.3 95% 0.2 90% 0.1 80% 0 1 2 3 4 6 8 12 16 24 32 40 48 Number of Processors MacQueen Fest — May 12, 2012 Concurrency and Parallelism in ML 14
The future Speculation I Amdahl’s Law tells us that as the number of cores increases, execution time will be dominated by sequential code. I Speculation is an important tool for introducing parallelism in otherwise sequential code. I PML supports both deterministic and nondeterministic speculation. I For many applications, we can relax determinism and still get a correct answer. MacQueen Fest — May 12, 2012 Concurrency and Parallelism in ML 15
Conclusion Credits I Matthew Fluet (RIT) I Claudio Russo (MSR Cambridge) I Sven Auhagen, Lars Bergstrom, Mike Rainey, Adam Shaw, and Yingqi Xiao (U. of Chicago Graduate Students) I Carsen Berger, Stephen Rosen, and Nora Sandler (U. of Chicago Undergraduates) I Chelsea Bingiel, Nic Ford, Korie Klein, Joshua Knox, Jordan Lewis, and Damon Wang (Past U. of Chicago Undergraduates) I National Science Foundation MacQueen Fest — May 12, 2012 Concurrency and Parallelism in ML 16
Conclusion Questions? http://manticore.cs.uchicago.edu MacQueen Fest — May 12, 2012 Concurrency and Parallelism in ML 17
Recommend
More recommend