Basic Idea ”The main task of a functional programmer should be to specify what has to be evaluated in parallel, and not Parallel functional programming how the parallel evaluation has to be organized.” “Parallelism from a functional angle” Main goal: speedup (as always). Thanks to Lennart Edblom (viceprefekt) for the material. Functional Languages Problems Why functional languages?? • Easier to partition a parallel program as tasks to evaluate Operational aspects (not the semantics) • Simple communication model (data dependence) – The rest is hidden (in general) to the user • Determinism: • Performance monitoring Suppose that the sequential program is correct – • Cost modeling – Deadlock can not occur – The result is independent of the scheduling • Locality • Simpler debugging (= sequential program) • Easy to utilize parallel language constructions on a high level Is this really always true??? We will try to find out….
A basic concept… …and a question of definition "Pure" functional languages posess Parallelism 4 several processes solves parts of a "the Church4Rosser property" . joint problem. Independent subexpressions can be evaluated in any Aim : speedup order (sequentially/parallel). The resultat will be the same (except for memory management)! Concurrency 4 independent processes cooperate, deadlock possible, often non4deterministic, • Only data dependencies controls the execution order explicit communication • No side effects Aim : better structure, higher level of abstraction Independent Tasks are Evaluated in Parallel Simple Partitioning Basic idea: Every computation needed to produce the final Example: Compute Fibonacci4numbers recursively resultat can be executed as a separate task (in parallel). (No side effects. Only data dependencies controls) parallel x = (f1 x, f2 x) fun nfib n = if n <= 1 then 1 else 1 + nfib(n-1) + nfib(n-2) f1 y = y + 1 f2 z = z * 3 The two recursive calls are evaluated in parallel (f1 x) and (f2 x) may be evaluated in parallel. But first x must have a value. Data dependence! recursively. par"g x = g (f1 x) (f2 x) g a b = a + b g:s argument may be evaluated in parallel… …. before the evaluation of g can start (strict languages!)
Language Issues – Design Simple Communication Model Redex 4 an expression (often function application) that can be evaluated Data dependent = communication channel There exists two main classes of functional languages: Strict vs non"strict Strict language 4 all arguments are evaluated , may be evaluated in parallel, "Simple" debugging before the body of the function The same program executes sequentially or in parallel, Non"strict – arguments are evaluated if/when they are needed => The communication, scheduling etc does not have to be considered evaluation of the function body may start before the arguments "exists”. ”Lazy evaluation”. ”Data4driven” vs ”Demand4driven” evaluation. when debugging Strict – sometimes have to limit the parallelism No deadlocks Non"strict 4 problems finding enough parallelism may be introduced by parallelization! (But an erroneous (sequential) program is of course erroneous • In non4strict languages you use strictness-analysis to decide (at in a parallel execution.) compile time) which expressions that are really being used A function is strict if f ⊥ = ⊥ , ⊥ denotes a non4defined value. Is also used Performance problems for programs that never finish executing. still have to examine with real tests (or simulations). If a function is strict it is safe to evaluate the arguments (and function body) in parallel. Where is the control? How to utilize the parallelism? Implicit parallelism 4 compiler & runtime4system decides about partitioning, distribution of data, load balancing, communication • Partitioning into tasks – implicit or explicit? Strict languages: easy to partition into tasks, often (too) fine grained • Static or dynamic load balancing? Non4strict: strictness analysis needed • Task placement? Limited implicit parallelism • Some language constructs matches parallel computation schemes • Granularity? • Data parallelism (SIMD) – the same operation is applied in parallel on every element in a large data structure. Powerful in functional languages with advanced data structures and higher order functions. Controlled (semi"explicit) parallelism 4 Annotation 4 directives / suggestions to the compiler 4 Evaluation strategies Explicit parallelism Language constructs for partitioning, communication etc. ”Algorithm skeletons” – catch common patterns for parallel computations in higher order functions. Express programs in these ”skeletons”.
Languages Computation Models "Pure" vs "non"pure" functional languages Data flow Ex: let x = a*b Pure 4 no sido effects (assignment, I/O etc) y = 4 * c in (x+y) * (x-y) / c end; "Data4driven evaluation" 4 an Pure language 4 easier to parallelize & partition � � operation can be performed as � Explicit control is hard to combine with "pureness" soon as all operands are � �� available. � Type system � Can be described by data flow � � Small influence on the parallelism graphs; a directed graph where the nodes represents Some languages have special types for "parallel data � operations and arrows data structures”. dependencies between the � operations. Reduction (Idealized) Behavior • A (functional) program = one (large) expression • Evaluation is done by stepwise substituting • The values are sent directly between the subexpressions with their values until a "normal form" nodes/instructions. No shared memory. is reached. • Only data dependencies limits the parallelism • Expressions represented as a graph => graph reduction • Computations may be "pipelined" through the • Ex: f x = (4 + (2 * x)) / ((2 * x) 4 5) graph � � • Operations can not have side effects ⇒ � � � � • Reduction � � � • "Merge" and "Switch" nodes are used to build � � conditions and loops • Common subexpressions share graph representation. • A suitable representation of d4f graphs can be • Parallel reductions are possible used as machine language in a d4f4machine. • Usually demand driven evaluation
Hardware Dataflow vs. Control4Flow During many years there were a lot of research about special von Neumann or control flow computing model architectures that closely matched these computation models. – a program is a series of addressable instructions, each of which either Reduction machines 4 ALICE, Flagship, GRIP etc • specifies an operation along with memory locations of the operands or Data flow machines 4 Manchester, TTDA & Monsoon (MIT) • it specifies (un)conditional transfer of control to some other instruction. Many ideas from this research have been adopted in modern – Essentially: the next instruction to be executed depends on what (parallel) computer architecture. Not everything were happened during the execution of the current instruction. implemented, but a lot were simulated using conventional – The next instr. to be executed is pointed to and triggered by the PC. hardware. – The instruction is executed even if some of its operands are not available yet The idea for special hardware for parallel functional programming is now "very unfashionable". Dataflow model : the execution is driven only by the availability of operands! Most are nowadays done using traditional hardware (newer things – no PC and global updateable store Cell, GPU??) – the programming and communication models may – The two features of von Neumann model that become bottlenecks in however be totally separate from reality (high level of exploiting parallelism are missing abstraction) Implementation Issues Issues of the ordering Computations on a von Neumann4machine must be performed Early implementations were interpreting in some order, which bring forward the question about reduction order. Functional languages are often implemented with the help of an abstract machine. This is often also true for parallel Normal order evaluation ; evaluates the arguments when implementations. they are needed; is implemented using call4by4need ≈ lazy The level of abstraction of the abstract machine is important for evaluation (or call4by4name); realizes non4strict semantics. how easily it can be realized on a concrete architecture. Always terminates if the value of the expression ≠ ⊥ 4 interpretation program Is often used in graph reduction. 4 concrete machine code abstract machine semantics Applicative order evaluation ; evaluates the arguments compiler informs before a function is called; is implemented using call4by4value; meaning abstract realizes strict semantics. code instance reduction abstract Can get into an infinite loop when evaluating arguments that input machine are not used. equivalence normal form Is often used in data flow. output
Recommend
More recommend