Asynchronous Parallel DLA in Concurrent Collections Aparna Chandramowlishwaran, Richard Vuduc – Georgia Tech Kathleen Knobe – Intel May 14, 2009 Workshop on Scheduling for Large-Scale Systems @ UTK 1 1
Motivation and goals Motivating recent work for multicore systems Tile algorithms for DLA, e.g. , Buttari, et al . (2007); Chan, et al . (2007) General parallel programming models suited to this algorithmic style, e.g. , Concurrent Collections (CnC) by Knobe & Offner (2004) Goals Study: Apply and evaluate CnC using PDLA examples Talk: CnC tutorial crash course; platform for your work? To download CnC, see: whatif.intel.com 2 2
Outline Overview of the Concurrent Collections (CnC) language Asynchronous parallel Cholesky & symmetric eigensolver in CnC Experimental results (preliminary) 3 3
Concurrent Collections (CnC) programming model Separates computation semantics from expression of parallelism Program = components + scheduling constraints Components: Computation , control , data Constraints: Relations among components No overwriting of data, no arbitrary serialization, and no side-effects Combines tuple-space, streaming, and dataflow models 4 4
CnC example: Outer product Z ← x · y T 5 5
CnC example: Outer product Z ← x · y T z i,j ← x i · y j Example only; coarser grain may be more realistic in practice. 6 6
CnC example: Outer product z i,j ← x i · y j Collections: Static representation of dynamic instances 7 7
CnC example: Outer product z i,j ← x i · y j Collections: Static representation of dynamic instances Step Unit of execution * Set of all (dynamic) multiplications 8 8
CnC example: Outer product z i,j ← x i · y j <i,j> Collections: Static representation of dynamic instances Step Unit of execution * Control Tag < a , b , …> = tuple of tag components 9 9
CnC example: Outer product z i,j ← x i · y j <i,j> Collections: Static representation of dynamic instances Step Unit of execution * Control Tag Says whether , not when , step executes 10 10
CnC example: Outer product z i,j ← x i · y j <i,j> Collections: Static representation of dynamic instances Step Unit of execution * Control Tag Tags prescribe steps 11 11
CnC example: Outer product z i,j ← x i · y j <i,j> Collections: Static representation of <i> dynamic instances x <i,j> Step Unit of execution * Z <j> Control y Tag Item Data 12 12
CnC example: Outer product z i,j ← x i · y j <i,j> Collections: Static representation of <i> dynamic instances x <i,j> Step Unit of execution * Z <j> Control y Tag Item Data → shows producer/consumer relations 13 13
CnC example: Outer product z i,j ← x i · y j <i,j> Collections: Static representation of <i> dynamic instances x <i,j> Step Unit of execution * Z <j> Control y Tag Item Data “ Environment ” may produce/consume 14 14
Essential properties of a CnC program z i,j ← x i · y j Written in terms of values, without overwriting ⇒ race-free ( dynamic single assignment ) <i,j> <i> No arbitrary serialization, x only explicit ordering <i,j> constraints * Z ( avoids analysis ) <j> y Steps are side-effect free ( functional ) 15 15
CnC example: Tree search match ← find (value x in tree T ) Collections: Static representation of dynamic instances Step Unit of execution Control Tag Item Data 16 16
CnC example: Tree search Controller/controlee relations match ← find (value x in tree T ) <root> Collections: Static representation of <node> dynamic instances T Step Unit of execution = < ⋅ > <match> Control x Tag Item Data 17 17
Execution model z i,j ← x i · y j <i,j> <i> x <i,j> * Z <j> y Recall: Outer product example 18 18
Execution model z i,j ← x i · y j Tag <i=2, j=5> available <2,5> 19 19
Execution model z i,j ← x i · y j Tag <i=2, j=5> available ⇒ Step prescribed <2,5> * 20 20
Execution model z i,j ← x i · y j Tag <2,5> available ⇒ Step prescribed <2,5> Items x:<2>, y:<5> available <2> ⇒ Step inputs-available x * <5> y 21 21
Execution model z i,j ← x i · y j Tag <2,5> available ⇒ Step prescribed <2,5> Items x:<2>, y:<5> available <2> ⇒ Step inputs-available x Prescribed + inputs-available * ⇒ enabled <5> y 22 22
Execution model z i,j ← x i · y j Tag <2,5> available ⇒ Step prescribed <2,5> Items x:<2>, y:<5> available <2> ⇒ Step inputs-available x <2,5> Prescribed + inputs-available * Z ⇒ enabled <5> y Executes ⇒ Z:<2,5> available 23 23
z i,j ← Coding and execution [1] Write the specification (graph). [2] Implement steps in a “base” language (C/C++). [3] Build using CnC translator + compiler. [4] Run-time system maintains collections and schedules step execution. 24 24
Textual notation z i,j ← x i · y j <i,j> <i> x <i,j> * Z <j> y Recall: Outer product example 25 25
Textual notation z i,j ← x i · y j <i,j> <i> x <i,j> * Z <j> y 26 26
Textual notation // Input: z i,j ← x i · y j env → <*: i,j>; <i,j> <i> x <i,j> * Z <j> y 27 27
Textual notation // Input: z i,j ← x i · y j env → <*: i,j>, [x: i], [y: j]; <i,j> <i> x <i,j> * Z <j> y 28 28
Textual notation // Input: z i,j ← x i · y j env → <*: i,j>, [x: i], [y: j]; <i,j> <i> x <i,j> * Z <j> y 29 29
Textual notation // Input: z i,j ← x i · y j env → <*: i,j>, [x: i], [y: j]; // Prescription relations: <i,j> <i> <*: i,j> :: (*: i,j); x <i,j> * Z <j> y 30 30
Textual notation // Input: z i,j ← x i · y j env → <*: i,j>, [x: i], [y: j]; // Prescription relations: <i,j> <i> <*: i,j> :: (*: i,j); x // Producer/consumer relations: <i,j> * Z [x: i], [y: j] → (*: i, j); <j> (*: i, j) → [Z: i, j]; y 31 31
Textual notation // Input: z i,j ← x i · y j env → <*: i,j>, [x: i], [y: j]; // Prescription relations: <i,j> <i> <*: i,j> :: (*: i,j); x // Producer/consumer relations: <i,j> * Z [x: i], [y: j] → (*: i, j); <j> (*: i, j) → [Z: i, j]; y // Output: [Z: i, j] → env; 32 32
Textual notation // Input: z i,j ← x i · y j env → <*: i,j>, [x: i], [y: j]; // Prescription relations: <i,j> <i> <*: i,j> :: (*: i,j); x // Producer/consumer relations: <i,j> * Z [x: i], [y: j] → (*: i, j); <j> (*: i, j) → [Z: i, j]; y // Output: [Z: i, j] → env; 33 33
Step code written in a sequential base language Return_t mult (Graph_t& G, z i,j ← x i · y j const Tag_t& t) { <i,j> <i> int i = t[0], j = t[1]; x double x_i = G.x.Get (Tag_t(i)); <i,j> * Z double y_j = G.y.Get (Tag_t(j)); <j> G.Z.Put (Tag_t(i, j), x_i*y_j); y return CNC_Success; } Intel’s implementation uses C++; Rice University’s uses Java (Habanero) 34 34
Step code written in a sequential base language Return_t mult (Graph_t& G, z i,j ← x i · y j const Tag_t& t) { <i,j> <i> int i = t[0], j = t[1]; x double x_i = G.x.Get (Tag_t(i)); <i,j> * Z double y_j = G.y.Get (Tag_t(j)); <j> G.Z.Put (Tag_t(i, j), x_i*y_j); y return CNC_Success; } Intel’s implementation uses C++; Rice University’s uses Java (Habanero) 35 35
Step code written in a sequential base language Return_t mult (Graph_t& G, z i,j ← x i · y j const Tag_t& t) { <i,j> <i> int i = t[0], j = t[1]; x double x_i = G.x.Get (Tag_t(i)); <i,j> * Z double y_j = G.y.Get (Tag_t(j)); <j> G.Z.Put (Tag_t(i, j), x_i*y_j); y return CNC_Success; } Intel’s implementation uses C++; Rice University’s uses Java (Habanero) 36 36
Step code written in a sequential base language Return_t mult (Graph_t& G, z i,j ← x i · y j const Tag_t& t) { <i,j> <i> int i = t[0], j = t[1]; x double x_i = G.x.Get (Tag_t(i)); <i,j> * Z double y_j = G.y.Get (Tag_t(j)); <j> G.Z.Put (Tag_t(i, j), x_i*y_j); y return CNC_Success; } Intel’s implementation uses C++; Rice University’s uses Java (Habanero) 37 37
Step code written in a sequential base language Return_t mult (Graph_t& G, z i,j ← x i · y j const Tag_t& t) { <i,j> <i> int i = t[0], j = t[1]; x double x_i = G.x.Get (Tag_t(i)); <i,j> * Z double y_j = G.y.Get (Tag_t(j)); <j> G.Z.Put (Tag_t(i, j), x_i*y_j); y return CNC_Success; } Intel’s implementation uses C++; Rice University’s uses Java (Habanero) 38 38
Recommend
More recommend