partitioned global address space paradigm asd distributed
play

Partitioned Global Address Space Paradigm ASD Distributed Memory HPC - PowerPoint PPT Presentation

Partitioned Global Address Space Paradigm ASD Distributed Memory HPC Workshop Computer Systems Group Research School of Computer Science Australian National University Canberra, Australia November 02, 2017 Day 4 Schedule Computer Systems


  1. Partitioned Global Address Space Paradigm ASD Distributed Memory HPC Workshop Computer Systems Group Research School of Computer Science Australian National University Canberra, Australia November 02, 2017

  2. Day 4 – Schedule Computer Systems (ANU) PGAS Paradigm 02 Nov 2017 2 / 90

  3. Introduction to the PGAS Paradigm and Chapel Outline Introduction to the PGAS Paradigm and Chapel 1 Chapel Programming Strategies for Distributed Memory 2 Runtime Support for PGAS 3 4 MPI One-Sided Communications 5 Fault Tolerance Computer Systems (ANU) PGAS Paradigm 02 Nov 2017 3 / 90

  4. Introduction to the PGAS Paradigm and Chapel Partitioned Global Address Space recall the shared memory model : multiple threads with pointers to a global address space in the partitioned global address space ( PGAS ) model: have multiple threads, each with affinity to some portion of global address space SPMD or fork-join thread creation remote pointers to access data in other partitions the model maps to a cluster with remote memory access can also map to NUMA domains Computer Systems (ANU) PGAS Paradigm 02 Nov 2017 4 / 90

  5. Introduction to the PGAS Paradigm and Chapel Chapel: Design Principles C ray H igh P erformance L anguage originally developed under DARPA High Productivity Computing Systems program Targeted at massively parallel computers object-oriented (Java-like syntax, but influenced by ZPL & HPF) supports exploratory programming implicit (statically-inferable) types, run-time settable parameters ( config ), implicit main and module wrappings multiresolution design: build higher-level concepts in terms of lower Fork-join, not SPMD Computer Systems (ANU) PGAS Paradigm 02 Nov 2017 5 / 90

  6. Introduction to the PGAS Paradigm and Chapel Chapel: Language Primitives Task parallelism: concurrent loops and blocks ( cobegin , coforall ) Data parallelism: Concurrent map operations ( forall ) Concurrent fold operations ( scan , reduce ) Synchronization: Task synchronization, sync variables, atomic sections Locality: locales (UMA places to hold data and run tasks) (index) domains used to specify arrays, iteration ranges distributions (mappings of domains to locales) can drastically reduce code size compared to MPI+X more info on Chapel home page Computer Systems (ANU) PGAS Paradigm 02 Nov 2017 6 / 90

  7. Introduction to the PGAS Paradigm and Chapel Chapel: Compile Chain chpl compiler generates standard C code, or uses LLVM backend (Image: Cray Inc.) Computer Systems (ANU) PGAS Paradigm 02 Nov 2017 7 / 90

  8. Introduction to the PGAS Paradigm and Chapel Chapel: Base Language variables, constants, parameters: 1 var timestep:int; 2 param pi: real = 3.14159265; 3 config const epsilon = 0.05; // $ ./ myProgram --epsilon =0.01 records: 1 record Vector3D { var x, y, z: real; 2 3 } 4 var pos = new Vector3D (0.0 , 1.0, -1.5); 5 pos.x = 2.0; 6 var copy = pos; // copied by value classes: 1 class Person { var firstName , surname: string; 2 var age:int; 3 4 } 5 var patsy = new Person("Patricia", "Stone", 39); Computer Systems (ANU) PGAS Paradigm 02 Nov 2017 8 / 90

  9. Introduction to the PGAS Paradigm and Chapel Chapel: Base Language (2) procedures, type inference, generic methods: 1 proc square(n) { return n * n; 2 3 } 4 5 var x = 2; 6 var x2 = square(x); 7 writeln(x2 , ": ",x2.type:string); // 4: int (64) 8 9 var y = 0.5; 10 var y2 = square(y); 11 writeln(y2 , ": ",y2.type:string); // 0.25: real (64) Computer Systems (ANU) PGAS Paradigm 02 Nov 2017 9 / 90

  10. Introduction to the PGAS Paradigm and Chapel Chapel: Base Language (3) iterators: 1 iter triangle(n) { var current = 0; 2 for i in 1..n { 3 current += i; 4 yield current; 5 } 6 7 } tuples, zippered iteration: 1 config const n = 10; 2 for (i,t) in zip (0..#n, triangle(n)) do writeln("triangle number ", i, " is ", t); 3 Computer Systems (ANU) PGAS Paradigm 02 Nov 2017 10 / 90

  11. Introduction to the PGAS Paradigm and Chapel Chapel: Task Parallelism task creation: 1 begin doStuff (); // spawn task and don ’t wait 2 cobegin { doStuff1 (); 3 doStuff2 (); 4 5 } // wait for completion of all statements in the block synchronisation variables: 1 var a$: sync int; 2 begin a$ = foo (); 3 var c = 2 * a$; // suspend until a$ is assigned Computer Systems (ANU) PGAS Paradigm 02 Nov 2017 11 / 90

  12. Introduction to the PGAS Paradigm and Chapel Chapel: Synchronization Variables single variables can only be written once; sync variables are reset to empty when read. 1 var item$: sync int; 2 proc produce () { for i in 0..#N do 1 var latch$: single bool; 3 item$ = i; 2 proc await () { 4 5 } latch$; 3 6 proc consume () { 4 } for i in 0..#N { 5 proc release () { 7 var x = item$; latch$ = true; 8 6 writeln(x); 7 } 9 } 10 8 11 } 9 begin await (); 10 begin release (); 12 13 begin produce (); 14 begin consume (); Computer Systems (ANU) PGAS Paradigm 02 Nov 2017 12 / 90

  13. Introduction to the PGAS Paradigm and Chapel Chapel: Task Parallelism Example Fibonacci numbers: 1 proc fib(n): int { 1 proc fib(n): int { if n <= 2 then if n <= 2 then 2 2 return 1; return 1; 3 3 var t1$: single int; var t1$ , t2$: single int; 4 4 var t2: int; cobegin { 5 5 begin t1$ = fib(n -1); t1$ = fib(n -1); 6 6 t2 = fib (n -2); t2$ = fib(n -2); 7 7 // wait for $t1 } 8 8 return t1$ + t2; // wait for t1$ and t2$ 9 9 10 } return t1$ + t2$; 10 Computer Systems (ANU) PGAS Paradigm 02 Nov 2017 13 / 90

  14. Introduction to the PGAS Paradigm and Chapel Chapel: Data Parallelism ranges: 1 var r1 = 0..3; // 0, 1, 2, 3 2 var r2 = 0..#10 by 2; // 0, 2, 4, 6, 8 arrays, data parallel loops: 1 var A, B: [0..#N] real; 2 forall i in 0..#N do // cf. coforall A(i) = A(i) + B(i); 3 scalar promotion: 1 A = A + B; Computer Systems (ANU) PGAS Paradigm 02 Nov 2017 14 / 90

  15. Introduction to the PGAS Paradigm and Chapel Chapel: Data Parallelism (2) example: DAXPY 1 config const alpha = 3.0; 2 const MyRange = 0..#N; 3 proc daxpy(x: [MyRange] real , y: [MyRange] real): int { forall i in MyRange do 4 y(i) = alpha * x(i) + y(i); 5 6 } Alternatively, via promotion, the forall loop can be replaced by: y = alpha * x + y; reductions and scans: 1 var mx = (max reduce A); 2 A = (+ scan A); // prefix sum of A - parallel? the target of data parallelism could be SIMD, GPU or normal threads (currently no way to express this) Computer Systems (ANU) PGAS Paradigm 02 Nov 2017 15 / 90

  16. Introduction to the PGAS Paradigm and Chapel Chapel: forall vs. coforall Use forall when iterations may be executed in parallel Use coforall when iterations must be executed in parallel What’s wrong with this code? 1 var a$: [0..#N] single int; 2 forall i in {0..#N} { if i < (N -1) then 3 a$[i] = a$[i+1] - 1; 4 else 5 a$[i] = N; 6 var result = a$[i]; 7 writeln(result); 8 9 } Computer Systems (ANU) PGAS Paradigm 02 Nov 2017 16 / 90

  17. Introduction to the PGAS Paradigm and Chapel Chapel: Task Intents constant (default): 1 config const N = 10; 2 var race:int; 3 coforall i in 0..#N do race += 1; // illegal! 4 reference: 1 var deliberateRace :int; 2 coforall i in 0..#N with (ref deliberateRace ) do deliberateRace += 1; 3 reduce: 1 var sum:int; 2 coforall i in 0..#N with (+ reduce sum) do sum += i; 3 Computer Systems (ANU) PGAS Paradigm 02 Nov 2017 17 / 90

  18. Introduction to the PGAS Paradigm and Chapel Chapel: Domains domain : an index set, can be used to declare arrays dense (rectangular): a tensor product of ranges, e.g. 1 config const M = 5, N = 7; 2 const D: domain (2) = {0..#M, 0..#N}; strided: 1 const D1 = {0..#M by 4, 0..#N by 2}; Computer Systems (ANU) PGAS Paradigm 02 Nov 2017 18 / 90

  19. Introduction to the PGAS Paradigm and Chapel Chapel: Domains (2) sparse: 1 const SparseD: sparse subdomain(D) = ((0 ,0) , (1 ,2), (3 ,2), (4 ,4)); 2 associative: 1 var Colours: domain(string) = {"Black", "Yellow", "Red"}; Computer Systems (ANU) PGAS Paradigm 02 Nov 2017 19 / 90

  20. Introduction to the PGAS Paradigm and Chapel Chapel: Locales locale : a unit of the target architecture: processing elements with (uniform) local memory 1 const Locales: [0..# numLocales ] locale = ... ; //built -in 2 on Locales [1] do foo (); 3 coforall (loc , id) in zip(Locales , 1..) do 4 on loc do // migrates this task to loc 5 coforall tid in 0..# numTasks do 6 writeln("Task ", id , " thread ", tid , " on ", loc); 7 Computer Systems (ANU) PGAS Paradigm 02 Nov 2017 20 / 90

  21. Introduction to the PGAS Paradigm and Chapel Chapel: Domain Maps use domain maps to map indices in a domain to locales: 1 use CyclicDist ; 2 const Dist = new dmap( new Cyclic(startIdx = 1, targetLocales = Locales [0..1])); 3 4 const D = {0..#N} dmapped Dist; 5 var x, y: [D] real; Computer Systems (ANU) PGAS Paradigm 02 Nov 2017 21 / 90

  22. Introduction to the PGAS Paradigm and Chapel Chapel: Domain Maps (2) block: 1 use BlockDist; 2 const space1D = {0..#N}; 3 const B = space1D dmapped Block( boundingBox =space1D); Computer Systems (ANU) PGAS Paradigm 02 Nov 2017 22 / 90

  23. Introduction to the PGAS Paradigm and Chapel Hands-on Exercise: Locales in Chapel Computer Systems (ANU) PGAS Paradigm 02 Nov 2017 23 / 90

Recommend


More recommend