Exame de Qualifica¸ c˜ ao Aluno: Vin´ ıcius Garcia Pinto Orientador: Nicolas Maillard Linha de pesquisa: Processamento Paralelo e Distribu´ ıdo ´ Area de abrangˆ encia: Programa¸ c˜ ao Paralela Tema de profundidade: Programa¸ c˜ ao Paralela H´ ıbrida 11 de junho de 2014
Exame de Qualifica¸ c˜ ao Agenda 1 Why Parallel Programming? 2 Parallel Programming 3 Hybrid Parallel Programming 4 Conclusion 5 References 2/70
Qualifying Exam Why Parallel Programming? 1 Why Parallel Programming? 2 Parallel Programming 3 Hybrid Parallel Programming 4 Conclusion 5 References 3/70
Qualifying Exam Why Parallel Programming? Introduction Parallel Computing use of two or more processing units to solve a single problem. Goals: Solve problems in less time; Solve larger problems; Where? Climate modeling; Energy research; Data analysis; Simulation; [J´ aJ´ a 1992; Mattson et al. 2004; Pacheco 2011; Scott et al. 2005] 4/70
Qualifying Exam Why Parallel Programming? A current parallel computer #0 #1 #0 #1 #2 #3 #2 #3 #0 #1 #0 #1 #2 #3 #2 #3 5/70
Qualifying Exam Why Parallel Programming? A current parallel computer #0 #1 #0 #1 #2 #3 #2 #3 #0 #1 #0 #1 #2 #3 #2 #3 6/70
Qualifying Exam Why Parallel Programming? A current parallel computer #0 #1 #0 #1 Xeon 1.9 GHz E5-2630 Cortex-A15 #2 #3 #2 #3 #0 #1 #0 #1 1.3 GHz Xeon Cortex-A7 E5-2630 #2 #3 #2 #3 Mali-T628 Tesla K20m 7/70
Qualifying Exam Why Parallel Programming? A current parallel computer #0 #1 #0 #1 Xeon 1.9 GHz E5-2630 Cortex-A15 #2 #3 #2 #3 #0 #1 #0 #1 GPPD Orion #1 1.3 GHz Xeon Cortex-A7 E5-2630 Samsung Galaxy S5 #2 #3 #2 #3 Mali-T628 Tesla K20m 7/70
Qualifying Exam Why Parallel Programming? Introduction Parallel Computers are (now) mainstream: vector instructions, multithreaded cores, multicore processors, graphics engines, accelerators; not only for scientific (or HPC) applications. But... None of the most popular programming languages was designed for parallel computing; Many programmers have never written a parallel program; Tools for parallel computing were designed for homogeneous supercomputers. All programmers could be parallel programmers! [McCool et al. 2012] 8/70
Qualifying Exam Parallel Programming (breadth) 1 Why Parallel Programming? 2 Parallel Programming 3 Hybrid Parallel Programming 4 Conclusion 5 References 9/70
Qualifying Exam Parallel Programming (breadth) Solving the problem in parallel 1 Why Parallel Programming? 2 Parallel Programming Solving the problem in parallel Implementing the solution Testing the solution 3 Hybrid Parallel Programming Programming Accelerators Hot Topics 4 Conclusion 5 References 10/70
Qualifying Exam Parallel Programming (breadth) Solving the problem in parallel Solving a Problem in Parallel Programmer’s tasks: 1 Identify the concurrency in the problem; 2 Structure an algorithm to exploit this concurrency; 3 Implement the solution with a suitable programming environment. Main challenges: Identify and manage dependencies between concurrent tasks; Manage additional errors introduced by parallelism; Improve the sequential solution (if exists). [Mattson et al. 2004; Sottile et al. 2010] 11/70
Qualifying Exam Parallel Programming (breadth) Solving the problem in parallel Finding concurrency Decompose the problem Identify sequences of steps that can be executed together and (probably) at the same time 12/70
Qualifying Exam Parallel Programming (breadth) Solving the problem in parallel Finding concurrency Decompose the problem Identify sequences of steps that can be executed together and (probably) at the same time → tasks [Sottile et al. 2010] 12/70
Qualifying Exam Parallel Programming (breadth) Solving the problem in parallel Finding concurrency Decompose the problem Identify sequences of steps that can be executed together and (probably) at the same time → tasks [Sottile et al. 2010] Granularity : size of one task vs # of tasks [Grama 2003] fine-grained: large number of smaller tasks; coarse-grained: small number of larger tasks. Degree of concurrency: # of tasks that can be executed simultaneously in parallel. 12/70
Qualifying Exam Parallel Programming (breadth) Solving the problem in parallel Finding concurrency Describe dependencies between tasks logical dependencies: the order that specific operations must be executed; data dependencies: the order that data elements must be updated. 13/70
Qualifying Exam Parallel Programming (breadth) Solving the problem in parallel Structuring the algorithm Common patterns: Fork-join: divide sequential flow into multiple parallel flows; join parallel flows back to the sequential flow; usually used to implement parallel divide and conquer; Example: merge-sort [J´ aJ´ a 1992; McCool et al. 2012] 14/70
Qualifying Exam Parallel Programming (breadth) Solving the problem in parallel Structuring the algorithm Common patterns: Stencil apply a function to each element and its neighbors, the output is a combination of the values of the current element and its neighbors. Example: computational fluid dynamics [J´ aJ´ a 1992; McCool et al. 2012] 15/70
Qualifying Exam Parallel Programming (breadth) Solving the problem in parallel Structuring the algorithm Common patterns: Map apply a function to all elements of a collection, producing a new collection. Example: parallel for Recurrence similar to map, but elements can use the outputs of adjacent elements as inputs. [McCool et al. 2012] 16/70
Qualifying Exam Parallel Programming (breadth) Solving the problem in parallel Structuring the algorithm Common patterns: Reduction combine all elements in a collection into a single element using an associative combiner function. Example: MPI Reduce [McCool et al. 2012] 17/70
Qualifying Exam Parallel Programming (breadth) Solving the problem in parallel Structuring the algorithm Common patterns: Scan computes all partial reductions of a collection, for every output position, a reduction of the input up to that point is computed. Example: prefix sum [McCool et al. 2012] 18/70
Qualifying Exam Parallel Programming (breadth) Implementing the solution 1 Why Parallel Programming? 2 Parallel Programming Solving the problem in parallel Implementing the solution Testing the solution 3 Hybrid Parallel Programming Programming Accelerators Hot Topics 4 Conclusion 5 References 19/70
Qualifying Exam Parallel Programming (breadth) Implementing the solution Implementing the solution Parallel programming environments, in general, abstract hardware organization: Message passing for distributed-memory platforms e.g. MPI; Threads for shared-memory platforms e.g. OpenMP; But, there are exceptions: Google Go is a multicore programming language that communicates using message passing by channels (Hoare). 20/70
Qualifying Exam Parallel Programming (breadth) Implementing the solution Programming in distributed-memory platforms MPI (Message Passing Interface) de facto standard; processes communicate by exchanging messages: send / receive ; point-to-point/collective communications; several implementations: Open MPI, MPICH, etc; [Gropp et al. 1999] 21/70
Qualifying Exam Parallel Programming (breadth) Implementing the solution Programming in distributed-memory platforms MPI (Message Passing Interface) All processes run the same binary program (MPI-1); SPMD; Each process is identified by a rank; Execute tests (if... then) to run those parts of the program that are relevant; Dynamic process creation (MPI-2); MPMD; process creation after MPI application has started; [Forum 1997] 22/70
Qualifying Exam Parallel Programming (breadth) Implementing the solution Programming in distributed-memory platforms ... MPI Init ( &argc, &argv ); MPI Comm rank ( MPI COMM WORLD, &myrank ); if (myrank == 0) { strcpy(message,"Hello, there"); MPI Send (message, strlen(message)+1, MPI CHAR, 1, 99, MPI COMM WORLD); } else if (myrank == 1) { MPI Recv (message, 13, MPI CHAR, 0, 99, MPI COMM WORLD, MPI STATUS IGNORE); printf("Received :%s: \ n", message); } MPI Finalize (); 23/70
Qualifying Exam Parallel Programming (breadth) Implementing the solution Programming in distributed-memory platforms MPI (Message Passing Interface) Version 3 (MPI-3) nonblocking collective operations; neighborhood collective communication; new one-sided communication operations; Fortran 2008 bindings; removed/deprecated functionalities: C++ bindings; [Forum 2012] 24/70
Qualifying Exam Parallel Programming (breadth) Implementing the solution Programming in distributed-memory platforms MPI (Message Passing Interface) Pros: widely used; scalability; portability; no (explicit) locks. Cons: low level; explicit data distribution; ”assembly code of parallel computing”; 25/70
Qualifying Exam Parallel Programming (breadth) Implementing the solution Programming in distributed-memory platforms Are there alternatives? Low level: sockets (OS); High level: RMI/RPC; Java RMI; Charm++; MapReduce (Hadoop); Partitioned Global Address Space (PGAS): Unified Parallel C (UPC); [Dean et al. 2008; Downing 1998; Kale et al. 1993; UPC et al. 2013] 26/70
More recommend