UofM-Summer-School, June 25-28, 2018 In Intro roduc ductio ion t n to P Paralle arallel P l Pro rogram gramming ing for shared memory machines using fo Op Open enMP MP Al Ali Ke Kerrache E-ma mail: ali. ali.kerrac ache he@um @umanit anitoba. ba.ca Summer School, June 25-28, 2018
Outline q Introduction to parallel programming (OpenMP) q Definition of OpenMPAPI Ø Constitution of an OpenMP program Ø OpenMP programming Model Ø OpenMP syntax [C/C++, Fortran]: compiler directives Ø Run or submit an OpenMP job [SLURM, PBS] q Learn OpenMP by Examples Ø Hello World program v Work sharing in OpenMP ü Sections ü Loops Ø Compute pi = 3.14 v Serial and Parallel versions v Race condition v SPMD model v Synchronization Summer School, June 25-28, 2018
Download the support material q Use ssh client: PuTTy, MobaXterm, Terminal (Mac or Linux) to connect to cedar and/or graham: Ø ssh –Y username@cedar.computecanada.ca Ø ssh –Y username@graham.computecanada.ca q Download the files using wget: wget https://ali-kerrache.000webhostapp.com/uofm/openmp.tar.gz wget https://ali-kerrache.000webhostapp.com/uofm/openmp-slides.pdf Or from the website https://westgrid.github.io/manitobaSummerSchool2018/ q Unpack the archive and change the directory: tar -xvf openmp.tar.gz cd UofM-Summer-School-OpenMP Summer School, June 25-28, 2018
Concurrency and parallelism Concurrency: q Condition of a system in which multiple tasks are logically active at the same time … but they may not necessarily run in parallel. Parallelism: - subset of concurrency q Condition of a system in which multiple tasks are active at the same time and run in parallel. What do we mean by parallel machines? Summer School, June 25-28, 2018
Introduction of parallel programming Se Serial ial Program amming ing: Example: Ø Develop a serial program. Time Ø Performance & Optimization? 4 Cores 1 Core But in real world: Bu Parallelization Ø Run multiple programs. Execution in parallel Ø Large & complex problems. With 4 cores: Ø Time consuming. Execution time reduced So Solut lutio ion: n: by a factor of 4 Ø Use Parallel Machines. Ø Use Multi-Core Machines. What is Parallel Programming? Why Parallel? Wh Obtain the same amount of computation with multiple Ø Reduce the execution time. cores at low frequency (fast). Ø Run multiple programs. Summer School, June 25-28, 2018
Parallel machines & parallel programming Distributed Memory Machines Shared Memory Machines CPU-0 CPU-1 CPU-2 CPU-3 CPU-0 CPU-1 CPU-2 CPU-3 MEM-0 MEM-1 MEM-2 MEM-3 SHARED MEMORY Ø Each processor has its own memory . Ø All processors share the same memory . Ø The variables are independent . Ø The variables can be shared or private . Ø Communication by passing messages Ø Communication via shared memory . (network). Multi-Threading Multi-Processing Ø Portable, easy to program and use. Ø Difficult to program. Ø Not very scalable . Ø Scalable . MPI based programming OpenMP based programming Summer School, June 25-28, 2018
Definition of OpenMP: API v Library used to divide computational work in a program and add parallelism to a serial program ( create threads ). v Supported by compilers: Intel (ifort, icc), GNU (gcc, gfortran, …). v Programming languages: C/C++, Fortran. v Compilers: http://www.openmp.org/resources/openmp-compilers/ OpenMP Runtime Library Environment Variables Compiler Directives Directives to add to a Directives introduced after Directives executed serial program. compile time to control & at run time. Interpreted at compile execute OpenMP program. time. Summer School, June 25-28, 2018
Construction of an OpenMP program Application / Serial program / End user OpenMP Environment Compiler Directives Runtime Library Variables Compilation / Runtime Library / Operating System Thread creation & Parallel Execution … N-1 Thread 0 Thread 1 Thread 2 Thread 3 Thread 4 What is the OpenMP programming model? Summer School, June 25-28, 2018
OpenMP model: Fork-Join parallelism Serial Serial Serial FORK FORK JOIN JOIN Region Region Region Nested Serial region: master thread Parallel region: all threads Region Master thread spawns a team of threads as needed. FORK JOIN The Parallelism is added incrementally: that is, the sequential program evolves into a parallel program. Serial Program Define the regions to parallelize , then add OpenMP directives Summer School, June 25-28, 2018
Learn OpenMP by examples v Example_00: Threads creation. ü How to go from a serial code to a parallel code? ü How to create threads ? ü Introduce some constructs of OpenMP. ü Compile and run an OpenMP program ü submit an OpenMP job v Example_01: Work sharing using: ü Loops ü Sections v Example_02: Common problem in OpenMP programming. ü False sharing and race conditions. v Example_03: Single Program Multiple Data model: ü as solution to avoid race conditions . v Example_04: ü More OpenMP constructs. ü Synchronization . Summer School, June 25-28, 2018
OpenMP: simple syntax Most of the constructs in OpenMP are compiler directives or pragma : #include <omp.h> v For C/C++, the pragma take the form: #pragma omp parallel { #pragma omp construct [clause [clause]…] Block of a C/C++ code; } v For Fortran, the directives take one of the forms: use omp_lib !$OMP construct [clause [clause]…] !$omp parallel C$OMP construct [clause [clause]…] Block of Fortran code *$OMP construct [clause [clause]…] !$omp end parallel ü For C/C++ include the H eader file: #include <omp.h> ü For Fortran 90 use the module : use omp_lib ü For F77 include the Header file: include ‘omp_lib.h’ Summer School, June 25-28, 2018
Parallel regions and structured blocks Most of OpenMP constructs apply to structured blocks q Structured block: a block with one point of entry at the top and one point of exit at the bottom. q The only “ branches ” allowed are STOP statements in Fortran and exit() in C/C++ Stru St ruct ctured b block ock No Non structured block #pragma omp parallel if (go_now()) goto more; { #pragma omp parallel int id = omp_get_thread_num(); { int id = omp_get_thread_num(); more: res[id] = do_big_job (id); more: res[id] = do_big_job(id); if (conv (res[id]) goto done; if (conv (res[id]) goto more; goto more; } } printf (“All done\n”); done: if (!Really_done()) goto more; Summer School, June 25-28, 2018
Compile and run OpenMP program q Compile and enable OpenMP library: Ø GNU: add –fopenmp to C/C++ & Fortran compilers. Ø Intel compilers: add –openmp, -qopenmp (accepts also –fopenmp ) ü PGI Linux compilers: add –mp ü Windows: add /Qopenmp q Set the environment variable: OMP_NUM_THREADS ü OpenMP will spawns one thread per hardware thread . Ø $ export OMP_NUM_THREADS=value ( bash shell ) Ø $ setenv OMP_NUM_THREADS value (tcsh shell) value: number of threads [ For example 4 ] q Execute or run the program: Ø $ ./exec_program {options, parameters} or ./a.out Summer School, June 25-28, 2018
Submission script: SLURM #!/bin/bash Resources: #SBATCH --nodes=1 q nodes=1 #SBATCH --ntasks=1 q ntasks=1 #SBATCH --cpus-per-task=4 q cpus-per-task=1 to number of cores #SBATCH --mem-per-cpu=2500M per node #SBATCH --time=0-00:30 Ø Cedar: nodes with 32 or 48 cores # Load compiler module and/or your Ø Graham: nodes with 32 cores # application module. Ø Niagara: nodes with 40 cores cd $SLURM_SUBMIT_DIR export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK echo "Starting run at: `date`" ./your_openmp_program_exec {options and/or parameters} echo "Program finished with exit code $? at: `date`" Summer School, June 25-28, 2018
Submission script: PBS #!/bin/bash Resources: #PBS -S /bin/bash ü nodes=1 #PBS –l nodes=1:ppn=4 ü ppn=1 to maximum of N CPU (hardware) #PBS –l pmem=2000mb ü nodes=1:ppn=4 (for example). #PBS –l walltime=24:00:00 #PBS –M <your-valid-email> # On systems where $PBS_NUM_PPN is not #PBS –m abe available, one could use: # Load compiler module CORES=`/bin/awk 'END {print NR}' $PBS_NODEFILE` # and/or your application # module. export OMP_NUM_THREADS=$CORES cd $PBS_O_WORKDIR echo "Current working directory is `pwd`" export OMP_NUM_THREADS=$PBS_NUM_PPN ./your_openmp_exec < input_file > output_file echo "Program finished at: `date`" Summer School, June 25-28, 2018
Data environment Ø only a single instance of variables in shared memory. shared Ø all threads have read and write access to these variables. Ø Each thread allocates its own private copy of the data. private Ø These local copies only exist in parallel region. Ø Undefined when entering or exiting the parallel region. Ø variables are also declared to be private . firstprivate Ø additionally, get initialized with value of original variable. Ø declares variables as private . lastprivate Ø variables get value from the last iteration of the loop. C/C++: default ( shared | none ) Fortran: default ( private | firstprivate | shared | none ) It is highly recommended to use: default ( none ) Summer School, June 25-28, 2018
Recommend
More recommend