CS 4230: Parallel Programming Lecture 4: OpenMP Open - PowerPoint PPT Presentation

CS 4230: Parallel Programming Lecture 4: OpenMP Open Multi-Processing January 23, 2017 01/23/2017 CS4230 1

Outline • OpenMP – another approach for thread parallel programming • Fork-Join execution model • OpenMP constructs – syntax and semantics – Work sharing – Thread scheduling – Data sharing – Reduction – Synchronization • ‘ count_primes ’ hands -on! 01/23/2017 CS4230 2

OpenMP: Common Thread-Level Programming Approach in HPC • Portable across shared-memory architectures • Incremental parallelization – Parallelize individual computations in a program while leaving the rest of the program sequential • Compiler based – Compiler generates thread program and synchronization • Extensions to existing programming languages (Fortran, C and C++) – mainly by directives – a few library routines See http://www.openmp.org 01/23/2017 CS4230 3

Fork-Join Model 1/23/2017 CS 4230

OpenMP HelloWorld #include <omp.h> #include <stdio.h> int main (int argc, char *argv[]) { #pragma omp parallel { printf("Hello World from Thread %d!\n ”, omp_get_thread_num()); } return 0; } Compiling for OpenMP gcc: -fopenmp, icc: -openmp, pgcc: -mp , … 01/23/2017 CS4230 5

Number of threads • if clause • NUM_THREADS clause • omp_set_num_threads() • OMP_NUM_THREADS • Default 01/23/2017 CS4230 6

OpenMP constructs • Compiler directives (44) #pragma omp parallel [clause] • Runtime library routines (35) #include <omp.h> int omp_get_num_threads(void) int omp_get_thread_num(void) • Environment variable (13) export OMP_NUM_THREADS=x 01/23/2017 CS4230 7

Work sharing • divides the execution of the enclosed code region among multiple threads – for shares iterations of a loop across the team of threads #pragma omp parallel for [clause] – Also sections and single (see [1]) 01/23/2017 CS4230 8

Work sharing - for #include <omp.h> int main (int argc, char *argv[]) { int i, n=10; #pragma omp parallel for { for(i=0;i<n;i++) printf("Hello World!\ n”); } return 0; } 01/23/2017 CS4230 9

Thread scheduling • Static: Loop iterations are divided into pieces of size chunk and then statically assigned to threads. – schedule(static [,chunk]) • Dynamic: Loop iterations are divided into pieces of size chunk , and dynamically scheduled among the threads – schedule(dynamic [,chunk]) • More options, – guided, runtime, auto 01/23/2017 CS4230 10

Data sharing/ Data scope • shared variables are shared among threads • private variables are private to a thread • Default is shared • Loop index is private , nested loops #pragma omp parallel for private(list) shared(list) • Can be used with any work sharing clause • Also firstprivate , lastprivate , default , copyin , … (see [1]) 01/23/2017 CS4230 11

Reduction • The reduction clause performs a reduction on the variables that appear in its list • A private copy for each list variable is created for each thread. • At the end of the reduction, the reduction variable is applied to all private copies of the shared variable, and the final result is written to the global shared variable. reduction (operator: list) 01/23/2017 CS4230 12

Reduction #include <omp.h> #include <stdio.h> #include <stdlib.h> int main (int argc, char *argv[]) { int i,n=1000; float a[1000], b[1000], sum; for (i=0; i<n; i++) a[i] = b[i] = i * 1.0; sum = 0.0; #pragma omp parallel for reduction(+:sum) for (i=0; i<n; i++) sum = sum + (a[i] * b[i]); printf("Sum = %f\n",sum); } Source: http://computing.llnl.gov/tutorials/openMP/samples/C/omp_reduction.c 01/23/2017 CS4230 13

OpenMP Synchronization • Recall ‘barrier’ from pthreads – int pthread_barrier_wait(pthread_barrier_t *barrier); • Implicit barrier – At the end of parallel regions – Barrier can be removed with nowait clause #pragma omp parallel for nowait • Explicit synchronization – single, critical, atomic, ordered, flush 01/23/2017 CS4230 14

Exercise • See prime_sequential.c • How to improve? Write a thread parallel version using what we discussed • Observe scalability with the #of threads Threads Time (s) Speedup 01/23/2017 CS4230 15

Summary • What’s good? – Small changes are required to produce a parallel program from sequential (parallel formulation) – Avoid having to express low-level mapping details – Portable and scalable, correct on 1 processor • What is missing? – Not completely natural if want to write a parallel code from scratch – Not always possible to express certain common parallel constructs – Locality management – Control of performance 01/23/2017 CS4230 16

References [1] Blaise Barney, Lawrence Livermore National Laboratory https://computing.llnl.gov/tutorials/openMP [2] XSEDE HPC Workshop: OpenMP https://www.psc.edu/index.php/136- users/training/2496-xsede-hpc-workshop- january-17-2017-openmp 01/23/2017 CS4230 17

CS 4230: Parallel Programming Lecture 4: OpenMP Open - PowerPoint PPT Presentation

CS 4230: Parallel Programming Lecture 4: OpenMP Open Multi-Processing January 23, 2017 01/23/2017 CS4230 1 Outline OpenMP another approach for thread parallel programming Fork-Join execution model OpenMP constructs syntax

Introduction to OpenMP Lecture 2: OpenMP fundamentals Overview Basic Concepts in OpenMP

Recommended Reading A Brief Introduction to OpenMP OpenMP FAQ http://openmp.org/openmp-faq.html

Threaded Programming Lecture 2: OpenMP fundamentals Overview Basic Concepts in OpenMP

Parallel Programming with OpenMP CS240A, T. Yang 1 A Programmer s View of OpenMP What

OpenMP Paolo Burgio paolo.burgio@unimore.it A history of OpenMP 1997 OpenMP for

Parallel Programming using OpenMP Qin Liu The Chinese University of Hong Kong 1 Overview Why

Shared Memory Programming Introduction to OpenMP Overview Shared memory systems Basic

CS 4230: Parallel Programming Lecture 4a: HPC Clusters January 23, 2017 01/23/2017 CS4230 1

Introduction to OpenMP ! Introduction to parallel computing ! Classification of parallel

OpenMP 4.0 and Beyond! Aidan Chalk, Hartree Centre, STFC What is OpenMP? OpenMP is an API

Advanced OpenMP Lecture 11: OpenMP 4.0 OpenMP 4.0 Version 4.0 was released in July 2013

Shared Memory Programming with OpenMP Lecture 3: Parallel Regions Parallel region directive

SHARED MEMORY PROGRAMMING WITH OPENMP Lecture 9: OpenMP Performance 2 A common scenario.....

Programming with OpenMP CS240A, T. Yang, 2013 Modified from Demmel/Yelicks and Mary Halls

OpenMP: a shared-memory parallel programming model Eduard Ayguad Computer Sciences Department

Introduction to OpenMP Lecture 6: Further topics in OpenMP Nested parallelism Unlike most

#GPI2020 omfif.org/gpi Download the digital copy and interactive databank now #GPI2020

for Web Applications 01 Introduction to Perl Alexandros Labrinidis University of Pittsburgh

Chapter Eight: Regular Expression Applications Formal Language, chapter 8, slide 1 1 We have

ASTR 1120 Dark matter halo for galaxies REVIEW General Astronomy: Dark matter extends Stars

Relations Relations By the end of this part of the course the student should understand and

Apache Calcite for Enabling SQL Access to NoSQL Data Systems such as Apache Geode Christian

Web-Based Tools for Policy Change: The Environmental Nutrition and Activity (ENACT) Local

A Machine Learning Based Approach to Mobile Network Analysis Zengwen Yuan 1 , Yuanjie Li 1 ,