identifying opportunities for parallelization in the
play

Identifying opportunities for parallelization In the hotspots of - PowerPoint PPT Presentation

Identifying opportunities for parallelization In the hotspots of your code PARALLWARE SW DEVELOPMENT CYCLE Understanding the sequential code and profiling a. Analyze your code b. Focus on profiling and where to parallelize your code correctly


  1. Identifying opportunities for parallelization In the hotspots of your code

  2. PARALLWARE SW DEVELOPMENT CYCLE Understanding the sequential code and profiling a. Analyze your code b. Focus on profiling and where to parallelize your code correctly Identifying opportunities for parallelization c. Figuring out where the code is suitable for parallelization d. Often the hardest step! Introduce parallelism e. Decide how to implement the parallelism discovered in your code Test the correctness of your parallel implementation f. Compile & run the parallel versions of your code to check that the numerical result is correct Test the performance of your parallel implementation g. Run the parallel versions of your code to measure performance increase for real-world workloads Performance tuning h. Repeat steps 1-5 until you meet your performance requirements.... 2

  3. PARALLWARE SW DEVELOPMENT CYCLE Understanding the sequential code and profiling a. Analyze your code b. Focus on profiling and where to parallelize your code correctly Identifying opportunities for parallelization c. Figuring out where the code is suitable for parallelization d. Often the hardest step! Introduce parallelism e. Decide how to implement the parallelism discovered in your code Test the correctness of your parallel implementation f. Compile & run the parallel versions of your code to check that the numerical result is correct Test the performance of your parallel implementation g. Run the parallel versions of your code to measure performance increase for real-world workloads Performance tuning h. Repeat steps 1-5 until you meet your performance requirements.... 3

  4. PARALLWARE SW DEVELOPMENT CYCLE Understanding the sequential code and profiling https://www.exascaleproject.org/event/bssw/ Identifying opportunities for parallelization K. Asanovic et al. 2009. A view of the parallel computing landscape. Commun. ACM 52, 10 (October 2009), 56-67. DOI: https://doi.org/10.1145/1562764.1562783 Why are dependences difficult to use in practice? 4

  5. PARALLWARE SW DEVELOPMENT CYCLE Understanding the sequential code and profiling Identifying opportunities for parallelization FLOW dependences 01 void atmux(double* restrict y, … , int n) 08 { OUTPUT dependences 09 for(int t = 0; t < n; t++) 10 y[t] = 0; ANTI dependences 11 12 for(int i = 0; i < n; i++) { 13 for (int k = row_ptr[i]; k < row_ptr[i+1]; k++) { 14 y[col_ind[k]] += x[i] * val[k]; 15 } 16 } 17 } $ icc atmux.c -std=c99 -c -O3 -xAVX -Wall -vec-report3 -opt-report3 -restrict -parallel -openmp -guide icc (ICC) 13.1.1 20130313 ... HPO THREADIZER REPORT (atmux) LOG OPENED ON Fri Sep 25 18:04:15 2015 HPO Threadizer Report (atmux) atmux.c(9:2-9:2):PAR:atmux: loop was not parallelized : existence of parallel dependence atmux.c(10:3-10:3):PAR:atmux: potential ANTI dependence on y. potential FLOW dependence on y. atmux.c(9:2-9:2):PAR:atmux: LOOP WAS AUTO-PARALLELIZED atmux.c(12:2-12:2):PAR:atmux: loop was not parallelized : existence of parallel dependence atmux.c(13:3-13:3):PAR:atmux: loop was not parallelized : existence of parallel dependence 5 ...

  6. PARALLWARE SW DEVELOPMENT CYCLE Understanding the sequential code and profiling Identifying opportunities for parallelization MISSING TOWER “CODE” FOR “IMPLEMENTATION GAP” 6

  7. PARALLWARE SW DEVELOPMENT CYCLE Understanding the sequential code and profiling Identifying opportunities for parallelization Source code: - Outputs: xi, yi, zi - Temporaries: dxc, dyc, dzc, m, f - Read-only: xx1, yy1, zz1, mass1, fsrrmax2, ... 7

  8. Making the most of your opportunities to parallelize

  9. The Parallware Analysis will help you to ● identify regions that are opportunities for parallelization: ○

  10. #pragma omp parallel for … \ shared(y) Parallel forall, in the outer loop for (i = 0; i<n; i++) { y[i] = 0; for (k=ia[i]; k<ia[i+1-1]; i++) Parallel scalar reduction, in the { inner loop y[i] = y[i] + a[j]*x[ja[k]]; } } #pragma omp parallel for … \ private(t) Parallel forall, in the outer loop for (i = 0; i<n; i++) { t = 0; Parallel scalar reduction, in the for (k=ia[i]; k<ia[i+1-1]; i++) { inner loop t = t + a[k]*x[ja[k]]; } y[i] = t; }

  11. Parallel forall for (h = 0; h<Adim; h++) { hist[h] = 0; } Parallel sparse reduction for (h = 0; h<fDim; h++) { hist[f[[h]] = hist[f[h]] + 1; } for (h = 1; h<Adim; h++) { Parallel recurrence hist[h] = hist[h] + hist[h-1]; In general not parallelizable, but in } many situations can be parallelized with significant synchronization overhead.

Recommend


More recommend