Measuring the performance improvements as you parallelize and optimize your software
0.25 s -O2 -march=native -mtune=native -ftree-vectorize 0.27 s -O2 -march=native -mtune=native -ftree-vectorize Multiplication of 0.359 s 0.7x -O2 -march=native -mtune=native -ftree-vectorize -fopenmp OpenMP Transposed Sparse pragma atomic 0.3470 s 0.78x Matrix by Vector -O2 -march=native -mtune=native -ftree-vectorize -fopenmp managing race 0.1172 s 2.13x OpenMP -O2 -march=native -mtune=native -ftree-vectorize -fopenmp conditions privatization with... of arrays 0.1177 s 2.29x -O2 -march=native -mtune=native -ftree-vectorize -fopenmp 2
-Evaluate the performance of your serial, parallel and optimized code ○ ○ ○ -Tuning and optimization: ○ ○ ○ 3
Speedup -Avoiding ‘too much parallel’ Ideal ○ Real ○ Threads Processes Cores 4
-Serial and parallel performance -Take regular parallel performance measurements as you progress ○ ○ -Understand your performance limits ○ ○ Use Speedup and Efficiency measures 5
-Measure the relative performance between serial and parallel code. -Improvement in speed of execution of a task executed on the same architecture but with different resources Speedup, S, for problem size N on P processes/threads/cores T(N,1) S(N, P) = T(N,P) -Tips: ○ ○ 6 ○
-Measure the efficiency of the parallel code. -100% efficiency = using double the resources, but taking half the runtime (i.e. the same resources are used in total) Parallel efficiency, E, for problem size N on P processes/threads/cores S(N,P) T(N,1) E(N, P) = = P (P*T(N,P)) 7
- We can never parallelize every single part of code (e.g. initialising and distributing the data). - A fraction of the runtime , α , is completely serial, limiting the parallel runtime even with 100% efficiency of the parallel fraction on P processors/threads/cores. For runtime T, using problem size N for P processes Known as ‘Amdahl’s Law’ (1- α )T(N,1) T= α T(N,1) + P - Limited by the serial fraction: ○ α ○ α ■ α 8
Gene Amdahl, 1967 Serial α =10% Parallel 5x 3.7x α =5% 2x α =sequential (1- α )=90% portion 1x α =10% 1 2 4 8 Processors α =25% α =50% Serial α =50% Parallel (1- α )=50% 1.8x 1.6x 1.33x Source: Wikipedia 1x 1 2 4 8 Processors 9
Use the spreadsheet assigned to your team to record timings ● ○ ■ ■ ■ ○ Use this regularly, particularly once you start trying multiple parallelization ● methods and tuning your implementations. Try and gain an understanding of your serial fraction, α ● ○ ○ α 10
Recommend
More recommend