Performance metrics How is my parallel code performing and scaling?

Performance metrics • A typical program has two categories of components - Inherently sequential sections: can’t be run in parallel - Potentially parallel sections ( ) ) = T N ,1 ( S N , P • Speed up ( ) ( ) < P T N , P S N , P - typically ( ) ( ) ) = S N , P T N ,1 ( • Parallel efficiency = E N , P ( ) < 1 ( ) E N , P - typically P P T N , P ( ) ( ) = T best N • Serial efficiency ( ) <= 1 E N E N - typically ( ) T N ,1 where N is the size of the problem and P the number of processors 2

Scaling • Scaling is how the performance of a parallel application changes as the number of processors is increased • There are two different types of scaling: - Strong Scaling – total problem size stays the same as the number of processors increases - Weak Scaling – the problem size increases at the same rate as the number of processors, keeping the amount of work per processor the same • Strong scaling is generally more useful and more difficult to achieve than weak scaling 3

Strong scaling Speed-up vs No of processors 300 250 200 Speed-up linear 150 actual 100 50 0 0 50 100 150 200 250 300 No of processors 4

Weak scaling 20 18 16 14 12 Actual 10 Runtime (s) Ideal 8 6 4 2 0 1 n No. of processors 5

The serial section of code “The performance improvement to be gained by parallelisation is limited by the proportion of the code which is serial” Gene Amdahl, 1967 6

Amdahl’s law a • A fraction, , is completely serial ( ) T N ,1 ( ) ) + 1 - a ( ) = a T N ,1 ( T N , P • Parallel runtime P - Assuming parallel part is 100% efficient ( ) ) = T N ,1 P ( = • Parallel speedup S N , P ( ) ( ) a P + 1 - a T N , P • We are fundamentally limited by the serial fraction a = 0 - For , S = P as expected (i.e. efficiency = 100%) 1/ a - Otherwise, speedup limited by for any P a = 0.1 • For ; 1/0.1 = 10 therefore 10 times maximum speed up a = 0.1 • For ; S(N, 16) = 6.4, S(N, 1024) = 9.9 7

Gustafson’s Law • We need larger problems for larger numbers of CPUs • Whilst we are still limited by the serial fraction, it becomes less important 8

Utilising Large Parallel Machines • Assume parallel part is O(N), serial part is O(1) ( ) ( ) + T parallel N , P ( ) - time = T serial N , P T N , P ( ) T 1,1 ( ) ) + 1 - a ( = a T 1,1 P ( ) ( ) N = a + 1 - a ) = T N ,1 ( S N , P - speedup ( ) ) N T N , P ( a + 1 - a P N = P • Scale problem size with CPUs, i.e. set (weak scaling) ( ) = a + 1 - a ( ) P S P , P - speedup ) = a ( ( ) P + 1 - a E P , P - efficiency 9

Gustafson’s Law • If you can increase the amount of work done by each process/task then the serial component will not dominate - Increase the problem size to maintain scaling - This can be in terms of adding extra complexity or increasing the overall problem size. ( ) = P - a P - 1 ( ) S N * P , P a - Due to the scaling of N, effectively the serial fraction becomes P a = 0.1 • For instance, ( ) = 14.5 S 16 N ,16 ( ) = 921.7 S 1024 N ,1024 10

Analogy: Flying London to New York 11

Buckingham Palace to Empire State • By Jumbo Jet - distance: 5600 km; speed: 700 kph - time: 8 hours ? • No! - 1 hour by tube to Heathrow + 1 hour for check in etc. - 1 hour immigration + 1 hour taxi downtown - fixed overhead of 4 hours; total journey time: 4 + 8 = 12 hours • Triple the flight speed with Concorde to 2100 kph - total journey time = 4 hours + 2 hours 40 mins = 6.7 hours - speedup of 1.8 not 3.0 • Amdahl’s law! - a = 4/12 = 0.33; max speedup = 3 (i.e. 4 hours) 12

Flying London to Sydney 13

Buckingham Palace to Sydney Opera • By Jumbo Jet - distance: 16800 km; speed: 700 kph; flight time; 24 hours - serial overhead stays the same: total time: 4 + 24 = 28 hours • Triple the flight speed - total time = 4 hours + 8 hours = 12 hours - speedup = 2.3 (as opposed to 1.8 for New York) • Gustafson’s law! - bigger problems scale better - increase both distance (i.e. N ) and max speed (i.e. P ) by three - maintain same balance: 4 “serial” + 8 “parallel” 14

Plotting • Think carefully whenever you plot data - what am I trying to show with the graph? - is it easy to interpret? - can it be interpreted quantitatively? • Default plotting options are rarely what you want - default colours can be hard to read (e.g. yellow on white) - default axis limits may not be sensible - ... • Test data - MPI version of traffic model on multiple nodes of ARCHER 15

Hard to interpret small N data here 700 600 500 Time (seconds) 400 Large N 300 Small N 200 100 0 0 50 100 150 200 250 Processes 16

log/log can make trends in data too similar 1000 100 Time (seconds) Large N Small N 10 1 16 32 64 128 256 512 Processes 17

Normalised data easier to compare • use single-node (24-core) performance as baseline here 6 5 4 Speedup Large N 3 Small N 2 1 0 0 50 100 Processes 150 200 250 18

Efficiency plots can be useful too 1.2 1 0.8 Parallel Efficiency 0.6 Large N Small N 0.4 0.2 0 0 50 100 150 200 250 Processes 19

log/linear useful if many points at small P 1.2 1 0.8 Parallel Efficiency 0.6 Large N Small N 0.4 0.2 0 16 32 64 128 256 Processes 20

Don’t just accept the default options • In this bar chart the x- axis doesn’t have a meaningful scale 6 5 4 Speedup 3 2 1 0 1 2 3 4 8 Nodes 21

Summary • A variety of considerations when parallelising code - serial sections - communications overheads - load balance - ... • Scaling is important - the better a code scales the larger machine it can take advantage of • Metrics exist to give you an indication of how well your code performs and scales - important to plot them appropriately 22

Performance metrics How is my parallel code performing and scaling? - PowerPoint PPT Presentation

Performance metrics How is my parallel code performing and scaling? Performance metrics A typical program has two categories of components - Inherently sequential sections: cant be run in parallel - Potentially parallel sections ( ) ) = T

Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics

What we learned from Community Metrics Agenda Why are metrics used? How metrics are used

AGENCY OPERATIONS METRICS The Metrics of Me The Metrics of Me x 159 13,006 5 days old books

Proposal Metrics Dashboard What Gets Measured Gets Done Topics Why Keep Metrics? What

Software Metrics Alex Boughton Executive Summary What are software metrics? Why are

Astheno-Khler and strong KT General results metrics Bismut connection Definition of strong KT

NDCs and metrics Andrei Marcu , Director, ERCST 1 NDCs and metrics Main issues: - Which metrics

Metrics are Pivotal A NATIONAL FARM TO INSTITUTION METRICS COLLABORATIVE WEBINAR Local

Metrics and Estimation Rahul Premraj + Andreas Zeller 1 Metrics Quantitative measures that

Software Metrics And I gnominy Software Metrics And I gnominy Software Metrics And I gnominy

Software Metrics Chapter 4 1 SW Metrics SW process and product metrics are quantitative

Software Metrics Overview SE 350 Software Process & Product Quality Lecture Objectives

CSA Z260 Pipeline Safety Metrics CSA Z260 - Pipeline Safety Metrics Provide a suite of

Metrics and Performance Metrics and Performance Management Management W. Post W. Post

TXDOT TRAFFIC MANAGEMENT CENTER (TMC) PERFORMANCE METRICS Evolution by Performance Metrics

Opsim and MAF Metrics Lynne Jones Opsim and MAF metrics Call for white papers on LSST survey

The Easy (and Free!) way to implement drill-through

Comparison of Approaches for Querying Chemical Compounds Vojtch pek, Irena Holubov,

Administrative - A1 is due Today (midnight). You can use up to 3 late days - A2 will be up this

Ternary system (Fo-An-SiO 2 ) with intermediate compound (peritectic) Phase rule: At points c

DataLab: Introducing Software Engineering Thinking into Data Science Education at Scale Yang Zhang

CS31 Discussion 1E Spring 17: week 10 TA: Bo-Jhang Ho bojhang@cs.ucla.edu Credit to former

A Survey of Reinforcement Learning Informed by Natural Language Luketina et al., IJCAI 2019

The Discipline of Prayer Prayer as a Discipline Prayer Changes Things Then Jesus told his