PlaFRIM Court` es L., Ru´ e F. Introduction General PlaFRIM Exploration The Roofline model Performance Methodology Court` es L., Ru´ e F. November 8, 2019
Table of contents PlaFRIM Court` es L., Ru´ e F. Introduction Introduction 1 General Exploration The Roofline General Exploration 2 model Performance Methodology The Roofline model 3 Performance Methodology 4
The hard way PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration printf(”%i”,time(NULL)); The Roofline model Performance Methodology
The hard way PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration printf(”%i”,time(NULL)); The Roofline model Performance � Methodology
The optimization objectives PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration Improve the speed of execution The Roofline model Performance Methodology
The optimization objectives PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration Improve the speed of execution The Roofline model Reduce memory footprint Performance Methodology
The optimization objectives PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration Improve the speed of execution The Roofline model Reduce memory footprint Performance Reduce energy consumption Methodology
The optimization objectives PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration Improve the speed of execution The Roofline model Reduce memory footprint Performance Reduce energy consumption Methodology Consume fewer resources
The Process PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration Identify bottlenecks (Profiling) The Roofline model Choose better algorithms or improve implementation Performance Methodology (Optimization)
How profilers do it PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration Call stack sampling The Roofline model Performance Methodology
How profilers do it PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration Call stack sampling The Roofline model Optional function call instrumentation Performance Methodology
How profilers do it PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration Call stack sampling The Roofline model Optional function call instrumentation Performance Hardware simulation Methodology
How profilers do it PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration Call stack sampling The Roofline model Optional function call instrumentation Performance Hardware simulation Methodology Hardware counter
Memory PlaFRIM Court` es L., Ru´ e F. Introduction Understanding memory locality General Exploration The Roofline model Performance Methodology
General Exploration PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Optimization and granularity Performance Methodology
The easiest way PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration Time command The Roofline model Real, user & sys time Performance Best way to evaluate scalability Methodology
The easiest way PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration Time command The Roofline model Real, user & sys time Performance Best way to evaluate scalability Methodology Accuracy of the evaluation?
Profiler PlaFRIM Court` es L., Ru´ e F. Introduction General static instrumentation - gprof Exploration The Roofline Sampling technique model no instrumentation needed Performance Methodology 2 types of view (flat profile and call graph)
Profiler PlaFRIM Court` es L., Ru´ e F. Introduction General static instrumentation - gprof Exploration The Roofline Sampling technique model no instrumentation needed Performance Methodology 2 types of view (flat profile and call graph) Annotated code
Profiler PlaFRIM static instrumentation - gprof Court` es L., Ru´ e F. use the -pg option to compile Introduction evaluate the output : gprof ’binary name’ gmon.out General Exploration The Roofline model Performance Methodology
Profiler PlaFRIM static instrumentation - gprof Court` es L., use the -pg option to compile Ru´ e F. evaluate the output : gprof ’binary name’ gmon.out Introduction General Exploration The Roofline model Performance Methodology
Profiler PlaFRIM static instrumentation - gprof Court` es L., Ru´ e F. gprof -A -l ’binary name’ gmon.out Introduction General Exploration The Roofline model Performance Methodology
Profiler PlaFRIM static instrumentation - gprof Court` es L., Ru´ e F. gprof -A -l ’binary name’ gmon.out Introduction General Exploration The Roofline model Performance Methodology
Profiler PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model and for memory usage ? Performance Methodology
Profiler PlaFRIM Court` es L., Ru´ e F. Introduction Dynamic instrumentation - valgrind General Exploration Done at execution time The Roofline model no instrumentation needed Performance different tools for differents analysis Methodology massif - heap profiler callgrind - call history among functions cachegrind - interactions with machine cache
Profiler PlaFRIM Dynamic instrumentation - valgrind Court` es L., Ru´ e F. valgrind –tool=massif –time-unit=ms ./bin/wave0 5 5 5 Introduction 100 100 100 0.0005 50 General Exploration ms print massif.out.%pid The Roofline model Performance Methodology
Profiler PlaFRIM Dynamic instrumentation - valgrind Court` es L., Ru´ e F. valgrind –tool=massif –time-unit=ms ./bin/wave0 5 5 5 Introduction 100 100 100 0.0005 50 General ms print massif.out.%pid Exploration The Roofline model Performance Methodology
Profiler PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model what kind of expertise ? Performance Methodology
Profiler PlaFRIM Court` es L., what kind of image of your program do you need ? Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology
Profiler PlaFRIM Court` es L., what kind of image of your program do you need ? Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology
The Roofline model PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model roofline Performance Methodology
The model PlaFRIM cache aware roofline model Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology Figure: IBM - ICSC 2014, Shanghai, China
The model PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology Figure: PICSAR Project
The model PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology Figure: Thomas Jefferson National Accelerator Facility
The model PlaFRIM cache aware roofline model Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology Figure: IBM - ICSC 2014, Shanghai, China
The model PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline How to construct this model ? model Performance Methodology
The model PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline How to construct this model ? model Performance How to evaluate your Arithmetic Intensity ? Methodology
Roofline evaluation PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model ... Performance Methodology
Roofline evaluation PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model evaluate the performance you can achieve Performance Methodology
PlaFRIM Court` es L., Ru´ e F. Performance achievement Introduction Understanding memory locality General Exploration The Roofline model Performance Methodology Figure: Memory Bound Figure: Compute Bound
Intel Advisor tool PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model one tool to do that ... Performance Methodology
Intel Advisor tool PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology Figure: The 3D stencil: its memory access pattern (a) and the data points it uses (b). - Raul de la Cruz, BSC
Intel Advisor tool PlaFRIM Court` es L., Ru´ e F. Introduction General Exploration The Roofline model Performance Methodology Figure: Stencil 1 thread - roofline
Intel Advisor tool PlaFRIM Court` es L., Ru´ e F. Introduction General module load compiler/gcc/9.1.0 compiler/intel/2019 update4 Exploration intel/vtune-advisor The Roofline model advixe-cl -collect roofline –project-dir=wave0 –ignore-checksums Performance ./bin/wave0 5 5 5 100 100 100 0.0005 500 Methodology advixe-gui
Recommend
More recommend