Correlating Performance, Code Location and Memory Access Harald - PowerPoint PPT Presentation

Correlating Performance, Code Location and Memory Access Harald Servat, Jesus Labarta, Judit Gimenez Scalable Tools Workshop - Lake Tahoe, Aug 2 nd 2016 1

Folding: instantaneous metric with minimum overhead Combine instrumentation and sampling – Instrumentation delimits regions (routines, loops, …) – Sampling exposes progression within a region Capture performance counters and call-stack references Initialization Iteration #1 Iteration #2 Iteration #3 Finalization Synth Iteration 2 2

Adding PEBS to Paraver traces Memory related data in the trace – PEBS events • Loads: address, cost in cycles, level providing the data • Stores: only address • Sampling frequency: – Possibly different rate for both loads and stores – One entry PEBS buffer. Signal Extrae on individual event. • Multiplexing: alternate periods sampling loads and stores 3 3

Memory object references Memory related data in the trace – Interception of mallocs and frees • Emit object id/call stack • With threshold on allocated size (potential unresolved objects) – Identification of memory object on sampled references • Static object from symbol table  Identify variable name • Dynamic objects from instantaneous memory map  Identify malloc where object was allocated Observation – Same source code  different per process address space • Randomization Linux security Different Different base addresses most frequent Insight buffers – Folding should be applied on a per process basis 4 4

Analytics Identification of coarse grain repetitive structure (prerequisite) – Computation bursts • Between calls to the runtime (MPI, OpenMP) • Clustering – Iteration (longer intervals with runtime calls) • Manually: – Extrae_event API call – Paraver analysis • Automatic: Using spectral analysis (WIP) • Clustering – Isolate different modes, eliminate outliers Folding generates: – Gnuplot – Paraver trace • All PEBS related events are projected and ordered into a representative instance of the repetitive region • The same Paraver configuration files can be applied 5 5

Looking at Lulesh: 1. Performance 27 MPI ranks in 2 nodes (2 sockets x 12 cores each node) MPI calls Useful duration Useful instructions 6 6

Looking at Lulesh: 1. Performance Histogram useful duration Process mapping Histogram clock frequency Histogram useful instructions 7 7

Looking at Lulesh: 1. Performance One iteration 4 tasks selected 8 8

Looking at Lulesh: 2. Code location Approximation based on call stack @ MPI calls Approximation based on folded call stack 9 9

Looking at Lulesh: 3. Memory access PEBS address 10 10

Looking at Lulesh: 3. Memory access PEBS address 11 11

Looking at Lulesh: 3. Memory access PEBS level providing the data LFB L2 L3 DRAM 12 12

Looking at Lulesh: 3. Memory access PEBS cost in cycles (avg.) 13 13

Looking at Lulesh: Comparing gnuplots Architecture impact Stalls distribution Task 21 Task 23 14 14

Conclusions Folding can provide low overhead detailed analysis on accesses to memory – Wide range of new metrics: access pattern, memory objects, memory level, cost in cycles,… Paraver provides huge flexibility combining and correlating the new data :) – Only required to implement new “paint as” punctual information How much far/close to reverse engineering? 15 15

Correlating Performance, Code Location and Memory Access Harald - PowerPoint PPT Presentation

Correlating Performance, Code Location and Memory Access Harald Servat, Jesus Labarta, Judit Gimenez Scalable Tools Workshop - Lake Tahoe, Aug 2 nd 2016 1 Folding: instantaneous metric with minimum overhead Combine instrumentation and sampling

Correlating GSM and 802.11 Hardware Identifiers LCDR Jeremy Martin, LT Danny Rhame, Dr. Robert

Holometer Holometer results and status in correlating twin 40m interferometers results and

Storage Efficient Hardware Prefetching using Delta Correlating Prediction Tables Marius Granns

Code Generation Machine code generation cs4713 1 Machine code generation machine Intermediate

{Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code}

HS-SPME GC/Q-TOF: Correlating Geographical Origin with Volatile Aroma Profiles Philip L. Wylie 1

Topology Discovery Correlating different network topology layers in heterogeneous environments

FIBER REINFORCEMENTS: CORRELATING PERMEABILITY AND LOCAL SPATIAL FIBROUS FEATURES S. Comas-Cardona

Stucco Situation & Threat Understanding by Correlating Contextual Observations John Gerth,

ENHANCEMENT OF ETCHING FACTOR OF COPPER CIRCUIT BY CORRELATING BETWEEN MICROSTRUCTURE AND PATTERN

Autonomous Ground Systems CROSS CORRELATING GROUND-LEVEL PANORAMAS WITH SATELLITE IMAGERY FOR

CORRELATING SPEECH PROCESSING IN DEEP LEARNING AND COMPUT ATIONAL NEUROSCIENCE Shefali Garg

Quantifying and Correlating Rhythm Formants in Speech Dafydd Gibbon Andrea Lee Bielefeld

Correlating Low-Level Events To Identify High-Level Bot Behaviors Liz Stinson Matt Fredrikson

Correlating Events with Time Series for Incident Diagnosis Ricardo Reimao Idea: Identifying

Correlating TTL data to network characteristics Final Talk BSc Informatics Till Wickenheiser

CS 61A Discussion 5 Iterators/Generators and Midterm Review Albert Xu Kaavya Shah Slides:

Dependence: Theory and Practice Introduction to loop dependence and loop transformation 1 The

Supervisor Meeting 7 Project Progress Project Status

Chapter 5: Unfolding Keshab K. Parhi Unf olding P arallel P rocessing 2-unfolded (1)

time Markov Models Mohmmadsadegh Mohagheghi And Behrang Chaboki Vali-e-Asr University of

Handling Unbounded Loops with ESBMC 1.20 Jeremy Morse, Lucas Cordeiro Denis Nicole, Bernd

Python: Recursive Functions Recursive Functions Recall factorial function: Iterative Algorithm

Lecture 14: Iterative Methods and Sparse Linear Algebra David Bindel 10 Mar 2010 Reminder:

Sambuz

Useful Links

Newsletter

Mail Us

Correlating Performance, Code Location and Memory Access Harald - PowerPoint PPT Presentation

Correlating Performance, Code Location and Memory Access Harald Servat, Jesus Labarta, Judit Gimenez Scalable Tools Workshop - Lake Tahoe, Aug 2 nd 2016 1 Folding: instantaneous metric with minimum overhead Combine instrumentation and sampling

Correlating GSM and 802.11 Hardware Identifiers LCDR Jeremy Martin, LT Danny Rhame, Dr. Robert

Holometer Holometer results and status in correlating twin 40m interferometers results and

Storage Efficient Hardware Prefetching using Delta Correlating Prediction Tables Marius Granns

Code Generation Machine code generation cs4713 1 Machine code generation machine Intermediate

{Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code}

HS-SPME GC/Q-TOF: Correlating Geographical Origin with Volatile Aroma Profiles Philip L. Wylie 1

Topology Discovery Correlating different network topology layers in heterogeneous environments

FIBER REINFORCEMENTS: CORRELATING PERMEABILITY AND LOCAL SPATIAL FIBROUS FEATURES S. Comas-Cardona

Stucco Situation &amp; Threat Understanding by Correlating Contextual Observations John Gerth,

ENHANCEMENT OF ETCHING FACTOR OF COPPER CIRCUIT BY CORRELATING BETWEEN MICROSTRUCTURE AND PATTERN

Autonomous Ground Systems CROSS CORRELATING GROUND-LEVEL PANORAMAS WITH SATELLITE IMAGERY FOR

CORRELATING SPEECH PROCESSING IN DEEP LEARNING AND COMPUT ATIONAL NEUROSCIENCE Shefali Garg

Quantifying and Correlating Rhythm Formants in Speech Dafydd Gibbon Andrea Lee Bielefeld

Correlating Low-Level Events To Identify High-Level Bot Behaviors Liz Stinson Matt Fredrikson

Correlating Events with Time Series for Incident Diagnosis Ricardo Reimao Idea: Identifying

Correlating TTL data to network characteristics Final Talk BSc Informatics Till Wickenheiser

CS 61A Discussion 5 Iterators/Generators and Midterm Review Albert Xu Kaavya Shah Slides:

Dependence: Theory and Practice Introduction to loop dependence and loop transformation 1 The

Supervisor Meeting 7 Project Progress Project Status

Chapter 5: Unfolding Keshab K. Parhi Unf olding P arallel P rocessing 2-unfolded (1)

time Markov Models Mohmmadsadegh Mohagheghi And Behrang Chaboki Vali-e-Asr University of

Handling Unbounded Loops with ESBMC 1.20 Jeremy Morse, Lucas Cordeiro Denis Nicole, Bernd

Python: Recursive Functions Recursive Functions Recall factorial function: Iterative Algorithm

Lecture 14: Iterative Methods and Sparse Linear Algebra David Bindel 10 Mar 2010 Reminder:

Sambuz

Useful Links

Newsletter

Mail Us

Stucco Situation & Threat Understanding by Correlating Contextual Observations John Gerth,