martin schulz
play

Martin Schulz Lawrence Livermore National Laboratory VAPLS 2013, - PowerPoint PPT Presentation

Alfredo Gimenez University of California at Davis Martin Schulz Lawrence Livermore National Laboratory VAPLS 2013, Atlanta October 14 th , 2013 LLNL-PRES-xxxxxx Single view on data is insufficient Different patterns emerge in different


  1. Alfredo Gimenez University of California at Davis Martin Schulz Lawrence Livermore National Laboratory VAPLS 2013, Atlanta  October 14 th , 2013 LLNL-PRES-xxxxxx

  2. Single view on data is insufficient • Different patterns emerge in different domains • Patterns help identify performance problems Map data from one domain to one of the other domains • Comparable data Physical • Enable correlation Simulation Data • Understand interactions • Access to visualization techniques Communication Physical layout Application Domain Patterns of hardware • Intuitive to the application scientist • Can employ similar sci viz techniques Working in the Application Domain Alfredo Gimenez and Martin Schulz

  3. L2CM Temperature Floating Point Example: 256 core run of a % #" % #" % ! " % ! " CFD application, MIRANDA $+" $+" $*" $*" $) " $) " • Floating point operations $( " $( " $' " $' " $& " $& " $% " $% " $$" $$" Simple step: $#" $#" $! " $! " #+" #+" • Map floating point ops onto #*" #*" #) " #) " #( " #( " 3 4 5 667 89 :" the application domain #' " #' " #& " #& " #% " #% " • Similar L2 cache misses #$" #$" ##" ##" #! " #! " +" +" " 3 >5 ?@ 2 9 :" *" *" Apparent correlations ) " ) " ( " ( " ' " ' " • Explains performance & " & " % " % " $" $" • Application-specific #" #" ! " ! " ! " #" $" % " & " ' " ( " ) " ! " #" $" % " & " ' " ( " ) " bottlenecks Working in the Application Domain Alfredo Gimenez and Martin Schulz

  4. Aluminum Velocity FP Ops L1CM Working in the Application Domain Time BranchMiss Alfredo Gimenez and Martin Schulz

  5. App Domain  Observation: one core per node consistently creates more L1 misses • Caused by the execution of collective MPI operations • Shows the need for different perspectives to disambiguate causes • Feature detection and correlation can automate this process L1CM HW Domain: 16 nodes with 4x4 cores Working in the Application Domain Alfredo Gimenez and Martin Schulz

  6. Same data with linear color map L1 Cache Misses FP Operations Working in the Application Domain Alfredo Gimenez and Martin Schulz

  7. Same data with linear color map L1 Cache Misses with MPI worker filtered FP Operations Working in the Application Domain Alfredo Gimenez and Martin Schulz

  8. Same data with linear color map L1 Cache Misses with MPI worker filtered FP Operations L1 Misses per FP operation: Proxy for efficiency Working in the Application Domain Alfredo Gimenez and Martin Schulz

  9. Previous example has coarse granularity • MIRANDA example uses per-core performance data • Each core is responsible for a portion of the application domain • Need finer-grained data and more general mapping techniques Question: can we get access to finer grained data? • Ideal: per data point measurements — Hard to track in hardware (in all details) — Hardware simulation has high overhead and is most probably inaccurate • Approach: exploit new hardware sampling techniques and develop mechanisms to provide application domain mappings Working in the Application Domain Alfredo Gimenez and Martin Schulz

  10. Calculation step: 616 Target application: LULESH • Shock Hydrodynamics challenge problem • Unstructured hex mesh • Implemented in a wide range of models PEBS counters • Sampling of memory loads Calculation step: 2051 • Load address, time to load, cache hierarchy • Enables mapping back to data structure Experiments • OpenMP version of LULESH • 4 Core Intel IvyBridge Working in the Application Domain Alfredo Gimenez and Martin Schulz

  11. Total Cycles Compulsory cache misses at first element Working in the Application Domain Alfredo Gimenez and Martin Schulz

  12. Cache misses Total Cycles due to thread-level parallelism Working in the Application Domain Alfredo Gimenez and Martin Schulz

  13. Performance data can be measured in many domains • Need to correlate domains • Need visualization techniques in each domain Application domain • Intuitive for the user • Exploit existing tools Challenges moving forward • Automatic analysis within different domains (e.g. feature detection) • Emerging domains — multivariate, infoviz Working in the Application Domain Alfredo Gimenez and Martin Schulz

Recommend


More recommend