computing with allinea
play

Computing with Allinea 06 Nov 2015 VSB, Ostrava Florent Lebeau - PowerPoint PPT Presentation

Towards Efficiency Computing with Allinea 06 Nov 2015 VSB, Ostrava Florent Lebeau flebeau@allinea.com Agenda 09:30-10:00 Registration 10:00-10:15 Introduction to Allinea tools 10:15-11:45 Getting started with Allinea Forge for profiling


  1. Towards Efficiency Computing with Allinea 06 Nov 2015 VSB, Ostrava Florent Lebeau flebeau@allinea.com

  2. Agenda 09:30-10:00 Registration 10:00-10:15 Introduction to Allinea tools 10:15-11:45 Getting started with Allinea Forge for profiling 11:45-13:00 Lunch break 13:00-14:15 Getting started with Allinea Forge for debugging 14:15-14:45 Coffee break 14:45-16:00 Allinea tools and Intel Xeon Phi coprocessors 16:00-16:30 Questions and wrap-up

  3. Introduction to Allinea Tools

  4. Allinea : an expanding company • HPC tools company since 2002 – Leading in HPC software tools market worldwide – Global customer base • Helping the HPC community design the best applications – Unrivaled productive and easy-to-use development environment … – … To help reach the highest level of performance and scalability • Helping HPC production make the most of their clusters – Unique solutions to reduce HPC systems operating costs – Innovative approach to facilitate cutting-edge challenges resolution

  5. Improve cluster efficiency • “Optimization” is not always synonym of “efficiency” – Cluster productivity or cluster usage • Possible efficiency needs during production – Define and enforce best practices (scale, parameters…) – Provision and validate cluster upgrades and changes – Detect & resolve hardware or software faults impacting performance • Effortless one-touch reports with allinea – Generates explicit and readable reports with metrics and explanations – Understand optimized HPC applications effortlessly

  6. Better runs, quickly No source code needed Less than 5% runtime overhead Fully scalable Run regularly – or in regression tests Explicit and usable output

  7. Need to dive into the code ? • Allinea Forge: a modern integrated environment for HPC developers ‒ Rebranding of Allinea Unified (Allinea DDT + Allinea MAP) • Supporting the lifecycle of application development and improvement ‒ Productively debug code with Allinea DDT ‒ Enhance application performance with Allinea MAP • Designed for productivity ‒ Consistent easy to use tools ‒ Fewer failed jobs • Available to you

  8. Allinea Forge One Unified Solution Use Allinea MAP to find a bottleneck Increasing memory usage ? Memory leak ! Workload imbalance ? Possible partitioner bug ! Flick to Allinea DDT Common interface and settings files Observe and debug your code step by step

  9. Allinea MAP Performance made easy Low overhead measurement • Accurate, non-intrusive application performance profiling • Seamless – no recompilation or relinking required Easy to use • Source code viewer pinpoints bottleneck locations • Zoom in to explore iterations, functions and loops Deep • Measures CPU, communication, I/O and memory to identify problem causes • Identifies vectorization and cache performance

  10. Allinea DDT helps to understand • Run Who had a rogue behaviour ? with Allinea tools ‒ Merges stacks from processes and threads Identify a problem • Where did it happen? Gather info ‒ Allinea DDT leaps to source automatically Who, Where, How, Why • How did it happen? Fix ‒ Detailed error message given to the user ‒ Some faults evident instantly from source • Why did it happen? ‒ Unique “Smart Highlighting” ‒ Sparklines comparing data across processes

  11. Latest Changes

  12. Reverse connect: the end of template files

  13. Horizontal & vertical zoom

  14. Energy Metrics: Quantify gains immediately GPU RUN CPU RUN

  15. Allinea Forge and Performance Reports 5.1 Performance DDT MAP Reports Energy metrics Reverse connect Energy metrics (No more queue Start/stop sampling by time configuration) Zoomable metrics Stdout/stderr in .map files ARM v8 and Power8 Limited ARMv8 support support Limited ARMv8 support Hardware (system + CPU) provides energy-related data (Currently: IPMI-based power sensors) caption API extracts this data and feeds Allinea’s tools (Currently: Intel Energy Checker SDK) Allinea’s tools process data at runtime to bring unique perspective

  16. Profile and Optimize with Allinea Forge

  17. The quest for the Holy Performance Code optimisation can be time- consuming. Efficient tools can help you focus on the most important bottlenecks.

  18. Getting Started with profiling on Salomon • Load the environment $ module load iimpi/5.5.0 $ module load Forge/5.1-43967 • Prepare the code for profiling $ mpiicc – g – O3 myapp.c – o myapp.exe • Modify job script to prefix the mpirun command map - – profile mpirun myapp.exe • Submit job $ qsub myjob.sub • View result $ map myapp_Xp_Yt_YYYY-MM-DD-HH-MM.map

  19. Hands-on Exercise Matrix Multiplication: C = A x B + C B k k j i, j, k: loop indexes nslices = 4 A C size i Algorithm 1- Master initializes matrices A, B & C 2- Master slices the matrices A & C, sends them to slaves 3- Master and Slaves perform the multiplication 4- Slaves send their results back to Master 5- Master writes the result Matrix C in an output file

  20. Profiling the application Exercise objectives : – Load Allinea Forge environment – Compile a code for allinea MAP – Submit the job through the queue – Discover allinea MAP interface and features – Optimize a simple code Content – Handout with step by step instructions – Source code in C and F90 + Makefile – Submission script Tutorial archive on Salomon in: /scratch/temp/flebeau/allinea_workshop.tar.gz

  21. Resolving Bugs with Allinea Forge

  22. Debugging by Discipline Debugging a problem is much easier when you can : • Make and undo changes fearlessly Use a source control (CVS, …) - • Track what you’ve tried so far - Write logbooks • Reproduce bugs with a single command - Create and use test script

  23. Debugging by Magic Any technology sufficiently advanced is indistinguishable from magic. Unpredictable, dangerous, irresistible.

  24. Learn your spells Debugging a problem is much easier if you know debuggers • Prepare the code $ mpiicc – O0 – g myapp.c – o myapp • Start Allinea DDT in interactive mode $ ddt mpirun -n 8 ./myapp arg1 arg2 • Start Allinea DDT in offline mode $ ddt --offline report.html mpirun -n 8 ./myapp arg1 arg2

  25. Debugging by Inspiration Look at the problem, see the solution. Trust your instincts. Control if they are right.

  26. Debugging by Inspiration Debugging a problem is much easier if you are inspired : • Search your inspiration sources - Check your past logbooks - Explain the problem to a rubber duck • Test your instincts Create tests (tracepoints, watchpoints, conditional breakpoints…) - • Observe what the debugger is telling you - Analyse what the debugger communicates - Retrieve information from the debugger ( advanced magic )

  27. Debugging by Inspiration • Memory errors can be obvious (segfaults …) • Sometimes not • Allinea DDT memory debugging tool enables automatic error detection • By activating dmalloc library • By adding guard pages • On the host as well as on the Xeon Phi • Different levels of detection brings different debugger behaviour

  28. Getting Started on Salomon • Load the environment $ module load iimpi/5.5.0 $ module load Forge/5.1-43967 • Prepare the code for debugging $ mpiicc – g – O0 myapp.c – o myapp.exe • Modify job script to prefix the mpirun command for reverse connect ddt - – connect mpirun myapp.exe • Launch Allinea MAP in the background on the login node (or use the remote client on your laptop) $ ddt & • Submit job $ qsub myjob.sub • When the job runs, it automatically connects to the GUI

  29. Exercise 2 : Working on the Optimized code Exercise objectives: – Compile a code for allinea forge – Discover underlying bugs with allinea MAP – Use allinea DDT to debug issues Content – Handout with step by step instructions – Source code in C and F90 + Makefile – Submission script

  30. Allinea Tools and Intel Xeon Phi Coprocessors From Xeon to Xeon Phi

  31. Determine the right candidates for Intel Xeon Phi • Best applications for intel xeon phi* – Scalable to over 100 threads – Heavy use of vectorization – Heavy use of memory bandwidth *Source : https://software.intel.com/en-us/articles/is-the-intel-xeon-phi-coprocessor-right-for-me • Scientific approach : make a decision based on facts – How to retrieve relevant metrics to identify appropriate applications? – How to benchmark and analyze lots of applications? – How to speed up the benchmarking process to focus on the migration itself? Allinea Performance Reports can help.

  32. Painless and quick benchmarks on Intel Xeon Extremely simple to start No source code needed Fully scalable, very low overhead Contains the relevant metrics Helps make informed decisions

  33. Getting Started on Salomon • Load the environment $ module load PerformanceReports/5.1-43967 • Modify job script to prefix the mpirun command perf-report mpirun – n 8 myapp.exe • Submit job $ sbatch myjob.sub • View result $ firefox myapp_Xp_Yt_YYYY-MM-DD-HH-MM.html $ cat myapp_Xp_Yt_YYYY-MM-DD-HH-MM.txt

Recommend


More recommend