all the things you need to know about intel mpi library
play

All the things you need to know about Intel MPI Library Jerome - PowerPoint PPT Presentation

All the things you need to know about Intel MPI Library Jerome Vienne viennej@tacc.utexas.edu Texas Advanced Computing Center The University of Texas at Austin Austin, TX November 12th, 2016 A Heterogeneous Environment MPI performance


  1. All the things you need to know about Intel MPI Library Jerome Vienne viennej@tacc.utexas.edu Texas Advanced Computing Center The University of Texas at Austin Austin, TX November 12th, 2016

  2. A Heterogeneous Environment MPI performance depends on many factors MPI libraries have to make choices Why ? Because the number of combinations is too large. Are these choices optimal for my application ? Not necessarily. Can we change them ? Yes, this is why we are there. All the things you need to know about Intel MPI Library | November 12th, 2016 | 2 ▶ CPUs (Number of cores, Cache sizes, Frequency) ▶ Memory (Amount, Frequency) ▶ Network Speed (10,20,40 … Gbit/s) ▶ Size of the job ▶ Type of code: Hybrid (ex: OpenMP+MPI) or Pure MPI

  3. A Heterogeneous Environment MPI performance depends on many factors MPI libraries have to make choices All the things you need to know about Intel MPI Library | November 12th, 2016 | 2 ▶ CPUs (Number of cores, Cache sizes, Frequency) ▶ Memory (Amount, Frequency) ▶ Network Speed (10,20,40 … Gbit/s) ▶ Size of the job ▶ Type of code: Hybrid (ex: OpenMP+MPI) or Pure MPI ▶ Why ? Because the number of combinations is too large. ▶ Are these choices optimal for my application ? Not necessarily. ▶ Can we change them ? Yes, this is why we are there.

  4. Aim of this talk your MPI application All the things you need to know about Intel MPI Library | November 12th, 2016 | 3 ▶ ”How to tune MPI” cannot be found easily inside books. ▶ Show that MPI libraries are not black boxes. ▶ Describe concepts that are common inside MPI libraries. ▶ Understand the difgerence between MPI libraries. ▶ Provide some useful commands for Intel MPI. ▶ Result: Help you to reduce the time and memory foot print of

  5. Before to start Warnings ‼! | November 12th, 2016 | All the things you need to know about Intel MPI Library worth it the most important ones. 4 common. TACC. OpenMPI). ▶ Talk based on Intel MPI (few references to MVAPICH2 and ▶ All experiments were done on Stampede supercomputer at ▶ Tuning options are specific to a MPI library ! But concepts are ▶ Options can have counter-efgects ! ▶ MPI libraries have lot of options for tuning, we will only cover ▶ Tuning could be time consuming, but long-term, it might be

  6. Plan Collective Tuning | November 12th, 2016 | All the things you need to know about Intel MPI Library To conclude Intra-node Point-to-Point Optimization Inter-node Point-to-Point Optimization To conclude Process Placement Hostfile Profiling The Choice of the Benchmark 5 • Basic Tuning • Intermediate Tuning • Conclusion

  7. Plan Collective Tuning | November 12th, 2016 | All the things you need to know about Intel MPI Library To conclude Intra-node Point-to-Point Optimization Inter-node Point-to-Point Optimization To conclude Process Placement Hostfile Profiling The Choice of the Benchmark 6 • Basic Tuning • Intermediate Tuning • Conclusion

  8. Plan Collective Tuning | November 12th, 2016 | All the things you need to know about Intel MPI Library To conclude Intra-node Point-to-Point Optimization Inter-node Point-to-Point Optimization To conclude Process Placement Hostfile Profiling The Choice of the Benchmark 7 • Basic Tuning • Intermediate Tuning • Conclusion

  9. The Choice of Benchmarks Difgerent MPI library = Tuning Based on Difgerent Benchmarks IMB or OMB, which one is the best to use ? Both are communication intensive without computation Depend on your application The best benchmark is your application ! But… let’s take a look at them in detail ! All the things you need to know about Intel MPI Library | November 12th, 2016 | 8 ▶ Intel MPI: Intel MPI Benchmarks (IMB) ▶ MVAPICH2: OSU Micro-Benchmarks (OMB)

  10. The Choice of Benchmarks Difgerent MPI library = Tuning Based on Difgerent Benchmarks IMB or OMB, which one is the best to use ? But… let’s take a look at them in detail ! All the things you need to know about Intel MPI Library | November 12th, 2016 | 8 ▶ Intel MPI: Intel MPI Benchmarks (IMB) ▶ MVAPICH2: OSU Micro-Benchmarks (OMB) ▶ Both are communication intensive without computation ▶ Depend on your application ▶ The best benchmark is your application !

  11. The Choice of Benchmarks Difgerent MPI library = Tuning Based on Difgerent Benchmarks IMB or OMB, which one is the best to use ? But… let’s take a look at them in detail ! All the things you need to know about Intel MPI Library | November 12th, 2016 | 8 ▶ Intel MPI: Intel MPI Benchmarks (IMB) ▶ MVAPICH2: OSU Micro-Benchmarks (OMB) ▶ Both are communication intensive without computation ▶ Depend on your application ▶ The best benchmark is your application !

  12. Intel MPI Benchmarks (IMB) Details (IMB-MPI1) All the things you need to know about Intel MPI Library | November 12th, 2016 | 9 ▶ Originally know as Pallas MPI Benchmarks (PMB) ▶ Support Point-to-Point and Collective operations ▶ 1 program with lot of options for classical MPI functions ▶ Root changes afuer each iteration for collectives

  13. Intel MPI Benchmarks (IMB) Intel MPI vs MVAPICH2 using IMB Bcast with 256 cores | November 12th, 2016 | All the things you need to know about Intel MPI Library 9 10000 Mvapich2 2.2 Intel MPI 2017 1000 Time (us) 100 10 1 4 16 64 256 1K 4K 16K 64K 256K 1M 4M Message Size (Bytes)

  14. OSU Micro-Benchmarks (OMB) Details All the things you need to know about Intel MPI Library | November 12th, 2016 | 10 ▶ Very simple to use ▶ Support Point-to-Point and Collective operations ▶ Multiples programs with simple options ▶ Keep the same root during all iterations + use barrier

  15. OSU Micro-Benchmarks (OMB) Intel MPI vs MVAPICH2 using OMB Bcast with 256 cores | November 12th, 2016 | All the things you need to know about Intel MPI Library 10 1000 Mvapich2 2.2 Intel MPI 2017 100 Time (us) 10 1 4 16 64 256 1K 4K 16K 64K 256K 1M Message Size (Bytes)

  16. OSU Micro-Benchmarks (OMB) Tuned Intel MPI vs MVAPICH2 using OMB Bcast with 256 | November 12th, 2016 | All the things you need to know about Intel MPI Library 10 cores 10000 Mvapich2 2.2 Intel MPI 2017 1000 Time (us) 100 10 1 4 16 64 256 1K 4K 16K 64K 256K 1M Message Size (Bytes)

  17. Benchmarks: What you need to know To resume two MPI libraries be painful, we will see it later :) All the things you need to know about Intel MPI Library | November 12th, 2016 | 11 ▶ Don’t trust them ! ▶ They have difgerent behaviors: so, KNOW your benchmark ! ▶ Don’t provide you necessarily the best results by default. ▶ Be sure that you tune things correctly if you want to compare ▶ Collective tuning for a particular benchmark/application could

  18. Plan Collective Tuning | November 12th, 2016 | All the things you need to know about Intel MPI Library To conclude Intra-node Point-to-Point Optimization Inter-node Point-to-Point Optimization To conclude Process Placement Hostfile Profiling The Choice of the Benchmark 12 • Basic Tuning • Intermediate Tuning • Conclusion

  19. To know what you need to tune first Why MPI profiling is important ? choices: communications (size, time spent, functions called etc…) Scalasca, IPM, mpiP …) All the things you need to know about Intel MPI Library | November 12th, 2016 | 13 ▶ To identify which MPI functions are used, you have two ▶ Look at the code ▶ Profile your application ▶ Profiling provides you all the information regarding MPI ▶ Could be integrated in the MPI library (ex: Intel MPI) ▶ Lot of tools can help you to profile your application (TAU,

  20. How to profile ? With Intel MPI at runtime mpiexec -genv I_MPI_STATS=ipm I_MPI_STATS_FILE=myprofile.txt …. Tools All the things you need to know about Intel MPI Library | November 12th, 2016 | 14 ▶ MPI Performance Snapshots (MPS) ▶ Intel Trace Analyzer and Collector (ITAC)

  21. Plan Collective Tuning | November 12th, 2016 | All the things you need to know about Intel MPI Library To conclude Intra-node Point-to-Point Optimization Inter-node Point-to-Point Optimization To conclude Process Placement Hostfile Profiling The Choice of the Benchmark 15 • Basic Tuning • Intermediate Tuning • Conclusion

  22. Impact of the hostfile Example of command: mpirun -np 4 -hostfile host ./a.out difgerent results ! All the things you need to know about Intel MPI Library | November 12th, 2016 | 16 ▶ Hostfile provides the list of nodes that will be used ▶ Depending on the MPI library, the same hostfile could lead to

  23. A Qvick Performance Example Intel MPI | November 12th, 2016 | All the things you need to know about Intel MPI Library 19 sec. Correct Hostfile/Command: Default: 51 sec. + Process Placement: 19 sec. NAS SP-MZ on Stampede Correct Hostfile: 176 sec. Default: 176 sec. Mvapich2 node2 node1 mpirun -np 4 -hostfile host ./sp-mz.C.4 2 nodes, 2 MPI tasks/node with 8 OpenMP threads 17

  24. A Qvick Performance Example Intel MPI | November 12th, 2016 | All the things you need to know about Intel MPI Library 19 sec. Correct Hostfile/Command: Default: 51 sec. + Process Placement: 19 sec. NAS SP-MZ on Stampede Correct Hostfile: 176 sec. Default: 176 sec. Mvapich2 node2 node1 mpirun -np 4 -hostfile host ./sp-mz.C.4 2 nodes, 2 MPI tasks/node with 8 OpenMP threads 17

Recommend


More recommend