to understand parallel program behaviors
play

to Understand Parallel Program Behaviors LAI WEI , JOHN - PowerPoint PPT Presentation

Automated Analysis of Time Series Data to Understand Parallel Program Behaviors LAI WEI , JOHN MELLOR-CRUMMEY RICE UNIVERSITY HOUSTON, TX, USA Background Parallel computers of increasing scale Support scientific simulations of increasing


  1. Automated Analysis of Time Series Data to Understand Parallel Program Behaviors LAI WEI , JOHN MELLOR-CRUMMEY RICE UNIVERSITY HOUSTON, TX, USA

  2. Background Parallel computers of increasing scale ◦ Support scientific simulations of increasing ambition Performance of many applications fail to scale accordingly ◦ Load imbalance, serialization, network congestion, etc. Performance tools to understand application behaviors ◦ Measure and present performance data ◦ Used by experts to manually identify performance inefficiencies 2

  3. Profile Breaks down application run time into sources of costs Calling context P0 P1 P2 P3 main() 9s 9s 9s 9s init() 1s 1s 1s 1s solve() 8s 8s 8s 8s compute() 4s 4.1s 3.9s 4s sync() 4s 3.9s 4.1s 4s 3

  4. Profile Breaks down application run time into sources of costs Calling context P0 P1 P2 P3 main() 9s 9s 9s 9s init() 1s 1s 1s 1s solve() 8s 8s 8s 8s compute() 4s 4.1s 3.9s 4s sync() 4s 3.9s 4.1s 4s 4

  5. Profile Breaks down application run time into sources of costs Calling context P0 P1 P2 P3 main() 9s 9s 9s 9s init() 1s 1s 1s 1s solve() 8s 8s 8s 8s compute() 4s 4.1s 3.9s 4s sync() 4s 3.9s 4.1s 4s Performance loss, why? 5

  6. Time series Presents application behavior over time init() solve() Calling context P0 main() init() P1 solve() P2 compute() sync() P3 3s 5s 7s 9s Depth = 1 1s 6

  7. Time series Presents application behavior over time init() compute() sync() Calling context P0 main() init() P1 solve() P2 compute() sync() P3 3s 5s 7s 9s Depth = 2 1s 7

  8. Time series Presents application behavior over time init() compute() sync() Calling context P0 main() init() P1 solve() P2 compute() sync() P3 3s 5s 7s 9s Depth = 2 1s 8

  9. Time series Presents application behavior over time init() compute() sync() Calling context P0 main() init() P1 solve() P2 compute() sync() P3 3s 5s 7s 9s Depth = 2 1s Load imbalance 9

  10. Time series Presents application behavior over time init() compute() sync() Calling context P0 main() init() P1 solve() P2 compute() sync() P3 3s 5s 7s 9s Depth = 2 1s Load imbalance 10

  11. Time series Presents application behavior over time init() compute() sync() Calling context P0 main() init() P1 solve() P2 compute() sync() P3 3s 5s 7s 9s Depth = 2 1s Load imbalance 11

  12. Motivation Experts manually examine time series ◦ Understand how and why performance inefficiencies arise Time series of large scale parallel executions ◦ Vast in three dimensions ◦ Process ◦ Time ◦ Call path depth ◦ Manual analysis is difficult if not impractical 12

  13. Related work -- automated analysis Analysis of profiles [Huck, SC’05] [Tallent, SC’10 ] ◦ Often insufficient for diagnosing how and why parallel inefficiencies arise Analysis of execution traces ◦ Collecting instrumentation-based traces are costly in time and space ◦ Fine-grained traces explode at large scale ◦ Analysis at coarse granularity [Gonzalez , IPDPS’09] [Llort , IPDPS’10] ◦ Still needs lots of manual effort ◦ Analysis at fine granularity for short intervals [Geimer , CCPE’10 ] [Böhme, TOPC’16] ◦ Requires prior knowledge for selective tracing 13

  14. Our contribution Automated analysis of sample-based time-series data ◦ Feasible for large-scale programs ◦ Data volume is manageable ◦ Derive compact top-down summaries ◦ Uncover patterns and variance ◦ Direct attention to potential performance losses ◦ Attribute losses to code regions where they originate 14

  15. Approach 1. Collect and prepare sample-based time-series for further analysis ◦ Collect a time series of call paths with HPCToolkit ◦ Organize each time series as a tree of program calling contexts ◦ Identify iterative behaviors in the time series 2. Build clusters across threads and loop iterations 3. Quantify performance losses and attribute them to call paths 15

  16. Collect call path samples over time 1 Depth = 0 2 Depth = 1 3 4 Depth = 2 5 Depth = 3 6 7 8 Depth = 0 9 Depth = 1 10 Depth = 2 11 12 Depth = 3 13 14 15 16

  17. Collect call path samples over time 1 Depth = 0 2 Depth = 1 3 4 Depth = 2 5 Depth = 3 6 7 8 Depth = 0 9 Depth = 1 10 Depth = 2 11 12 Depth = 3 13 14 15 17

  18. Collect call path samples over time 1 Depth = 0 2 Depth = 1 3 4 Depth = 2 5 Depth = 3 6 7 8 Depth = 0 9 Depth = 1 10 Depth = 2 11 12 Depth = 3 13 14 15 18

  19. Collect call path samples over time 1 Depth = 0 2 Depth = 1 3 4 Depth = 2 5 Depth = 3 6 7 8 Depth = 0 9 Depth = 1 10 Depth = 2 11 12 Depth = 3 13 14 15 19

  20. Collect call path samples over time 1 Depth = 0 2 Depth = 1 3 4 Depth = 2 5 Depth = 3 6 T1 7 8 Depth = 0 9 Depth = 1 10 Depth = 2 11 12 Depth = 3 13 14 15 20

  21. Collect call path samples over time 1 main() Depth = 0 2 foo@13 Depth = 1 3 C@5 4 Depth = 2 5 Depth = 3 6 T1 7 8 Depth = 0 9 Depth = 1 10 Depth = 2 11 12 Depth = 3 13 14 15 21

  22. Collect call path samples over time 1 main() Depth = 0 2 foo@13 Depth = 1 3 C@5 4 Depth = 2 5 Depth = 3 6 T1 7 8 Depth = 0 9 Depth = 1 10 Depth = 2 11 12 Depth = 3 13 14 15 22

  23. Collect call path samples over time 1 main() Depth = 0 2 foo@13 Depth = 1 3 C@5 4 Depth = 2 5 Depth = 3 6 T1 7 8 Depth = 0 9 Depth = 1 10 Depth = 2 11 12 Depth = 3 13 14 15 23

  24. Collect call path samples over time 1 main() Depth = 0 2 foo@13 Depth = 1 3 C@5 4 Depth = 2 5 Depth = 3 6 T1 7 8 Depth = 0 9 Depth = 1 10 Depth = 2 11 12 Depth = 3 13 14 15 24

  25. Collect call path samples over time 1 main() Depth = 0 2 foo@13 Depth = 1 3 C@5 4 Depth = 2 5 Depth = 3 6 T1 7 8 Depth = 0 9 Depth = 1 10 Depth = 2 11 12 Depth = 3 13 14 15 25

  26. Collect call path samples over time 1 main() Depth = 0 2 foo@13 Depth = 1 3 C@5 4 Depth = 2 5 Depth = 3 6 T1 T2 7 8 Depth = 0 9 Depth = 1 10 Depth = 2 11 12 Depth = 3 13 14 15 26

  27. Collect call path samples over time 1 main() main() Depth = 0 2 foo@13 foo@13 Depth = 1 3 C@5 4 Depth = 2 A@7 5 Depth = 3 6 T1 T2 7 8 Depth = 0 9 Depth = 1 10 Depth = 2 11 12 Depth = 3 13 14 15 27

  28. Collect call path samples over time 1 main() main() Depth = 0 2 foo@13 foo@13 Depth = 1 3 C@5 4 Depth = 2 A@7 5 Depth = 3 6 T1 T2 7 8 Depth = 0 9 Depth = 1 10 Depth = 2 11 12 Depth = 3 13 14 15 28

  29. Collect call path samples over time 1 main() main() Depth = 0 2 foo@13 foo@13 Depth = 1 3 C@5 loop@6 4 Depth = 2 A@7 5 Depth = 3 6 T1 T2 7 8 Depth = 0 9 Depth = 1 10 Depth = 2 11 12 Depth = 3 13 14 15 29

  30. Collect call path samples over time 1 main() main() Depth = 0 2 foo@13 foo@13 Depth = 1 3 C@5 loop@6 4 Depth = 2 A@7 5 Depth = 3 6 T1 T2 7 8 Depth = 0 9 Depth = 1 10 Depth = 2 11 12 Depth = 3 13 14 15 30

  31. Collect call path samples over time 1 main() main() Depth = 0 2 foo@13 foo@13 Depth = 1 3 C@5 loop@6 4 Depth = 2 A@7 5 Depth = 3 6 T1 T2 7 8 Depth = 0 9 Depth = 1 10 Depth = 2 11 12 Depth = 3 13 14 15 31

  32. Collect call path samples over time 1 main() main() Depth = 0 2 foo@13 foo@13 Depth = 1 3 C@5 loop@6 4 Depth = 2 A@7 5 Depth = 3 6 T1 T2 7 8 Depth = 0 9 Depth = 1 10 Depth = 2 11 12 Depth = 3 13 14 15 32

  33. Collect call path samples over time 1 main() main() Depth = 0 2 foo@13 foo@13 Depth = 1 3 C@5 loop@6 4 Depth = 2 A@7 5 Depth = 3 6 T1 T2 7 8 Depth = 0 9 Depth = 1 10 Depth = 2 11 12 Depth = 3 13 14 15 33

  34. Collect call path samples over time 1 main() main() main() Depth = 0 2 foo@13 foo@13 foo@13 Depth = 1 3 C@5 loop@6 loop@6 4 Depth = 2 A@7 B@8 5 Depth = 3 6 T1 T2 T3 7 8 Depth = 0 9 Depth = 1 10 Depth = 2 11 12 Depth = 3 13 14 15 34

  35. Collect call path samples over time 1 main() main() main() Depth = 0 2 foo@13 foo@13 foo@13 Depth = 1 3 C@5 loop@6 loop@6 4 Depth = 2 A@7 B@8 5 Depth = 3 6 T1 T2 T3 7 8 Depth = 0 9 Depth = 1 10 Depth = 2 11 12 Depth = 3 13 14 15 35

  36. Collect call path samples over time 1 main() main() main() Depth = 0 2 foo@13 foo@13 foo@13 Depth = 1 3 C@5 loop@6 loop@6 4 Depth = 2 A@7 B@8 5 Depth = 3 6 T1 T2 T3 7 8 Depth = 0 9 Depth = 1 10 Depth = 2 11 12 Depth = 3 13 14 15 36

  37. Collect call path samples over time 1 main() main() main() main() Depth = 0 2 foo@13 foo@13 foo@13 foo@13 Depth = 1 3 C@5 loop@6 loop@6 loop@6 4 Depth = 2 A@7 B@8 A@7 5 Depth = 3 6 T1 T2 T3 T4 7 8 Depth = 0 9 Depth = 1 10 Depth = 2 11 12 Depth = 3 13 14 15 37

  38. Collect call path samples over time 1 main() main() main() main() Depth = 0 2 foo@13 foo@13 foo@13 foo@13 Depth = 1 3 C@5 loop@6 loop@6 loop@6 4 Depth = 2 A@7 B@8 A@7 5 Depth = 3 6 T1 T2 T3 T4 7 main() 8 Depth = 0 9 foo@13 Depth = 1 10 loop@6 Depth = 2 11 A@7 12 Depth = 3 13 T5 14 15 38

Recommend


More recommend