compiler assisted performance analysis
play

Compiler-assisted Performance Analysis Adam Nemet Apple - PowerPoint PPT Presentation

Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com Hotspot User Bottleneck Compiler Optimization X, Y 2 Hotspot Hotspot Legality User Compiler Bottleneck Cost Model Compiler Optimization Some Optimizations?


  1. Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com

  2. Hotspot User Bottleneck Compiler Optimization X, Y 2

  3. Hotspot Hotspot Legality User Compiler Bottleneck Cost Model Compiler Optimization Some Optimizations? X, Y 2

  4. Hotspot Hotspot Legality User Compiler Bottleneck Cost Model Compiler Optimization Some Optimizations? X, Y 2

  5. Hotspot Hotspot Disassemble Legality User Compiler Bottleneck Cost Model Compiler Optimization Some Optimizations? X, Y 2

  6. Hotspot Hotspot -debug-only Legality User Compiler Bottleneck Cost Model Compiler Optimization Some Optimizations? X, Y 2

  7. Hotspot Hotspot Optimization Diagnostics Legality User Compiler Bottleneck Cost Model Compiler Optimization Some Optimizations? X, Y 2

  8. Optimization Diagnostics in LLVM • Supported in LLVM • Only a small number of passes emit them • -Rpass options to enable them in the compiler output foo.c:8:5: remark: accumulate inlined into compute_sum[-Rpass=inline] accumulate(arr[i], sum); ^ 3

  9. Optimization Diagnostics in LLVM • Supported in LLVM • Only a small number of passes emit them • -Rpass options to enable them in the compiler output • For large programs, the output of -Rpass is noisy and unstructured 3

  10. 4

  11. Messages appear Remarks for hot and cold in no particular order code are intermixed How can we make this information accessible and actionable? Messages from successful and failed optimizations are dumped together 4

  12. Wish List • All in one place : Optimizations Dashboard • At a glance : See high-level interaction between optimizations for targeted low-level debugging • Filtering : Noise-level should be minimized by focusing on the hot code • Integration : Display hot code and the optimizations side-by-side 5

  13. opt-viewer 6

  14. Approach • Extend existing optimization remark infrastructure • Add the new optimizations • Add ability to output remarks to a data file • Visualize data in HTML • Targeting compiler developers initially 7

  15. Example 9

  16. Work Flow $ clang -O3 —fsave-optimization-record -c foo.c $ utils/opt-viewer/opt-viewer.py foo.opt.yaml html $ open html/foo.c.html 11

  17. Successful Optimizations Remarks appear inline under Further details the referenced line about the optimization Green for successful Name of the pass optimization 13

  18. Successful Optimizations Column aligned with the expression HTML link to facilitate further analysis 14

  19. Successful Optimizations Optimizations can expose interesting Remarks in white analyses are Analysis remarks 15

  20. Missed Optimizations 15

  21. Missed Optimizations Red means failed optimization 16

  22. old LLVM Changes new Pass pipeline IR IR Inliner LoopVectorizer ORE.emit(OptimizationRemarkAnalysis("inline", "CanBeInlined", Call) << NV("Callee", Callee) << " can be inlined into “ << NV("Caller", Caller) << " with cost=" << NV("Cost", IC.getCost()) << " threshold=“ << NV("Threshold", Threshold)); OptimizationRemarkEmitter -Rpass-analysis=inline foo.c:8:5: remark: accumulate can be inlined into compute_sum with cost=-5 (threshold=487) [-Rpass-analysis=inline] accumulate(arr[i], sum); ^ 22

  23. old LLVM Changes new Pass pipeline IR IR Inliner LoopVectorizer ORE.emit(OptimizationRemarkAnalysis("inline", "CanBeInlined", Call) << NV("Callee", Callee) << " can be inlined into “ << NV("Caller", Caller) << " with cost=" << NV("Cost", IC.getCost()) << " threshold=“ << NV("Threshold", Threshold)); OptimizationRemarkEmitter -fsave-optmization-record enables source line debug info YAML (-gline-tables-only) 22

  24. old LLVM Changes new Pass pipeline IR IR Inliner LoopVectorizer ORE.emit(OptimizationRemarkAnalysis("inline", "CanBeInlined", Call) << NV("Callee", Callee) << " can be inlined into “ << NV("Caller", Caller) << " with cost=" << NV("Cost", IC.getCost()) << " threshold=“ << NV("Threshold", Threshold)); --- !Analysis Pass: inline Name: CanBeInlined DebugLoc: { File: s.cc, Line: 8, Column: 5 } OptimizationRemarkEmitter Function: compute_sum Args: - Callee: accumulate DebugLoc: { File: s.cc, Line: 1, Column: 0 } -fsave-optmization-record - String: ' can be inlined into ' - Caller: compute_sum enables source line DebugLoc: { File: s.cc, Line: 5, Column: 0 } debug info YAML - String: ' with cost=' (-gline-tables-only) - Cost: '-5' - String: ' (threshold=' - Threshold: '487' - String: ')' ... 22

  25. old opt-viewer new YAML utils/opt-viewer/opt-viewer.py index.html foo.o.html 23

  26. Index 24

  27. Index Noisy: Most of this code not hot Sort by hotness 24

  28. old Use PGO for Hotness new Pass pipeline --- !Analysis IR IR Inliner LoopVectorizer Pass: inline Name: CanBeInlined DebugLoc: { File: s.cc, Line: 8, Column: 5 } Function: compute_sum Hotness: 3 Args: - Callee: accumulate DebugLoc: { File: s.cc, Line: 1, Column: 0 } OptimizationRemarkEmitter - String: ' can be inlined into ' - Caller: compute_sum DebugLoc: { File: s.cc, Line: 5, Column: 0 } - String: ' with cost=' - Cost: '-5' - String: ' (threshold=' - Threshold: '487' LazyBlockFrequencyInfo - String: ')' ... YAML BlockFrequencyInfo 25

  29. Hotness Relative to maximum hotness, NOT total time % 27

  30. Optimizations Recorded LICM Function Inliner GVN Loop Vectorizer Loop Idiom Loop Unroller Loop Deletion LoopDataPrefetch SLP Vectorizer … more to follow 28

  31. Test Drive on LLVM test suite 29

  32. Improve & Evaluate 1. Does the information presented in this high-level view contain sufficient detail to reconstruct what happened? 2. Can we discover the interactions between optimizations? 3. With the improved visibility, can we quickly find real performance opportunities? 30

  33. DhryStone (SingleSource/Benchmark) Interaction of Optimizations 31

  34. DhryStone Inlining Context 33

  35. DhryStone 36

  36. DhryStone 38

  37. DhryStone 40

  38. DhryStone 42

  39. DhryStone 45

  40. DhryStone 46

  41. DhryStone 48

  42. DhryStone 50

  43. DhryStone: Summary • Without low-level debugging, quickly reconstructed what happened • Even though it involved interaction between multiple optimizations • Inlining and Alias Analysis/GVN • Missed optimizations: Extra analysis to manage with false positives 1. Filter trivially false positives 2. Expose enough information for quick detection by user 51

  44. Freebench/distray (MultiSource/Benchmarks) Finding Performance Opportunity 52

  45. Not modified via LinP, maybe writes through other pointers

  46. Not modified via LinD, maybe writes through other pointers

  47. Reads and writes don’t alias

  48. Loop versioning Reads and writes don’t alias with array overlap checks?

  49. LICM-based LoopVersioning (-enable-loop-versioning-licm) 55

  50. LICM-based LoopVersioning Performance opportunity if we can (-enable-loop-versioning-licm) improve this pass 55

  51. LICM-based LoopVersioning Performance opportunity if we can (-enable-loop-versioning-licm) Approximate the opportunity by improve this pass manually modifying the source 55

  52. Dynamic Instruction Count Reduced by 11%

  53. Dynamic Instruction Count Performance headroom Reduced by 11% 11%

  54. Freebench/distray: Summary • Found optimization opportunity while staying in the high-level view • Reconstructed the reason for missed optimization • High-level view exposed that the gain may be substantial • Got immediate feedback of the desired effect on the prototype • Identified the pass for low-level debugging 58

  55. Check Out More Examples http://lab.llvm.org:8080/artifacts/opt-view_test-suite 59

  56. Development Timeline Initial version on LLVM trunk Now New tools using Optimization Compiler Developer Tool Records Code Author Tool 60

  57. Compiler Developer Tool: Status • Written in Python • Hook up new passes • Improve diagnostics quality for existing passes • Perform extra analysis for insightful messages • Improve UI 61

  58. Compiler Developer Tool: Status • Written in Python p • Hook up new passes l e H r o f • Improve diagnostics quality for existing passes t s e u q e R • Perform extra analysis for insightful messages • Improve UI 61

Recommend


More recommend