Compiler-assisted Performance Analysis Adam Nemet Apple anemet@apple.com
Hotspot User Bottleneck Compiler Optimization X, Y 2
Hotspot Hotspot Legality User Compiler Bottleneck Cost Model Compiler Optimization Some Optimizations? X, Y 2
Hotspot Hotspot Legality User Compiler Bottleneck Cost Model Compiler Optimization Some Optimizations? X, Y 2
Hotspot Hotspot Disassemble Legality User Compiler Bottleneck Cost Model Compiler Optimization Some Optimizations? X, Y 2
Hotspot Hotspot -debug-only Legality User Compiler Bottleneck Cost Model Compiler Optimization Some Optimizations? X, Y 2
Hotspot Hotspot Optimization Diagnostics Legality User Compiler Bottleneck Cost Model Compiler Optimization Some Optimizations? X, Y 2
Optimization Diagnostics in LLVM • Supported in LLVM • Only a small number of passes emit them • -Rpass options to enable them in the compiler output foo.c:8:5: remark: accumulate inlined into compute_sum[-Rpass=inline] accumulate(arr[i], sum); ^ 3
Optimization Diagnostics in LLVM • Supported in LLVM • Only a small number of passes emit them • -Rpass options to enable them in the compiler output • For large programs, the output of -Rpass is noisy and unstructured 3
4
Messages appear Remarks for hot and cold in no particular order code are intermixed How can we make this information accessible and actionable? Messages from successful and failed optimizations are dumped together 4
Wish List • All in one place : Optimizations Dashboard • At a glance : See high-level interaction between optimizations for targeted low-level debugging • Filtering : Noise-level should be minimized by focusing on the hot code • Integration : Display hot code and the optimizations side-by-side 5
opt-viewer 6
Approach • Extend existing optimization remark infrastructure • Add the new optimizations • Add ability to output remarks to a data file • Visualize data in HTML • Targeting compiler developers initially 7
Example 9
Work Flow $ clang -O3 —fsave-optimization-record -c foo.c $ utils/opt-viewer/opt-viewer.py foo.opt.yaml html $ open html/foo.c.html 11
Successful Optimizations Remarks appear inline under Further details the referenced line about the optimization Green for successful Name of the pass optimization 13
Successful Optimizations Column aligned with the expression HTML link to facilitate further analysis 14
Successful Optimizations Optimizations can expose interesting Remarks in white analyses are Analysis remarks 15
Missed Optimizations 15
Missed Optimizations Red means failed optimization 16
old LLVM Changes new Pass pipeline IR IR Inliner LoopVectorizer ORE.emit(OptimizationRemarkAnalysis("inline", "CanBeInlined", Call) << NV("Callee", Callee) << " can be inlined into “ << NV("Caller", Caller) << " with cost=" << NV("Cost", IC.getCost()) << " threshold=“ << NV("Threshold", Threshold)); OptimizationRemarkEmitter -Rpass-analysis=inline foo.c:8:5: remark: accumulate can be inlined into compute_sum with cost=-5 (threshold=487) [-Rpass-analysis=inline] accumulate(arr[i], sum); ^ 22
old LLVM Changes new Pass pipeline IR IR Inliner LoopVectorizer ORE.emit(OptimizationRemarkAnalysis("inline", "CanBeInlined", Call) << NV("Callee", Callee) << " can be inlined into “ << NV("Caller", Caller) << " with cost=" << NV("Cost", IC.getCost()) << " threshold=“ << NV("Threshold", Threshold)); OptimizationRemarkEmitter -fsave-optmization-record enables source line debug info YAML (-gline-tables-only) 22
old LLVM Changes new Pass pipeline IR IR Inliner LoopVectorizer ORE.emit(OptimizationRemarkAnalysis("inline", "CanBeInlined", Call) << NV("Callee", Callee) << " can be inlined into “ << NV("Caller", Caller) << " with cost=" << NV("Cost", IC.getCost()) << " threshold=“ << NV("Threshold", Threshold)); --- !Analysis Pass: inline Name: CanBeInlined DebugLoc: { File: s.cc, Line: 8, Column: 5 } OptimizationRemarkEmitter Function: compute_sum Args: - Callee: accumulate DebugLoc: { File: s.cc, Line: 1, Column: 0 } -fsave-optmization-record - String: ' can be inlined into ' - Caller: compute_sum enables source line DebugLoc: { File: s.cc, Line: 5, Column: 0 } debug info YAML - String: ' with cost=' (-gline-tables-only) - Cost: '-5' - String: ' (threshold=' - Threshold: '487' - String: ')' ... 22
old opt-viewer new YAML utils/opt-viewer/opt-viewer.py index.html foo.o.html 23
Index 24
Index Noisy: Most of this code not hot Sort by hotness 24
old Use PGO for Hotness new Pass pipeline --- !Analysis IR IR Inliner LoopVectorizer Pass: inline Name: CanBeInlined DebugLoc: { File: s.cc, Line: 8, Column: 5 } Function: compute_sum Hotness: 3 Args: - Callee: accumulate DebugLoc: { File: s.cc, Line: 1, Column: 0 } OptimizationRemarkEmitter - String: ' can be inlined into ' - Caller: compute_sum DebugLoc: { File: s.cc, Line: 5, Column: 0 } - String: ' with cost=' - Cost: '-5' - String: ' (threshold=' - Threshold: '487' LazyBlockFrequencyInfo - String: ')' ... YAML BlockFrequencyInfo 25
Hotness Relative to maximum hotness, NOT total time % 27
Optimizations Recorded LICM Function Inliner GVN Loop Vectorizer Loop Idiom Loop Unroller Loop Deletion LoopDataPrefetch SLP Vectorizer … more to follow 28
Test Drive on LLVM test suite 29
Improve & Evaluate 1. Does the information presented in this high-level view contain sufficient detail to reconstruct what happened? 2. Can we discover the interactions between optimizations? 3. With the improved visibility, can we quickly find real performance opportunities? 30
DhryStone (SingleSource/Benchmark) Interaction of Optimizations 31
DhryStone Inlining Context 33
DhryStone 36
DhryStone 38
DhryStone 40
DhryStone 42
DhryStone 45
DhryStone 46
DhryStone 48
DhryStone 50
DhryStone: Summary • Without low-level debugging, quickly reconstructed what happened • Even though it involved interaction between multiple optimizations • Inlining and Alias Analysis/GVN • Missed optimizations: Extra analysis to manage with false positives 1. Filter trivially false positives 2. Expose enough information for quick detection by user 51
Freebench/distray (MultiSource/Benchmarks) Finding Performance Opportunity 52
Not modified via LinP, maybe writes through other pointers
Not modified via LinD, maybe writes through other pointers
Reads and writes don’t alias
Loop versioning Reads and writes don’t alias with array overlap checks?
LICM-based LoopVersioning (-enable-loop-versioning-licm) 55
LICM-based LoopVersioning Performance opportunity if we can (-enable-loop-versioning-licm) improve this pass 55
LICM-based LoopVersioning Performance opportunity if we can (-enable-loop-versioning-licm) Approximate the opportunity by improve this pass manually modifying the source 55
Dynamic Instruction Count Reduced by 11%
Dynamic Instruction Count Performance headroom Reduced by 11% 11%
Freebench/distray: Summary • Found optimization opportunity while staying in the high-level view • Reconstructed the reason for missed optimization • High-level view exposed that the gain may be substantial • Got immediate feedback of the desired effect on the prototype • Identified the pass for low-level debugging 58
Check Out More Examples http://lab.llvm.org:8080/artifacts/opt-view_test-suite 59
Development Timeline Initial version on LLVM trunk Now New tools using Optimization Compiler Developer Tool Records Code Author Tool 60
Compiler Developer Tool: Status • Written in Python • Hook up new passes • Improve diagnostics quality for existing passes • Perform extra analysis for insightful messages • Improve UI 61
Compiler Developer Tool: Status • Written in Python p • Hook up new passes l e H r o f • Improve diagnostics quality for existing passes t s e u q e R • Perform extra analysis for insightful messages • Improve UI 61
Recommend
More recommend