263-2810: Advanced Compiler Design Compilation with dynamic information Thomas R. Gross Computer Science Department ETH Zurich, Switzerland
Outline § Dynamic information: Information obtained at runtime § During program execution § Why use dynamic information § Guidance for optimizations § Starting point for speculation 2
Outline 7.1 Review of compiler models § Structure to obtain and use dynamic information 7.2 Techniques to obtain dynamic information 7.3 Classification of dynamic information § What is a “profile”? 7.4 Types of profiles 7.5 Practical issues 7.6 Optimization based on dynamic information 3
Src 7.1 Compiler model Frontend § So far we used a simple model § “Ahead of time” compiler IR Backend a.out Execution Platform (OS/Core) 4
Src 7.1 Compiler model Frontend § So far we used a simple model § “Ahead of time” compiler IR Backend a.out § Use of dynamic information requires re-compilation Execution Platform (OS/Core) 5
Src Re-compilation Frontend § Compiler can (re)use dynamic information from earlier executions IR § Many questions § Off-line model preserved: compiler reads source plus Backend historical information a.out Execution Platform (OS/Core) 6
Src Continuous compilation Frontend § No reason to limit compiler to a single recompilation § Continuous compilation: IR Continuous compiler works in parallel compiler with program execution § Some issues remain Backend a.out Execution Platform (OS/Core) 7
Src Just-in-time compilation Frontend § Delay compilation to first execution § Compilation time matters IR § Combine interpretation and compilation Execution Platform (OS/Core) Backend Just-in-time compiler 9
Issues § Unit of compilation § Basic block § Trace § Method § Package § Delay compilation to first execution § Or to N-th execution § Interpret first (N-1) executions § Use multi-tier compilation model § More than one “just-in-time” compiler § Optimizer: later stages of multi-tier compilation system § How to switch between interpreted and compiled code § Is there ever a need to de-optimize? 10
7.2 Techniques to obtain dynamic information § Get compiler to collect information § Produces code to collect information § Instrument program § Precise (collect only what is requested) § Get platform (hardware, OS) to collect information § Modern processors contain monitoring/measurement units 11
Program instrumentation § Compiler adds operations to program § Can be done “early” during compilation § Add IR operations § Subject to optimization § But no guarantee that added operations can be removed § Issues § Overhead: extra operations take time to execute § Extra operations may perturb measurements § Cache hit rate § Register spill traffic 12
Hardware-based monitoring § Program (processor) monitoring unit (PMU) collects data on the fly § Usually no noticeable overhead for collection § May have to store collected data § Overhead § Can be configured to monitor various aspects § PMUs use sampling § Select “event” to monitor § Execution of instruction (any instruction, control flow transfers, method invocation…) § Cache miss § …. § Select frequency of monitoring 13
§ Sampling allows software to tune the overhead § Overhead too high: Increase sampling interval § Information not precise enough: shorten sampling interval § No guarantee that sampling provides meaningful data but it almost always does § Issues § PMU may be processor-dependent § Different implementations of same architecture use different PMUs § Often many restrictions on what can be observed at the same time § May have to execute program multiple times (for same input) to get all the data you need § Documentation is often incomplete (or incorrect) § PMUs don’t sell processors, performance or energy consumption matter(s) § Use may require special privileges § Do not influence other jobs on the same system § No information leakage 14
Instrumentation vs. PMU usage § PMU wins if it can get the information you need § Usually faster § Usually more precise § Often accurate enough 15
7.3 Classification of dynamic information § Brief discussion, not a complete discussion of all aspects § An attempt to sort out different dimensions § Independent of how the information is collected § Measurement by instrumentation § Measurement by processor/operating system/runtime system 16
Deterministic execution § Assume deterministic program execution § May need to capture/replay input and output (incl. network-based communication or message passing) § Design harness to capture environment § Packets arrive in the same order, same contents § Interrupts/signal arrive in the same order § I/O § Non-deterministic programs can be made deterministic for monitoring § Record random numbers generated and always replay this list 17
7.3.1 What is measured/observed? § Program properties vs. platform (hardware) properties § Program properties § Do not change if program is executed on a different platform (for the same input) § Examples § # of times a method is invoked § # of times a branch is taken § Loop trip count § “Profile” : record of program properties 18
§ Platform (hardware) properties § Depend on specific system used for program execution § Results on platform X may be different from results on platform Y § Examples § Cache hit rate (may depend on cache size, replacement policy, mapping strategy) § TLB hit rate § Prefetching effectiveness § Hardware properties require use of PMU or simulator § Simulator may be slow 19
7.3.2 Granularity of information § “Fine-grained” vs. “coarse-grained” § Fine-grained: Information about single resource/effect of individual operations § Examples: § Basic blocks § Instructions/operations § Resources like registers, individual variables, branch outcomes § Coarse-grained: Summary information § For aggregation/collection § Examples: § Method properties § Package/application/library information § Summary information on cache behavior, … § Wall-clock timing 20
7.3.3 Discussion 4 questions/issues to think about 1. What kind of information do you need to capture? § Depends on the relationship between dynamic information (observed at runtime) and the compiler’s optimization/transformation § Connection not always obvious § Dynamic information must provide compiler guidance 21
§ Profiling or reporting of summary information tells you what happened § Time spent § Memory region accessed § Not clear what compiler can do to improve program § Some operations take time § … even if implementation is efficient 22
2. What accuracy is needed? § What cost is acceptable? § How much perturbation can be tolerated? § See Question 1 § Are observations repeatable? To what extend? § All measurements (“live” observations) must deal with cost of collection, overhead, measurement errors, perturbation, …. § Devise measurement strategy 23
3. How can we bridge the gap between what is collected and what is needed by the compiler? § Compiler works with data structures (and/or abstractions), instrumentation or PMUs work with addresses § Example: Map virtual (or physical) addresses to symbol table that is used by the compiler § Objects may be moved by garbage collector § Must constantly update map of addresses and (application program) objects 24
4. Is the information obtained stable ? § Stability: similar setups provide similar results § Variations on this theme: how much does the information depend on the input data? § Record decisions on dynamic method resolution for input A § Same information for input B § Is there (any) overlap? § Variation: how does scaling the size of the input set influence the information obtained? § Example: Execution time as for quadratic vs exponential algorithms § Small input sets may mislead due to constant factors § “Algorithmic profiling” 25
§ Profiles most interesting to compiler § Hardware designers should deal with processor properties § Helps if the hardware designers understand compilers 26
Return of investment § The cost of obtaining and processing dynamic information must be recovered § Speedup of execution § Otherwise why bother? § May be difficult if § Program executed rarely § Dynamic information not stable § Information gathered may be unconnected to compiler optimization 27
7.4 Types of profiles § A wide variety of events can be observed 28
7.4.1 Method invocation § Frequency of method invocation § Which methods are invoked / functions called? § Breakdown as function of all methods invoked/functions called § Absolute counts § Sampling works usually well § Some inaccuracy can be tolerated § Compiler often needs a ranking (from often-invoked to never invoked) § Good tool support § gprof – ancient but still relevant on Unix/Linux systems § vtune – powerful tool for IA32, connects to managed runtimes 29
§ Variations § Time spent in a method § Time spent in the body of method f() § Time spent in a method plus (including) time spent in called method § f() à g() à h() § Time in g and h is included in time for f § Time spent in a method in a given context § Time spent in method when invoked at call site 1, call site 2, …. 30
§ Method information useful but often compiler needs finer- grained information § Profiles that look inside a method 31
Recommend
More recommend