1
play

1 Control-Flow Profiles Code Motion Using Control Flow Profiles - PowerPoint PPT Presentation

Profile-Guided Optimizations Motivation for Profiling Recall Limitations of static analysis Instruction scheduling Compilers can analyze possible paths but must behave conservatively List scheduling Frequency information


  1. Profile-Guided Optimizations Motivation for Profiling Recall Limitations of static analysis – Instruction scheduling – Compilers can analyze possible paths but must behave conservatively – List scheduling – Frequency information cannot be obtained through static analysis – Register renaming – Loop unrolling How runtime information helps – Software pipelining – Control flow information if c – Alias analysis 10% 90% – how can we use alias analysis for instruction scheduling? Optimize the more frequent path – what causes conservative results? (perhaps at the expense of the less frequent path) Today − Memory conflicts – More instruction scheduling If r5 and r4 always have different values, st r1,0(r5) – Profiling we can move the load above the store ld r2,0(r4) – Trace scheduling CS553 Lecture Profile-Guided Optimizations 2 CS553 Lecture Profile-Guided Optimizations 3 Profile-Guided Optimizations Profiling Issues Basic idea Profile data – Instrument and run program on sample inputs to get likely runtime – Collected over whole program run behavior – May not be useful (unbiased branches) – Can use this information to improve instruction scheduling – May not reflect all runs – Many other uses – May be expensive and inconvenient to gather – Code placement – Continuous profiling [Anderson 97] – Inlining – May interfere with program – Value speculation – Branch prediction – Class-based optimization (static method lookup) CS553 Lecture Profile-Guided Optimizations 4 CS553 Lecture Profile-Guided Optimizations 5 1

  2. Control-Flow Profiles Code Motion Using Control Flow Profiles Commonly gather two types of information Code motion across basic blocks – Execution frequencies of basic blocks – Increased scheduling freedom – Branch frequencies of conditional branches A B – Represent information in a weighted flow graph − If we want to move s1 to A, we must move execution frequencies 1 100 C s1 s1 to both A and B move code above a join 2 100 branch frequencies 30 70 s1 A 3 4 70 30 − If we want to move s1 to B, we must Instrumentation B C move s1 to both B and C – Insert instrumentation code at basic block entrances and before each move code below a split branch – Take average of values from multiple training runs – Fairly inexpensive CS553 Lecture Profile-Guided Optimizations 6 CS553 Lecture Profile-Guided Optimizations 7 Code Motion Using Control Flow Profiles (cont) Memory-Dependence Profiles Code motion across basic blocks Gather information about memory conflicts – Increased scheduling freedom – Frequencies of address matches between pairs of loads and stores – Attempts to answer the question: Are two references independent of one another? A s1 B A B – Concentrate on ambiguous reference pairs (those that the compiler cannot s1 C C ′ C figure out) move code below a join tail duplication prevents B → C from seeing s1 st1: store r5 (st1, ld2, 7) If this number is low, we can ld2: load r4 A − If we want to move s1 from B to A and if s1 speculatively assume that st1 would destroy a value along the A → C path, and ld2 do not conflict Instrumentation B s1 C do renaming – Much more expensive (in both space and time) to gather than control flow move code above a split − What if s1 might cause an exception? information – First perform control flow profiling – Apply only to most frequently executed blocks CS553 Lecture Profile-Guided Optimizations 8 CS553 Lecture Profile-Guided Optimizations 9 2

  3. Trace Scheduling [Fisher 81] and [Ellis 85] Trace Scheduling (example) trace: Basic idea b[i] = “old” b[i] = “old” – We want large blocks to create large scheduling windows, but basic a[i] = ... a[i] = ... if (a[i]>0) then blocks are small because branches are frequent b[i]=“new”; b[i]=“new”; – Create superblocks to increase scheduling window c[i] = ... else if (a[i]<=0) then goto repair – Use profile information to create good superblocks stmt X continue: stmt Y – Optimize each superblock independently ... endif c[i] = ... Superblocks – A sequence of basic blocks with a single entrance and multiple exits repair: restore old b[i] 1 stmt X Goals stmt Y a superblock – Want large superblocks recalculate c[i]? 2 goto continue – Want to avoid early exits 3 4 – Want blocks that match actual execution paths CS553 Lecture Profile-Guided Optimizations 10 CS553 Lecture Profile-Guided Optimizations 11 Trace Scheduling (cont) Trace Scheduling (cont) Three steps 1. Superblock formation (cont) 1. Create superblocks – Convert traces into Superblocks 2. Enlarge superblocks – Use tail duplication to eliminate side entrances 3. Compact (optimize) superblocks A A 70 30 70 30 1. Superblock formation trace superblock C C B B − Create traces using mutual-most-likely heuristic 70 70 10 10 (two blocks A and B are mutual-most-likely if B is the most likely E E E ′ successor of A, and A is the most likely predecessor of B) − Tail duplication increases code size D A − A trace is a maximal sequence of mutual- 10 70 30 most-likely blocks that does not contain a back B C edge 70 10 − Each block belongs to exactly one trace E CS553 Lecture Profile-Guided Optimizations 12 CS553 Lecture Profile-Guided Optimizations 13 3

  4. Trace Scheduling (cont) Trace Scheduling (cont) 2. Superblock enlargement 3. Optimizations – Enlarge superblocks that are too small – Perform list scheduling for each superblock – Code expansion can hurt i-cache performance – Memory-dependence profiles can be used to speculatively assume that load/store pairs do not conflict – Insert repair code in case the assumption is incorrect Three techniques for enlargement – Software pipelining – Branch target expansion – If the last branch in a superblock is likely to jump to the start of another superblock, append the contents of the target superblock to the first superblock – Loop peeling – Loop unrolling – These last two techniques apply to superblock loops, which are superblocks whose last blocks are likely to jump to their first blocks – Assume that each loop body has a single dominant path CS553 Lecture Profile-Guided Optimizations 14 CS553 Lecture Profile-Guided Optimizations 15 Enhancements to Profile-Guided Code Scheduling Speculation based on memory-dependence profiles (example) trace: Path profiling [Ball and Larus 96] b[i] = “old” b[i] = “old” a[i] = ... – Collect information about entire paths instead of about individual edges c[i] = a[j] if (a[i]>0) then a[i] = ... b[i]=“new”; 50 50 50 50 50 50 b[i]=“new”; else if (i==j) then goto deprepair stmt X if (a[i]<=0) then goto repair stmt Y continue: 50 50 50 50 50 50 endif ... c[i] = a[j] deprepair: c[i] = a[i] Edge profiles Path profiles Path profiles if (a[i]<=0) then goto repair goto continue repair: – Limit paths to some specified length (can thus handle loops) restore old b[i] – Can also stop paths at back edges stmt X – Disadvantages of path profiling? stmt Y goto continue CS553 Lecture Profile-Guided Optimizations 16 CS553 Lecture Profile-Guided Optimizations 17 4

  5. Lessons Concepts Larger scope helps Instruction scheduling – How can we increase scope? How do we schedule across control – Trace scheduling dependences? – Uses profile information – Looks at scopes beyond basic blocks Static information is limited – Use profiles Miscellany – How else can profiles be used in optimization? – Path profiling – Can we do these kinds of optimizations at runtime? CS553 Lecture Profile-Guided Optimizations 18 CS553 Lecture Profile-Guided Optimizations 19 5

Recommend


More recommend