Lecture Outline Intermediate Code & • Intermediate code Local Optimizations • Local optimizations 2 Compiler Design I (2011) Code Generation Summary Why Intermediate Languages? • We have so far discussed ISSUE: ISSUE: When to perform optimizations – Runtime organization – On abstract syntax trees • Pro: Machine independent – Simple stack machine code generation • Con: Too high level – Improvements to stack machine code generation – On assembly language • Our compiler goes directly from the abstract • Pro: Exposes most optimization opportunities syntax tree (AST) to assembly language • Con: Machine dependent – And does not perform optimizations • Con: Must re-implement optimizations when re-targeting – (optimization is the last compiler phase, which is by – On an intermediate language far the largest and most complex these days) • Pro: Exposes optimization opportunities • Most real compilers use intermediate • Pro: Machine independent languages 3 4 Compiler Design I (2011) Compiler Design I (2011)
Why Intermediate Languages? Kinds of Intermediate Languages High-level intermediate representations: • Have many front-ends into a single back-end – closer to the source language; e.g., syntax trees – gcc can handle C, C++, Java, Fortran, Ada, ... – easy to generate from the input program – each front-end translates source to the same – code optimizations may not be straightforward generic language (called GENERIC ) Low-level intermediate representations: – closer to target machine; e.g., P-Code, U-Code (used • Have many back-ends from a single front-end in PA-RISC and MIPS), GCC’s RTL, 3-address code – Do most optimization on intermediate representation – easy to generate code from before emitting code targeted at a single machine – generation from input program may require effort “Mid”-level intermediate representations: – Java bytecode, Microsoft CIL, LLVM IR, ... 5 6 Compiler Design I (2011) Compiler Design I (2011) Intermediate Code Languages: Design Issues Intermediate Languages • Designing a good ICode language is not trivial • Each compiler uses its own intermediate language • The set of operators in ICode must be rich enough to allow the implementation of source – IL design is still an active area of research language operations • Nowadays, usually an intermediate language is a high-level assembly language • ICode operations that are closely tied to a particular machine or architecture, make – Uses register names, but has an unlimited number retargeting harder – Uses control structures like assembly language – Uses opcodes but some are higher level • A small set of operations • E.g., push translates to several assembly instructions – may lead to long instruction sequences for some • Most opcodes correspond directly to assembly opcodes source language constructs, – but on the other hand makes retargeting easier 7 8 Compiler Design I (2011) Compiler Design I (2011)
Architecture of gcc Three-Address Intermediate Code • Each instruction is of the form x := y op z – y and z can be only registers or constants – Just like assembly • Common form of intermediate code • The expression x + y * z is translated as t 1 := y * z t 2 := x + t 1 – temporary names are made up for internal nodes – each sub-expression has a “home” 9 10 Compiler Design I (2011) Compiler Design I (2011) Generating Intermediate Code Generating Intermediate Code (Cont.) • Similar to assembly code generation • igen(e, t) function generates code to compute the value of e in register t • Major difference • Example: – Use any number of IL registers to hold intermediate results igen(e 1 + e 2 , t) = Example: if (x + 2 > 3 * (y - 1) + 42) then z := 0; igen(e 1 , t 1 ) (t 1 is a fresh register) igen(e 2 , t 2 ) (t 2 is a fresh register) t 1 := x + 2 t 2 := y - 1 t := t 1 + t 2 t 3 := 3 * t 2 • Unlimited number of registers t 4 := t 3 + 42 ⇒ simple code generation if t 1 =< t 4 goto L z := 0 L: 11 12 Compiler Design I (2011) Compiler Design I (2011)
An Intermediate Language From 3-address code to machine code P → S P | ε This is almost a macro expansion process S → id := id op id • id’s are register names 3-address code MIPS assembly code | id := op id x := A[i] load i into r1 • Constants can replace id’s | id := id la r2 , A | push id • Typical operators: +, -, * add r2 , r2 , r1 lw r2 , ( r2 ) | id := pop sw r2 , x | if id relop id goto L x := y + z load y into r1 | L: load z into r2 add r3 , r1 , r2 | goto L sw r3 , x if x >= y goto L load x into r1 load y into r2 bge r1 , r2 , L 13 14 Compiler Design I (2011) Compiler Design I (2011) Basic Blocks Basic Block Example • A basic block is a maximal sequence of Consider the basic block instructions with: L: (1) – no labels (except at the first instruction), and t := 2 * x (2) – no jumps (except in the last instruction) w := t + x (3) if w > 0 goto L’ (4) • Idea: • No way for (3) to be executed without (2) – Cannot jump into a basic block (except at beginning) having been executed right before – Cannot jump out of a basic block (except at end) – We can change (3) to w := 3 * x – Each instruction in a basic block is executed after – Can we eliminate (2) as well? all the preceding instructions have been executed 15 16 Compiler Design I (2011) Compiler Design I (2011)
Identifying Basic Blocks Control-Flow Graphs • Determine the set of leaders , i.e., the first A control-flow graph is a directed graph with instruction of each basic block: – Basic blocks as nodes – The first instruction of a function is a leader – An edge from block A to block B if the execution can flow from the last instruction in A to the first – Any instruction that is a target of a branch is a instruction in B leader E.g., the last instruction in A is goto L B – Any instruction immediately following a (conditional E.g., the execution can fall-through from block A to block B or unconditional) branch is a leader • For each leader, its basic block consists of Frequently abbreviated as CFGs itself and all instructions upto, but not including, the next leader (or end of function) 17 18 Compiler Design I (2011) Compiler Design I (2011) Control-Flow Graphs: Example Constructing the Control Flow Graph • Identify the basic blocks of the function • The body of a function • There is a directed edge between block B 1 to x := 1 i := 1 (or procedure) can be block B 2 if represented as a control- – there is a (conditional or unconditional) jump from flow graph the last instruction of B 1 to the first instruction of L: B 2 or x := x * x i := i + 1 – B 2 immediately follows B 1 in the textual order of • There is one initial node if i < 42 goto L the program, and B 1 does not end in an unconditional jump. • All “return” nodes are terminal 19 20 Compiler Design I (2011) Compiler Design I (2011)
Optimization Overview A Classification of Optimizations • Optimization seeks to improve a program’s For languages like C there are three granularities utilization of some resource of optimizations – Execution time (most often) (1) Local optimizations • Apply to a basic block in isolation – Code size (2) Global optimizations – Network messages sent • Apply to a control-flow graph (function body) in isolation – (Battery) power used, etc. (3) Inter-procedural optimizations • Apply across method boundaries • Optimization should not alter what the program computes Most compilers do (1), many do (2) and very few do (3) – The answer must still be the same – Observable behavior must be the same • this typically also includes termination behavior 21 22 Compiler Design I (2011) Compiler Design I (2011) Cost of Optimizations Local Optimizations • In practice, a conscious decision is made not • The simplest form of optimizations to implement the fanciest optimization known • No need to analyze the whole procedure body – Just the basic block in question • Why? • Example: algebraic simplification – Some optimizations are hard to implement – Some optimizations are costly in terms of compilation time – Some optimizations have low benefit – Many fancy optimizations are all three! • Goal: maximum benefit for minimum cost 23 24 Compiler Design I (2011) Compiler Design I (2011)
Recommend
More recommend