CS502: Compiler Design Intermediate Code Generation Manas Thakur Fall 2020
Midway through the course! Character stream Machine-Independent Machine-Independent Lexical Analyzer Lexical Analyzer Code Optimizer Code Optimizer B a c k e n d Intermediate representation Token stream F r o n t e n d Syntax Analyzer Code Generator Syntax Analyzer Code Generator Syntax tree Target machine code Machine-Dependent Semantic Analyzer Machine-Dependent Semantic Analyzer Code Optimizer Code Optimizer Syntax tree Target machine code Intermediate Intermediate Symbol Code Generator Code Generator Table Intermediate representation Manas Thakur CS502: Compiler Design 2
Roles of IR Generator ● Act as a glue between front-end and back-end – Or source and machine codes ● Lower abstraction from source level – To make life simple ● Maintain some high-level information – To keep life interesting ● Make the dream of m+n components for m languages and n platforms look like a possibility – Scala to Java Bytecode, for example ● Enable machine-independent optimization – Next phase Manas Thakur CS502: Compiler Design 3
Intermediate Representations (IR) ● IR design affects compiler speed and capabilities ● Some important IR properties: – Ease of generation, manipulation, optimization – Size of the representation – Level of abstraction: level of detail in the IR ● How close is the IR to source code? To the machine? ● What kinds of operations are represented? ● Often, different IRs for different jobs: – High-level IR: close to the source language – Low-level IR: close to the assembly code – Some compilers even have mid-level IRs! Manas Thakur CS502: Compiler Design 4
Kinds of IRs ● Structural Examples: – Graph oriented ASTs, DAGs – Heavily used in IDEs, source-to-source translators – Tend to be large ● Linear Examples: 3 address code – Pseudo-code for an abstract machine Bytecode (Stack machine) – Level of abstraction varies – Simple, compact data structures ● Hybrid Examples: Control-fmow graphs, – Combination of graphs and linear code Ideal IR (HotSpot C2) Manas Thakur CS502: Compiler Design 5
Abstract Syntax Tree (AST) ● Parse tree with some intermediate nodes removed - x – 2 * y x * y 2 ● Advantages: – Easy to evaluate ● Postfix form: x 2 y * - ● Useful for interpretation – Source code can be reconstructed ● Helpful in program understanding Manas Thakur CS502: Compiler Design 6
Directed Acyclic Graph (DAG) ● AST with a unique node for each value a + a * (b – c) + (b – c) * d + + + + + * + * + * + * * becomes a d * * - d a d * - d - a - - c a a b - c a b c b c c b b c b ● Advantages: – Compact (reduces redundancy) – Won’t have to evaluate the same expression twice Manas Thakur CS502: Compiler Design 7
Three Address Code (3AC or TAC) ● At most – Three addresses (names/constants) in the instruction – One operator on the right hand side of assignment ● General statement form: x = y op z ● Longer expressions are simplified by introducing temporaries t1 = 2 * y t2 = x – t1 z = x – 2 * y becomes z = t2 or ● Advantages: t1 = 2 * y z = x – t1 – Easy to understand – Names for intermediate values Manas Thakur CS502: Compiler Design 8
More about 3AC ● Allows variety of instructions: – Assignments ● x = y op z ● x = op y ● x = y ● x = y[i] and x[i] = y ● x = y.f and x.f = y – Branches ● goto L ● if x goto L – Procedure calls ● param x 1 ; param x 2 ; ..., param x n ; call p, n – Pointer assignments Manas Thakur CS502: Compiler Design 9
Classwork: Generate 3AC t1 = b - c r = a + a * (b – c) + (b – c) * d ● t2 = t1 * d t3 = b – c t4 = a * t3 t5 = t4 + t2 if (x < y) S1 else S2 r = a + t5 ● t1 = x < y if !t1 goto L1 S1 goto L2 L1: S2 L2: L1: c = x < 10 t = !c while (x < 10) S1 ● if !t goto L2 S1 goto L1 L2: Manas Thakur CS502: Compiler Design 10
3AC Representations ● Triples ● Quadruples Instructions cannot be reordered easily. Instructions can be reordered easily. Assignment: a = b * -c + d * -e op arg1 arg2 result op arg1 arg2 minus c t1 t1 = minus c minus c 0 t2 = b * t1 * b t1 t2 1 * b (0) t3 = minus e minus e t3 minus e 2 3 t4 = d * t3 * d t3 t4 * d (2) t5 = t2 + t4 + t2 t4 t5 + (1) (3) 4 a = t5 = t5 a = a (4) 5 Manas Thakur CS502: Compiler Design 11
3AC Representations (Cont.) ● Triples Instructions cannot be ● Quadruples reordered easily. Assignment: a = b * -c + d * -e op arg1 arg2 (0) (2) minus c t1 = minus c 0 0 (3) (1) 1 1 * b (0) t2 = b * t1 (0) (2) minus e t3 = minus e 2 2 (1) 3 3 (3) * d (2) t4 = d * t3 (4) (4) + (1) (3) t5 = t2 + t4 4 4 (5) (5) = a (4) a = t5 5 5 Indirect triples can be reordered easily Manas Thakur CS502: Compiler Design 12
2 Address Code ● Where have you seen them? – Common in Assembly ● Example: MOV R1, y MUL R1, 2 z = x – 2 * y becomes MOV R2, x SUB R2, R1 MOV x , R2 ● Larger number of instructions compared to 3AC ● Good for register allocation Manas Thakur CS502: Compiler Design 13
1 Address Code ● Stack-based computers ● Example: Java Virtual Machines! push x push 2 push y becomes x – 2 * y multiply subtract ● Advantages: – Simple to generate and execute – Compact form ● There is a reason you find Java based systems popular in: – Embedded systems – Mobile phones (Android) – Systems where code is transmitted (Internet) Manas Thakur CS502: Compiler Design 14
What next? ● More IRs (while learning CGO): – Control-Flow Graph (CFG) – Static Single Assignment (SSA) ● Next class: IR generation – Focus: 3AC. Why? ● Comfortable and still affordable! ● Offers a wide understanding of the involved challenges. ● Assignment 3 would involve 3AC generation! – But there is time for it. Manas Thakur CS502: Compiler Design 15
CS502: Compiler Design Intermediate Code Generation (Cont.) Manas Thakur Fall 2020
IR Generation ● High level language is complex ● Goal: Lower HLL code to a simpler form (3AC) ● Constructs that we need to translate: – Variable declarations – Expressions – Array accesses – Control structures (conditionals, loops) – Function calls – Function bodies – Classes and objects! ● Approach: Syntax-directed translation from parse tree. Manas Thakur CS502: Compiler Design 17
Variable declarations ● Use symbol tables – Maps from names to values ● Take care of nested scopes – What will you do at the entry to a new block? – What to do at a function call? – Function entry? – Function exit? – Need to push and pop the current environment. ● Fields of a structure/class? – We will study in detail when we learn translating objects. Manas Thakur CS502: Compiler Design 18
Lowering scheme ● Code template for each AST node – Captures key semantics of each construct – Has blanks for the node’s children – Implemented in a function called gen ● To fill in the template: – Call the function gen recursively on children ● Did anyone say “visitors”? – Plug code into the blanks ● How to stitch code together? – gen stores the results into a temporary – Emit code that combines the results for the syntactic construct represented by the current node Manas Thakur CS502: Compiler Design 19
Translating expressions Say E.addr is a synthesized attribute that denotes the temporary holding the value of E . Construct Translation E.addr = newtemp(); E -> E 1 + E 2 gen(E.addr ‘=’ E 1 .addr ‘+’ E 2 .addr) attribute E.code : Construct Translation In terms of an E.addr = newtemp(); E -> E 1 + E 2 E.code = E1.code || E2.code || gen(E.addr ‘=’ E 1 .addr ‘+’ E 2 .addr) Construct visit() method In terms of our t1 = visit(E1); assignment: t2 = visit(E2); E -> E 1 + E 2 r = newtemp(); System.out.println(“r = t1 + t2”); return r; Manas Thakur CS502: Compiler Design 20
Translating expressions (Cont.) Construct Translation S -> id = E gen(symTab.get(id.lexeme) ‘=’ E.addr) E.addr = newtemp() E -> -E 1 gen(E.addr ‘=’ ‘-’E 1 .addr) E -> (E 1 ) E.addr = E 1 .addr E -> id E.addr = symTab.get(id.lexeme) ● symTab is the symbol table of the current scope. Manas Thakur CS502: Compiler Design 21
Example ● 3AC for a = b + -c : t1 = - c t2 = b + t1 a = t2 Construct Translation S -> id = E gen(symTab.get(id.lexeme) ‘=’ E.addr) E.addr = newtemp(); E -> E 1 + E 2 gen(E.addr ‘=’ E 1 .addr ‘+’ E 2 .addr) E.addr = newtemp() E -> -E 1 gen(E.addr ‘=’ ‘-’E 1 .addr) E -> (E 1 ) E.addr = E 1 .addr E -> id E.addr = symTab.get(id.lexeme) Manas Thakur CS502: Compiler Design 22
Translating array references ● Each type has a width (e.g., int may have 4) ● How do you get the relative address (from base) of the i th element of an array A , that is, A[i] ? – base + i * w ● What about A[i][j] ? – base + i 1 * w 1 + i 2 * w 2 ● In general for a k-dimension array: – base + i 1 * w 1 + i 2 * w 2 + ... + i k * w k ● Note: We are assuming row-major order. Manas Thakur CS502: Compiler Design 23
Recommend
More recommend