Some Thoughts on Grad School Undergraduate Compilers Review and Intro to MJC Goal Announcements – learn how to learn a subject in depth – Mailing list is in full swing, go ahead and share test cases – learn how to organize a project, execute it, and write about it Today Iterate through the following: – Semantic analysis – read the background material – try some examples – Visitor pattern for abstract syntax trees – ask lots of questions – IRT Trees – repeat – Assem You will have too much to do! – learn to prioritize – it is not possible to read ALL of the background material – spend 2+ hours of dedicated time EACH day on each class/project – what grade you get is not the point – have fun and learn a ton! CS553 Lecture Undergraduate Compilers Review 2 CS553 Lecture Undergraduate Compilers Review 3 Structure of the MiniJava Compiler (CodeGenAssem.java) Lexing and Parsing Analysis Synthesis Lexing character stream – theoretical tool: regular expressions – recognizing substrings instead of strings so need longest match and rule Lexer lexical analysis IR code generation Translate priority – implementation tools: flex, lex, SableCC, etc. generate code that tokens “words” IRT Tree/ implements a deterministic finite automata that recognizes the specified instruction selection Mips/Codegen Parser.parse() tokens syntactic analysis AST “sentences” Assem Parsing Project 4 BuildSymTable semantic analysis optimization – theoretical tool: context free grammars CheckTypes – recognizing a whole program of tokens Assem – implementation tools: bison, yacc, SableCC, etc. generate a LALR(1) or CodeGenAssem AST and symbol table code generation bottom-up parser that uses shift-reduce parsing to recognize the program and uses syntax-directed translation to generate an AST minijava.node/ MIPS SymTable/ CS553 Lecture Undergraduate Compilers Review 4 CS553 Lecture Undergraduate Compilers Review 5 1
Syntax-directed Translation: AST Construction example Using SableCC to specify grammar and generate AST Grammer with production rules Productions cst_stm {-> stm} = S: E { $$ = $1; }; cst_exp {-> New stm(cst_exp.exp) } E: E ‘+’ T { $$ = new node(“+”, $1, $3); } ; | T { $$ = $1; } ; cst_exp {-> exp} = {plus_rule} } cst_exp t_plus cst_term T: T_ID { $$ = new leaf(“id”, $1); }; {-> New exp.plus(cst_exp.exp, cst_term.exp) } Implicit parse tree for a+b+c AST for a+b+c | {term_rule} } cst_term {-> cst_term.exp } ; S + cst_term {-> exp} = E t_id + {-> New exp.id(t_id) } E + T c ; E + T Abstract Syntax Tree T_ID b a T T_ID stm = exp; T_ID c b exp = {plus} [l_exp]:exp [r_exp]:exp | a {id} t_id; Reference: Barbara Ryder’s 198:515 lecture notes CS553 Lecture Undergraduate Compilers Review 6 CS553 Lecture Undergraduate Compilers Review 7 Example Abstract Syntax Tree MJC Semantic Analysis class Fac { class Factorial{ Determine whether source is meaningful public static void main(String[] a){ public int ComputeFac(int num){ System.out.println(new int num_aux ; – Check for semantic errors Fac().ComputeFac(10)); if (num < 1) } – Check for type errors } num_aux = 1 ; else – Gather type information for subsequent stages num_aux = num * (this.ComputeFac(num-1)) ; – Relate variable uses to their declarations return num_aux ; } } Example errors (from C) function1 = 3.14159; x = 570 + “hello, world!” scalar[i] CS553 Lecture Undergraduate Compilers Review 8 CS553 Lecture Undergraduate Compilers Review 9 2
Compiler Data Structures Using the Visitor Pattern for semantic analysis public final class APlusExp extends PExp public class DepthFirstAdapter extends { Symbol Tables AnalysisAdapter { ... ... – Compile-time data structure public void apply(Switch sw) { public void inAPlusExp(APlusExp node) { – Holds names, type information, and scope information for variables ((Analysis) sw).caseAPlusExp(this); defaultIn(node); } } Scopes ... public void outAPlusExp(APlusExp node) – A name space { e.g., In Pascal, each procedure creates a new scope defaultOut(node); } e.g., In C, each set of curly braces defines a new scope public void caseAPlusExp(APlusExp node) – Can create a separate symbol table for each scope { The BuildSymTable is an inAPlusExp(node); – What are the scopes in MiniJava? example visitor that uses if(node.getLExp() != null) { node.getLExp().apply(this); Using Symbol Tables this visitor pattern. } – For each variable declaration: if(node.getRExp() != null) { node.getRExp().apply(this); – Check for symbol table entry } – Add new entry; add type info outAPlusExp(node); } – For each variable use: ... – Check symbol table entry CS553 Lecture Undergraduate Compilers Review 10 CS553 Lecture Undergraduate Compilers Review 11 Symbol Table in the MiniJava Compiler Compiling Procedures Properties of procedures higher addresses – Procedures/methods/functions define scopes AR: zoo – Procedure lifetimes are nested – Can store information related to dynamic invocation of a procedure on a call stack ( activation record or AR or AR: goo stack frame): – Space for saving registers – Space for passing parameters and returning values AR: foo – Space for local variables – Return address of calling instruction AR: foo Stack management – Push an AR on procedure entry (caller or callee) lower addresses stack – Pop an AR on procedure exit (caller or callee) – Why do we need a stack? CS553 Lecture Undergraduate Compilers Review 12 CS553 Lecture Undergraduate Compilers Review 13 3
Stack Frame for MiniJava Compiler Wisconsin C-- calling convention int foo(int x,int y,int *z) { .text Calling convention (contract between caller and callee) .globl main int a; main: a = x * y - *z; – $sp must be divisible by 4 sw $ra, 0($sp) #PUSH return a; subu $sp, $sp, 4 – caller should pass parameters in order on the stack } sw $fp, 0($sp) #PUSH void main() { subu $sp, $sp, 4 – upon callee entry, the stack pointer $sp should be pointing at the first int x; addu $fp, $sp, 8 empty slot past the last parameter subu $sp, $fp, 12 x = 2; li $t0, 2 cout << foo(4,5,&x); – upon callee exit, the stack pointer $sp should be pointing at the first sw $t0, -8($fp) cout << "\n"; parameter li $t0, 4 } sw $t0, 0($sp) #PUSH – upon callee exit, return value should be in $v0 subu $sp, $sp, 4 .text li $t0, 5 Rules to follow for PA6 (to standardize frame usage) _foo: sw $t0, 0($sp) #PUSH sw $ra, 0($sp) #PUSH subu $sp, $sp, 4 – $sp should always be pointing at next empty slot on the stack subu $sp, $sp, 4 subu $t0, $fp, 8 sw $fp, 0($sp) #PUSH sw $t0, 0($sp) #PUSH – $ra and $fp should be stored right after the parameters on stack, you can’t subu $sp, $sp, 4 subu $sp, $sp, 4 addu $fp, $sp, 20 use any other callee-saved registers jal _foo subu $sp, $fp, 24 move $a0, $v0 ... – $fp should be made to point at the first parameter, so that the address for ... lw $t0, -20($fp) lw $ra, 0($fp) the first parameter is $fp-0, the address for the second parameter is $fp-4, move $v0, $t0 move $t0, $fp lw $ra, -12($fp) ... lw $fp, -4($fp) move $t0, $fp move $sp, $t0 – locals should be stored in order, right after $ra and $fp lw $fp, -16($fp) jr $ra move $sp, $t0 jr $ra CS553 Lecture Undergraduate Compilers Review 14 CS553 Lecture Undergraduate Compilers Review 15 Compiling Procedures (cont) Code Generation Code generation for procedures Conceptually easy – Emit code to manage the stack – IRT Tree is a generic machine language, 3-address code is another example of an intermediate representation – Are we done? – Instruction selection converts the low-level IR to real machine instructions Translate procedure body The source of heroic effort on modern architectures – References to local variables must be translated to refer to the current activation record – Alias analysis – References to non-local variables must be translated to refer to the – Instruction scheduling for ILP appropriate activation record or global data space – Register allocation – More later. . . CS553 Lecture Undergraduate Compilers Review 16 CS553 Lecture Undergraduate Compilers Review 17 4
Recommend
More recommend