Compiler Technology Compiler Technology � A Joos compiler uses big technology: Compilation 2007 Compilation 2007 • scanner and parser generator • visitor patterns Hand Hand- -Written Written • aspects One- One -Pass Compilers Pass Compilers • 14 passes through the AST � Even a Joos 0 compiler requires: • 4,063 lines of hand-written code • 21,910 lines of auto-generated code � This is orthogonal to the conceptual complexity: • scanner, parser, weeder, scopes, environments, static type Michael I. Schwartzbach checking, static analysis, code templates, optimization BRICS, University of Aarhus One-Pass Compilers 2 Light- -Weight Technology Weight Technology Limitations of One- -Pass Technology Pass Technology Light Limitations of One � A one-pass (or narrow) compiler: � Limited scope rules: • reads the source file one character at a time • we can't see anything that occurs later in the file • constructs no internal representation of the full program � Lack of static analysis: • outputs the generated code simultaneously • we never get a complete picture of the program � A hand-written compiler: � Lack of optimization: • contains no auto-generated code • we can't look at the generated code twice � Benefits of light-weight technology: • simple, fast, and fun � Downside of light-weight technology: • doesn't scale to complex languages like Java One-Pass Compilers 3 One-Pass Compilers 4 1
The Original One- -Pass Language Pass Language One- -Pass Scopes in Pascal Pass Scopes in Pascal The Original One One � Pascal introduced forward declarations: � The Pascal language (1970): • a hand-written, one-pass compiler procedure foo(x: alpha; var y: integer); forward; • implemented in 4000 lines of Pascal • simplicity of the compiler was a major design criteria to permit mutually recursive procedures � The light-weight tradition continued to: � Also, (as yet) unknown pointers were allowed: • countless Pascal dialects (at least 15 languages) • the Modula-2 language (1978) type List = ^Item; type Item = record • the Oberon language (1988) head: integer; tail: List; end; One-Pass Compilers 5 One-Pass Compilers 6 An Example Language: C0 Context- -Free Grammar for C0 Free Grammar for C0 An Example Language: C0 Context program → header function * main � The C0 language is a simple subset of C header → #include <stdio.h> function → int id ( formals ? ) body | int id ( formals ? ) ; � Limitations: → formal | formal , formals formals • only integer types formal → int id → { decl * stm * } • control structures if , else , while , return body → int id init ? ; decl • operators + , * , / , - , % , ! , & , | , = , != , < , > , <= , >= init → = const • I/O only with putchar and getchar → id = exp ; | while( exp ) { stm * } | putchar( exp ); | stm � Includes function protoypes (similar to forward ) return exp ; | if ( exp ) { stm * } | if ( exp ) { stm * } else { stm * } → const | id | - exp | ! exp | id ( actuals ? ) | exp � Every C0 program can also be compiled by gcc exp op exp | getchar() | ( exp ) → exp | exp , actuals actuals � We shall hand-write a one-pass compiler → intconst | ' char ' | '\n' const → + | - | * | / | % | & | | | = | != | < | > | <= | >= op One-Pass Compilers 7 One-Pass Compilers 8 2
Example C0 Program The IJVM Target Architecture Example C0 Program The IJVM Target Architecture #include <stdio.h> int gcd(int x, int y) { program → method + while ( x != y ) { int writedigits(int n) { if (x < y) { y = y - x; } → .method symbol directive * insn + method int w; else { x = x - y; } directive → .args expr | .locals expr | .define symbol = expr if (n!=0) { } w = writedigits(n/10); return x; → bipush expr | dup | goto symbol | iadd | insn putchar('0'+n%10); } iand | ifeq symbol | iflt symbol | if_icmpeq symbol | } return 0; main() { iinc expr , expr | iload expr | invokevirtual symbol | } int w; ior | ireturn | istore expr | isub | ldc_w expr | w = writeint(gcd(15,35)); int writeint(int n) { } nop | pop | swap | symbol : int w; expr → integer | symbol | expr + expr | expr - expr | ( expr ) if (n==0) { putchar('0'); } else { if (n<0) { putchar('-'); n = -n; } w = writedigits(n); } return 0; } One-Pass Compilers 9 One-Pass Compilers 10 Example IJVM Code Defining Tokens Example IJVM Code Defining Tokens .method writedigits ldc_w 0 isub iload 2 static final int tLPAR = 0; .args 2 ireturn istore 1 iload 1 .locals 1 bipush 0 goto L5 isub static final int tRPAR = 1; iload 1 ireturn L4: istore 2 bipush 44 .method writeint L5: goto L9 static final int tASSIGN = 2; swap .args 2 bipush 44 L8: static final int tSEMI = 3; ldc_w 0 .locals 1 iload 1 iload 1 invokevirtual ne_ iload 1 invokevirtual writedigits iload 2 static final int tCOMMA = 4; ifeq L0 bipush 44 istore 2 isub static final int tEQ = 5; bipush 44 swap L3: istore 1 iload 1 ldc_w 0 ldc_w 0 L9: static final int tNE = 6; bipush 44 invokevirtual eq_ ireturn goto L6 static final int tID = 7; swap ifeq L2 bipush 0 L7: ldc_w 10 bipush 44 ireturn iload 1 static final int tCONST = 8; invokevirtual div_ ldc_w 48 .method gcd ireturn invokevirtual writedigits invokevirtual putchar .args 3 bipush 0 static final int tCHAR = 9; istore 2 goto L3 L6: ireturn static final int tADD = 10; bipush 44 L2: iload 1 .method main ldc_w 48 iload 1 bipush 44 .args 1 ... iload 1 bipush 44 swap .locals 1 static String tFile; // source file name bipush 44 swap iload 2 bipush 44 swap ldc_w 0 invokevirtual ne_ bipush 44 static int tLine; // current line ldc_w 10 invokevirtual lt_ ifeq L7 ldc_w 15 static int tCol; // current column invokevirtual mod_ ifeq L4 iload 1 ldc_w 35 iadd bipush 44 bipush 44 invokevirtual gcd static int tIntValue; // value if tCONST invokevirtual putchar ldc_w 45 swap invokevirtual writeint goto L1 invokevirtual putchar iload 2 istore 1 static String tIdValue; // value if tID L0: bipush 0 invokevirtual lt_ static int tKind; // current token kind L1: iload 1 ifeq L8 One-Pass Compilers 11 One-Pass Compilers 12 3
A Hand- -Written Scanner Written Scanner Embedding a DFA in Java (1/3) A Hand Embedding a DFA in Java (1/3) static int nextChar() { // read next char from the source file case '\'': nextChar(); if (c==-1 || c=='\'') { try { if (c=='\\') { tKind = tERROR; c = in.read(); nextChar(); break; } catch (Exception e) { if (c!='n') { } c = -1; tKind = tERROR; tIntValue = c; } break; nextChar(); if (c=='\n') { } if (c!='\'') { tLine++; nextChar(); tKind = tERROR; tCol = 1; if (c!='\'') { break; } else tCol++; tKind = tERROR; } return c; break; nextChar(); } } tKind = tCONST; nextChar(); break; int c; // current char tKind = tCONST; tIntValue = 10; static int nextToken() { // recognize next token break; switch (c) { } ... } } One-Pass Compilers 13 One-Pass Compilers 14 Embedding a DFA in Java (2/3) Embedding a DFA in Java (3/3) Embedding a DFA in Java (2/3) Embedding a DFA in Java (3/3) -1,' -1,' case '\'': nextChar(); if (c==-1 || c=='\'') { if (c=='\\') { tKind = tERROR; \ \ ' ' nextChar(); break; tERROR ¬ -1,',\ ¬ -1,',\ if (c!='n') { } ¬ n ¬ n tKind = tERROR; tIntValue = c; n n break; nextChar(); tERROR } if (c!='\'') { ¬ ' ¬ ' nextChar(); tKind = tERROR; ' ' if (c!='\'') { break; tERROR ¬ ' ¬ ' tKind = tERROR; } break; nextChar(); tERROR ' ' } tKind = tCONST; tCONST nextChar(); break; tKind = tCONST; tIntValue = 10; tCONST break; } One-Pass Compilers 15 One-Pass Compilers 16 4
Recommend
More recommend