abstract syntax
play

Abstract Syntax Aslan Askarov aslan@cs.au.dk Revised from slides by - PowerPoint PPT Presentation

Compilation 2014 Abstract Syntax Aslan Askarov aslan@cs.au.dk Revised from slides by E. Ernst Abstract syntax High-level source Pretty printing code Abstract syntax tree Lexing/Parsing Elaboration Lowering Code generation


  1. 
 
 Compilation 2014 Abstract Syntax Aslan Askarov aslan@cs.au.dk 
 Revised from slides by E. Ernst

  2. Abstract syntax High-level source Pretty printing code Abstract syntax tree Lexing/Parsing Elaboration Lowering Code generation Optimization Low-level target code

  3. Recall ml-yacc file semantic actions in parser … exp : ID ( A.Id (ID) ) | INT ( A.Number (INT) ) | LPAREN exp RPAREN ( exp ) | exp PLUS exp ( A.Op (A.Plus, exp1, exp2 )) | exp MINUS exp ( A.Op (A.Minus, exp1, exp2 )) … .grm file datatype aexp = Id of string What else can we put as | Number of int semantic actions? | Op of binop * aexp * aexp Is it a good idea? and binop = Plus | Minus | Times | Div ML-code

  4. Semantic actions in parser • In principle, it’s possible to write the entire compiler in the parser semantic actions • Not a good idea: error-prone, di ffi cult to maintain • Limited in features: mutually recursive declarations • Rather have multiple passes over the program • need convenient representation • In practice, semantic actions in the parser generate an abstract syntax tree (AST) • tree representation of the program.

  5. Concrete (surface) syntax vs abstract syntax • Concrete syntax corresponds to a grammar • requires pars ability • may even be seriously disfigured: T’ → * F T’ • lots of otherwise unnecessary details • parenthesis, “then”, “else”, semicolons, associativity, precedence, etc • syntactic sugar: a & b vs if a then b else 0 • Abstract syntax • non-parsable, designed for later compiler phases

  6. Decorated abstract syntax trees • Idea: annotate AST with useful information, e.g.: • position in the source – for error reporting • types – for semantic analysis

  7. Including aux information in AST � exp : ID ( A.Id (ID, IDleft) ) | INT ( A.Number (INT) ) | LPAREN exp RPAREN ( exp ) position information � | exp PLUS exp ( A.Op (A.Plus, exp1, exp2, exp1left )) | exp MINUS exp ( A.Op (A.Minus, exp1, exp2, exp1left )) � | exp TIMES exp ( A.Op (A.Times, exp1, exp2, exp1left )) | exp DIV exp ( A.Op (A.Div, exp1, exp2, exp1left )) � type pos = int � datatype aexp = Id of id * pos | Number of int | Op of binop * aexp * aexp * pos

  8. Pretty printing • Generating concrete syntax from abstract • Why? • Formatting • Debugging parsers • Design criteria for abstract syntax • a good AST design will contain all the relevant information to go from abstract syntax to concrete via pretty printing, but no more • Implemented as a straightforward tree traversal • Library support

  9. Pretty printing libraries • Primitive type: block/document • Blocks are stacked vertically and horizontally to form bigger blocks • User provides functions from AST nodes to blocks • straightforward traversal over trees • Layout engine generates a indented string from the outermost block block block x := a + b block block block block block block

  10. Pretty printing >_

  11. AST for Tiger • Just like before; nice syntax trees, though more complex • Representation: note the use of records • Semantic issues: mutual dependencies expressed using sublists of declarations • Overall robust design

  12. Excerpt from “absyn.sml” � structure Absyn = struct � datatype var = SimpleVar of S.symbol * pos | FieldVar of var * S.symbol * pos | SubscriptVar of var * exp * pos � and exp = VarExp of var | NilExp | IntExp of int | StringExp of string * pos | CallExp of calldata ... and decl = FunctionDec of fundecldata list | VarDec of vardecldata | TypeDec of tydecldata list ... and fundecldata = { name: S.symbol , params: fielddata list , result: (S.symbol * pos) option , body: exp , pos: pos} …

  13. AST of “test01.tig” LetExp � /* an array type and body an array variable */ VarDec 'arr1: let TypeDec SeqExp arrtype' type arrtype = array of int init var arr1: arrtype := ArrayExp arrtype [10] of 0 'arrtype' VarExp 'arrtype' in arr1 size init end ArrayTy SimpleVar 10 0 'int' 'arr1'

  14. AST of “test02.tig” � /* arr1 is valid since LetExp * expression 0 is * int = myint body */ � VarDec 'arr1: TypeDec SeqExp arrtype' let type myint = init int ArrayExp type arrtype = 'myint' 'arrtype' VarExp 'arrtype' array of myint var arr1: arrtype := size init arrtype [10] of 0 NameTy ArrayTy SimpleVar 10 0 in 'int' 'myint' 'arr1' arr1 end

  15. AST of “test07.tig” LetExp � body /* mutually recursive Function * functions */ SeqExp Dec let function do_nothing1 'do_nothing1 'do_nothing2: Call : int' string' 'do_nothing1' (a: int, b: string): int = par body par arg body arg par (do_nothing2(a+1); 0) 'a: "str2 'b: string' SeqExp 'd: int' SeqExp 0 � int' " function do_nothing2 Call Call 0 " " (d: int): string = 'do_nothing2' 'do_nothing1' (do_nothing1(d, "str"); " ") arg arg arg in + VarExp "str" do_nothing1(0, "str2") end VarExp 1 SimpleVar 'd' SimpleVar 'a'

  16. AST of “test42.tig” � /* correct declarations */ let type arrtype1 = array of int type rectype1 = {name: string, address: string, id: int, age: int} type arrtype2 = array of rectype1 type rectype2 = {name: string, dates: arrtype1} type arrtype3 = array of string � var arr1 := arrtype1 [10] of 0 var arr2 := arrtype2 [5] of rectype1{name="aname", address="somewhere", id=0, age=0} var arr3: arrtype3 := arrtype3 [100] of "" � var rec1 := rectype1{name="Kapoios", address="Kapou", id=02432, age=44} var rec2 := rectype2{name="Allos", dates = arrtype1 [3] of 1900} in arr1[0] := 1; arr1[9] := 3; arr2[3].name := "kati"; arr2[1].age := 23; arr3[34] := "sfd"; � rec1.name := "sdf"; rec2.dates[0] := 2323; rec2.dates[2] := 2323 end LetExp body TypeDec VarDec 'arr1' VarDec 'arr2' VarDec 'arr3: arrtype3' VarDec 'rec1' VarDec 'rec2' SeqExp init init init init init 'arrtype1' 'rectype1' 'arrtype2' 'rectype2' 'arrtype3' ArrayExp 'arrtype1' ArrayExp 'arrtype2' ArrayExp 'arrtype3' RecordExp 'rectype1' RecordExp 'rectype2' AssignExp AssignExp AssignExp AssignExp AssignExp AssignExp AssignExp AssignExp 'age' RHS size init size init size init 'name' 'address' 'id' 'name' 'dates' LHS RHS LHS RHS LHS RHS LHS LHS RHS LHS RHS LHS RHS LHS RHS ArrayTy 'int' RecordTy ArrayTy 'rectype1' RecordTy ArrayTy 'string' 10 0 5 RecordExp 'rectype1' 100 "" "Kapoios" "Kapou" 2432 44 "Allos" ArrayExp 'arrtype1' SubscriptVar 1 SubscriptVar 3 FieldVar "kati" FieldVar 23 SubscriptVar "sfd" FieldVar "sdf" SubscriptVar 2323 SubscriptVar 2323 'name' 'age' size 'address' 'id' 'age' 'name' 'dates' 'name' 'address' 'id' init 'string' 'string' 'int' 'int' 'string' 'arrtype1' "aname" "somewhere" 0 0 3 1900 SimpleVar 'arr1' 0 SimpleVar 'arr1' 9 SubscriptVar 'name' SubscriptVar 'age' SimpleVar 'arr3' 34 SimpleVar 'rec1' 'name' FieldVar 0 FieldVar 2 SimpleVar 'arr2' 3 SimpleVar 'arr2' 1 SimpleVar 'rec2' 'dates' SimpleVar 'rec2' 'dates'

  17. Issues to note • Structures easily get rather large • Not hard to read, except for size � • Connection to source: ASTs require stored position • available under ml-yacc x, x1, start positions are under xleft, x1left

  18. Summary • Parsing purpose: frontend of the compiler • single-pass compilation possible, but not messy, limited in functionality • Abstract syntax tree: • tree representation of the program • produced by the semantic actions of LR parser • Pretty-printing: • allows us to reformat the source • Traversals • important programming idiom • will be using a lot in the compilation

Recommend


More recommend