Compiler Design Spring 2018 3.5 Limitations of context-free grammars 4.0 Semantic analysis Thomas R. Gross Computer Science Department ETH Zurich, Switzerland 1
Context-free grammars § Efficient parsers exist for context-free languages § Should we look at other language classes? § Context-sensitive § Unrestricted grammars § Grammars are about checking properties 2
Compiler structure § Parser builds parse tree § (Concrete syntax tree) § Can be turned into abstract syntax tree (AST) § Checks input for compliance with language spec § Can be turned into abstract syntax tree (AST) § Remove unnecessary detail § Most details related to grammar symbols is not critical § From parse tree / AST to code generation § We did this (in part) in Homework 1 3 § (Will do more in later Homework)
Extra step: Error detection § Parse tree construction § Parser finds some kinds of errors but not all § Some kinds of errors can be detected only at runtime § “Syntax errors” § Efficient parsing algorithms known for Type-2 (context-free) grammars § Not always desirable: Find errors with parser § Limitations of context-free grammars 4
A useful property: Variables declared § Consider a language like Java(Li) § The spec requires that all variables used have been declared int x; x = x + 1; 5
Using parsing to check property § How could we express this property so that it can be checked by the parser? § A parse tree is constructed only for those programs that maintain this property (variables declared before use) § Otherwise error is signaled § Can we find a language L J to model this property? § Then we can think about a grammar G J such that L(G J ) = L J 6
L J § L J = { a c a | a ∈ {a, b}* } § Terminals: a, b, c § Example words from L J § Not in L J § aacaa § ca § abcab § acb § aabacaaba § How does L J relate to our problem? 7
void fct2() { void fct1() { int x; int x; { { x = y + 1 x = x + 1 } } } } § Could use L J = { a c a d | a ∈ {a, b}* } 9
L J § L J allows us to model the following constraint Any variable that appears in the program/function/method has been declared previously § Terminal c defines a separation between the “body” of a unit and the definition block. § Useful property to check before code generation 10
L J § L J allows us to model the following constraint Any variable that appears in the program/function/method has been declared previously § Bad news: (Theorem) There exists no context-free grammar G such that L J = L(G) § Proof: 11
Another useful property: Matching parameters § Consider a language like Java(Li) § The spec requires that for all methods/functions, the number of formal parameters (at the place of method definition) matches the number of actual parameters (at the call site) int fct (int a, float b, xref c) { … } x = fct(a, b, c); 12
Another useful property § How could we express this property so that it can be checked by the parser? § A parse tree is constructed only for those programs that maintain this property (actuals and formals match) § Otherwise error is signaled § Can we find a language L P to model this property? § Then we can think about a grammar G P such that L(G P ) = L P 13
L P § L p = { a n b m c n d m } § a, b, c, d: terminals § Integers n, m ≥ 1 § Example words from L P § Not in L P § aabccd § aabcd § aaabbcccdd § abbbbcdddd § Why would we care about L P ? 14
L P § L P allows us to model the following constraint For all methods/functions, the number of formal parameters (at the place of method definition) matches the number of actual parameters (at the call site) § Can be extended to deal with matching types § Tricky if type conversions are an option § Useful property to check before code generation 15
L P § L P allows us to model the following constraint For all methods/functions, the number of formal parameters (at the place of method definition) matches the number of actual parameters (at the call site) § Bad news: (Theorem) There exists no context-free grammar G such that L P = L(G) § Proof: 16
Comments § Context-free grammars cannot express all desirable constraints § Switching to context-sensitive not productive § Use “unrestricted grammar” instead… § Use a program to perform additional checks § Complete flexibility § Can be (and often is) an additional step in compiler § After parsing § Before code generation § Recall: Some checks must wait till run time 17
More comments § Note: Parsing also used in (natural) language processing § No (complete) (context-free) grammar exists for English, German, … § More expensive approaches are needed § Ambiguity part of reality § May need to obtain (multiple, all ) parse trees § “The food is here!” vs. “The food is here?” § Interesting topic but not part of this class 18
4.0 Semantic analysis § Idea: before proceeding to code generation compiler checks program properties § Early feedback (while source information still available) § Avoid subsequent complications 19
Semantic analysis § Idea: before proceeding to code generation compiler checks program properties § Also the time to transform program § Often done at the time parse tree is transformed into AST § Example transformations § Type casts § Add default parameters to method/function calls § Construct initializer 20
4.1 Syntax-directed translation § Parsing: Control table M decides which production to use § So far: Recorded production (as “action”) § General: Attach code to production § E.g., add node to syntax tree § E.g., keep track of definitions § As the parser recognizes a word § It produces an AST (or other desired data structure) § And/or computes predicate 21
Attribute grammars § Context free grammar extended with (context-sensitive) information § “Attributes” § Attached to non-terminals § Attributes have values § Value assigned during parsing § Value evaluated in a conditional statement (see later) 22
Attribute grammars § Types of attributes 1. Synthesized attributes § Value obtained from attributes of children of non-terminal 2. Inherited attributes § Value obtained from attribute of parent of non-terminal § Or from attribute(s) of sibling(s) of non-terminal 23
Example § Example (expression evaluation) § E à E + T § Production: E 0 à E 1 + T § Attribute § Integer value § E 0 . Value := E 1 . Value + T. Value § Note: Use E 1 vs E 0 to distinguish two occurrences of E in production 25
Attributes § Consider L = { a n b n c n }. § Terminals: a, b, c § n integer ≥ 1 § L cannot be produced by a context-free grammar § We would like to use a context free grammar (and parser) to recognize L § Idea: Use attributes to deal with aspects parser cannot handle § Attribute domain: Integers § Result predicate: “true” if w = a k b k c k for some k 27
Example (cont’d) § Consider G 19 S à A B C A à aA | a B à bB | b C à cC | c § Start symbol is S § L = { a n b n c n } ⊂ L(G 19 ) 28
Rules § Attach a rule to each production § Rules for A productions A 0 à a A 1 <A 0 >.Na := <A 1 >.Na + 1 A à a <A>.Na := 1 § Rules for B, C productions similar § Condition for S à A B C 29 § <A>.Na == <B>.Nb == <C>.Nc
S à A B C Rules A à aA | a B à bB | b C à cC | c Productions S à A B C if and only if <A>.Na == <B>.Nb == <C>.Nc A 0 à a A 1 <A 0 >.Na := <A 1 >.Na + 1 A à a <A>.Na := 1 B 0 à b B 1 <B 0 >.Nb := <B 1 >.Nb + 1 B à b <B>.Nb := 1 C 0 à c C 1 <C 0 >.Nc := <C 1 >.Nc + 1 C à c <C>.Nc := 1 30
aabbcc Stack Input Action $ aabbcc$ a$ abbcc$ aa$ bbcc$ A à a; <A>.Na:=1 Aa$ bbcc$ A 0 à a A 1 ; < A 0 >.Na:=<A 1 >.Na+1=2 A$ bbcc$ bA$ bbcc$ B à b; <B>.Nb:=1 bbA$ bbcc$ B à bB; <B 0 >.Nb:=2 BbA$ bcc$ BA$ cc$ 32
aabbcc Stack Input Action BA$ cc$ cBA$ c$ ccBA$ $ C à c; <C>.Nc:=1 CcBA$ $ C 0 à c C 1 ; C 0 >.Nc:=<C 1 >.Nc+1=2 CBA$ $ S à A B C; S$ $ Na==Nb==Nc ? True ACCEPT 34
aabbcc – tree view S Condition: true A B C Na = 2 Nb = 2 Nc = 2 a A b B c C Na = 1 Nb = 1 Nc = 1 a b c 35
Question What type of parser (top-down or bottom-up) did we use to parse w (and to implement the checks)? Why? (Hint: Top-of-stack arbitrarily picked to be on the left, that is, position of top-of-stack does not convey any information.) 36
Syntax(-based) analysis § Powerful tool § Easy to get carried away § Once a topic of active research 37
Semantic analysis § Goal: Identify problems early on float f; int [] iarray; int j; iarray = new int [10]; iarray [f] = j; § Idea: check AST § Either report error § Modify AST int j; float f; j = f; // replace with: j = round(f) 38
4.2 Symbol table § Symbol table: Central repository of information about program symbols § Checks must exploit structure of program 39
Symbol table § Many checks require gathering/retrieving information about symbols § Function/method names § Class names § Variable/field names § Function/method types § Class types § Variable/field types 40
Recommend
More recommend