CSE443 Compilers Dr. Carl Alphonce alphonce@buffalo.edu 343 Davis Hall
Phases of a Syntactic compiler structure Figure 1.6, page 5 of text
TOOLS Lexical analysis: LEX/FLEX (regex -> lexer) Syntactic analysis: YACC/BISON (grammar -> parser)
Top-down & bottom-up A top-down parser builds a parse tree from root to the leaves easier to construct by hand A bottom-up parser builds a parse tree from leaves to root Handles a larger class of grammars tools (yacc/bison) build bottom-up parsers
Our presentation First top-down, then bottom-up Present top-down parsing first. Introduce necessary vocabulary and data structures. Move on to bottom-up parsing second.
vocab: look-ahead The current symbol being scanned in the input is called the lookahead symbol. PARSER token token token token token token
Top-down parsing
Top-down parsing Start from grammar's start symbol Build parse tree so its yield matches input predictive parsing: a simple form of recursive descent parsing
FIRST( 𝛽 ) If 𝛽∈ (NUT)* then FIRST( 𝛽 ) is "the set of terminals that appear as the first symbols of one or more strings of terminals generated from 𝛽 ." [p. 64] Ex: If A -> a 𝛾 then FIRST(A) = {a} Ex. If A -> a 𝛾 | B then FIRST(A) = {a} ∪ FIRST(B)
FIRST( 𝛽 ) First sets are considered when there are two (or more) productions to expand A ∈ N: A -> 𝛽 | 𝛾 Predictive parsing requires that FIRST( 𝛽 ) ∩ FIRST( 𝛾 ) = ∅
𝜁 productions If lookahead symbol does not match first set, use 𝜁 production not to advance lookahead symbol but instead "discard" non-terminal: optexpt -> expr | 𝜁 "While parsing optexpr, if the lookahead symbol is not in FIRST(expr), then the 𝜁 production is used" [p. 66]
Left recursion Grammars with left recursion are problematic for top-down parsers, as they lead to infinite regress.
Left recursion example expr Grammar: expr + term expr -> expr + term | term expr + term term -> id FIRST sets for rule expr + term alternatives are not disjoint: FIRST(expr) = id term FIRST(term) = id
Left recursion example expr Grammar: 𝛽 𝛾 expr + term expr -> expr + term | term expr + term term -> id FIRST sets for rule expr + term alternatives are not disjoint: FIRST(expr) = id term FIRST(term) = id 𝛾 𝛽 𝛽 𝛽
Rewriting grammar to remove left recursion expr rule is of form A -> A 𝛽 | 𝛾 Rewrite as two rules A -> 𝛾 R R -> 𝛽 R | 𝜁
Back to example expr term R Grammar is re- written as + term R expr -> term R + term R R -> + term R | 𝜁 + term R 𝛾 𝛽 𝛽 𝛽 𝜁
Ambiguity A grammar G is ambiguous if ∃ 𝛕 ∈ 𝓜 (G) that has two or more distinct parse trees. Example - dangling 'else': if <expr> then if <expr> then <stmt> else <stmt> if <expr> then { if <expr> then <stmt> } else <stmt> if <expr> then { if <expr> then <stmt> else <stmt> }
dangling else resolution usually resolved so else matches closest if- then we can re-write grammar to force this interpretation (ms = matched statement, os = open statement) <stmt> -> <ms> | <os> <ms> -> if <expr> then <ms> else <ms> | … <os> -> if <expr> then <stmt> | if <expr> then <ms> else <os>
Left factoring If two (or more) rules share a prefix then their FIRST sets do not distinguish between rule alternatives. If there is a choice point later in the rule, rewrite rule by factoring common prefix Example: rewrite A -> 𝛽 𝛾 1 | 𝛽 𝛾 2 as A -> 𝛽 A' A' -> 𝛾 1 | 𝛾 2
Predictive parsing: a special case of recursive-descent parsing that does not require backtracking Each non-terminal A ∈ N has an associated procedure: void A() { choose an A-production A -> X1 X2 … Xk for (i = 1 to k) { if (xi ∈ N) { call xi() } else if (xi = current input symbol) { advance input to next symbol } else error } }
Predictive parsing: a special case of recursive-descent parsing that does not require backtracking Each non-terminal A ∈ N has an associated procedure: void A() { choose an A-production A -> X1 X2 … Xk for (i = 1 to k) { There is non-determinism if (xi ∈ N) { in choice of production. If "wrong" choice is made call xi() the parser will need to } revisit its choice by else if (xi = current input symbol) { backtracking. advance input to next symbol A predictive parser can } always made the correct else error choice here. } }
FIRST(X) if X ∈ T then FIRST(X) = { X } if X ∈ N and X -> Y 1 Y 2 … Y k ∈ P for k ≥ 1, then add a ∈ T to FIRST(X) if ∃ i s.t. a ∈ FIRST(Y i ) and 𝜁 ∈ FIRST(Y j ) ∀ j < i (i.e. Y 1 Y 2 … Y k ⇒ * 𝜁 ) if 𝜁 ∈ FIRST(Y j ) ∀ j < k add 𝜁 to FIRST(X)
FOLLOW(X) Place $ in FOLLOW(S), where S is the start symbol ($ is an end marker) if A -> 𝛽 B 𝛾 ∈ P, then FIRST( 𝛾 ) - { 𝜁 } is in FOLLOW(B) if A -> 𝛽 B ∈ P or A -> 𝛽 B 𝛾 ∈ P where 𝜁 ∈ FIRST( 𝛾 ), then everything in FOLLOW(A) is in FOLLOW(B)
Table-driven predictive parsing Algorithm 4.32 (p. 224) INPUT: Grammar G = (N,T,P,S) OUTPUT: Parsing table M For each production A -> 𝛽 of G: 1. For each terminal a ∈ FIRST( 𝛽 ), add A -> 𝛽 to M[A,a] 2. If 𝜁 ∈ FIRST( 𝛽 ), then for each terminal b in FOLLOW(A), add A -> 𝛽 to M[A,b] 3. If 𝜁 ∈ FIRST( 𝛽 ) and $ ∈ FOLLOW(A), add A -> 𝛽 to M[A,$]
Example G given by its productions: E -> T E' E' -> + T E' | 𝜁 T -> F T' For each production A -> 𝛽 of G: T' -> * F T' | 𝜁 For each terminal a ∈ FIRST( 𝛽 ), F -> ( E ) | id add A -> 𝛽 to M[A,a] If 𝜁 ∈ FIRST( 𝛽 ), then for each terminal b in FOLLOW(A), add A - > 𝛽 to M[A,b] If 𝜁 ∈ FIRST( 𝛽 ) and $ ∈ FOLLOW(A), add A -> 𝛽 to M[A,$]
FIRST SETS E -> T E' E' -> + T E' | 𝜁 T -> F T' T' -> * F T' | 𝜁 F -> ( E ) | id FIRST(F) = { ( , id } FIRST(T) = FIRST(F) = { ( , id } FIRST(E) = FIRST(T) = FIRST(F) = { ( , id } FIRST(E') = { + , 𝜁 } FIRST(T') = { * , 𝜁 } For each production A -> 𝛽 of G: if X ∈ T then FIRST(X) = { X } if X ∈ N and X -> Y 1 Y 2 … Y k ∈ P for k ≥ 1, then For each terminal a ∈ FIRST( 𝛽 ), add A -> 𝛽 to M[A,a] add a ∈ T to FIRST(X) if ∃ i s.t. a ∈ If 𝜁 ∈ FIRST( 𝛽 ), then for each FIRST(Y i ) and 𝜁 ∈ FIRST(Y j ) ∀ j < i (i.e. Y 1 terminal b in FOLLOW(A), add A - Y 2 … Y k ⇒ * 𝜁 ) > 𝛽 to M[A,b] If 𝜁 ∈ FIRST( 𝛽 ) and $ ∈ if 𝜁 ∈ FIRST(Y j ) ∀ j < k add 𝜁 to FIRST(X) FOLLOW(A), add A -> 𝛽 to M[A,$]
FOLLOW SETS E -> T E' E' -> + T E' | 𝜁 T -> F T' T' -> * F T' | 𝜁 F -> ( E ) | id FOLLOW(E) = { ) , $ } FOLLOW(E') = FOLLOW(E) = { ) , $ } FOLLOW(T) = { + , ) , $ } FOLLOW(T') = FOLLOW(T) = { + , ) , $ } FOLLOW(F) = { + , * , ) , $ } For each production A -> 𝛽 of G: Place $ in FOLLOW(S), where S is the start For each terminal a ∈ FIRST( 𝛽 ), symbol ($ is an end marker) add A -> 𝛽 to M[A,a] if A -> 𝛽 B 𝛾 ∈ P, then FIRST( 𝛾 ) - { 𝜁 } is in If 𝜁 ∈ FIRST( 𝛽 ), then for each FOLLOW(B) terminal b in FOLLOW(A), add A - if A -> 𝛽 B ∈ P or A -> 𝛽 B 𝛾 ∈ P where 𝜁 ∈ FIRST( 𝛾 ), > 𝛽 to M[A,b] then everything in FOLLOW(A) is in FOLLOW(B) If 𝜁 ∈ FIRST( 𝛽 ) and $ ∈ FOLLOW(A), add A -> 𝛽 to M[A,$]
Parse-table M NON id + * ( ) $ TERMINALS E E -> T E' E -> T E' E' E' -> 𝜁 E' -> 𝜁 E' -> + T E' T T -> F T' T -> F T' T' T' -> 𝜁 T' -> 𝜁 T' -> 𝜁 T' -> * F T F F -> id F -> ( E ) For each production A -> 𝛽 of G: FIRST(E) = FIRST(T) = FIRST(F) = { ( , id } FIRST(E') = { + , 𝜁 } For each terminal a ∈ FIRST( 𝛽 ), add FIRST(T') = { * , 𝜁 } E -> T E' A -> 𝛽 to M[A,a] If 𝜁 ∈ FIRST( 𝛽 ), then for each E' -> + T E' | 𝜁 terminal b in FOLLOW(A), add A -> 𝛽 to T -> F T' FOLLOW(E') = FOLLOW(E) = { ) , $ } M[A,b] T' -> * F T' | 𝜁 FOLLOW(T') = FOLLOW(T) = { + , ) , $ } If 𝜁 ∈ FIRST( 𝛽 ) and $ ∈ FOLLOW(A), F -> ( E ) | id FOLLOW(F) = { + , * , ) , $ } add A -> 𝛽 to M[A,$]
Algorithm 4.34 [p. 226] INPUT: A string 𝜕 and a parsing table M for a grammar G=(N,T,P,S). OUTPUT: If 𝜕∈𝓜 (G), a leftmost derivation of 𝜕 , error otherwise input $ 𝜕 stack S M parser $ output
Algorithm 4.34 [p. 226] Let a be the first symbol of 𝜕 Let X be the top stack symbol while (X ≠ $) { if (X == a) { pop the stack, advance a in 𝜕 } else if (X is a terminal) { error } else if (M[X,a] is blank) { error } else if (M[X,a] is X -> Y 1 Y 2 … Y k ) { output X -> Y 1 Y 2 … Y k pop the stack push Y k … Y 2 Y 1 onto the stack } Let X be the top stack symbol } Accept if a == X == $
Recommend
More recommend