CSE443 Compilers Dr. Carl Alphonce alphonce@buffalo.edu 343 Davis Hall
Phases of a Syntactic compiler structure Figure 1.6, page 5 of text
TOOLS Lexical analysis: LEX/FLEX (regex -> lexer) Syntactic analysis: YACC/BISON (grammar -> parser)
Top-down & bottom-up A top-down parser builds a parse tree from root to the leaves easier to construct by hand A bottom-up parser builds a parse tree from leaves to root Handles a larger class of grammars tools (yacc/bison) build bottom-up parsers
Our presentation First top-down, then bottom-up Present top-down parsing first. Introduce necessary vocabulary and data structures. Move on to bottom-up parsing second.
vocab: look-ahead The current symbol being scanned in the input is called the lookahead symbol. PARSER token token token token token token
Top-down parsing
Top-down parsing Start from grammar's start symbol Build parse tree so its yield matches input predictive parsing: a simple form of recursive descent parsing
FIRST( π½ ) If π½β (NUT)* then FIRST( π½ ) is "the set of terminals that appear as the first symbols of one or more strings of terminals generated from π½ ." [p. 64] Ex: If A -> a πΎ then FIRST(A) = {a} Ex. If A -> a πΎ | B then FIRST(A) = {a} βͺ FIRST(B)
FIRST( π½ ) First sets are considered when there are two (or more) productions to expand A β N: A -> π½ | πΎ Predictive parsing requires that FIRST( π½ ) β© FIRST( πΎ ) = β
π productions If lookahead symbol does not match first set, use π production not to advance lookahead symbol but instead "discard" non-terminal: optexpt -> expr | π "While parsing optexpr, if the lookahead symbol is not in FIRST(expr), then the π production is used" [p. 66]
Left recursion Grammars with left recursion are problematic for top-down parsers, as they lead to infinite regress.
Left recursion example expr Grammar: expr + term expr -> expr + term | term expr + term term -> id FIRST sets for rule expr + term alternatives are not disjoint: FIRST(expr) = id term FIRST(term) = id
Left recursion example expr Grammar: π½ πΎ expr + term expr -> expr + term | term expr + term term -> id FIRST sets for rule expr + term alternatives are not disjoint: FIRST(expr) = id term FIRST(term) = id πΎ π½ π½ π½
Rewriting grammar to remove left recursion expr rule is of form A -> A π½ | πΎ Rewrite as two rules A -> πΎ R R -> π½ R | π
Back to example expr term R Grammar is re- written as + term R expr -> term R + term R R -> + term R | π + term R πΎ π½ π½ π½ π
Ambiguity A grammar G is ambiguous if β π β π (G) that has two or more distinct parse trees. Example - dangling 'else': if <expr> then if <expr> then <stmt> else <stmt> if <expr> then { if <expr> then <stmt> } else <stmt> if <expr> then { if <expr> then <stmt> else <stmt> }
dangling else resolution usually resolved so else matches closest if- then we can re-write grammar to force this interpretation (ms = matched statement, os = open statement) <stmt> -> <ms> | <os> <ms> -> if <expr> then <ms> else <ms> | β¦ <os> -> if <expr> then <stmt> | if <expr> then <ms> else <os>
Left factoring If two (or more) rules share a prefix then their FIRST sets do not distinguish between rule alternatives. If there is a choice point later in the rule, rewrite rule by factoring common prefix Example: rewrite A -> π½ πΎ 1 | π½ πΎ 2 as A -> π½ A' A' -> πΎ 1 | πΎ 2
Predictive parsing: a special case of recursive-descent parsing that does not require backtracking Each non-terminal A β N has an associated procedure: void A() { choose an A-production A -> X1 X2 β¦ Xk for (i = 1 to k) { if (xi β N) { call xi() } else if (xi = current input symbol) { advance input to next symbol } else error } }
Predictive parsing: a special case of recursive-descent parsing that does not require backtracking Each non-terminal A β N has an associated procedure: void A() { choose an A-production A -> X1 X2 β¦ Xk for (i = 1 to k) { There is non-determinism if (xi β N) { in choice of production. If "wrong" choice is made call xi() the parser will need to } revisit its choice by else if (xi = current input symbol) { backtracking. advance input to next symbol A predictive parser can } always made the correct else error choice here. } }
FIRST(X) if X β T then FIRST(X) = { X } if X β N and X -> Y 1 Y 2 β¦ Y k β P for k β₯ 1, then add a β T to FIRST(X) if β i s.t. a β FIRST(Y i ) and π β FIRST(Y j ) β j < i (i.e. Y 1 Y 2 β¦ Y k β * π ) if π β FIRST(Y j ) β j < k add π to FIRST(X)
FOLLOW(X) Place $ in FOLLOW(S), where S is the start symbol ($ is an end marker) if A -> π½ B πΎ β P, then FIRST( πΎ ) - { π } is in FOLLOW(B) if A -> π½ B β P or A -> π½ B πΎ β P where π β FIRST( πΎ ), then everything in FOLLOW(A) is in FOLLOW(B)
Table-driven predictive parsing Algorithm 4.32 (p. 224) INPUT: Grammar G = (N,T,P,S) OUTPUT: Parsing table M For each production A -> π½ of G: 1. For each terminal a β FIRST( π½ ), add A -> π½ to M[A,a] 2. If π β FIRST( π½ ), then for each terminal b in FOLLOW(A), add A -> π½ to M[A,b] 3. If π β FIRST( π½ ) and $ β FOLLOW(A), add A -> π½ to M[A,$]
Example G given by its productions: E -> T E' E' -> + T E' | π T -> F T' For each production A -> π½ of G: T' -> * F T' | π For each terminal a β FIRST( π½ ), F -> ( E ) | id add A -> π½ to M[A,a] If π β FIRST( π½ ), then for each terminal b in FOLLOW(A), add A - > π½ to M[A,b] If π β FIRST( π½ ) and $ β FOLLOW(A), add A -> π½ to M[A,$]
FIRST SETS E -> T E' E' -> + T E' | π T -> F T' T' -> * F T' | π F -> ( E ) | id FIRST(F) = { ( , id } FIRST(T) = FIRST(F) = { ( , id } FIRST(E) = FIRST(T) = FIRST(F) = { ( , id } FIRST(E') = { + , π } FIRST(T') = { * , π } For each production A -> π½ of G: if X β T then FIRST(X) = { X } if X β N and X -> Y 1 Y 2 β¦ Y k β P for k β₯ 1, then For each terminal a β FIRST( π½ ), add A -> π½ to M[A,a] add a β T to FIRST(X) if β i s.t. a β If π β FIRST( π½ ), then for each FIRST(Y i ) and π β FIRST(Y j ) β j < i (i.e. Y 1 terminal b in FOLLOW(A), add A - Y 2 β¦ Y k β * π ) > π½ to M[A,b] If π β FIRST( π½ ) and $ β if π β FIRST(Y j ) β j < k add π to FIRST(X) FOLLOW(A), add A -> π½ to M[A,$]
FOLLOW SETS E -> T E' E' -> + T E' | π T -> F T' T' -> * F T' | π F -> ( E ) | id FOLLOW(E) = { ) , $ } FOLLOW(E') = FOLLOW(E) = { ) , $ } FOLLOW(T) = { + , ) , $ } FOLLOW(T') = FOLLOW(T) = { + , ) , $ } FOLLOW(F) = { + , * , ) , $ } For each production A -> π½ of G: Place $ in FOLLOW(S), where S is the start For each terminal a β FIRST( π½ ), symbol ($ is an end marker) add A -> π½ to M[A,a] if A -> π½ B πΎ β P, then FIRST( πΎ ) - { π } is in If π β FIRST( π½ ), then for each FOLLOW(B) terminal b in FOLLOW(A), add A - if A -> π½ B β P or A -> π½ B πΎ β P where π β FIRST( πΎ ), > π½ to M[A,b] then everything in FOLLOW(A) is in FOLLOW(B) If π β FIRST( π½ ) and $ β FOLLOW(A), add A -> π½ to M[A,$]
Parse-table M NON id + * ( ) $ TERMINALS E E -> T E' E -> T E' E' E' -> π E' -> π E' -> + T E' T T -> F T' T -> F T' T' T' -> π T' -> π T' -> π T' -> * F T F F -> id F -> ( E ) For each production A -> π½ of G: FIRST(E) = FIRST(T) = FIRST(F) = { ( , id } FIRST(E') = { + , π } For each terminal a β FIRST( π½ ), add FIRST(T') = { * , π } E -> T E' A -> π½ to M[A,a] If π β FIRST( π½ ), then for each E' -> + T E' | π terminal b in FOLLOW(A), add A -> π½ to T -> F T' FOLLOW(E') = FOLLOW(E) = { ) , $ } M[A,b] T' -> * F T' | π FOLLOW(T') = FOLLOW(T) = { + , ) , $ } If π β FIRST( π½ ) and $ β FOLLOW(A), F -> ( E ) | id FOLLOW(F) = { + , * , ) , $ } add A -> π½ to M[A,$]
Algorithm 4.34 [p. 226] INPUT: A string π and a parsing table M for a grammar G=(N,T,P,S). OUTPUT: If πβπ (G), a leftmost derivation of π , error otherwise input $ π stack S M parser $ output
Algorithm 4.34 [p. 226] Let a be the first symbol of π Let X be the top stack symbol while (X β $) { if (X == a) { pop the stack, advance a in π } else if (X is a terminal) { error } else if (M[X,a] is blank) { error } else if (M[X,a] is X -> Y 1 Y 2 β¦ Y k ) { output X -> Y 1 Y 2 β¦ Y k pop the stack push Y k β¦ Y 2 Y 1 onto the stack } Let X be the top stack symbol } Accept if a == X == $
Recommend
More recommend