Defining syntax using CFGs
Roadmap Last time – Defined context-free grammar This time – CFGs for specifying a language’s syntax • Language membership • List grammars • Resolving ambiguity
CFG Review • G = (N,Σ,P,S) Example: Nested parens N = { Q } • ⇒ ! means “ derives in Σ = { ( , ) } 1 or more steps” P = Q → ( Q ) • CFG generates a | ε string by applying S = Q productions until no non-terminals remain
Formal Definition of a CFG’s Language Let G = (N,Σ,P,S) be a CFG. Then L(G) = 𝑥 𝑇 ⇒ ! 𝑥 where S is the start nonterminal of G, and w is a sequence that consists of (only) terminal symbols or 𝜁
A CFG Defines a Language CFG productions define the syntax of a language 1. Prog → begin Stmts end 2. Stmts → Stmts semicolon Stmt 3. | Stmt 4. Stmt → id assign Expr 5. Expr → id 6. | Expr plus id We call this notation “ BNF” (for “Backus-Naur Form”) or “ extended BNF” HTTP grammar using BNF: – http://www.w3.org/Protocols/rfc2616/rfc2616-sec2.html
List Grammars • Useful to repeat a structure arbitrarily often Stmts → Stmts semicolon Stmt | Stmt Stmts Stmts ; Stmt Stmts ; Stmt List skews left Stmts Stmts ; Stmt Stmts Stmts ; Stmt …
List Grammars • Useful to repeat a structure arbitrarily often Stmts → Stmt semicolon Stmts | Stmt Stmts Stmts ; Stmt List skews right Stmt ; Stmts Stmts Stmt ; Stmts Stmts Stmt ; Stmts
List Grammars • What if we allowed both “skews”? Stmts → Stmts semicolon Stmts | Stmt Stmts ; Stmts Stmts Stmts ; Stmts Stmts ; Stmts Stmts ; Stmts Stmt
Derivation Order Leftmost Derivation: always expand the leftmost nonterminal • Rightmost Derivation: always expand the rightmost nonterminal • Prog 1. Prog → begin Stmts end begin Stmts end 2. Stmts → Stmts semicolon Stmt 3. | Stmt Stmts semicolon Stmt 4. Stmt → id assign Expr 5. Expr → id 6. | Expr plus id Rightmost expands Leftmost expands this nonterminal this nonterminal
Ambiguity Even with a fixed derivation order, it is possible to derive the same string in multiple ways For Grammar G and string w – G is ambiguous if • >1 leftmost derivation of w • >1 rightmost derivation of w • > 1 parse tree for w
Exercise • Give a grammar G and a word w that has more than 1 left-most derivation in G
Example: Ambiguous Grammars Expr → intlit Derive the string 4 - 7 * 3 | Expr minus Expr (assume tokenization) | Expr times Expr | lparen Expr rparen Parse Tree 1 Parse Tree 2 Expr Expr Expr times Expr Expr minus Expr Expr minus Expr intlit intlit Expr times Expr 4 3 intlit intlit intlit intlit 7 7 3 4
Why is Ambiguity Bad?
Why is Ambiguity Bad? Eventually, we’ll be using CFGs as the basis for our parser – Parsing is much easier when there is no ambiguity in the grammar – The parse tree may mismatch user understanding! Operator precedence 4 - 7 * 3 Expr Expr Expr times Expr Expr minus Expr Expr minus Expr intlit intlit Expr times Expr 4 3 intlit intlit intlit intlit 7 7 3 4
Resolving Grammar Ambiguity: Precedence Intuitive problem – Nonterminals are the same for both operators Expr → intlit To fix precedence | Expr minus Expr | Expr times Expr – 1 nonterminal per precedence level | lparen Expr rparen – Parse lowest level first
Resolving Grammar Ambiguity: Precedence lowest precedence level first Expr → intlit 1 nonterminal per precedence level | Expr minus Expr Derive the string 4 - 7 * 3 | Expr times Expr | lparen Expr rparen Expr Expr minus Expr Expr → Expr minus Expr Term Term | Term Term times Term Factor Term → Term times Term intlit Factor Factor | Factor 4 Factor → intlit intlit intlit | lparen Expr rparen 7 3
Resolving Grammar Ambiguity: Precedence Fixed Grammar Expr → expr minus expr Derive the string 4 - 7 * 3 | Term Let’s try to re-build the wrong parse tree Term → Term times Term | Factor Expr Factor → intlit Term | lparen Expr rparen Term Term times Expr Factor Expr Expr minus 3 intlit Term Term We’ll never be able to derive minus Term times Term Factor without parens Factor Factor intlit 4 7 intlit intlit 3
Did we fix all ambiguity? Derive the string 4 - 7 - 3 Fixed Grammar Expr → Expr minus Expr Expr | Term Term → Term times Term Expr Expr minus | Factor Expr Term Expr minus Factor → intlit Factor Term Term | lparen Expr rparen Factor Factor intlit intlit intlit NO! These subtrees could have been swapped!
Where we are so far Precedence – We want correct behavior on 4 – 7 * 9 – A new nonterminal for each precedence level Associativity – We want correct behavior on 4 – 7 – 9 – Minus should be left associative : a – b – c = (a – b) – c – Problem: the recursion in a rule like Expr → Expr mi minus Expr
Definition: Recursion in Grammars • A A gr grammar is s recu cursive in in (n (nonter ermin minal) al) X if if 𝑌 ⇒ ! α𝑌γ for non-empty strings of symbols α and γ • A A gr grammar is s le left ft-recu cursive in in X if if 𝑌 ⇒ ! 𝑌γ for non-empty string of symbols γ • A A gr grammar is s rig right-recu cursive in in X if if 𝑌 ⇒ ! α𝑌 for non-empty string of symbols α
Resolving Grammar Ambiguity: Associativity Recognize left-assoc operators with left-recursive productions Recognize right-assoc operators with right-recursive productions Term Example: 4 – 7 – 9 E Expr → Expr minus Expr E - T | Term Factor T F E - Term → Term times Term F T intlit | Factor 9 F intlit Factor → intlit | lparen Expr rparen 7 intlit 4
Resolving Grammar Ambiguity: Associativity Expr → Expr minus Term Example: 4 – 7 – 9 | Term Let’s try to re-build the wrong parse tree again Term → Term times Factor | Factor Factor → intlit | lparen Expr rparen E E - T T F We’ll never be able to derive minus intlit without parens 4
Example • Language of Boolean expressions – bexp → TRUE bexp → FALSE bexp → bexp OR bexp bexp → bexp AND bexp bexp → NOT bexp bexp → LPAREN bexp RPAREN OR has lowest • Add nonterminals so that OR precedence, then AND AND , then NO NOT . Then change the grammar to reflect the fact that AND and OR OR are left associative. both AND • Draw a parse tree for the expression: – true AND NOT true.
Another ambiguous example Consider this word in this grammar: if a then if b then s else s2 Stmt → How would you derive it? if Cond th en Stmt | if then if Cond th en Stmt el se Stmt | … if then else
Summary To understand how a parser works, we start by understanding co context-fr free grammars , which are used to define the language recognized by the parser. terminal symbol – (non)terminal symbol – grammar rule (or production) – derivation (leftmost derivation, rightmost derivation) – parse (or derivation) tree – the language defined by a grammar – ambiguous grammar
Recommend
More recommend