Parsing CSCI 3130 Formal Languages and Automata Theory Siu On CHAN Fall 2018 Chinese University of Hong Kong 1/28
Context-free versus regular Every regular language is context-free regular expression NFA DFA 2/28 Write a CFG for the language ( 0 + 1 ) ∗ 111 S → U 111 U → 0 U | 1 U | ε Can you do so for every regular language?
Context-free versus regular Every regular language is context-free regular expression NFA DFA 2/28 Write a CFG for the language ( 0 + 1 ) ∗ 111 S → U 111 U → 0 U | 1 U | ε Can you do so for every regular language?
From regular to context-free a (alphabet symbol) 1 E 1 E 2 regular expression 3/28 grammar with no rules ⇒ CFG ∅ ε S → ε S → a E 1 + E 2 S → S 1 | S 2 S → S 1 S 2 E ∗ S → SS 1 | ε S becomes the new start variable
Context-free versus regular Is every context-free language regular? S 0 S 1 L 0 n 1 n n 0 Is context-free but not regular regular context-free 4/28
Context-free versus regular Is every context-free language regular? Is context-free but not regular regular context-free 4/28 L = { 0 n 1 n | n � 0 } S → 0 S 1
Ambiguity
Ambiguity + A CFG is ambiguous if some string has more than one parse tree 2 2 * 1 5/28 2 2 1 + * 1+2*2 E → E + E | E * E | ( E ) | N N → 1 | 2 ✗ = 6 = 5
Example S Two ways to derive xxx x S x S S x S x S x S x S S S Yes, because 6/28 Is S → SS | x ambiguous?
Example S Two ways to derive xxx x S x S S x S x S x S x S S S Yes, because 6/28 Is S → SS | x ambiguous?
Disambiguation S S S x x x Sometimes we can rewrite the grammar to remove ambiguity 7/28 S → SS | x ⇒ S → S x | x
Disambiguation + and * have the same precedence! F F T T F T 8/28 E → E + E | E * E | ( E ) | N N → 1 | 2 Decompose expression into terms and factors 2 * ( 1 + 2 * 2 )
Disambiguation Each term is a product of one or more factors Each factor is a parenthesized expression or a number 9/28 E → E + E | E * E | ( E ) | N N → 1 | 2 An expression is a sum of one or more terms E → T | E + T T → F | T * F F → ( E ) | 1 | 2
Parsing example 2 F 1 + T T F * F F 2 ) + T F 1 1 T 10/28 E Parse tree for 2+(1+1+2*2)+1 E E E T F 2 + T F ( E E E → T | E + T T → F | T * F F → ( E ) | 1 | 2 + T
In programming languages, ambiguity comes from the precedence Disambiguation Disambiguation is not always possible because rules, and we can resolve like in the example In English, ambiguity is sometimes a problem: I look at the dog with one eye 11/28 There exists inherently ambiguous languages There is no general procedure for disambiguation
Disambiguation In English, ambiguity is sometimes a problem: the dog with one eye I look at Disambiguation is not always possible because 11/28 rules, and we can resolve like in the example There exists inherently ambiguous languages There is no general procedure for disambiguation In programming languages, ambiguity comes from the precedence � �� � � �� � � �� � � �� �
Parsing input: 0011 If so, how to build a parse tree with a program? 12/28 S → 0 S 1 | 1 S 0 S | T T → S | ε Is 0011 ∈ L ?
Parsing 0 S 1 This is (part of) the tree of all derivations, not the parse tree … 00 S 11 00 T 11 00 S 11 … 01 S 0 S 1 … 0 T 1 … 10 S 10 S … 1 S 0 S … S T S Try all derivations? input: 0011 13/28 S → 0 S 1 | 1 S 0 S | T T → S | ε 0011 ✓ ε
Parsing 0 S 1 This is (part of) the tree of all derivations, not the parse tree … 00 S 11 00 T 11 00 S 11 … 01 S 0 S 1 … 0 T 1 … 10 S 10 S … 1 S 0 S … S T S Try all derivations? input: 0011 13/28 S → 0 S 1 | 1 S 0 S | T T → S | ε 0011 ✓ ε
Parsing 0 S 1 This is (part of) the tree of all derivations, not the parse tree … 00 S 11 00 T 11 00 S 11 … 01 S 0 S 1 … 0 T 1 … 10 S 10 S … 1 S 0 S … S T S Try all derivations? input: 0011 13/28 S → 0 S 1 | 1 S 0 S | T T → S | ε 0011 ✓ ε
Parsing 0 S 1 This is (part of) the tree of all derivations, not the parse tree … 00 S 11 00 T 11 00 S 11 … 01 S 0 S 1 … 0 T 1 … 10 S 10 S … 1 S 0 S … S T S Try all derivations? input: 0011 13/28 S → 0 S 1 | 1 S 0 S | T T → S | ε 0011 ✓ ε
Problems 1. Trying all derivations may take too long Let’s tackle the 2nd problem 14/28 2. If input is not in the language, parsing will never stop
Derviation may loop When to stop S and unit productions Remove productions” because of “unit T S T because of “ -productions” Derived string may shrink 01 0 T 1 0 S 1 S Problems: Idea: Stop when 15/28 S → 0 S 1 | 1 S 0 S | T | derived string | > | input | T → S | ε
Derviation may loop When to stop T and unit productions Remove productions” because of “unit T S S Derived string may shrink Problems: Idea: Stop when 15/28 S → 0 S 1 | 1 S 0 S | T | derived string | > | input | T → S | ε S ⇒ 0 S 1 ⇒ 0 T 1 ⇒ 01 because of “ ε -productions”
When to stop Idea: Stop when Problems: Derived string may shrink because of “unit productions” 15/28 S → 0 S 1 | 1 S 0 S | T | derived string | > | input | T → S | ε S ⇒ T ⇒ S ⇒ T ⇒ . . . S ⇒ 0 S 1 ⇒ 0 T 1 ⇒ 01 Derviation may loop because of “ ε -productions” Remove ε and unit productions
16/28 D If S is the start variable and Removing Add a new start variable T A S is not the (new) start variable E C D AC AD S C Removing ε -productions Goal: remove all A → ε rules for every non-start variable A For every rule A → ε where A the rule S → ε exists 1. Remove the rule A → ε 2. If you see B → α A β Add the rule T → S Add a new rule B → αβ S → ACD A → a B → ε C → ED | ε D → BC | b E → b
16/28 S If S is the start variable and A Add a new start variable T S E is not the (new) start variable C D AC AD Removing ε -productions Goal: remove all A → ε rules for every non-start variable A For every rule A → ε where A the rule S → ε exists 1. Remove the rule A → ε 2. If you see B → α A β Add the rule T → S Add a new rule B → αβ S → ACD D → C A → a ✘✘✘ B → ε C → ED | ε D → BC | b E → b Removing B → ε
16/28 AC If S is the start variable and A Add a new start variable T S E is not the (new) start variable C D Removing ε -productions Goal: remove all A → ε rules for every non-start variable A For every rule A → ε where A the rule S → ε exists 1. Remove the rule A → ε 2. If you see B → α A β Add the rule T → S Add a new rule B → αβ D → C | B S → ACD A → a S → AD ✘✘✘ B → ε C → ED | ✁ ε D → BC | b E → b Removing C → ε
16/28 AC If S is the start variable and A Add a new start variable T S E is not the (new) start variable C Removing ε -productions Goal: remove all A → ε rules for every non-start variable A For every rule A → ε where A the rule S → ε exists 1. Remove the rule A → ε 2. If you see B → α A β Add the rule T → S Add a new rule B → αβ D → C | B S → ACD A → a S → AD ✘✘✘ B → ε D → ε C → ED | ✁ ε D → BC | b E → b Removing C → ε
16/28 is not the (new) start variable If S is the start variable and A Add a new start variable T S Removing ε -productions Goal: remove all A → ε rules for every non-start variable A For every rule A → ε where A the rule S → ε exists 1. Remove the rule A → ε 2. If you see B → α A β Add the rule T → S Add a new rule B → αβ D → C | B S → ACD A → a S → AD | AC ✘✘✘ ✘✘✘ ✘ B → ε D → ε C → ED | ✁ ε C → E D → BC | b E → b Removing D → ε
16/28 Add a new start variable T If S is the start variable and is not the (new) start variable Removing ε -productions Goal: remove all A → ε rules for every non-start variable A For every rule A → ε where A the rule S → ε exists 1. Remove the rule A → ε 2. If you see B → α A β Add the rule T → S Add a new rule B → αβ D → C | B S → ACD A → a S → AD | AC ✘✘✘ ✘✘✘ ✘ B → ε D → ε C → ED | ✁ ε C → E D → BC | b S → A E → b Removing D → ε
B A becomes B If B was removed earlier, don’t add it back 17/28 Eliminating ε -productions For every A → ε rule where A is not the start variable 1. Remove the rule A → ε 2. If you see B → α A β Add a new rule B → αβ Do 2. every time A appears B → α A β A γ yields B → αβ A γ B → α A βγ B → αβγ
don’t add it back 17/28 Eliminating ε -productions For every A → ε rule where A is not the start variable 1. Remove the rule A → ε 2. If you see B → α A β Add a new rule B → αβ Do 2. every time A appears B → A becomes B → ε B → α A β A γ yields If B → ε was removed earlier, B → αβ A γ B → α A βγ B → αβγ
Eliminating unit productions A unit production is a production of the form Grammar: Unit production graph: S T R 18/28 A → B S → 0 S 1 | 1 S 0 S | T T → S | R | ε R → 0 SR
Recommend
More recommend