1/18 Parsing CSCI 3130 Formal Languages and Automata Theory Siu On CHAN Chinese University of Hong Kong Fall 2015
2/18 Parsing input: 0011 If so, how to build a parse tree with a program? S → 0 S 1 | 1 S 0 S | T T → S | ε Is 0011 ∈ L ?
3/18 Parsing … 00 S 11 00 T 11 00 S 11 … 01 S 0 S 1 … 0 T 1 0 S 1 … 10 S 10 S … 1 S 0 S … S T S Try all derivations? input: 0011 This is (part of) the tree of all derivations, not the parse tree S → 0 S 1 | 1 S 0 S | T T → S | ε 0011 ✓ ε
3/18 Parsing … 00 S 11 00 T 11 00 S 11 … 01 S 0 S 1 … 0 T 1 0 S 1 … 10 S 10 S … 1 S 0 S … S T S Try all derivations? input: 0011 This is (part of) the tree of all derivations, not the parse tree S → 0 S 1 | 1 S 0 S | T T → S | ε 0011 ✓ ε
3/18 Parsing … 00 S 11 00 T 11 00 S 11 … 01 S 0 S 1 … 0 T 1 0 S 1 … 10 S 10 S … 1 S 0 S … S T S Try all derivations? input: 0011 This is (part of) the tree of all derivations, not the parse tree S → 0 S 1 | 1 S 0 S | T T → S | ε 0011 ✓ ε
3/18 Parsing … 00 S 11 00 T 11 00 S 11 … 01 S 0 S 1 … 0 T 1 0 S 1 … 10 S 10 S … 1 S 0 S … S T S Try all derivations? input: 0011 This is (part of) the tree of all derivations, not the parse tree S → 0 S 1 | 1 S 0 S | T T → S | ε 0011 ✓ ε
4/18 Problems 2. If input is not in the language, parsing will never stop Let’s tackle the 2nd problem 1. Trying all derivations may take too long
Derviation may loop because 5/18 Derived string may shrink Remove of “unit productions” T S T S because of “ -productions” 01 When to stop 0 T 1 0 S 1 S Problems: Idea: Stop when and unit productions S → 0 S 1 | 1 S 0 S | T | derived string | > | input | T → S | ε
Derviation may loop because 5/18 S Remove of “unit productions” T S T and unit productions When to stop Derived string may shrink Problems: Idea: Stop when S → 0 S 1 | 1 S 0 S | T | derived string | > | input | T → S | ε S ⇒ 0 S 1 ⇒ 0 T 1 ⇒ 01 because of “ ε -productions”
5/18 When to stop Idea: Stop when Problems: Derived string may shrink Derviation may loop because of “unit productions” S → 0 S 1 | 1 S 0 S | T | derived string | > | input | T → S | ε S ⇒ 0 S 1 ⇒ 0 T 1 ⇒ 01 S ⇒ T ⇒ S ⇒ T ⇒ . . . because of “ ε -productions” Remove ε and unit productions
6/18 D A S E C D AD S C Removing Add a new start variable T If S is the start variable and the not the (new) start variable Removing ε -productions Goal: remove all A → ε rules for every non-start variable A For every rule A → ε where A is rule S → ε exists 1. Remove the rule A → ε 2. If you see B → α A β Add the rule T → S Add a new rule B → αβ S → ACD A → a B → ε C → ED | ε D → BC | b E → b
6/18 AD D C E S A S Add a new start variable T not the (new) start variable If S is the start variable and the Removing ε -productions Goal: remove all A → ε rules for every non-start variable A For every rule A → ε where A is rule S → ε exists 1. Remove the rule A → ε 2. If you see B → α A β Add the rule T → S Add a new rule B → αβ S → ACD D → C A → a ✘✘✘ ✘ B → ε C → ED | ε D → BC | b E → b Removing B → ε
6/18 If S is the start variable and the Add a new start variable T D not the (new) start variable C E S A Removing ε -productions Goal: remove all A → ε rules for every non-start variable A For every rule A → ε where A is rule S → ε exists 1. Remove the rule A → ε 2. If you see B → α A β Add the rule T → S Add a new rule B → αβ S → ACD D → C A → a S → AD ✘✘✘ ✘ B → ε C → ED | ✁ ε D → BC | b E → b Removing C → ε
6/18 If S is the start variable and the Add a new start variable T C not the (new) start variable E S A Removing ε -productions Goal: remove all A → ε rules for every non-start variable A For every rule A → ε where A is rule S → ε exists 1. Remove the rule A → ε 2. If you see B → α A β Add the rule T → S Add a new rule B → αβ S → ACD D → C A → a S → AD ✘✘✘ ✘ B → ε D → ε C → ED | ✁ ε D → BC | b E → b Removing C → ε
6/18 If S is the start variable and the Add a new start variable T S not the (new) start variable A Removing ε -productions Goal: remove all A → ε rules for every non-start variable A For every rule A → ε where A is rule S → ε exists 1. Remove the rule A → ε 2. If you see B → α A β Add the rule T → S Add a new rule B → αβ S → ACD D → C A → a S → AD ✘✘✘ ✘ ✘ ✘✘✘ B → ε D → ε C → ED | ✁ ε C → E D → BC | b E → b Removing D → ε
6/18 If S is the start variable and the Add a new start variable T not the (new) start variable Removing ε -productions Goal: remove all A → ε rules for every non-start variable A For every rule A → ε where A is rule S → ε exists 1. Remove the rule A → ε 2. If you see B → α A β Add the rule T → S Add a new rule B → αβ S → ACD D → C A → a S → AD ✘✘✘ ✘ ✘ ✘✘✘ B → ε D → ε C → ED | ✁ ε C → E D → BC | b S → A E → b Removing D → ε
7/18 B A becomes B If B was removed earlier, don’t add it back Eliminating ε -productions For every A → ε rule where A is not the start variable 1. Remove the rule A → ε 2. If you see B → α A β Add a new rule B → αβ Do 2. every time A appears B → α A β A γ yields B → αβ A γ B → α A βγ B → αβγ
7/18 don’t add it back Eliminating ε -productions For every A → ε rule where A is not the start variable 1. Remove the rule A → ε 2. If you see B → α A β Add a new rule B → αβ B → A becomes B → ε Do 2. every time A appears B → α A β A γ yields If B → ε was removed earlier, B → αβ A γ B → α A βγ B → αβγ
8/18 Eliminating unit productions A unit production is a production of the form Grammar: Unit production graph: S T R A → B S → 0 S 1 | 1 S 0 S | T T → S | R | ε R → 0 SR
9/18 R 0 SR R R S 1 S 0 S 0 S 1 S T Removing unit productions S delete it and replace everything with A 1 If there is a cycle of unit productions Replace T by S A → B → · · · → C → A S → 0 S 1 | 1 S 0 S | T T → S | R | ε R → 0 SR
9/18 T R T S Removing unit productions Replace T by S 1 If there is a cycle of unit productions delete it and replace everything with A A → B → · · · → C → A S → 0 S 1 | 1 S 0 S | � � S → 0 S 1 | 1 S 0 S T → ✓ � � S | R | ε S → R | ε R → 0 SR R → 0 SR
10/18 0 S 1 R 0 SR S by 0 SR R S Replace 0 SR R 0 SR 1 S 0 S S Removal of unit productions R S by 2 replace any chain 0 SR A → B → · · · → C → α A → α, B → α, C → α . . . , S → 0 S 1 | 1 S 0 S | R | ε R → 0 SR
10/18 Replace R S Removal of unit productions by by 2 replace any chain A → B → · · · → C → α A → α, B → α, C → α . . . , S → 0 S 1 | 1 S 0 S S → 0 S 1 | 1 S 0 S | R | ε | 0 SR | ε R → 0 SR R → 0 SR S → R → 0 SR S → 0 SR , R → 0 SR
11/18 Recap Problems: 1. Trying all derivations may take too long 2. If input is not in the language, parsing will never stop Solution to problem 2: 2. Eliminate unit productions Try all possible derivations but stop parsing when ✓ 1. Eliminate ε productions | derived string | > | input |
12/18 Example Conclusion: 0011 00 S 11 too long 00 S 0 S 1 too long 0 S 1 0000 S 1 too long 0000 S 0 S too long 000 S 00 S 10 S too long 00 S 0 S 0 S too long 0 S 0 S S input: 0011 = L S → 0 S 1 | 0 S 0 S | T ⇒ S → 0 S 1 | 0 S 0 S | 0 T → S | 0 0 ✗ 001 ✗ 0000 ✗
12/18 Example Conclusion: 0011 00 S 11 too long 00 S 0 S 1 too long 0 S 1 0000 S 1 too long 0000 S 0 S too long 000 S 00 S 10 S too long 00 S 0 S 0 S too long 0 S 0 S S input: 0011 = L S → 0 S 1 | 0 S 0 S | T ⇒ S → 0 S 1 | 0 S 0 S | 0 T → S | 0 0 ✗ 001 ✗ 0000 ✗
12/18 00 S 0 S 0 S too long 00 S 11 too long 00 S 0 S 1 too long 0 S 1 0000 S 1 too long 0000 S 0 S too long 000 S Example 00 S 10 S too long 0 S 0 S S input: 0011 = S → 0 S 1 | 0 S 0 S | T ⇒ S → 0 S 1 | 0 S 0 S | 0 T → S | 0 0 ✗ 001 ✗ 0000 ✗ Conclusion: 0011 / ∈ L
13/18 Problems 2. If input is not in the language, parsing will never stop 1. Trying all derivations may take too long
14/18 Preparations A faster way to parse: Cocke–Younger–Kasami algorithm To use it we must perprocess the CFG: Eliminate unit productions Convert CFG to Chomsky Normal Form Eliminate ε productions
15/18 = variables with new sequences break up = variables with new terminals Chomsky Normal Form replace Convert to Chomsky Normal Form: or Noam Chomsky A CFG is in Chomsky Normal Form if for start variable S every production has the form but we also allow where neither B nor C is the start variable A → BC A → a S → ε ⇒ ⇒ A → B c DE A → BCDE A → BX C → c X → CY Y → DE C → c
16/18 a B A C S A B Cocke–Younger–Kasami algorithm B a b i a b A C A C let S C S A ℓ S → AB | BC 5 A → BA | a 4 B → CC | b 3 C → AB | a 2 1 Input: x = baaba 1 2 3 4 5 x [ i , ℓ ] = x i x i +1 . . . x i + ℓ − 1 For every substring x [ i , ℓ ] , remember all variables R that derive x [ i , ℓ ] Store in a table T [ i , ℓ ]
Recommend
More recommend