parsing
play

Parsing CSCI 3130 Formal Languages and Automata Theory Siu On CHAN - PowerPoint PPT Presentation

1/18 Parsing CSCI 3130 Formal Languages and Automata Theory Siu On CHAN Chinese University of Hong Kong Fall 2015 2/18 Parsing input: 0011 If so, how to build a parse tree with a program? S 0 S 1 | 1 S 0 S | T T S | Is 0011 L


  1. 1/18 Parsing CSCI 3130 Formal Languages and Automata Theory Siu On CHAN Chinese University of Hong Kong Fall 2015

  2. 2/18 Parsing input: 0011 If so, how to build a parse tree with a program? S → 0 S 1 | 1 S 0 S | T T → S | ε Is 0011 ∈ L ?

  3. 3/18 Parsing … 00 S 11 00 T 11 00 S 11 … 01 S 0 S 1 … 0 T 1 0 S 1 … 10 S 10 S … 1 S 0 S … S T S Try all derivations? input: 0011 This is (part of) the tree of all derivations, not the parse tree S → 0 S 1 | 1 S 0 S | T T → S | ε 0011 ✓ ε

  4. 3/18 Parsing … 00 S 11 00 T 11 00 S 11 … 01 S 0 S 1 … 0 T 1 0 S 1 … 10 S 10 S … 1 S 0 S … S T S Try all derivations? input: 0011 This is (part of) the tree of all derivations, not the parse tree S → 0 S 1 | 1 S 0 S | T T → S | ε 0011 ✓ ε

  5. 3/18 Parsing … 00 S 11 00 T 11 00 S 11 … 01 S 0 S 1 … 0 T 1 0 S 1 … 10 S 10 S … 1 S 0 S … S T S Try all derivations? input: 0011 This is (part of) the tree of all derivations, not the parse tree S → 0 S 1 | 1 S 0 S | T T → S | ε 0011 ✓ ε

  6. 3/18 Parsing … 00 S 11 00 T 11 00 S 11 … 01 S 0 S 1 … 0 T 1 0 S 1 … 10 S 10 S … 1 S 0 S … S T S Try all derivations? input: 0011 This is (part of) the tree of all derivations, not the parse tree S → 0 S 1 | 1 S 0 S | T T → S | ε 0011 ✓ ε

  7. 4/18 Problems 2. If input is not in the language, parsing will never stop Let’s tackle the 2nd problem 1. Trying all derivations may take too long

  8. Derviation may loop because 5/18 Derived string may shrink Remove of “unit productions” T S T S because of “ -productions” 01 When to stop 0 T 1 0 S 1 S Problems: Idea: Stop when and unit productions S → 0 S 1 | 1 S 0 S | T | derived string | > | input | T → S | ε

  9. Derviation may loop because 5/18 S Remove of “unit productions” T S T and unit productions When to stop Derived string may shrink Problems: Idea: Stop when S → 0 S 1 | 1 S 0 S | T | derived string | > | input | T → S | ε S ⇒ 0 S 1 ⇒ 0 T 1 ⇒ 01 because of “ ε -productions”

  10. 5/18 When to stop Idea: Stop when Problems: Derived string may shrink Derviation may loop because of “unit productions” S → 0 S 1 | 1 S 0 S | T | derived string | > | input | T → S | ε S ⇒ 0 S 1 ⇒ 0 T 1 ⇒ 01 S ⇒ T ⇒ S ⇒ T ⇒ . . . because of “ ε -productions” Remove ε and unit productions

  11. 6/18 D A S E C D AD S C Removing Add a new start variable T If S is the start variable and the not the (new) start variable Removing ε -productions Goal: remove all A → ε rules for every non-start variable A For every rule A → ε where A is rule S → ε exists 1. Remove the rule A → ε 2. If you see B → α A β Add the rule T → S Add a new rule B → αβ S → ACD A → a B → ε C → ED | ε D → BC | b E → b

  12. 6/18 AD D C E S A S Add a new start variable T not the (new) start variable If S is the start variable and the Removing ε -productions Goal: remove all A → ε rules for every non-start variable A For every rule A → ε where A is rule S → ε exists 1. Remove the rule A → ε 2. If you see B → α A β Add the rule T → S Add a new rule B → αβ S → ACD D → C A → a ✘✘✘ ✘ B → ε C → ED | ε D → BC | b E → b Removing B → ε

  13. 6/18 If S is the start variable and the Add a new start variable T D not the (new) start variable C E S A Removing ε -productions Goal: remove all A → ε rules for every non-start variable A For every rule A → ε where A is rule S → ε exists 1. Remove the rule A → ε 2. If you see B → α A β Add the rule T → S Add a new rule B → αβ S → ACD D → C A → a S → AD ✘✘✘ ✘ B → ε C → ED | ✁ ε D → BC | b E → b Removing C → ε

  14. 6/18 If S is the start variable and the Add a new start variable T C not the (new) start variable E S A Removing ε -productions Goal: remove all A → ε rules for every non-start variable A For every rule A → ε where A is rule S → ε exists 1. Remove the rule A → ε 2. If you see B → α A β Add the rule T → S Add a new rule B → αβ S → ACD D → C A → a S → AD ✘✘✘ ✘ B → ε D → ε C → ED | ✁ ε D → BC | b E → b Removing C → ε

  15. 6/18 If S is the start variable and the Add a new start variable T S not the (new) start variable A Removing ε -productions Goal: remove all A → ε rules for every non-start variable A For every rule A → ε where A is rule S → ε exists 1. Remove the rule A → ε 2. If you see B → α A β Add the rule T → S Add a new rule B → αβ S → ACD D → C A → a S → AD ✘✘✘ ✘ ✘ ✘✘✘ B → ε D → ε C → ED | ✁ ε C → E D → BC | b E → b Removing D → ε

  16. 6/18 If S is the start variable and the Add a new start variable T not the (new) start variable Removing ε -productions Goal: remove all A → ε rules for every non-start variable A For every rule A → ε where A is rule S → ε exists 1. Remove the rule A → ε 2. If you see B → α A β Add the rule T → S Add a new rule B → αβ S → ACD D → C A → a S → AD ✘✘✘ ✘ ✘ ✘✘✘ B → ε D → ε C → ED | ✁ ε C → E D → BC | b S → A E → b Removing D → ε

  17. 7/18 B A becomes B If B was removed earlier, don’t add it back Eliminating ε -productions For every A → ε rule where A is not the start variable 1. Remove the rule A → ε 2. If you see B → α A β Add a new rule B → αβ Do 2. every time A appears B → α A β A γ yields B → αβ A γ B → α A βγ B → αβγ

  18. 7/18 don’t add it back Eliminating ε -productions For every A → ε rule where A is not the start variable 1. Remove the rule A → ε 2. If you see B → α A β Add a new rule B → αβ B → A becomes B → ε Do 2. every time A appears B → α A β A γ yields If B → ε was removed earlier, B → αβ A γ B → α A βγ B → αβγ

  19. 8/18 Eliminating unit productions A unit production is a production of the form Grammar: Unit production graph: S T R A → B S → 0 S 1 | 1 S 0 S | T T → S | R | ε R → 0 SR

  20. 9/18 R 0 SR R R S 1 S 0 S 0 S 1 S T Removing unit productions S delete it and replace everything with A 1 If there is a cycle of unit productions Replace T by S A → B → · · · → C → A S → 0 S 1 | 1 S 0 S | T T → S | R | ε R → 0 SR

  21. 9/18 T R T S Removing unit productions Replace T by S 1 If there is a cycle of unit productions delete it and replace everything with A A → B → · · · → C → A S → 0 S 1 | 1 S 0 S | � � S → 0 S 1 | 1 S 0 S T → ✓ � � S | R | ε S → R | ε R → 0 SR R → 0 SR

  22. 10/18 0 S 1 R 0 SR S by 0 SR R S Replace 0 SR R 0 SR 1 S 0 S S Removal of unit productions R S by 2 replace any chain 0 SR A → B → · · · → C → α A → α, B → α, C → α . . . , S → 0 S 1 | 1 S 0 S | R | ε R → 0 SR

  23. 10/18 Replace R S Removal of unit productions by by 2 replace any chain A → B → · · · → C → α A → α, B → α, C → α . . . , S → 0 S 1 | 1 S 0 S S → 0 S 1 | 1 S 0 S | R | ε | 0 SR | ε R → 0 SR R → 0 SR S → R → 0 SR S → 0 SR , R → 0 SR

  24. 11/18 Recap Problems: 1. Trying all derivations may take too long 2. If input is not in the language, parsing will never stop Solution to problem 2: 2. Eliminate unit productions Try all possible derivations but stop parsing when ✓ 1. Eliminate ε productions | derived string | > | input |

  25. 12/18 Example Conclusion: 0011 00 S 11 too long 00 S 0 S 1 too long 0 S 1 0000 S 1 too long 0000 S 0 S too long 000 S 00 S 10 S too long 00 S 0 S 0 S too long 0 S 0 S S input: 0011 = L S → 0 S 1 | 0 S 0 S | T ⇒ S → 0 S 1 | 0 S 0 S | 0 T → S | 0 0 ✗ 001 ✗ 0000 ✗

  26. 12/18 Example Conclusion: 0011 00 S 11 too long 00 S 0 S 1 too long 0 S 1 0000 S 1 too long 0000 S 0 S too long 000 S 00 S 10 S too long 00 S 0 S 0 S too long 0 S 0 S S input: 0011 = L S → 0 S 1 | 0 S 0 S | T ⇒ S → 0 S 1 | 0 S 0 S | 0 T → S | 0 0 ✗ 001 ✗ 0000 ✗

  27. 12/18 00 S 0 S 0 S too long 00 S 11 too long 00 S 0 S 1 too long 0 S 1 0000 S 1 too long 0000 S 0 S too long 000 S Example 00 S 10 S too long 0 S 0 S S input: 0011 = S → 0 S 1 | 0 S 0 S | T ⇒ S → 0 S 1 | 0 S 0 S | 0 T → S | 0 0 ✗ 001 ✗ 0000 ✗ Conclusion: 0011 / ∈ L

  28. 13/18 Problems 2. If input is not in the language, parsing will never stop 1. Trying all derivations may take too long

  29. 14/18 Preparations A faster way to parse: Cocke–Younger–Kasami algorithm To use it we must perprocess the CFG: Eliminate unit productions Convert CFG to Chomsky Normal Form Eliminate ε productions

  30. 15/18 = variables with new sequences break up = variables with new terminals Chomsky Normal Form replace Convert to Chomsky Normal Form: or Noam Chomsky A CFG is in Chomsky Normal Form if for start variable S every production has the form but we also allow where neither B nor C is the start variable A → BC A → a S → ε ⇒ ⇒ A → B c DE A → BCDE A → BX C → c X → CY Y → DE C → c

  31. 16/18 a B A C S A B Cocke–Younger–Kasami algorithm B a b i a b A C A C let S C S A ℓ S → AB | BC 5 A → BA | a 4 B → CC | b 3 C → AB | a 2 1 Input: x = baaba 1 2 3 4 5 x [ i , ℓ ] = x i x i +1 . . . x i + ℓ − 1 For every substring x [ i , ℓ ] , remember all variables R that derive x [ i , ℓ ] Store in a table T [ i , ℓ ]

Recommend


More recommend