Chomsky Normal Form • Chomsky Normal Form Chomsky Normal Form – A context free grammar is in Chomsky Normal Form (CNF) if every production is of the form: • A → BC • A → a • Where A,B, and C are variables and a is a terminal. Theory Hall of Fame Chomsky Normal Form • Noam Chomsky • If we can put a CFG into CNF, then we can – The Grammar Guy calculate the “depth” of the longest branch – 1928 – of a parse tree for the derivation of a string. – b. Philadelphia, PA – PhD – UPenn (1955) A At most 2 branches at • Linguistics every node – Prof at MIT (Linguistics) B C (1955 - present) – Probably more famous for his a leftist political views. Removing ε -Productions Chomsky Normal Form • A ε -Productions is a production of the form • 3 Step process: • A → ε – Basic idea 1. Remove ε - Productions • Very similar to removing ε transitions from a NFA- 2. Remove Unit Productions ε • Find the set of all variables A such that A ⇒ * ε (set 3. Remove Useless Symbols of nullable variables) • For all productions that contain a nullable variable on the right hand side, add a production that eliminates the nullable from the right hand side 1
Removing ε -Productions Removing ε -Productions • We must be a bit careful here • Step 1: Find the set of nullable variables: – If ε is in a CFL, then the production S → ε – Example: • S → AB must be in the production set. • A → aAA | ε – The algorithm to be described will generate L – { ε } • B → bBB | ε • All variables are nullable – A and B are nullable since A → ε and B → ε – S is nullable since S → AB and A and B are nullable Removing ε -Productions Removing ε -Productions • Step 2: Remove nullable variables Step 2: Remove nullable variables Example: – For all productions A → β where β contains • S → AB • A → aAA | ε nullable variables, add a new production with each nullable removed from β • B → bBB | ε • All variables are nullable Removing ε -Productions Removing ε -Productions • Step 2: Remove nullable variables Example: • Step 2: Remove nullable variables – Consider: S → AB – Our grammar now looks like: • Add to P: S → A and S → B • S → AB | A | B • A → aAA | aA | a | ε – Consider: A → aAA • B → bBB | bB | b | ε • Add to P: A → aA and A → a – Consider: B → bBB • Add to P: B → bB and B → b 2
Removing ε -Productions Removing Unit Productions • Step 3: Remove your ε -Productions • A Unit Productions is a production of the form – Example: • A → B where A and B are variable • Remove A → ε and B → ε – Basic idea • Our final grammar looks like: • Very similar to removing ε productions – S → AB | A | B • For each variable A, find the set of all variables B – A → aAA | aA | a such that A ⇒ * B by just following unit productions – B → bBB | bB | b (A-derivable) • For all variables B that are A derivable and for all – Questions? productions B → α , add the production A → α Removing Unit Productions Removing Unit Productions • Step 0: Remove ε -Productions using the • Step 1: For all variables A find the set of previous algorithm. A-derivable variables: – Recursive definition of A-derivable 1. If A → B then B is A-derivable 2. If C is A derivable and C → B (and B ≠ A), then B is A derivable 3. No other variables are A-derivable. Removing Unit Productions Removing Unit Productions • Step 1: For all variables A find the set of A- • Step 1: For all variables A find the set of A- derivable variables: derivable variables: – Example: – Example: • S → S + T | T • S → S + T | T • T → T * F | F • T → T * F | F • F → (S) | a • F → (S) | a • Let’s find the set of S-derivable variables: • S-derivable = {T, F} – T is S derivable since S → T • T-derivable = {F} – F is S derivable since T → F and T is S derivable • F-derivable = ∅ 3
Removing Unit Productions Removing Unit Productions • Step 2: • Step 2: For each variable A, if B is A- – Example: derivable, for each non-unit production B • S → S + T | T → β , add the production A → β • T → T * F | F • F → (S) | a • S-derivable = {T, F} • T-derivable = {F} • Add to P: S → T * F, S → (S) | a : T → (S) | a • Removing Unit Productions Removing Unit Productions • Step 3: Remove Unit Productions • Step 2: – Our final grammar looks like: – Our new grammar now looks like: – Our new grammar now looks like: • S → S + T | T * F | (S) | a | T • S → S + T | T * F | (S) | a • T → T * F | (S) | a | F • T → T * F | (S) | a • F → (S) | a • Remove S → T, T → F – Questions Removing Useless Symbols Removing Useless Symbols • A symbol X is useful for a grammar G = (V, T, P, • Definitions: S) if – We say a symbol X is generating if: – S ⇒ * α X β ⇒ * w where w ∈ L(G) • X ⇒ * w for some w ∈ L(G) • In other words, a useful symbol will be used – We say a symbol X is reachable if: somewhere in the derivation of a string in the • S ⇒ * α X β for some α , β language. • Symbols that are useful must be both • Any symbol that is not useful is useless. generating and reachable. • Useless symbols do not add to the language – Such symbols (and assoc. productions) can be generated by a grammar, so it’s okay to remove removed them. 4
Removing useless symbols Removing useless symbols • Algorithm: • Finding generating symbols 1. Eliminate all non generating symbols 1. All symbols in T are generating 2. If A → α and all symbols in α are 2. Eliminate all non reachable symbols from resultant grammar. generating, then A is generating. 3. No other symbols are generating. Removing useless symbols Removing Useless Symbols • Finding reachable symbols • Example: S → AB | a 1. S is reachable 2. If A is reachable, and A → α , then all A → b variables in α are reachable. B is useless since it is not generating Eliminate it Removing useless symbols Recall our goal • Example: • Chomsky Normal Form S → a – A context free grammar is in Chomsky Normal A → b Form (CNF) if every production is of the form: • A → BC – Now A is not reachable, eliminate it! • A → a S → a • Where A,B, and C are variables and a is a terminal. Note that you must eliminate non-generating symbols before non-reachable symbols. 5
Chomsky Normal Form Chomsky Normal Form • Given a CFG G, there is an equivalent CFG, • Step 1: G’ in Chomsky Normal form such that – Remove ε -Productions – L(G’) = L(G) – { ε } • Step 2: – Remove Unit Productions • Step 3: – Remove useless symbols Chomsky Normal Form Chomsky Normal Form • Step 4: • After steps 1 – 3 : – All productions are of the form: – Let’s go back to our first example: • A → a where A is a variable and a is a terminal – S → AB | A | B – A → aAA | aA | a • A → β where | β | ≥ 2 and β contains variables and/or – B → bBB | bB | b terminals. • Removing unit transitions: – Step 4: Derive terminals from new variables: – S → AB | aAA | aA | a | bBB | bB | b • For all productions of the 2 nd type: A → β , for all terminals a in β , create a new variable X a – A → aAA | aA | a – B → bBB | bB | b • Add a new production X a → a • Replace a in β with X a • Note that S, A, and B are all useful. Chomsky Normal Form Chomsky Normal Form • Step 4: • After steps 1 – 4 : – Define new productions: X a → a and X b → b and – All productions are of the form: replace instance of a with X a , similarly for b • A → a where A is a variable and a is a terminal – S → AB | aAA | aA | a | bBB | bB | b – A → aAA | aA | a • A → β where | β | ≥ 2 and β contains only variables. – B → bBB | bB | b – Step 5: • New: – S → AB | X a AA | X a A | a | X b BB | X b B | b • For all productions of type 2 where | β | > 2 , replace – A → X a AA | X a A | a the production with a series of new productions each – B → X b BB | X b B | b having exactly 2 variables on the right – X a → a – X b → b • Best illustrated with an example 6
Chomsky Normal Form Chomsky Normal Form • Step 4: • Step 4: – The production: – Back to our example • A → BCDBCE – S → AB | X a AA | X a A | a | X b BB | X b B | b – A → X a AA | X a A | a – Would be replaced with – B → X b BB | X b B | b • A → BY 1 – X a → a • Y 1 → CY 2 – X b → b • Y 2 → DY 3 – Add productions • Y 3 → BY 4 • Y 1 → AA • Y 4 → CE • Y 2 → BB Chomsky Normal Form CNF • Step 4: • Any grammar can be placed into CNF – Our final grammar – S → AB | X a Y 1 | X a A | a | X b Y 2 | X b B | b – A → X a Y 1 | X a A | a • Why bother? – B → X b Y 2 | X b B | b – Remember that awful CFG we generated last – Y 1 → AA week? – Y 2 → BB – X a → a • Simplification – X b → b – Gives upper limit on size of parse tree – Questions • Pumping Lemma will need this. Questions? • Next time – The Return of the pumping lemma 7
Recommend
More recommend