Normalform unwanted in CFG : S ⇒ ∗ x ∈ Σ ∗ – variables not used in successful derivations – A → Λ A variable Λ -productions – A → B A , B variables unit productions [chain rules] Automata Theory Context-Free Languages Normalform 241 / 354
Normalform unwanted in CFG : S ⇒ ∗ x ∈ Σ ∗ – variables not used in successful derivations – A → Λ A variable Λ -productions – A → B A , B variables unit productions [chain rules] restricted CFG , with ‘nice‘ from Chomsky normalform A → BC , A → σ Greibach normalform ( ⊠ ) A → σ B 1 . . . B k Automata Theory Context-Free Languages Normalform 242 / 354
Useful etc. CFG G = ( V , Σ , S , P ) Definition variable A is live if A ⇒ ∗ x for some x ∈ Σ ∗ . variable A is reachable if S ⇒ ∗ α A β for some α , β ∈ ( Σ ∪ V ∗ ) . variable A is useful if there is a derivation of the form S ⇒ ∗ α A β ⇒ ∗ x for some string x ∈ Σ ∗ . useful implies live and reachable. For S → AB | b and A → a , variable A is live and reachable, not useful. [M] Exercise 4.51, 4.52, 4.53 Automata Theory Context-Free Languages Normalform 243 / 354
Recursion, and an algorithm Live variables Construction – N 0 = ∅ – N i + 1 = N i ∪ { A ∈ V | A → α in P , with α ∈ ( N i ∪ Σ ) ∗ } N 1 = { A ∈ V | A → x in P , with x ∈ Σ ∗ } N 0 ⊆ N 1 ⊆ N 2 ⊆ · · · ⊆ V there exists a k such that N k = N k + 1 A is live iff A ∈ � i � 0 N i = N k (minimal) depth of derivation tree A ⇒ ∗ x Automata Theory Context-Free Languages Normalform 244 / 354
Recursion, and an algorithm Live variables Construction – N 0 = ∅ – N i + 1 = N i ∪ { A ∈ V | A → α in P , with α ∈ ( N i ∪ Σ ) ∗ } Exercise 4.53(c i). S → ABC | BaB A → aA | BaC | aaa B → bBb | a C → CA | AC Automata Theory Context-Free Languages Normalform 245 / 354
Algorithm, ctd. Reachable variables Construction – N 0 = { S } – N i + 1 = N i ∪ { A ∈ V | B → α 1 A α 2 in P , with B ∈ N i } N 0 ⊆ N 1 ⊆ N 2 ⊆ · · · ⊆ V there exists a k such that N k = N k + 1 A is reachable iff A ∈ � i � 0 N i = N k (minimal) length of derivation S ⇒ ∗ α A β Automata Theory Context-Free Languages Normalform 246 / 354
Algorithm, ctd. Reachable variables Construction – N 0 = { S } – N i + 1 = N i ∪ { A ∈ V | B → α 1 A α 2 in P , with B ∈ N i } N 0 ⊆ N 1 ⊆ N 2 ⊆ · · · ⊆ V there exists a k such that N k = N k + 1 A is reachable iff A ∈ � i � 0 N i = N k (minimal) length of derivation S ⇒ ∗ α A β – remove all non-live variables (and productions that contain them) – remove all unreachable variables (and productions) then all variables are useful does not work the other way around . . . Automata Theory Context-Free Languages Normalform 247 / 354
Algorithm, ctd. Reachable variables Construction – N 0 = { S } – N i + 1 = N i ∪ { A ∈ V | B → α 1 A α 2 in P , with B ∈ N i } Exercise 4.53(c i). , ctd S → BaB A → aA | aaa B → bBb | a Automata Theory Context-Free Languages Normalform 248 / 354
Algorithm, ctd. – remove all non-live variables (and productions that contain them) – remove all unreachable variables (and productions) then all variables are useful does not work the other way around . . . Exercise 4.53(c i). , revisited S → ABC | BaB A → aA | BaC | aaa B → bBb | a C → CA | AC Automata Theory Context-Free Languages Normalform 249 / 354
Removing Λ -productions Idea: Example A → BCDCB B → b | Λ C → c | Λ D → d Automata Theory Context-Free Languages Normalform 250 / 354
Definition variable A is nullable iff A ⇒ ∗ Λ Theorem – if A → Λ then A is nullable – if A → B 1 B 2 . . . B k and all B i are nullable, then A is nullable [M] Def 4.26 / Exercise 4.48 Construction – N 0 = ∅ – N i + 1 = N i ∪ { A ∈ V | A → α in P , with α ∈ N ∗ i } N 1 = { A ∈ V | A → Λ in P } N 0 ⊆ N 1 ⊆ N 2 ⊆ · · · ⊆ V there exists a k such that N k = N k + 1 A is nullable iff A ∈ � i � 0 N i = N k Automata Theory Context-Free Languages Normalform 251 / 354
Construction – identify nullable variables – for every production A → α add A → β , where β is obtained from α by removing one or more nullable variables – remove all Λ -productions (and all productions A → A ) Grammar for { a i b j c k | i = j or i = k } S → TU | V T → aTb | Λ U → cU | Λ V → aVc | W W → bW | Λ Automata Theory Context-Free Languages Normalform 252 / 354
Example nullable Grammar for { a i b j c k | i = j or i = k } S → TU | V T → aTb | Λ U → cU | Λ V → aVc | W W → bW | Λ N 1 = { T , U , W } , variables with Λ at right-hand side productions N 2 = { T , U , W } ∪ { S , V } , variables with { T , U , W } ∗ at rhs productions N 3 = N 2 = { T , U , W , S , V } , all productions found, no new Automata Theory Context-Free Languages Normalform 253 / 354
Example nullable, ctd add all productions, where (any number of) nullable variables are removed. . . S → TU | V T → aTb | Λ U → cU | Λ V → aVc | W W → bW | Λ [M] Ex. 4.31 Automata Theory Context-Free Languages Normalform 254 / 354
Example nullable, ctd add all productions, where (any number of) nullable variables are removed S → TU | V S → T | U | Λ T → aTb | Λ T → ab U → cU | Λ U → c V → aVc | W V → ac | Λ W → bW | Λ W → b remove all Λ -productions. . . [M] Ex. 4.31 Automata Theory Context-Free Languages Normalform 255 / 354
Example nullable, ctd add all productions, where (any number of) nullable variables are removed S → TU | V S → T | U | Λ T → aTb | Λ T → ab U → cU | Λ U → c V → aVc | W V → ac | Λ W → bW | Λ W → b remove all Λ -productions S → TU | V | T | U T → aTb | ab U → cU | c V → aVc | W | ac W → bW | b [M] Ex. 4.31 Automata Theory Context-Free Languages Normalform 256 / 354
Removing Λ -productions Theorem For every CFG G there is CFG G 1 without Λ -productions such that L ( G 1 ) = L ( G ) − { Λ } . Proof. . . [M] Thm 4.27 Automata Theory Context-Free Languages Normalform 257 / 354
Removing unit productions Assume Λ -productions have been removed Variable B is A-derivable , if – B � = A , and – A ⇒ ∗ B (using only unit productions) Construction – N 1 = { B ∈ V | B � = A and A → B in P } – N i + 1 = N i ∪ { C ∈ V | C � = A and B → C in P , with B ∈ N i } N 1 ⊆ N 2 ⊆ · · · ⊆ V there exists a k such that N k = N k + 1 B is A -derivable iff B ∈ � i � 0 N i = N k Automata Theory Context-Free Languages Normalform 258 / 354
Removing unit productions Construction – for each A ∈ V , identify A -derivable variables – for every pair ( A , B ) where B is A -derivable, and every production B → α add A → α – remove all unit productions Grammar for { a i b j c k | i = j or i = k } S → TU | V | T | U T → aTb | ab U → cU | c V → aVc | W | ac W → bW | b Automata Theory Context-Free Languages Normalform 259 / 354
Example unit productions S → TU | V | T | U T → aTb | ab U → cU | c V → aVc | W | ac W → bW | b S -derivable: { V , T , U } , { V , T , U , W } V -derivable: { W } New productions: S → aTb | ab S → cU | c S → aVc | W | ac S → bW | b V → bW | b Remove unit productions: S → TU | aTb | ab | cU | c | aVc | ac | bW | b T → aTb | ab U → cU | c V → aVc | ac | bW | b W → bW | b Automata Theory Context-Free Languages Normalform 260 / 354
Definition CFG in Chomsky normal form productions are of the form – A → BC variables A , B , C – A → σ variable A , terminal σ Theorem For every CFG G there is CFG G 1 in CNF such that L ( G 1 ) = L ( G ) − { Λ } . [M] Def 4.29, Thm 4.30 Automata Theory Context-Free Languages Chomsky normalform 261 / 354
Construction ChNF Construction 1 remove Λ -productions 2 remove unit productions 3 introduce variables for terminals X σ → σ 4 split long productions A → aBabA is replaced by X a → a X b → b A → X a BX a X b A A → ACBA is replaced by A → AY 1 Y 1 → CY 2 Y 2 → BA Automata Theory Context-Free Languages Chomsky normalform 262 / 354
ChNF, example Grammar for { a i b j c k | i = j or i = k } S → TU | V T → aTb | Λ U → cU | Λ V → aVc | W W → bW | Λ After removing Λ -productions and unit productions, we obtain (see before) S → TU | aTb | ab | cU | c | aVc | ac | bW | b T → aTb | ab U → cU | c V → aVc | ac | bW | b W → bW | b Now introduce productions for the terminals. . . Automata Theory Context-Free Languages Chomsky normalform 263 / 354
ChNF, example Grammar for { a i b j c k | i = j or i = k } S → TU | V T → aTb | Λ U → cU | Λ V → aVc | W W → bW | Λ After removing Λ -productions and unit productions, we obtain (see before) S → TU | aTb | ab | cU | c | aVc | ac | bW | b T → aTb | ab U → cU | c V → aVc | ac | bW | b W → bW | b Now introduce productions for the terminals: X a → a X b → b X c → c S → TU | X a TX b | X a X b | X c U | c | X a VX c | X a X c | X b W | b T → X a TX b | X a X b U → X c U | c V → X a VX c | X a X c | X b W | b W → X b W | X b Automata Theory Context-Free Languages Chomsky normalform 264 / 354
ChNF, example ctd. Only a few productions that are too long: S → X a TX b | X a VX c T → X a TX b V → X a VX c Split these long productions. . . Automata Theory Context-Free Languages Chomsky normalform 265 / 354
More recommend