Cleaning Up Grammars W e can \simplify" grammars to a great exten t. Some of the things w e can do are: 1. Get rid of sym b ols | those that do useless not participate in an y deriv ation of a terminal string. 2. Get rid of � -pr o ductions | those of the form v ariable ! � . ✦ W ell sort of; y ou lose the abilit y to generate as a string in the language. � 3. Get rid of unit pr o ductions | those of the form v ariable ! v ariable. 4. Chomsky normal form | only pro duction forms are v ariable ! t w o v ariables and v ariable ! terminal. Useless Sym b ols In order for a sym b ol X to b e useful, it m ust: 1. Deriv e some terminal string (p ossibly a X is terminal). * 2. Be from the start sym b ol; i.e., ) r e achable S . �X � � Note that X w ouldn't really b e useful if � or � included a sym b ol that didn't satisfy (1), so it is imp ortan t that (1) b e tested �rst, and sym b ols that don't deriv e terminal strings b e eliminated b efor e testing (2). Finding Sym b ols That Don't Deriv e An y T erminal String Recursiv e construction: Basis : A terminal surely deriv es a terminal string. : If is the head of a pro duction whose Induction A b o dy is X X � � � X , and eac h X is kno wn to 1 2 k i deriv e a terminal string, then surely A deriv es a terminal string. � Keep going un til no more sym b ols that deriv e terminal strings are disco v ered. Example S ! AB j C ; A ! 0 B j C ; B ! 1 j A 0; C ! AC j C 1. � Round 1: 0 and 1 are \in." � Round 2: B ! 1 sa ys B is in. 1
� Round 3: ! 0 B sa ys is in. A A � Round 4: ! sa ys is in. S AB S � Round 5: Nothing more can b e added. � Th us, can b e eliminated, along with an y C pro duction that men tions it, lea ving ! ; S AB ! 0 B ; ! 1 j A 0. A B Finding Sym b ols That Cannot Be Deriv ed F rom the Start Sym b ol Another recursiv e algorithm: : is \in." Basis S Induction : If v ariable A is in, then so is ev ery sym b ol in the pro duction b o dies for A . � Keep going un til no more sym b ols deriv able from can b e found. S Example ! ; ! 0 B ; ! 1 j A 0. S AB A B � Round 1: is in. S � Round 2: A and B are in. � Round 3: 0 and 1 are in. � Round 4: Nothing can b e added. � In this case, all sym b ols are deriv able from , S so no c hange to gramma r. � Reader has an example where not only are there sym b ols not deriv able from S , but y ou eliminate �rst the sym b ols that don't must deriv e terminal strings, or y ou get the wrong grammar. Eliminati ng � -Pro duction s * A v ariable is if ) � . Find them b y a A nul lable A recursiv e algorithm: : If ! is a pro duction, then is Basis A � A n ullable. Induction : If A is the head of a pro duction whose b o dy consists of only n ullable sym b ols, then A is n ullable. � Once w e ha v e the n ullable sym b ols, w e can add additional pro ductions and then thro w a w a y the pro ductions of the form A ! � for an y A . 2
� If ! � � � is a pro duction, add all A X X X k 1 2 pro ductions that can b e formed b y eliminating some or all of those X 's that are n ullable. i ✦ But, don't eliminate all k if they are all n ullable. Example If A ! B C is a pro duction, and b oth B and C are n ullable, add A ! B j C . Eliminati ng Unit Pro ductions 1. Eliminate useless sym b ols and � -pro ductions. 2. Disco v er those pairs of v ariables ( A; ) suc h B * that ) . A B ✦ Because there are no � -pro ductions, this deriv ation can only use unit pro ductions. ✦ Th us, w e can �nd the pairs b y computing reac hablit y in a graph where no des = v ariables, and arcs = unit pro ductions. * 3. Replace eac h com bination where A ) B ) � and � is other than a single v ariable b y A ! � . ✦ I.e., \short circuit" sequences of unit pro ductions, whic h m ust ev en tually b e follo w ed b y some other kind of pro duction. Remo v e all unit pro ductions. Chomsky Normal F orm 0. Get rid of useless sym b ols, � -pro ductions, and unit pro ductions (already done). 1. Get rid of pro ductions whose b o dies are mixes of terminals and v ariables, or consist of more than one terminal. 2. Break up pro duction b o dies longer than 2. � Result: All pro ductions are of the form ! A or ! a . B C A No Mixed Bo dies 1. F or eac h terminal a , in tro duce a new v ariable A , with one pro duction A ! a . a a 2. Replace in an y b o dy where it is not the a en tire b o dy b y . A a ✦ No w, ev ery b o dy is either a single terminal or it consists only of v ariables. 3
Example ! 0 B 1 b ecomes ! 0; ! 1; ! . A A A A A B A 0 1 0 1 Making Bo dies Short If w e ha v e a pro duction lik e ! , w e A B C D E can in tro duce some new v ariables that allo w the v ariables of the b o dy to b e in tro duced one at a time. � A b o dy of length requires � 2 new k k v ariables. � Example: In tro duce F and G ; replace A ! B C D E b y A ! B F ; F ! C G ; G ! D E . Summary Theorem If is an y CFL, there is a gramma r that L G generates � f � g , for whic h eac h pro duction is L of the form A ! B C or A ! a , and there are no useless sym b ols. CFL Pumping Lemma Similar to regular-language PL, but y ou ha v e to pump t w o strings in the middle of the string, in tandem (i.e., the same n um b er of copies of eac h). F ormally: � 8 CFL L � 9 in teger n � 8 z in L , with j z j � n � 9 uv w xy = z suc h that j v w x j � n and j v x j > 0 i i � 8 � 0, is in L . i uv w x y Outline of Pro of of PL � Let there b e a Chomsky-normal -fo rm CF G for m with v ariables. Pic k = 2 . L m n � Because CNF grammars ha v e b o dies of no more than 2 sym b ols, a string z of length � n m ust ha v e some path with at least m + 1 v ariables. � Th us, some v ariable m ust app ear t wice on the path. ✦ Compare with the DF A argumen t ab out a path longer than the n um b er of states. 4
� F o cus on some path that is as long as an y path in the tree. In this path, w e can �nd a duplication of some v ariable A among the b ottom m + 1 v ariables on the path. ✦ Let the lo w er A deriv e w and the upp er A deriv e v w x . � CNF guaran tees us that j v w x j � n and v x 6 = � . � By rep eatedly replacing the lo w er A 's tree b y i i the upp er A 's tree, w e see uv w x y has a parse tree for all 1. i > ✦ And replacing the upp er b y the lo w er sho ws the case = 0; i.e., is in L . i uw y Example 2 k = f 0 j is an y in teger g is not a CFL. L k � Supp ose it w ere. Then let b e the PL n constan t for L . 2 n � Consider = 0 . W e can write = , z z uv w xy with j v x j � and j v x j 0. w n > 2 � Then is in L . But uv v w xxy n < 2 2 j uv v w xxy j � n + n < ( n + 1) , so there is no p erfect square that j uv v w xxy j could b e. � By \pro of b y con tradiction," L is not a CFL. 5
Recommend
More recommend