Compiler Construction Lecture 4: Lexical analysis in the real world 2020-01-17 Michael Engel Includes material by Jan Christian Meyer
Overview • NFA to DFA conversion • Subset construction algorithm • DFA state minimization: • Hopcroft's algorithm • Myhill-Nerode method • Using a scanner generator • lex syntax and usage • lex examples Compiler Construction 04: Lexical analysis in the real world � 2
What have we achieved so far? • We know a method to convert a regular expression: (all | and) into a nondeterministic finite automaton (NFA): l a l a d n using the McNaughton, Thompson and Yamada algorithm Compiler Construction 04: Lexical analysis in the real world � 3
Overhead of constructed NFAs Let’s look at another example: a(b|c)* • Construct the simple NFAs for a , b and c a b c s 1 s 0 s 2 s 4 s 3 s 5 • Construct the NFA for b|c b s 3 ε s 2 ε s 6 s 7 ε ε s 5 s 4 c Compiler Construction 04: Lexical analysis in the real world � 4
Overhead of constructed NFAs • Now construct the NFA for (b|c)* ε b s 3 s 2 ε ε ε ε s 8 s 6 s 7 s 9 ε ε s 5 s 4 c ε • Looks pretty complex already? We're not even finished… Compiler Construction 04: Lexical analysis in the real world � 5
Overhead of constructed NFAs • Finally, construct the NFA for a(b|c)* ε b ε s 3 ε s 2 a ε ε ε s 0 s 1 s 8 s 6 s 7 s 9 ε ε s 5 s 4 c ε • This NFA has many more states than a minimal human-built DFA: b,c a s 1 s 0 Compiler Construction 04: Lexical analysis in the real world � 6
From NFA to DFA • An NFA is not really helpful …since its implementation is not obvious • We know: every DFA is also an NFA (without ε -transitions) • Every NFA can also be converted to an equivalent DFA (this can be proven by induction, we just show the construction) • The method to do this is called subset construction: The alphabet 𝛵 stays the same NFA: ( Q N , 𝛵 , 𝜀 N , n 0 , F N ) The set of states Q N , transition function 𝜀 N , start state q N0 and set of accepting states F N DFA: ( Q D , 𝛵 , 𝜀 D , d 0 , F D ) are modified Compiler Construction 04: Lexical analysis in the real world � 7
Subset construction algorithm Idea of the algorithm: q 0 ← ε - cl osu r e({n 0 }); Find sets of states that are Q D ← q 0 ; equivalent (due to ε - Wo rk L i s t ← {q 0 }; transitions) and join these to form states of a DFA wh il e (Wo rk L i s t ! = ∅ ) do r emo v e q fr om Wo rk L i s t ; ε -closure: f o r ea c h c ha r a ct e r c ∈︎ 𝛵 do contains a set of states S and t ← ε - cl osu r e( 𝜀 N (q, c )); any states in the NFA that can 𝜀 D [q, c ] ← t ; be reached from one of the if t ∉ Q D t hen states in S along paths that add t t o Q D and t o Wo rk L i s t ; contain only ε -transitions end; (these are identical to a state end; in S ) Compiler Construction 04: Lexical analysis in the real world � 8
Subset construction example ε q 0 ← ε - cl osu r e({n 0 }); b n 5 ε ε n 4 Q D ← q 0 ; ε ε n 1 ε n 0 a Wo rk L i s t ← {q 0 }; n 2 n 3 n 8 n 9 wh il e (Wo rk L i s t ! = ∅ ) do ε ε n 6 n 7 c r emo v e q fr om Wo rk L i s t ; ε f o r ea c h c ha r a ct e r c ∈︎ 𝛵 do t ← ε - cl osu r e( 𝜀 N (q, c )); a b c ε 𝜀 N 𝜀 D [q, c ] ← t ; n 0 n 1 – – – if t ∉ Q D t hen n 1 – – – n 2 add t t o Q D and t o Wo rk L i s t ; n 2 – – – n 3, n 9 end; q 0 ← {n 0 } end; n 3 – – – n 4, n 6 Q D ← {n 0 }; n 4 – n 5 – – Wo rk L i s t ← {n 0 }; n 5 – – – n 8 n 6 – – n 7 – n 7 – – – n 8 n 8 – – – n 3, n 9 n 9 – – – – Compiler Construction 04: Lexical analysis in the real world � 9
Subset construction example ε q 0 ← ε - cl osu r e({n 0 }); b n 5 ε ε n 4 Q D ← q 0 ; ε ε n 1 ε n 0 a Wo rk L i s t ← {q 0 }; n 2 n 3 n 8 n 9 wh il e (Wo rk L i s t ! = ∅ ) do ε ε n 6 n 7 c r emo v e q fr om Wo rk L i s t ; ε f o r ea c h c ha r a ct e r c ∈︎ 𝛵 do t ← ε - cl osu r e( 𝜀 N (q, c )); a b c ε 𝜀 N wh il e- l oop I t e r a ti on 1 𝜀 D [q, c ] ← t ; n 0 n 1 – – – Wo rk L i s t ← {{n 0 }}; if t ∉ Q D t hen q ← n 0 ; n 1 – – – n 2 add t t o Q D and t o Wo rk L i s t ; c ← 'a': n 2 – – – n 3, n 9 end; t ← ε - cl osu r e( 𝜀 N (q, c )) end; n 3 – – – n 4, n 6 = ε - cl osu r e( 𝜀 N (n 0 ,’a')) n 4 – n 5 – – = ε - cl osu r e(n 1 ) n 5 – – – n 8 = {n 1 , n 2 ,n 3 ,n 4 ,n 6 ,n 9 } n 6 – – n 7 – 𝜀 D [n 0 ,’a'] ← {n 1 , n 2 ,n 3 ,n 4 ,n 6 ,n 9 }; n 7 – – – n 8 Q D ← {{n 0 },{n 1 ,n 2 ,n 3 ,n 4 ,n 6 ,n 9 }}; n 8 – – – n 3, n 9 Wo rk L i s t ← n 9 – – – – {{n 1 ,n 2 ,n 3 ,n 4 ,n 6 ,n 9 }}; Compiler Construction 04: Lexical analysis in the real world � 10
Subset construction example ε q 0 ← ε - cl osu r e({n 0 }); b n 5 ε ε n 4 Q D ← q 0 ; ε ε n 1 ε n 0 a Wo rk L i s t ← {q 0 }; n 2 n 3 n 8 n 9 wh il e (Wo rk L i s t ! = ∅ ) do ε ε n 6 n 7 c r emo v e q fr om Wo rk L i s t ; ε f o r ea c h c ha r a ct e r c ∈︎ 𝛵 do t ← ε - cl osu r e( 𝜀 N (q, c )); a b c ε 𝜀 N wh il e- l oop I t e r a ti on 1: 𝜀 D [q, c ] ← t ; n 0 n 1 – – – if t ∉ Q D t hen Wo rk L i s t ← {n 0 }; n 1 – – – n 2 add t t o Q D and t o Wo rk L i s t ; q ← n 0 ; n 2 – – – n 3, n 9 end; c ← 'b',' c ': end; n 3 – – – n 4, n 6 t ← {} n 4 – n 5 – – no c han g e t o Q D , Wo rkli s t n 5 – – – n 8 We will skip the iterations n 6 – – n 7 – of the for loop that do not n 7 – – – n 8 from now on n 8 – – – n 3, n 9 change Q D n 9 – – – – Compiler Construction 04: Lexical analysis in the real world � 11
Subset construction example ε q 0 ← ε - cl osu r e({n 0 }); b n 5 ε ε n 4 Q D ← q 0 ; ε ε n 1 ε n 0 a Wo rk L i s t ← {q 0 }; n 2 n 3 n 8 n 9 wh il e (Wo rk L i s t ! = ∅ ) do ε ε n 6 n 7 c r emo v e q fr om Wo rk L i s t ; ε f o r ea c h c ha r a ct e r c ∈︎ 𝛵 do t ← ε - cl osu r e( 𝜀 N (q, c )); a b c ε 𝜀 N wh il e- l oop I t e r a ti on 2 𝜀 D [q, c ] ← t ; n 0 n 1 – – – Wo rk L i s t = {{n 1 ,n 2 ,n 3 ,n 4 ,n 6 ,n 9 }}; if t ∉ Q D t hen q ← {n 1 ,n 2 ,n 3 ,n 4 ,n 6 ,n 9 }; n 1 – – – n 2 add t t o Q D and t o Wo rk L i s t ; c ← 'b': n 2 – – – n 3, n 9 end; t ← ε - cl osu r e( 𝜀 N (q, c )) end; n 3 – – – n 4, n 6 = ε - cl osu r e( 𝜀 N (q,’b’)) n 4 – n 5 – – = ε - cl osu r e(n 5 ) n 5 – – – n 8 = {n 5 , n 8 ,n 9 ,n 3 ,n 4 ,n 6 } n 6 – – n 7 – 𝜀 D [q,’a'] ← {n 5 , n 8 ,n 9 ,n 3 ,n 4 ,n 6 }; n 7 – – – n 8 Q D ← {{n 0 },{n 1 ,n 2 ,n 3 ,n 4 ,n 6 ,n 9 }, n 8 – – – n 3, n 9 {n 5 , n 8 ,n 9 ,n 3 ,n 4 ,n 6 }}; n 9 – – – – Wo rk L i s t ← {{n 5 , n 8 ,n 9 ,n 3 ,n 4 ,n 6 }}; Compiler Construction 04: Lexical analysis in the real world � 12
Subset construction example ε q 0 ← ε - cl osu r e({n 0 }); b n 5 ε ε n 4 Q D ← q 0 ; ε ε n 1 ε n 0 a Wo rk L i s t ← {q 0 }; n 2 n 3 n 8 n 9 wh il e (Wo rk L i s t ! = ∅ ) do ε ε n 6 n 7 c r emo v e q fr om Wo rk L i s t ; ε f o r ea c h c ha r a ct e r c ∈︎ 𝛵 do t ← ε - cl osu r e( 𝜀 N (q, c )); a b c ε 𝜀 N wh il e- l oop I t e r a ti on 2 𝜀 D [q, c ] ← t ; n 0 n 1 – – – Wo rk L i s t = {{n 1 ,n 2 ,n 3 ,n 4 ,n 6 ,n 9 }}; if t ∉ Q D t hen q ← {n 1 ,n 2 ,n 3 ,n 4 ,n 6 ,n 9 }; n 1 – – – n 2 add t t o Q D and t o Wo rk L i s t ; c ← ' c ': n 2 – – – n 3, n 9 end; t ← ε - cl osu r e( 𝜀 N (q, c )) end; n 3 – – – n 4, n 6 = ε - cl osu r e( 𝜀 N (q,’ c ’)) n 4 – n 5 – – = ε - cl osu r e(n 7 ) n 5 – – – n 8 = {n 7 , n 8 ,n 9 ,n 3 ,n 4 ,n 6 } n 6 – – n 7 – 𝜀 D [q,’a’] ← {n 7 , n 8 ,n 9 ,n 3 ,n 4 ,n 6 }; n 7 – – – n 8 Q D ← {{n 0 },{n 1 ,n 2 ,n 3 ,n 4 ,n 6 ,n 9 }, n 8 – – – n 3, n 9 {n 5 , n 8 ,n 9 ,n 3 ,n 4 ,n 6 }, n 9 – – – – {n 7 , n 8 ,n 9 ,n 3 ,n 4 ,n 6 }}; Wo rk L i s t ← {{n 7 , n 8 ,n 9 ,n 3 ,n 4 ,n 6 }}; Compiler Construction 04: Lexical analysis in the real world � 13
Subset construction example ε q 0 ← ε - cl osu r e({n 0 }); b n 5 ε ε n 4 Q D ← q 0 ; ε ε n 1 ε n 0 a Wo rk L i s t ← {q 0 }; n 2 n 3 n 8 n 9 wh il e (Wo rk L i s t ! = ∅ ) do ε ε n 6 n 7 c r emo v e q fr om Wo rk L i s t ; ε f o r ea c h c ha r a ct e r c ∈︎ 𝛵 do t ← ε - cl osu r e( 𝜀 N (q, c )); a b c ε 𝜀 N wh il e- l oop I t e r a ti on 3 𝜀 D [q, c ] ← t ; n 0 n 1 – – – Wo rk L i s t = {{n 7 , n 8 ,n 9 ,n 3 ,n 4 ,n 6 }}; if t ∉ Q D t hen q ← {n 7 , n 8 ,n 9 ,n 3 ,n 4 ,n 6 }; n 1 – – – n 2 add t t o Q D and t o Wo rk L i s t ; c ← 'b',' c ': n 2 – – – n 3, n 9 end; t ← ε - cl osu r e( 𝜀 N (q, c )) end; n 3 – – – n 4, n 6 = ε - cl osu r e( 𝜀 N (q,’ c ’)) n 4 – n 5 – – = ε - cl osu r e(n 5 ,n 7 ) n 5 – – – n 8 // we r an a r ound t he gr aph on c e! n 6 – – n 7 – No new states are added n 7 – – – n 8 in this and the n 8 – – – n 3, n 9 following iteration! to Q D n 9 – – – – Compiler Construction 04: Lexical analysis in the real world � 14
Recommend
More recommend