compiler construction
play

Compiler Construction Lecture 4: Lexical analysis in the real world - PowerPoint PPT Presentation

Compiler Construction Lecture 4: Lexical analysis in the real world 2020-01-17 Michael Engel Includes material by Jan Christian Meyer Overview NFA to DFA conversion Subset construction algorithm DFA state minimization:


  1. Compiler Construction Lecture 4: Lexical analysis in the real world 2020-01-17 Michael Engel Includes material by Jan Christian Meyer

  2. Overview • NFA to DFA conversion • Subset construction algorithm • DFA state minimization: • Hopcroft's algorithm • Myhill-Nerode method • Using a scanner generator • lex syntax and usage • lex examples Compiler Construction 04: Lexical analysis in the real world � 2

  3. What have we achieved so far? • We know a method to convert a regular expression: 
 (all | and) 
 into a nondeterministic finite automaton (NFA): l a l a d n using the McNaughton, Thompson and Yamada algorithm Compiler Construction 04: Lexical analysis in the real world � 3

  4. Overhead of constructed NFAs Let’s look at another example: a(b|c)* • Construct the simple NFAs for a , b and c a b c s 1 s 0 s 2 s 4 s 3 s 5 • Construct the NFA for b|c b s 3 ε s 2 ε s 6 s 7 ε ε s 5 s 4 c Compiler Construction 04: Lexical analysis in the real world � 4

  5. Overhead of constructed NFAs • Now construct the NFA for (b|c)* ε b s 3 s 2 ε ε ε ε s 8 s 6 s 7 s 9 ε ε s 5 s 4 c ε • Looks pretty complex already? We're not even finished… Compiler Construction 04: Lexical analysis in the real world � 5

  6. Overhead of constructed NFAs • Finally, construct the NFA for a(b|c)* ε b ε s 3 ε s 2 a ε ε ε s 0 s 1 s 8 s 6 s 7 s 9 ε ε s 5 s 4 c ε • This NFA has many more states than a minimal human-built DFA: b,c a s 1 s 0 Compiler Construction 04: Lexical analysis in the real world � 6

  7. From NFA to DFA • An NFA is not really helpful 
 …since its implementation is not obvious • We know: every DFA is also an NFA (without ε -transitions) • Every NFA can also be converted to an equivalent DFA 
 (this can be proven by induction, we just show the construction) • The method to do this is called subset construction: The alphabet 𝛵 stays the same NFA: ( Q N , 𝛵 , 𝜀 N , n 0 , F N ) The set of states Q N , 
 transition function 𝜀 N , 
 start state q N0 and set of accepting states F N DFA: ( Q D , 𝛵 , 𝜀 D , d 0 , F D ) are modified Compiler Construction 04: Lexical analysis in the real world � 7

  8. Subset construction algorithm Idea of the algorithm: q 0 ← ε - cl osu r e({n 0 }); Find sets of states that are Q D ← q 0 ; 
 equivalent (due to ε - Wo rk L i s t ← {q 0 }; transitions) and join these to form states of a DFA wh il e (Wo rk L i s t ! = ∅ ) do r emo v e q fr om Wo rk L i s t ; ε -closure: f o r ea c h c ha r a ct e r c ∈︎ 𝛵 do 
 contains a set of states S and t ← ε - cl osu r e( 𝜀 N (q, c )); any states in the NFA that can 𝜀 D [q, c ] ← t ; be reached from one of the if t ∉ Q D t hen 
 states in S along paths that add t t o Q D and t o Wo rk L i s t ; contain only ε -transitions end; (these are identical to a state end; in S ) Compiler Construction 04: Lexical analysis in the real world � 8

  9. Subset construction example ε q 0 ← ε - cl osu r e({n 0 }); b n 5 ε ε n 4 Q D ← q 0 ; 
 ε ε n 1 ε n 0 a Wo rk L i s t ← {q 0 }; n 2 n 3 n 8 n 9 wh il e (Wo rk L i s t ! = ∅ ) do ε ε n 6 n 7 c r emo v e q fr om Wo rk L i s t ; ε f o r ea c h c ha r a ct e r c ∈︎ 𝛵 do 
 t ← ε - cl osu r e( 𝜀 N (q, c )); a b c ε 𝜀 N 𝜀 D [q, c ] ← t ; n 0 n 1 – – – if t ∉ Q D t hen 
 n 1 – – – n 2 add t t o Q D and t o Wo rk L i s t ; n 2 – – – n 3, n 9 end; q 0 ← {n 0 } 
 end; n 3 – – – n 4, n 6 Q D ← {n 0 }; 
 n 4 – n 5 – – Wo rk L i s t ← {n 0 }; n 5 – – – n 8 n 6 – – n 7 – n 7 – – – n 8 n 8 – – – n 3, n 9 n 9 – – – – Compiler Construction 04: Lexical analysis in the real world � 9

  10. Subset construction example ε q 0 ← ε - cl osu r e({n 0 }); b n 5 ε ε n 4 Q D ← q 0 ; 
 ε ε n 1 ε n 0 a Wo rk L i s t ← {q 0 }; n 2 n 3 n 8 n 9 wh il e (Wo rk L i s t ! = ∅ ) do ε ε n 6 n 7 c r emo v e q fr om Wo rk L i s t ; ε f o r ea c h c ha r a ct e r c ∈︎ 𝛵 do 
 t ← ε - cl osu r e( 𝜀 N (q, c )); a b c ε 𝜀 N wh il e- l oop I t e r a ti on 1 𝜀 D [q, c ] ← t ; n 0 n 1 – – – Wo rk L i s t ← {{n 0 }}; if t ∉ Q D t hen 
 q ← n 0 ; n 1 – – – n 2 add t t o Q D and t o Wo rk L i s t ; c ← 'a': n 2 – – – n 3, n 9 end; t ← ε - cl osu r e( 𝜀 N (q, c )) end; n 3 – – – n 4, n 6 = ε - cl osu r e( 𝜀 N (n 0 ,’a')) 
 n 4 – n 5 – – = ε - cl osu r e(n 1 ) n 5 – – – n 8 = {n 1 , n 2 ,n 3 ,n 4 ,n 6 ,n 9 } n 6 – – n 7 – 𝜀 D [n 0 ,’a'] ← {n 1 , n 2 ,n 3 ,n 4 ,n 6 ,n 9 }; n 7 – – – n 8 Q D ← {{n 0 },{n 1 ,n 2 ,n 3 ,n 4 ,n 6 ,n 9 }}; n 8 – – – n 3, n 9 Wo rk L i s t ← n 9 – – – – {{n 1 ,n 2 ,n 3 ,n 4 ,n 6 ,n 9 }}; Compiler Construction 04: Lexical analysis in the real world � 10

  11. Subset construction example ε q 0 ← ε - cl osu r e({n 0 }); b n 5 ε ε n 4 Q D ← q 0 ; 
 ε ε n 1 ε n 0 a Wo rk L i s t ← {q 0 }; n 2 n 3 n 8 n 9 wh il e (Wo rk L i s t ! = ∅ ) do ε ε n 6 n 7 c r emo v e q fr om Wo rk L i s t ; ε f o r ea c h c ha r a ct e r c ∈︎ 𝛵 do 
 t ← ε - cl osu r e( 𝜀 N (q, c )); a b c ε 𝜀 N wh il e- l oop I t e r a ti on 1: 𝜀 D [q, c ] ← t ; n 0 n 1 – – – if t ∉ Q D t hen 
 Wo rk L i s t ← {n 0 }; n 1 – – – n 2 add t t o Q D and t o Wo rk L i s t ; q ← n 0 ; n 2 – – – n 3, n 9 end; c ← 'b',' c ': end; n 3 – – – n 4, n 6 t ← {} n 4 – n 5 – – no c han g e t o Q D , Wo rkli s t n 5 – – – n 8 We will skip the iterations n 6 – – n 7 – of the for loop that do not 
 n 7 – – – n 8 from now on n 8 – – – n 3, n 9 change Q D n 9 – – – – Compiler Construction 04: Lexical analysis in the real world � 11

  12. Subset construction example ε q 0 ← ε - cl osu r e({n 0 }); b n 5 ε ε n 4 Q D ← q 0 ; 
 ε ε n 1 ε n 0 a Wo rk L i s t ← {q 0 }; n 2 n 3 n 8 n 9 wh il e (Wo rk L i s t ! = ∅ ) do ε ε n 6 n 7 c r emo v e q fr om Wo rk L i s t ; ε f o r ea c h c ha r a ct e r c ∈︎ 𝛵 do 
 t ← ε - cl osu r e( 𝜀 N (q, c )); a b c ε 𝜀 N wh il e- l oop I t e r a ti on 2 𝜀 D [q, c ] ← t ; n 0 n 1 – – – Wo rk L i s t = {{n 1 ,n 2 ,n 3 ,n 4 ,n 6 ,n 9 }}; if t ∉ Q D t hen 
 q ← {n 1 ,n 2 ,n 3 ,n 4 ,n 6 ,n 9 }; n 1 – – – n 2 add t t o Q D and t o Wo rk L i s t ; c ← 'b': n 2 – – – n 3, n 9 end; t ← ε - cl osu r e( 𝜀 N (q, c )) end; n 3 – – – n 4, n 6 = ε - cl osu r e( 𝜀 N (q,’b’)) 
 n 4 – n 5 – – = ε - cl osu r e(n 5 ) n 5 – – – n 8 = {n 5 , n 8 ,n 9 ,n 3 ,n 4 ,n 6 } n 6 – – n 7 – 𝜀 D [q,’a'] ← {n 5 , n 8 ,n 9 ,n 3 ,n 4 ,n 6 }; n 7 – – – n 8 Q D ← {{n 0 },{n 1 ,n 2 ,n 3 ,n 4 ,n 6 ,n 9 }, 
 n 8 – – – n 3, n 9 {n 5 , n 8 ,n 9 ,n 3 ,n 4 ,n 6 }}; n 9 – – – – Wo rk L i s t ← {{n 5 , n 8 ,n 9 ,n 3 ,n 4 ,n 6 }}; Compiler Construction 04: Lexical analysis in the real world � 12

  13. Subset construction example ε q 0 ← ε - cl osu r e({n 0 }); b n 5 ε ε n 4 Q D ← q 0 ; 
 ε ε n 1 ε n 0 a Wo rk L i s t ← {q 0 }; n 2 n 3 n 8 n 9 wh il e (Wo rk L i s t ! = ∅ ) do ε ε n 6 n 7 c r emo v e q fr om Wo rk L i s t ; ε f o r ea c h c ha r a ct e r c ∈︎ 𝛵 do 
 t ← ε - cl osu r e( 𝜀 N (q, c )); a b c ε 𝜀 N wh il e- l oop I t e r a ti on 2 𝜀 D [q, c ] ← t ; n 0 n 1 – – – Wo rk L i s t = {{n 1 ,n 2 ,n 3 ,n 4 ,n 6 ,n 9 }}; if t ∉ Q D t hen 
 q ← {n 1 ,n 2 ,n 3 ,n 4 ,n 6 ,n 9 }; n 1 – – – n 2 add t t o Q D and t o Wo rk L i s t ; c ← ' c ': n 2 – – – n 3, n 9 end; t ← ε - cl osu r e( 𝜀 N (q, c )) end; n 3 – – – n 4, n 6 = ε - cl osu r e( 𝜀 N (q,’ c ’)) 
 n 4 – n 5 – – = ε - cl osu r e(n 7 ) n 5 – – – n 8 = {n 7 , n 8 ,n 9 ,n 3 ,n 4 ,n 6 } n 6 – – n 7 – 𝜀 D [q,’a’] ← {n 7 , n 8 ,n 9 ,n 3 ,n 4 ,n 6 }; n 7 – – – n 8 Q D ← {{n 0 },{n 1 ,n 2 ,n 3 ,n 4 ,n 6 ,n 9 }, 
 n 8 – – – n 3, n 9 {n 5 , n 8 ,n 9 ,n 3 ,n 4 ,n 6 }, n 9 – – – – {n 7 , n 8 ,n 9 ,n 3 ,n 4 ,n 6 }}; Wo rk L i s t ← {{n 7 , n 8 ,n 9 ,n 3 ,n 4 ,n 6 }}; Compiler Construction 04: Lexical analysis in the real world � 13

  14. Subset construction example ε q 0 ← ε - cl osu r e({n 0 }); b n 5 ε ε n 4 Q D ← q 0 ; 
 ε ε n 1 ε n 0 a Wo rk L i s t ← {q 0 }; n 2 n 3 n 8 n 9 wh il e (Wo rk L i s t ! = ∅ ) do ε ε n 6 n 7 c r emo v e q fr om Wo rk L i s t ; ε f o r ea c h c ha r a ct e r c ∈︎ 𝛵 do 
 t ← ε - cl osu r e( 𝜀 N (q, c )); a b c ε 𝜀 N wh il e- l oop I t e r a ti on 3 𝜀 D [q, c ] ← t ; n 0 n 1 – – – Wo rk L i s t = {{n 7 , n 8 ,n 9 ,n 3 ,n 4 ,n 6 }}; if t ∉ Q D t hen 
 q ← {n 7 , n 8 ,n 9 ,n 3 ,n 4 ,n 6 }; n 1 – – – n 2 add t t o Q D and t o Wo rk L i s t ; c ← 'b',' c ': n 2 – – – n 3, n 9 end; t ← ε - cl osu r e( 𝜀 N (q, c )) end; n 3 – – – n 4, n 6 = ε - cl osu r e( 𝜀 N (q,’ c ’)) 
 n 4 – n 5 – – = ε - cl osu r e(n 5 ,n 7 ) n 5 – – – n 8 // we r an a r ound t he gr aph on c e! n 6 – – n 7 – No new states are added n 7 – – – n 8 in this and the 
 n 8 – – – n 3, n 9 following iteration! to Q D n 9 – – – – Compiler Construction 04: Lexical analysis in the real world � 14

Recommend


More recommend