cse443 compilers
play

CSE443 Compilers Dr. Carl Alphonce alphonce@buffalo.edu 343 Davis - PowerPoint PPT Presentation

CSE443 Compilers Dr. Carl Alphonce alphonce@buffalo.edu 343 Davis Hall Announcements HW-01 posted PR-01 posted Team formation: what is current status? Lexical Phases of structure a compiler Figure 1.6, page 5 of text Bird's eye view


  1. CSE443 Compilers Dr. Carl Alphonce alphonce@buffalo.edu 343 Davis Hall

  2. Announcements HW-01 posted PR-01 posted Team formation: what is current status?

  3. Lexical Phases of structure a compiler Figure 1.6, page 5 of text

  4. Bird's eye view { for, while, x, factorial, … } G = (N, ∑ , P, S) grammar: rules for language: a set of strings generating language finite automaton regular expression a machine for language regex: a form of grammar C program generated by FLEX

  5. languages & grammars Formally, a grammar is defined by 4 items: 1. N, a set of non-terminals 2. ∑ , a set of terminals 3. P, a set of productions 4. S, a start symbol G = (N, ∑ , P, S)

  6. languages & grammars N, a set of non-terminals ∑ , a set of terminals (alphabet) N ∩ ∑ = {} P, a set of productions of the form (right linear) X -> a X -> aY X -> 𝜁 X ∈ N, Y ∈ N, a ∈ ∑ , 𝜁 denotes the empty string S, a start symbol S ∈ N

  7. Lexical Analysis Lexical structure described by regular grammar Deterministic finite state machine performs analysis

  8. LANGUAGE operations base cases { 𝜁 } is a regular language ∀ a ∈ ∑ , { a } is a regular language Recall, 𝜁 is the empty string

  9. LANGUAGE operations If L and M are regular, so are: L ∪ M = { s | s ∈ L or s ∈ M } union LM = { st | s ∈ L and t ∈ M } concatenation L * = ∪ i=0, ∞ L i Kleene closure L i is L concatenated with itself i times: L 0 = { 𝜁 }, by definition L 1 = L L 2 = LL L 3 = LLL, etc. L * is the union of all these sets!

  10. Example of L * Suppose L is {a, bb} L 0 = { 𝜁 }, by definition L 1 = L = {a, bb} L 2 = LL = {aa, abb, bba, bbbb} L 3 = LLL = {aaa, aabb, abba, abbbb, bbaa, bbbba, bbaa, bbabb, bbbba, bbbbbb, abbbb, bbabb} L 4 = …and so so… L * = ∪ i=0, ∞ L i = { 𝜁 , a, bb, aa, abb, bba, bbbb, aaa, aabb, abba, abbbb, bbaa, bbbba, bbaa, bbabb, bbbba, bbbbbb, abbbb, bbabb, … }

  11. Given an alphabet ∑ REGular EXpression (regex) Inductive definition 𝜁 is a regex 𝓜 ( 𝜁 ) = { 𝜁 } For each a ∈ ∑ , a is a regex 𝓜 (a) = {a}

  12. Regular expressions (regex) Inductive definition Assume r and s are regexes. r|s is a regex denoting 𝓜 (r) ∪ 𝓜 (s) rs is a regex denoting 𝓜 (r) 𝓜 (s) r * is a regex denoting ( 𝓜 (r)) * (r) is a regex denoting 𝓜 (r) Precedence: Kleene closure > concatenation > union Associativity: all left-associative (minimize use of parentheses: (r|s)|t = r|s|t )

  13. Algebraic laws Assume r and s are regexes. Commutativity r|s = s|r Associativity r|(s|t) = (r|s)|t and r(st) = (rs)t Disributivity r(s|t) = rs|rt and (s|t)r = sr|tr Identity 𝜁 r = r 𝜁 = r Idempotency r ** = r *

  14. We can describe a regular language using a regular expression

  15. A regular expression can be recognized using a finite state machine. Machines: NFA non-deterministic finite automaton DFA deterministic finite automaton

  16. Process of building lexical analyzer 1) spell out the language language

  17. Process of building lexical analyzer 2) formulate a regular expression language regex

  18. Process of building lexical analyzer 3) build an NFA language regex NFA

  19. Process of building lexical analyzer 4) transform NFA to DFA language regex NFA DFA

  20. Process of building lexical analyzer 5) transform DFA to a minimal DFA language regex NFA DFA DFA

  21. Process of building lexical analyzer 5) The minimal DFA is character our lexical analyzer stream language regex NFA DFA DFA token stream lexical analyzer

  22. Focus for today regex NFA

  23. Nondeterministic Finite Automata (NFA) A finite set of states S An alphabet ∑ , 𝜁 ∉ ∑ 𝛆 ⊆ S X ( ∑ ∪ { 𝜁 }) X 𝒬 (S) (transition function) s 0 ∈ S (a single start state) F ⊆ S (a set of final or accepting states)

  24. Deterministic Finite Automata (DFA) A finite set of states S An alphabet ∑ , 𝜁 ∉ ∑ 𝛆 ⊆ S X ∑ X S (transition function) s 0 ∈ S (a single start state) F ⊆ S (a set of final or accepting states)

  25. A state is a circle with its state number written inside. 0

  26. Initial state has an arrow from nowhere pointing in. State 0 is often the initial state. 0

  27. A final state is drawn with a double circle. 1

  28. Arrows are labeled with 𝜁 … 𝜁 1 0 … or a ∈ ∑ . a 1 0 for each a ∈ ∑

  29. Regex -> NFA 𝜁 1 0 N(s) 𝜁 𝜁 0 1 𝜁 𝜁 N(t) a 1 0 S | t for each a ∈ ∑

  30. Regex -> NFA St 0 1 N(s) N(t) 𝜁 𝜁 S * 0 1 N(s) 𝜁 𝜁

  31. Simple example static

  32. Simple example static c s t a t i 0 1 2 3 4 5 6

  33. Simple example static struct c s a t i t 0 1 2 3 4 5 6 𝜁 𝜁 i F t s t r u c 7 8 9 𝜁 𝜁 10 11 12 13

  34. Process of building lexical analyzer 5) The minimal DFA is character our lexical analyzer stream language regex NFA DFA DFA token stream lexical analyzer

  35. Focus above: build a non-deterministic recognizer regex NFA

  36. Next step: make recognizer deterministic NFA DFA

  37. (a|b) * abb first we construct an NFA from this regular expression

  38. (a|b) * abb a

  39. (a|b) * abb a b

  40. (a|b) * abb a 𝜁 𝜁 b 𝜁 𝜁

  41. (a|b) * abb 𝜁 a 𝜁 𝜁 𝜁 𝜁 𝜁 b 𝜁 𝜁

  42. (a|b) * abb 𝜁 a 𝜁 𝜁 a 𝜁 𝜁 𝜁 b 𝜁 𝜁

  43. (a|b) * abb 𝜁 a 𝜁 𝜁 a b 𝜁 𝜁 𝜁 b 𝜁 𝜁

  44. (a|b) * abb 𝜁 a 𝜁 𝜁 a b b 𝜁 𝜁 𝜁 b 𝜁 𝜁

  45. (a|b) * abb 𝜁 a 2 3 𝜁 𝜁 a b b 𝜁 0 1 6 8 7 9 10 𝜁 𝜁 b 𝜁 𝜁 4 5

  46. Operations 𝜁 -closure(t) is the set of states reachable from state t using only 𝜁 -transitions. 𝜁 -closure(T) is the set of states reachable from any state t ∈ T using only 𝜁 - transitions. move(T,a) is the set of states reachable from any state t ∈ T following a transition on symbol a ∈ ∑ .

  47. NFA -> DFA algorithm (set of states construction - page 153 of text) INPUT: An NFA N = (S, ∑ , 𝛆 , s 0 , F) OUTPUT: A DFA D = (S', ∑ , 𝛆 ', s 0 ', F') such that ℒ (D)= ℒ (N) ALGORITHM: Compute s 0 ' = 𝜁 -closure(s 0 ), an unmarked set of states Set S' = { s 0 ' } while there is an unmarked T ∈ S' mark T for each symbol a ∈ ∑ let U = 𝜁 -closure(move(T,a)) if U ∉ S', add unmarked U to S' add transition: 𝛆 '(T,a) = U F' is the subset of S' all of whose members contain a state in F .

Recommend


More recommend