lexical analysis lexical analysis
play

Lexical analysis Lexical analysis Lexical analysis checks the - PowerPoint PPT Presentation

Lexical analysis Lexical analysis Lexical analysis checks the correctness of program words and transforms a program to the stream of tokens: removes empty symbols and commentaries; identifies keywords, indentifiers and literal


  1. Lexical analysis

  2. Lexical analysis Lexical analysis checks the correctness of program words and transforms a program to the stream of tokens: – removes empty symbols and commentaries; – identifies keywords, indentifiers and literal constants; – constructs a symbol table; – finds line/column numbers of symbols; – informs about lexical errors when necessary. Lexical analysis is also called scanning and the corresponding analyser is called scanner.

  3. Regular expressions Regular expressions over (finite) alphabet ✝ ❊ ✿✿❂ ❀ ❥ ✧ ❥ ❛ ❥ ✭ ❊ ❊ ✮ ❥ ✭ ❊ ❥ ❊ ✮ ❥ ❊ ❄ where ❛ ✷ ✝ . Regular expression ❊ defines a language ▲ ✭ ❊ ✮ ✒ ✝ ❄ ▲ ✭ ❀ ✮ ❂ ❀ ▲ ✭ ❊ ✶ ❊ ✷ ✮ ❂ ❢ ✉✈ ❥ ✉ ✷ ▲ ✭ ❊ ✶ ✮ ❀ ✈ ✷ ▲ ✭ ❊ ✷ ✮ ❣ ▲ ✭ ✧ ✮ ❂ ❢ ✧ ❣ ▲ ✭ ❊ ✶ ❥ ❊ ✷ ✮ ❂ ▲ ✭ ❊ ✶ ✮ ❬ ▲ ✭ ❊ ✷ ✮ ❢ ✇ ✐ ❥ ✇ ✷ ▲ ✭ ❊ ✮ ❀ ✐ ✕ ✵ ❣ ▲ ✭ ❊ ❄ ✮ ▲ ✭ ❛ ✮ ❂ ❢ ❛ ❣ ❂ where ✇ ✵ ❂ ✧ and ✇ ♥ ✰✶ ❂ ✇✇ ♥ .

  4. Regular expressions Examples: Regular expression Defined language ❛ ❥ ❜ ❢ ❛❀ ❜ ❣ ❢ ❛❜❜❛ ❣ ❛❜❜❛ ❛❜ ❄ ❛ ❢ ❛❛❀ ❛❜❛❀ ❛❜❜❛❀ ❛❜❜❜❛❀ ✿ ✿ ✿ ❣ ✭ ❛❜ ✮ ❄ ❢ ✧❀ ❛❜❀ ❛❜❛❜❀ ❛❜❛❜❛❜❀ ✿ ✿ ✿ ❣ To minimize a number of needed parentheses, operators have priorities: – the closure operator ✭ ✁ ✮ ❄ has highest priority; – the choice operator ✭ ✁ ❥ ✁ ✮ has lowest priority.

  5. Regular expressions A regular description over alphabet ✝ is the set of rules ✦ ❞ ✶ ❊ ✶ ❞ ✷ ✦ ❊ ✷ ✿ ✿ ✿ ❞ ♥ ✦ ❊ ♥ where ❞ ✐ is a (unique) name and ❊ ✐ is a regular expression over alphabet ✝ ❬ ❢ ❞ ✶ ❀ ✿ ✿ ✿ ❀ ❞ ✐ � ✶ ❣ . Short-hand notation for regular expressions: – nonempty closure : ❊ ✰ ❂ ❊❊ ❄ ; – option : ❊ ❄ ❂ ✧ ❥ ❊ ; – character classes : eg. ❬ ❛❀ ❜❀ ❝ ❪ ❂ ❛ ❥ ❜ ❥ ❝ or ❬ ❛ � ③ ❪ ❂ ❛ ❥ ✿ ✿ ✿ ❥ ③ .

  6. Regular expressions Examples of regular descriptions: Identifiers: Letter ✦ ❬ ❛ � ③❀ ❆ � ❩ ❪ ✦ ❬✵ � ✾❪ Digit Letter ✭ Letter ❥ Digit ✮ ❄ Identifier ✦ Numeric constants: Sign ✦ ✭✰ ❥ � ✮❄ ✵ ❥ Sign ❬✶ � ✾❪ Digit ❄ Integer ✦ Integer ✿ Digit ✰ Decimal ✦ Real ✦ ✭ Integer ❥ Decimal ✮ ❊ Integer

  7. Finite automata A finite automaton is the quintuple ❆ ❂ ❤ ◗❀ ✝ ❀ ✍❀ q ✵ ❀ ❋ ✐ , where – ◗ is a finite set of states; – ✝ is the finite alphabet; – ✍ ✒ ◗ ✂ ✭✝ ❬ ✧ ✮ ✂ ◗ is the transition relation; – q ✵ ✷ ◗ is the initial state; – ❋ ✒ ◗ is a set of final states. A finite automaton is deterministic (DFA), if the transition relation is a function ✍ ✿ ◗ ✂ ✝ ✦ ◗ . Otherwise, the finite automaton is nondeterministic (NFA).

  8. Finite automata Finite automata can be represented by state transition diagrams: ❜ ❛ ❛ q ✵ q ✶ q ✷ The finite automaton ❆ ❂ ❤ ◗❀ ✝ ❀ ✍❀ q ✵ ❀ ❋ ✐ accepts the language ▲ ✭ ❆ ✮ ❂ ❢ ✇ ✷ ✝ ❄ ❥ ✭ q ✵ ❀ ✇❀ q ❢ ✮ ✷ ✍ ❄ ❀ q ❢ ✷ ❋ ❣ where ✍ ❄ ✒ ◗ ✂ ✝ ❄ ✂ ◗ is a reflexive and transitive closure of the transition relation ✍ . Theorem: The class of languages accepted by finite automata is that of regular languages.

  9. Converting a regular expression to an automaton Thompson’s construction for converting a regular expression to NFA: for a regular expression ❊ construct the ”automaton”: ❊ q ✵ q ❢ transform the ”automaton” using following rules until all transitions have only simple labels (ie. ✧ or a character): ❊ ✶ ❊ ✷ ❊ ✶ ❊ ✷ q ♣ q q ✶ ♣ ❊ ✶ ❊ ✶ ❥ ❊ ✷ q ♣ q ♣ ❊ ✷ ❊ ❊ ❄ ✧ ✧ q ♣ q q ✶ q ✷ ♣ ✧ ✧

  10. Converting a regular expression to an automaton Example: ❛ ✭ ❛ ❥ ❜ ✮ ❄ ✭ ❛ ❥ ❜ ✮ ❄ ❛ q ✵ q ❢ q ✵ q ✶ q ❢ ❛ ❥ ❜ ❛ ✧ ✧ q ❢ q ✵ q ✶ q ✷ q ✸ ✧ ✧ ❛ ❛ ✧ ❜ ✧ q ✵ q ✶ q ✷ q ✸ q ❢ ✧ ✧

  11. Converting a regular expression to an automaton Example: ❛ ✭ ❛ ❥ ❜ ✮ ❄ ✭ ❛ ❥ ❜ ✮ ❄ ❛ q ✵ q ❢ q ✵ q ✶ q ❢ ❛ ❥ ❜ ❛ ✧ ✧ q ❢ q ✵ q ✶ q ✷ q ✸ ✧ ✧ ❛ ❛ ✧ ❜ ✧ q ✵ q ✶ q ✷ q ✸ q ❢ ✧ ✧

  12. Converting a regular expression to an automaton Example: ❛ ✭ ❛ ❥ ❜ ✮ ❄ ✭ ❛ ❥ ❜ ✮ ❄ ❛ q ✵ q ❢ q ✵ q ✶ q ❢ ❛ ❥ ❜ ❛ ✧ ✧ q ❢ q ✵ q ✶ q ✷ q ✸ ✧ ✧ ❛ ❛ ✧ ❜ ✧ q ✵ q ✶ q ✷ q ✸ q ❢ ✧ ✧

  13. Converting a regular expression to an automaton Example: ❛ ✭ ❛ ❥ ❜ ✮ ❄ ✭ ❛ ❥ ❜ ✮ ❄ ❛ q ✵ q ❢ q ✵ q ✶ q ❢ ❛ ❥ ❜ ❛ ✧ ✧ q ❢ q ✵ q ✶ q ✷ q ✸ ✧ ✧ ❛ ❛ ✧ ❜ ✧ q ✵ q ✶ q ✷ q ✸ q ❢ ✧ ✧

  14. Constructing DFA Given NFA ❆ ❂ ❤ ◗❀ ✝ ❀ ✍❀ q ✵ ❀ ❋ ✐ construct an equivalent DFA ❆ ✵ ❂ ❤ ◗ ✵ ❀ ✝ ❀ ✍ ✵ ❀ q ✵ ✵ ❀ ❋ ✵ ✐ by subset construction. Auxiliary functions: – the ✧ -closure function ✧ - ❝❧♦s✉r❡ ✿ ✷ ◗ ✦ ✷ ◗ ✧ - ❝❧♦s✉r❡ ✭ ❙ ✮ ❂ ❢ ♣ ❥ q ✷ ❙❀ ✭ q❀ ✧❀ ♣ ✮ ✷ ✍ ❄ ❣ – the single step function ♠♦✈❡ ✿ ✷ ◗ ✂ ✝ ✦ ✷ ◗ ♠♦✈❡ ✭ ❙❀ ❛ ✮ ❂ ❢ ♣ ❥ q ✷ ❙❀ ✭ q❀ ❛❀ ♣ ✮ ✷ ✍ ❣

  15. Constructing DFA Algorithm: ◗ ✵ ✿❂ ❀ ❀ ❋ ✵ ✿❂ ❀ ❀ ✍ ✵ ✿❂ ❀ ❀ q ✵ ✵ ✿❂ ✧ - ❝❧♦s✉r❡ ✭ ❢ q ✵ ❣ ✮❀ ❯ ✿❂ ❢ q ✵ ✵ ❣ ❀ while ✾ ❙ ✷ ❯ do ❯ ✿❂ ❯ ♥ ❙ ❀ ◗ ✵ ✿❂ ◗ ✵ ❬ ❢ ❙ ❣ ❀ foreach ❛ ✷ ✝ do ❚ ✿❂ ✧ - ❝❧♦s✉r❡ ✭ ♠♦✈❡ ✭ ❙❀ ❛ ✮✮❀ if ❚ ✻✷ ❯ ❬ ◗ ✵ then ❯ ✿❂ ❯ ❬ ❢ ❚ ❣ ❀ ✍ ✵ ✿❂ ✍ ✵ ❬ ❢ ✭ ❙❀ ❛ ✮ ✼✦ ❚ ❣ ❀ end end ❋ ✵ ✿❂ ❢ ❙ ✷ ◗ ✵ ❥ ❙ ❭ ❋ ✻ ❂ ❀❣ ❀

  16. Constructing DFA Example: ❛ ❛ ✧ ❜ ✧ q ❢ q ✵ q ✶ q ✷ q ✸ ✧ ✧

  17. Constructing DFA Example: ❛ ❛ ✧ ❜ ✧ q ❢ q ✵ q ✶ q ✷ q ✸ ✧ ✧ q ✵ ✵

  18. Constructing DFA Example: ❛ ❛ ✧ ❜ ✧ q ❢ q ✵ q ✶ q ✷ q ✸ ✧ ✧ ❛ q ✵ q ✵ ✵ ✶

  19. Constructing DFA Example: ❛ ❛ ✧ ❜ ✧ q ❢ q ✵ q ✶ q ✷ q ✸ ✧ ✧ ❛ ❛ q ✵ q ✵ q ✵ ✵ ✶ ✷ ❜

  20. Constructing DFA Example: ❛ ❛ ✧ ❜ ✧ q ❢ q ✵ q ✶ q ✷ q ✸ ✧ ✧ ❛ ❛ ❛ q ✵ q ✵ q ✵ ✵ ✶ ✷ ❜ ❜

  21. Constructing DFA Example: ❛ ❛ ✧ ❜ ✧ q ❢ q ✵ q ✶ q ✷ q ✸ ✧ ✧ ❛ ❛ ❛ q ✵ q ✵ q ✵ ✵ ✶ ✷ ❜ ❜

  22. Minimizing DFA DFA constructed from the regular expression ❛ ✭ ❛ ❥ ❜ ✮ ❄ : ❛ ❛ ❛ q ✵ q ✶ q ✷ ❜ ❜ An equivalent smaller DFA: ❛ ❛ q ✵ q ✶ ❜

  23. Minimizing DFA DFA is minimal if there is no smaller DFA accepting the same language. For every DFA ❆ ❂ ❤ ◗❀ ✝ ❀ ✍❀ q ✵ ❀ ❋ ✐ there exists an (unique) equivalent minimal DFA ❆ ✵ ❂ ❤ ◗ ✵ ❀ ✝ ❀ ✍ ✵ ❀ q ✵ ✵ ❀ ❋ ✵ ✐ . Idea: partition the set of states into equivalence classes. – States ♣❀ q ✷ ◗ are equivalent or indistinguishable if automata having these as initial states accept the same language (ie. for any word ✇ ✷ ✝ ❄ if one succeeds (resp. fails), the other one does the same, and vice versa). – For every letter, the transition function transformes equivalent states to equivalent states.

  24. Minimizing DFA Minimization algorithm: Remove all states unreachable from the initial state q ✵ . On the remaining set of states find the biggest partition ✆ into equivalence classes. Construct the new automaton ❆ ✵ ❂ ❤ ◗ ✵ ❀ ✝ ❀ ✍ ✵ ❀ q ✵ ✵ ❀ ❋ ✵ ✐ , where – the set of states is ◗ ✵ ❂ ✆ ; – the initial state is q ✵ ✵ ❂ P ✵ , where P ✵ ✷ ✆ and q ✵ ✷ P ✵ ; – the set of final states is ❋ ✵ ❂ ❢ P ✷ ✆ ❥ P ❭ ❋ ✻ ❂ ❀❣ ; – the transition function is ✍ ✵ ❂ ❢ ✭ P ✐ ❀ ❛ ✮ ✼✦ P ❥ ❥ P ❥ ✷ ♠♦✈❡ ✭ P ✐ ❀ ❛ ✮ ❣ .

  25. Minimizing DFA Naive algorithm for finding partition: P ✿❂ ❢ ❋❀ ◗ ♥ ❋ ❣ ❀ do ✆ ✿❂ P ❀ P ✿❂ ❀ ❀ foreach ❙ ✷ ✆ do foreach ❛ ✷ ✝ do ❯ ✿❂ ❢ ❚ ✷ ✆ ❥ ❚ ❭ ♠♦✈❡ ✭ ❙❀ ❛ ✮ ✻ ❂ ❀❣ ❀ ❱ ✿❂ ❢ ❙ ❭ ♠♦✈❡ � ✶ ❛ ✭ ❚ ✮ ❥ ❚ ✷ ❯ ❣ ❀ P ✿❂ P ❬ ❱ ❀ end end until ✆ ❂ P ❀

  26. Minimizing DFA Naive algorithm tries to split all partition at every iteration. – In worst case has a quadradic complexity. – It is enough to consider only these partitions from which one can move to some split partition. Hopcroft’s algorithm for finding the partition: – uses work-list for non-examined split partitions; – if a partition not in the work-list is split, then only one (smaller) subpartition is put to the work-list.

Recommend


More recommend