compiler development cmpsc 401
play

Compiler Development (CMPSC 401) Lexical Analysis Janyl Jumadinova - PowerPoint PPT Presentation

Compiler Development (CMPSC 401) Lexical Analysis Janyl Jumadinova January 29, 2019 Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 1 / 40 Automaton and Regular Expressions Deterministic Finite Automata (DFAs),


  1. Compiler Development (CMPSC 401) Lexical Analysis Janyl Jumadinova January 29, 2019 Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 1 / 40

  2. Automaton and Regular Expressions Deterministic Finite Automata (DFAs), Non-deterministic Finite Automata (NFAs) and REs have same expressive power i.e. allow precisely same patterns/sets to be specified. For every DFA there is an equivalent RE DFA RE For every NFA For every RE there is there is an an equivalent NFA equivalent DFA NFA Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 2 / 40

  3. Finite State Automaton A finite automaton is a machine that has a finite number of states and a finite number of transitions between these. One marked as initial state. One or more marked as final states. States sometimes labeled or numbered. Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 3 / 40

  4. Finite State Automaton A finite automaton is a machine that has a finite number of states and a finite number of transitions between these. One marked as initial state. One or more marked as final states. States sometimes labeled or numbered. A set of transitions from state to state. Each labeled with symbol from � (the alphabet), or ε . The symbols correspond to characters in the input string. Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 3 / 40

  5. Example Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 4 / 40

  6. Example Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 5 / 40

  7. Example Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 6 / 40

  8. Example Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 7 / 40

  9. Example Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 8 / 40

  10. Example Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 9 / 40

  11. Example Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 10 / 40

  12. Example Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 11 / 40

  13. Example Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 12 / 40

  14. Example Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 13 / 40

  15. Example Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 14 / 40

  16. Example Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 15 / 40

  17. Example Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 16 / 40

  18. Example Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 17 / 40

  19. Finite State Automaton Operate by reading input symbols (usually characters). Transition can be taken if labeled with current symbol. ε -transition can be taken at any time. Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 18 / 40

  20. Finite State Automaton Operate by reading input symbols (usually characters). Transition can be taken if labeled with current symbol. ε -transition can be taken at any time. Accept when final state reached and no more input. Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 18 / 40

  21. Finite State Automaton Operate by reading input symbols (usually characters). Transition can be taken if labeled with current symbol. ε -transition can be taken at any time. Accept when final state reached and no more input. Slightly different in a scanner, where the FSA is used as a subroutine to find the longest input string that matches a token RE. Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 18 / 40

  22. Finite State Automaton Operate by reading input symbols (usually characters). Transition can be taken if labeled with current symbol. ε -transition can be taken at any time. Accept when final state reached and no more input. Slightly different in a scanner, where the FSA is used as a subroutine to find the longest input string that matches a token RE. Reject if no transition possible, or no more input and not in final state. Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 18 / 40

  23. A More Complex Automaton Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 19 / 40

  24. A More Complex Automaton Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 20 / 40

  25. A More Complex Automaton Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 21 / 40

  26. A More Complex Automaton Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 22 / 40

  27. A More Complex Automaton Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 23 / 40

  28. A More Complex Automaton Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 24 / 40

  29. A More Complex Automaton Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 25 / 40

  30. A More Complex Automaton Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 26 / 40

  31. A More Complex Automaton Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 27 / 40

  32. A More Complex Automaton Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 28 / 40

  33. A More Complex Automaton Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 29 / 40

  34. A More Complex Automaton Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 30 / 40

  35. A More Complex Automaton Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 31 / 40

  36. DFA vs. NFA Deterministic Finite Automata (DFA) No choice of which transition to make. Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 32 / 40

  37. DFA vs. NFA Deterministic Finite Automata (DFA) No choice of which transition to make. Non-deterministic Finite Automata (NFA) Choice of transition in at least one case. Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 32 / 40

  38. DFA vs. NFA Deterministic Finite Automata (DFA) No choice of which transition to make. Non-deterministic Finite Automata (NFA) Choice of transition in at least one case. ε transitions (arcs): If the current state has any outgoing ε arcs, we can follow any of them without consuming any input. Modeling choice option 1: guess path, backtrack if rejects Option 2: “clone” at choice point, accept if any clone accepts. Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 32 / 40

  39. Simulating an NFA For each character in the input: For each current state: - Follow all transitions labeled with the current letter. - Add these states to the set of new states. Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 33 / 40

  40. Simulating an NFA For each character in the input: For each current state: - Follow all transitions labeled with the current letter. - Add these states to the set of new states. Add every state reachable by an ε -move to the set of next states. Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 33 / 40

  41. Simulating an NFA For each character in the input: For each current state: - Follow all transitions labeled with the current letter. - Add these states to the set of new states. Add every state reachable by an ε -move to the set of next states. Accept if some way to reach a final state on given input. Reject if no possible way to final state. Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 33 / 40

  42. FAs in Scanners Want DFA for speed (no backtracking or cloning). But conversion from regular expressions to NFA is easier. Luckily, there is a well-defined procedure for converting an NFA to an equivalent DFA. Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 34 / 40

  43. Usefulness of RE to NFA Construction Lexical Analysis Specify language tokens (identifiers, numerical constants, symbols etc.) as REs. Tools like lex automatically generate automaton-based code to decompose source code into constituent tokens. Pattern Matching e.g. text editors, grep Pattern specified as RE. Automaton-based search locates occurrences. Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 35 / 40

  44. Lexical Analysis Generators Generates analyzer automatically from “descriptions” (regular expressions/ NFAs) of tokens in the programming language. Examples: lex/flex for C jFlex for Java Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 36 / 40

  45. Terminology A token is a group of characters having collective meaning. A lexeme is an actual character sequence forming a specific instance of a token, such as num . A pattern is a rule expressed as a regular expression and describing how a particular token can be formed. For example, [A-Za-z][A-Za-z 0-9]* is a rule. Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 37 / 40

  46. (jF)Lex Input: description of token structure (regular expressions) information on how to “process” different tokens Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 38 / 40

Recommend


More recommend