lexical analysis
play

Lexical Analysis April 3, 2013 Wednesday, April 3, 13 Previously - PowerPoint PPT Presentation

Lexical Analysis April 3, 2013 Wednesday, April 3, 13 Previously on CSE 131b... Structure of a modern compiler Source Lexical Analysis Code Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Machine


  1. Token: Lexemes • Keyword: for int if else while • Punctuation: ( ) { } ; • Operand: + - ++ • Relation: < > = • Identifier: (variable name,function name) foo foo_2 • Integer, float point, string: 2345 2.0 “hello world” • Whitespace, comment /* this code is awesome */ Wednesday, April 3, 13

  2. Token: Lexemes • Keyword: for int if else while Finite possible lexemes • Punctuation: ( ) { } ; • Operand: + - ++ • Relation: < > = • Identifier: (variable name,function name) foo foo_2 • Integer, float point, string: 2345 2.0 “hello world” • Whitespace, comment /* this code is awesome */ Wednesday, April 3, 13

  3. Token: Lexemes • Keyword: for int if else while Finite possible lexemes • Punctuation: ( ) { } ; • Operand: + - ++ Infinite • Relation: < > = possible lexemes • Identifier: (variable name,function name) foo foo_2 • Integer, float point, string: 2345 2.0 “hello world” • Whitespace, comment /* this code is awesome */ Wednesday, April 3, 13

  4. • How do we describe which (potentially infinite) set of lexemes is associated with each token type? Wednesday, April 3, 13

  5. Formal Languages ● A formal language is a set of strings. ● Many infinite languages have finite descriptions: ● Define the language using an automaton. ● Define the language using a grammar. ● Define the language using a regular expression. ● We can use these compact descriptions of the language to define sets of strings. ● Over the course of this class, we will use all of these approaches. Wednesday, April 3, 13

  6. • What type of formal language should we use to describe tokens? Wednesday, April 3, 13

  7. Regular Expressions ● Regular expressions are a family of descriptions that can be used to capture certain languages (the regular languages ). ● Often provide a compact and human- readable description of the language. ● Used as the basis for numerous software systems, including the flex tool we will use in this course. Wednesday, April 3, 13

  8. Atomic Regular Expressions ● The regular expressions we will use in this course begin with two simple building blocks. ● The symbol ε is a regular expression matches the empty string. ● For any symbol a , the symbol a is a regular expression that just matches a . Wednesday, April 3, 13

  9. Compound Regular Expressions ● If R 1 and R 2 are regular expressions, R 1 R 2 is a regular expression represents the concatenation of the languages of R 1 and R 2 . ● If R 1 and R 2 are regular expressions, R 1 | R 2 is a regular expression representing the union of R 1 and R 2 . ● If R is a regular expression, R* is a regular expression for the Kleene closure of R. ● If R is a regular expression, (R) is a regular expression with the same meaning as R. Wednesday, April 3, 13

  10. Simple Regular Expressions ● Suppose the only characters are 0 and 1 . ● Here is a regular expression for strings containing 00 as a substring: (0 | 1)*00(0 | 1)* Wednesday, April 3, 13

  11. Simple Regular Expressions ● Suppose the only characters are 0 and 1 . ● Here is a regular expression for strings containing 00 as a substring: (0 | 1)*00(0 | 1)* Wednesday, April 3, 13

  12. Simple Regular Expressions ● Suppose the only characters are 0 and 1 . ● Here is a regular expression for strings containing 00 as a substring: (0 | 1)*00(0 | 1)* 11011100101 0000 11111011110011111 Wednesday, April 3, 13

  13. Simple Regular Expressions ● Suppose the only characters are 0 and 1 . ● Here is a regular expression for strings containing 00 as a substring: (0 | 1)*00(0 | 1)* 11011100101 0000 11111011110011111 Wednesday, April 3, 13

  14. Applied Regular Expressions ● Suppose that our alphabet is all ASCII characters. ● A regular expression for even numbers is ? (+|-)?(0|1|2|3|4|5|6|7|8|9)*(0|2|4|6|8) Wednesday, April 3, 13

  15. Applied Regular Expressions ● Suppose that our alphabet is all ASCII characters. ● A regular expression for even numbers is (+|-)?(0|1|2|3|4|5|6|7|8|9)*(0|2|4|6|8) Wednesday, April 3, 13

  16. Applied Regular Expressions ● Suppose that our alphabet is all ASCII characters. ● A regular expression for even numbers is (+|-)?(0|1|2|3|4|5|6|7|8|9)*(0|2|4|6|8) 42 +1370 -3248 -9999912 Wednesday, April 3, 13

  17. Wednesday, April 3, 13

  18. • More examples • Whitespace: [ \t\n]+ • Integers: [+\-]?[0-9]+ • Hex numbers: 0x[0-9a-f]+ • identifier Wednesday, April 3, 13

  19. • More examples • Whitespace: [ \t\n]+ • Integers: [+\-]?[0-9]+ • Hex numbers: 0x[0-9a-f]+ • identifier • [A-Za-z]([A-Za-z]|[0-9])* Wednesday, April 3, 13

  20. • Use regular expressions to describe token types • How do we match regular expressions? Wednesday, April 3, 13

  21. Recognizing Regular Language What is the machine that recognize regular language?? Wednesday, April 3, 13

  22. Recognizing Regular Language What is the machine that recognize regular language?? • Finite Automata • DFA (Deterministic Finite Automata) • NFA (Non-deterministic Finite Automata) Wednesday, April 3, 13

  23. A Simple Automaton A,B,C, ... ,Z " " start Wednesday, April 3, 13

  24. A Simple Automaton A,B,C, ... ,Z " " start Each circle is a state of the Each circle is a state of the automaton. The automaton's automaton. The automaton's configuration is determined configuration is determined by what state(s) it is in. by what state(s) it is in. Wednesday, April 3, 13

  25. A Simple Automaton A,B,C, ... ,Z " " start These arrows are called These arrows are called transitions . The automaton transitions . The automaton changes which state(s) it is in changes which state(s) it is in by following transitions. by following transitions. Wednesday, April 3, 13

  26. A Simple Automaton A,B,C, ... ,Z " " start " H E Y A " Finite Automata: Takes an input string and determines whether it’s a valid sentence of a language accept or reject Wednesday, April 3, 13

  27. A Simple Automaton A,B,C, ... ,Z " " start " H E Y A " The automaton takes a string The automaton takes a string as input and decides whether as input and decides whether to accept or reject the string. to accept or reject the string. Wednesday, April 3, 13

  28. A Simple Automaton A,B,C, ... ,Z " " start " H E Y A " Wednesday, April 3, 13

  29. A Simple Automaton A,B,C, ... ,Z " " start " H E Y A " Wednesday, April 3, 13

  30. A Simple Automaton A,B,C, ... ,Z " " start " H E Y A " Wednesday, April 3, 13

  31. A Simple Automaton A,B,C, ... ,Z " " start " H E Y A " Wednesday, April 3, 13

  32. A Simple Automaton A,B,C, ... ,Z " " start " H E Y A " Wednesday, April 3, 13

  33. A Simple Automaton A,B,C, ... ,Z " " start " H E Y A " Wednesday, April 3, 13

  34. A Simple Automaton A,B,C, ... ,Z " " start " H E Y A " Wednesday, April 3, 13

  35. A Simple Automaton A,B,C, ... ,Z " " start " H E Y A " Wednesday, April 3, 13

  36. A Simple Automaton A,B,C, ... ,Z " " start " H E Y A " Wednesday, April 3, 13

  37. A Simple Automaton A,B,C, ... ,Z " " start " H E Y A " Wednesday, April 3, 13

  38. A Simple Automaton A,B,C, ... ,Z " " start " H E Y A " The double circle indicates that this The double circle indicates that this state is an accepting state . The state is an accepting state . The automaton accepts the string if it automaton accepts the string if it ends in an accepting state. ends in an accepting state. Wednesday, April 3, 13

  39. An Even More Complex Automaton a, b c ε a, c b start ε b, c ε a Wednesday, April 3, 13

  40. An Even More Complex Automaton a, b c ε a, c b start ε b, c ε a These are called -transitions . These ε These are called -transitions . These ε transitions are followed automatically and transitions are followed automatically and without consuming any input. without consuming any input. Wednesday, April 3, 13

  41. An Even More Complex Automaton a, b c ε a, c b start ε b, c ε a b c b a Wednesday, April 3, 13

  42. An Even More Complex Automaton a, b c ε a, c b start ε b, c ε a b c b a Wednesday, April 3, 13

  43. An Even More Complex Automaton a, b c ε a, c b start ε b, c ε a b c b a Wednesday, April 3, 13

  44. An Even More Complex Automaton a, b c ε a, c b start ε b, c ε a b c b a Wednesday, April 3, 13

  45. An Even More Complex Automaton a, b c ε a, c b start ε b, c ε a b c b a Wednesday, April 3, 13

  46. An Even More Complex Automaton a, b c ε a, c b start ε b, c ε a b c b a Wednesday, April 3, 13

  47. An Even More Complex Automaton a, b c ε a, c b start ε b, c ε a b c b a Wednesday, April 3, 13

  48. An Even More Complex Automaton a, b c ε a, c b start ε b, c ε a b c b a Wednesday, April 3, 13

  49. An Even More Complex Automaton a, b c ε a, c b start ε b, c ε a b c b a Wednesday, April 3, 13

Recommend


More recommend