cs502 compiler design lexical analysis manas thakur
play

CS502: Compiler Design Lexical Analysis Manas Thakur Fall 2020 - PowerPoint PPT Presentation

CS502: Compiler Design Lexical Analysis Manas Thakur Fall 2020 Lets get started Character stream Machine-Independent Machine-Independent Lexical Analyzer Lexical Analyzer Code Optimizer Code Optimizer B a c k e n d Intermediate


  1. CS502: Compiler Design Lexical Analysis Manas Thakur Fall 2020

  2. Let’s get started Character stream Machine-Independent Machine-Independent Lexical Analyzer Lexical Analyzer Code Optimizer Code Optimizer B a c k e n d Intermediate representation Token stream F r o n t e n d Syntax Analyzer Code Generator Syntax Analyzer Code Generator Target machine code Syntax tree Machine-Dependent Machine-Dependent Semantic Analyzer Semantic Analyzer Code Optimizer Code Optimizer Syntax tree Target machine code Intermediate Intermediate Symbol Code Generator Code Generator Table Intermediate representation Manas Thakur CS502: Compiler Design 2

  3. Lexical Analysis ● Also called scanning ● Corresponding component called lexical analyzer or scanner ● Roles: – Read input characters – Group into tokens (also called lexemes ) – Return stream of tokens ● To whom? – Usually the parser – Sometimes ● Remove whitespace ● Remove comments ● Record information (such as line number ) into symbol table ● Report errors Manas Thakur CS502: Compiler Design 3

  4. Characters to tokens ● Input program: if (a>b) x = 0; else x = 1; – Basically a sequence of characters ● Actual input: – \tif (a>b)\n\t\tx = 0;\n\telse\n\t\tx = 1; ● Goal of lexical analyzer: – Partition input stream into substrings (tokens) and classify them according to their roles (types). Manas Thakur CS502: Compiler Design 4

  5. Identifying and classifying tokens: Example ● Input: – \tif (a>b)\n\t\tx = 0;\n\telse\n\t\tx = 1; ● Say we have the following token types: – keywords, operators, identifjers, literals (constants), special symbols, white space ● How many tokens are there in this string? ● Example output (excluding white spaces): – <keyword, ‘if’> – <special_symbol, ‘(‘> – <identifjer, ‘a’> – ... Manas Thakur CS502: Compiler Design 5

  6. Patterns for lexical analysis ● Keywords can be represented directly – ‘break’, ‘int’, ‘while’ ● And similarly punctuation symbols ● What about the ones that are too many? – Numbers – Identifiers ● Specified (or modelled) using – Regular expressions – The set of strings represented by a regular expression r forms a regular language L(r) . Manas Thakur CS502: Compiler Design 6

  7. Regex Primer ● Alphabet Σ consists of the symbols – Our fjrst names are strings over the alphabet Σ = [(a-z)*] * denotes zero or more occurences ● ● ε denotes an empty string + denotes one or more occurences ● ? denotes zero or one occurence ● ● | (or sometimes +) used to denote choice – a*b | a*c ● Many ways to express the same language: – a*b | a*c can also be written as: a*(b+c) Manas Thakur CS502: Compiler Design 7

  8. Classwork ● Write a regex that represents strings over alphabet { a , b } that start and end with a . – (a(a+b)*a) + a ● Strings with third last letter as a . – (a*+b*)*a(a+b)(a+b) ● Strings with exactly three b s. – a*ba*ba*ba* ● Strings over Σ = {0,1} with odd number of 1s: – HW Manas Thakur CS502: Compiler Design 8

  9. More Regex ● Identifiers that begin only with a letter and may have numbers or letters afterwards: – letter: (a|b|c| ... |z|A|B|C| ...|Z) – number: (0|1|2| ... |9) – identifier: letter(letter|digit)* ● HWOT: Write a regular expression for representing valid email ids. (You are free to choose your alphabet.) Manas Thakur CS502: Compiler Design 9

  10. Some considerations ● How to distinguish between patterns with common prefixes: – <, <=, << – Need to “look ahead” before taking a decision ● Clashes between token types (e.g., then versus thenVar ) – Assign priorities while checking (e.g., keywords before identifiers) – Start with an identifier and if the value matches a reserved word, then change its type ● Detecting and recovering from errors – Next class Manas Thakur CS502: Compiler Design 10

Recommend


More recommend