CSCI 2320 Lexical Analysis Ref: Ch 3 + Handout (Nishimura) MOHAMMAD - PDF document

9/25/17 CSCI 2320 Lexical Analysis Ref: Ch 3 + Handout (Nishimura) MOHAMMAD T. IRFAN Plan Chomsky Hierarchy Lexical Analysis 1

9/25/17 Chomsky Hierarchy Faster computa?on Regular grammar BoGom of hierarchy Context-free grammar (CFG/BNF) Context-sensi?ve grammar Unrestricted grammar Top of hierarchy More expressive power Chomsky Hierarchy A, B ∈ N ω ∈ T* α, β ∈ (T U N)* A → ω B A → ω B | ω Regular grammar A → ω Context-free grammar (CFG/BNF) A → β Context-sensi?ve grammar α → β, where |α| <= |β| Unrestricted grammar α → β 2

9/25/17 Regular grammar: A, B ∈ N A → ω B ω ∈ T* A → ω pros and cons Pros ◦ Can do the first layer of abstrac?on in PL syntax ◦ Integer → 0 Integer | 1 Integer | ... | 9 Integer | 0 | 1 | ... | 9 ◦ Note: following is not regular grammar (why?) ◦ Integer à Integer Digit ◦ Digit à 0 | 1 | ... | 9 Cons ◦ Cannot check balanced parenthesis, braces, etc. ◦ Cannot represent {a n b n | n >= 1} CFG/BNF/EBNF: A ∈ N A → β β ∈ (T U N)* pros and cons Pros ◦ Can do all layers of abstrac?ons in PL syntax ◦ Assignment à Iden-fier = Expression; Cons ◦ Can't do lots of seman?c-type things ◦ Variable declared before use? ◦ Operand and operator compa?ble? ◦ Can't represent languages like {ww | w ∈ T + } ◦ Can do equality checking (a n b n ), but can't detect repe??on 3

9/25/17 A, B ∈ N Context-sensiRve: ω ∈ T* α, β ∈ (T U N)* pros and cons α → β, where |α| <= |β| Pros ◦ Can represent languages like {a n b n c n | n >= 1} Cons ◦ It is undecidable whether a given sentence ω can be derived from a given context-sensi?ve grammar ◦ Can't do parsing! ◦ Can't write a compiler for context-sensi?ve grammar! A, B ∈ N Unrestricted: ω ∈ T* α, β ∈ (T U N)* pros and cons α → β Pros ◦ Equivalent to Turing machine ◦ That is, can compute any computable func?on Cons ◦ Can we do parsing? 4

9/25/17 Plan Chomsky Hierarchy Lexical Analysis Lexical Analysis Input: Lexemes (typed ASCII characters) Output: Tokens (sequence of characters having a collec?ve meaning) Discard: whitespace, comments int count = 10; Lexemes int count = 10 ; keywo ident opera intLi separ Tokens rd ifier tor teral ator 5

9/25/17 Why do lexical analysis separately? Simpler, faster grammar for parsing ◦ Next: how? 75% of ?me spent in lexical analysis Def. Regular Expressions RegExpr Meaning x a character x \x an escaped character, e.g., \n { Z } a reference to a reg expr Z M | N M or N, where M and N are reg expr M N M followed by N M* zero or more occurrences of M M+ One or more occurrences of M M? Zero or one occurrence of M 6

9/25/17 Def. Regular Expressions RegExpr Meaning [aeiou] the set of vowels [0-9] the set of digits . Any single character Special symbols: ^ means not (e.g., [^aeiouAEIOU] is a non-vowel) CLite regular definiRon Category Defini3on AnyChar [ -~] From space (ASCII 27) to ?lde (126) LeGer [a-zA-Z] Digit [0-9] Whitespace [ \t] Space and tab Eol \n 7

9/25/17 Category Defini3on Keyword bool | char | else | false | float | if | int | main | true | while Iden?fier {LeGer}({LeGer} | {Digit})* IntegerLit {Digit}+ FloatLit {Digit}+\.{Digit}+ CharLit '{AnyChar}' Category Defini3on Operator = | || | && | == | != | < |   <= | > | + | - | * | / | ! | [ | ] Separator : | . | { | } | ( | ) Comment // ({AnyChar} | {Whitespace})* {Eol} 8

9/25/17 ImplementaRon Using Python Python's re package hGps://docs.python.org/3/library/re.html import re #regex re.split(...) #Use regex argument to split a string into parts Common string matching regex: Symbol Defini3on \d [0-9] \D [^0-9] \w [a-zA-Z0-9_] \W [^a-zA-Z0-9_] 9

9/25/17 Describe the language: 1. 0(0|1) + 0 2. ((ε|0)1*)* 3. 0*10*10*10* 4. (00|11)* Write regular expression for: 1. All strings of lowercase leGers, where leGers appear in ascending order. 2. All strings of leGers containing vowels in order. 10

9/25/17 Exam 1 Coming Thursday, Sept 28 Start of class (30 min) Up to today's class Finite State Automata (FSA) BEHIND THE SCENE OF REGULAR EXPRESSIONS 11

9/25/17 Finite State Automata (FSA) Σ: Input alphabet + unique end symbol ($) Set of states ◦ Represented by nodes ◦ Unique start state ◦ One or more final states State transi?on func?on ◦ Labelled (using alphabet) arcs in graph DeterminisRc F.A. (DFA) There is at most one outgoing arc from any state for any par?cular input symbol ◦ Easy to parse: does x belong to L G ? 12

9/25/17 Non-determinisRc F.A. (NFA) Allows mul?ple outgoing arcs from a state for the same input symbol Allows transi?ons on empty string (ε) ◦ Easy to express a language ◦ But difficult to parse Known algorithms 1. DFA à regular expression 2. Regular expression à NFA Language designer à implementa?on (parsing) 3. NFA à DFA DFA à Regex à NFA à DFA All 3 are equivalent! 13

9/25/17 Example State elimina?on algorithm • Nishimura handout: + means | Odd binary number (More details soon) Regex à NFA à DFA à Regex (0|1)*1 à ? à ? à ? Idea: • For |, symbols will be on the same arc For concatena?on, create new state • • For *, use self-loop (More details on next slide) Idea: • Start with the NFA start symbol and tabulate all possible sets of NFA states that you can reach on 0 and 1 transi?ons. • Each set of NFA state is a DFA state. Regex à NFA ScoG, Programming Languages (2000) 14

9/25/17 NFA/DFA à Regex State eliminaRon algorithm How to preserve all paths a•er dele?ng a node? For each node to be deleted: ◦ Match each incoming arc with every outgoing arc Class ParRcipaRon 3 Do the following for binary numbers with an even number of 0s: Regular expression à NFA à DFA à Regular expression. 15

CSCI 2320 Lexical Analysis Ref: Ch 3 + Handout (Nishimura) MOHAMMAD - PDF document

9/25/17 CSCI 2320 Lexical Analysis Ref: Ch 3 + Handout (Nishimura) MOHAMMAD T. IRFAN Plan Chomsky Hierarchy Lexical Analysis 1 9/25/17 Chomsky Hierarchy Faster computa?on Regular grammar BoGom of hierarchy Context-free grammar

Compilers Lexical Analysis Alex Aiken Lexical Analysis 1. Lexical Analysis 2. Parsing 3.

Heterogeneous Lexical Resources MultiJEDI ERC 259234 Lexical Resource Lexical Resource Lexical

Lexical analysis Lexical analysis Lexical analysis checks the correctness of program words and

Introduction to Lexical Analysis Outline Informal sketch of lexical analysis

LEXICAL TYPOLOGY Peter Koch (Part I) Koch, Lexical typology, 2010-8-24 A. General introduction

Lesson 2 Lexical Analysis CS 226/326 Spring 2003 Lexical Analysis Transform source program

Binary Trees Review From CSCI-1321 Data Abstractions CSCI-2320 Dr. Tom Hicks Computer Science

Introduction to Lexical Analysis Outline Informal sketch of lexical analysis

Lexical Analysis Aslan Askarov aslan@cs.au.dk acknowledgments: E. Ernst Lexical analysis

LEXICAL TYPOLOGY LEXICAL TYPOLOGY Peter Koch (Part II) Department of Romance Studies, Tbingen

LEXICAL SEMANTICS LEXICAL SEMANTICS CS 224N 2011 Gerald Penn Slides largely adapted from

Introduction to Lexical Analysis Identifies tokens in input string Issues in lexical

CSCI-2320 Syntactic Analysis (Ch 3 & Wikipedia for CYK) Mohammad T . Irfan AKA

Lexical Analysis Therefore an implementation of a lexical analyser must do two things: Recognise

Lexical Analysis (2) Sukree Sinthupinyo 1 1 Department of Computer Engineering Chulalongkorn

CSCI-2320 Functional Programming with Haskell Mohammad T . Irfan Functional Programming u Mimic

CMSC 430 Introduction to Compilers Spring 2017 Lexing and Parsing Overview Compilers are

Lexer and parser generators Lecture 3 Formal Languages and Compilers 2011 Nataliia Bielova 1

Concepts Introduced in Chapter 3 Lexical Analysis Regular Expressions (RE) Lex

2. Lexical Analysis 2.1 Tasks of a Scanner 2.2 Regular Grammars and Finite Automata 2.3 Scanner

The Compiler So Far Scanner Lexical analysis CSC 4181 Detects inputs with illegal

Compiler Construction Lecture 3: Lexical Analysis II (Extended Matching Problem) Thomas Noll

CSE443 Compilers Dr. Carl Alphonce alphonce@buffalo.edu 343 Davis Hall http:/

Lexical Analysis Sukree Sinthupinyo 1 1 Department of Computer Engineering Chulalongkorn

CSCI 2320 Lexical Analysis Ref: Ch 3 + Handout (Nishimura) MOHAMMAD - PDF document

9/25/17 CSCI 2320 Lexical Analysis Ref: Ch 3 + Handout (Nishimura) MOHAMMAD T. IRFAN Plan Chomsky Hierarchy Lexical Analysis 1 9/25/17 Chomsky Hierarchy Faster computa?on Regular grammar BoGom of hierarchy Context-free grammar

Compilers Lexical Analysis Alex Aiken Lexical Analysis 1. Lexical Analysis 2. Parsing 3.

Heterogeneous Lexical Resources MultiJEDI ERC 259234 Lexical Resource Lexical Resource Lexical

Lexical analysis Lexical analysis Lexical analysis checks the correctness of program words and

Introduction to Lexical Analysis Outline Informal sketch of lexical analysis

LEXICAL TYPOLOGY Peter Koch (Part I) Koch, Lexical typology, 2010-8-24 A. General introduction

Lesson 2 Lexical Analysis CS 226/326 Spring 2003 Lexical Analysis Transform source program

Binary Trees Review From CSCI-1321 Data Abstractions CSCI-2320 Dr. Tom Hicks Computer Science

Introduction to Lexical Analysis Outline Informal sketch of lexical analysis

Lexical Analysis Aslan Askarov aslan@cs.au.dk acknowledgments: E. Ernst Lexical analysis

LEXICAL TYPOLOGY LEXICAL TYPOLOGY Peter Koch (Part II) Department of Romance Studies, Tbingen

LEXICAL SEMANTICS LEXICAL SEMANTICS CS 224N 2011 Gerald Penn Slides largely adapted from

Introduction to Lexical Analysis Identifies tokens in input string Issues in lexical

CSCI-2320 Syntactic Analysis (Ch 3 &amp; Wikipedia for CYK) Mohammad T . Irfan AKA

Lexical Analysis Therefore an implementation of a lexical analyser must do two things: Recognise

Lexical Analysis (2) Sukree Sinthupinyo 1 1 Department of Computer Engineering Chulalongkorn

CSCI-2320 Functional Programming with Haskell Mohammad T . Irfan Functional Programming u Mimic

CMSC 430 Introduction to Compilers Spring 2017 Lexing and Parsing Overview Compilers are

Lexer and parser generators Lecture 3 Formal Languages and Compilers 2011 Nataliia Bielova 1

Concepts Introduced in Chapter 3 Lexical Analysis Regular Expressions (RE) Lex

2. Lexical Analysis 2.1 Tasks of a Scanner 2.2 Regular Grammars and Finite Automata 2.3 Scanner

The Compiler So Far Scanner Lexical analysis CSC 4181 Detects inputs with illegal

Compiler Construction Lecture 3: Lexical Analysis II (Extended Matching Problem) Thomas Noll

CSE443 Compilers Dr. Carl Alphonce alphonce@buffalo.edu 343 Davis Hall http:/

Lexical Analysis Sukree Sinthupinyo 1 1 Department of Computer Engineering Chulalongkorn

CSCI-2320 Syntactic Analysis (Ch 3 & Wikipedia for CYK) Mohammad T . Irfan AKA