Formal Languages 1 Discrete Mathematical Structures Formal Languages
✁ � ✂ ✁ � Strings Alphabet : a finite set of symbols – Normally characters of some character set – E.g., ASCII, Unicode – Σ is used to represent an alphabet String : a finite sequence of symbols from some alphabet – If s is a string, then s is its length – The empty string is symbolized by 2 Discrete Mathematical Structures Formal Languages
✝ ✡ ✆ ✆ ☎ ✄ ✟✠ ✄ ✂ ✂ ✂ ✞ � ✟✠ ✁ � ☛ � ✞ String Operations Concatenation x = hi, y = bye xy = hibye s = s = s i 0 s i s i 1 s 0 i 3 Discrete Mathematical Structures Formal Languages
� � � � � Parts of a String Prefix Suffix Substring Proper prefix, suffix, or substring Subsequence 4 Discrete Mathematical Structures Formal Languages
✂ � ✁ � ✂ � ☎ ✄ Language A language is a set of strings over some alphabet Σ L Examples: – is a language – is a language – The set of all legal Java programs – The set of all correct English sentences 5 Discrete Mathematical Structures Formal Languages
� � � Operations on Languages Of most concern for lexical analysis Union Concatenation Closure 6 Discrete Mathematical Structures Formal Languages
✄ � ✄ ✁ ✂ ✄ ☎ Union The union of languages L and M L M s s L or s M 7 Discrete Mathematical Structures Formal Languages
✄ ✁ ✂ ✄ ✄ ☎ Concatenation The concatenation of languages L and M LM st s L and t M 8 Discrete Mathematical Structures Formal Languages
✁ ✄ � Kleene Closure The Kleene closure of language L ∞ L i L i 0 Zero or more concatenations 9 Discrete Mathematical Structures Formal Languages
� ✄ � Positive Closure The positive closure of language L ∞ L i L i 1 One or more concatenations 10 Discrete Mathematical Structures Formal Languages
✁ � ✄ ✞ � ☎ � ✞ � � � ✞ ✞ ☎ ✞ ✞ ✞ ✁ ✞ � � ✂ � ✞ ✄ ✞ ✁ ✁ ✄ � � Example Let L A B C Z a b c z ✞ ✁� ✞ ✁� 0 1 2 9 Let D ✞ ✁� L D LD L 4 L L L D D 11 Discrete Mathematical Structures Formal Languages
� � � � Regular Expressions A convenient way to represent languages that can be processed by lexical analyzers Notation is slightly different than the set notation presented for languages A regular expression is built from simpler regular expressions using a set of defining rules A regular expression represents strings that are members of some regular set 12 Discrete Mathematical Structures Formal Languages
☎ � � ✂ � ✄ � ✂ ✁ ✁ ✂ ☎ Rules for Defining Regular Expressions The regular expression r denotes the language L r is a regular expression that denotes , the set containing the empty string If a is a symbol in the alphabet, then a is a regular expression that denotes a , the containing the string a How to distinguish among these notations 13 Discrete Mathematical Structures Formal Languages
✄ ✂ ✂ ✁ ✄ � ✄ ✂ ✄ ✂ ✂ ✄ ✄ ✂ ✂ � ✂ ✄ ✂ ✂ � ✂ ✄ ✄ ✂ ✄ ✂ ✄ ✄ ✄ ✂ ✁ ✄ ✂ ✂ ✄ Combining Regular Expressions Let r and s be regular expressions that denote the languages L r and L s respectively r s is a regular expression denoting L r L s r s is a regular expression denoting L r L s r is a regular expression denoting L r r is a regular expression denoting L r The language denoted by a regular expression is called a regular set 14 Discrete Mathematical Structures Formal Languages
✂ ✂ ✂ ✄ ✄ ✁ ✄ ✂ ✄ ✄ ✄ ✂ ✄ ✄ ✂ ✄ ✄ ✂ � ✄ ☎ ✄ ✂ ✂ ✂ ✂ ☎ ✄ ✂ ✄ ✁ ✄ ✂ ✂ � ✄ ✄ � ✄ ✂ ✂ ✄ ✄ ✁ ✂ ☎ ✄ ✂ ✁ ✄ ✄ ✁ ✄ More Formally Σ a E and F are regular expressions L L L a a L EF ab a L E and b L F L E F L E L F L E L E L E L E 15 Discrete Mathematical Structures Formal Languages
✁ � � Precedence Rules Precedence rules help simplify regular expressions – Kleene closure has highest precedence – Concatenation has next highest – has lowest precedence All operators associate left-to-right 16 Discrete Mathematical Structures Formal Languages
✄ ✂ ✂ ✂ ✂ ✄ ✂ ✂ ✄ ✂ ✁ ✂ � ✁ ✂ � ☎ ✂ ✞ ✄ ✁ ✄ ✁ ✁ Example Let Σ a b Find the strings in the language represented by the following regular expressions: a b a b a b a a b a a b a a b a 17 Discrete Mathematical Structures Formal Languages
✄ ✂ ✂ ✄ � ✂ ✂ ✄ ✄ ✄ ✄ ✂ ✂ ✄ ✂ ✂ ✂ ✂ ✄ ✂ ✄ ✄ ✂ ✂ ✂ ✂ ✁ ✁ ✂ ✄ ✁ ✂ ✁ ✄ � ✂ ✂ ✂ ✄ ✂ ✁ ✂ ✄ ✄ ✂ ✂ ✄ ✂ ✂ Algebra of Regular Expressions Property Definition is commutative r s s r r s t r s t is associative rs t r st Concatenation is associative r s t rs rt Concatenation distributes over s t r sr tr is the identity element for concatenation r r r Relation between and r r is idempotent r r 18 Discrete Mathematical Structures Formal Languages
� � � ✁ � ☎ � Mathematically Describing Relational Operators Σ = < , > , = , ! relop = < > <= >= == != 19 Discrete Mathematical Structures Formal Languages
✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✁ ✁ ✁ ✂ ✂ ✂ ☎ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ Identifiers and Numbers Σ = a , b , c , d , e , f , g , h , i , j , k , l , m , n , o , p , q , r , s , t , u , v , w , x , y , z , A , B , C , D , E , F , G , H , I , J , K , L , M , N , O , P , Q , R , S , T , U , V , W , X , Y , Z , 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , _ letter = a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H I J K L M N O P Q R S T U V W X Y Z digit = 0 1 2 3 4 5 6 7 8 9 identifier letter ( letter digit ) = number digit digit = 20 Discrete Mathematical Structures Formal Languages
� ✂ � � ✁ � ✁ ✁ ✁ � ✁ ✄ ✂ � ✞ � Finite Automata A non-deterministic finite automaton (NFA) is a 5-tuple: Σ φ S s 0 F S a set of states Σ a set of input symbols φ a transition function Σ S S s 0 a distinguished state called the start state F a set of accepting or final states 21 Discrete Mathematical Structures Formal Languages
� � ☎ ✄ ☎ ✄ NFA Representation An NFA can be conveniently represented by both a directed graph and a table Current Next State a State a b c Output c 0 0, 2 – 3 0 1 – 2 0 1 0 1 b 2 2 – 1, 2 0 c a a 3 1 0 0 1 b, c Final states c 3 2 are double circled (graph) a, c output a 1 (table) 22 Discrete Mathematical Structures Formal Languages
NFA Transition Graphs l, d 0 1 l a a b b 3 0 1 2 b 23 Discrete Mathematical Structures Formal Languages
Another NFA a a 2 3 ∋ 0 ∋ b 4 5 b 24 Discrete Mathematical Structures Formal Languages
� NFAs and Regular Sets An NFA can be built to recognize strings represented by a regular expression (i.e., strings that are members of some regular set) a a 2 3 ∋ 0 ∋ b 4 5 b 25 Discrete Mathematical Structures Formal Languages
� ✂ � ✂ ✄ ✄ ✂ � � ✄ NFAs as Recognizers Given an NFA M , L M is the language recoginized by that machine If the NFA scans the complete string and ends in a final state, then the string is a member of L M We say M accepts the the string If the NFA scans the complete string and ends in a non-final state, then the string is not a member of L M We say M rejects the the string Because of non-determinism a string is accepted if there is a path to a final state; a string is rejected if there is no path to a final state Think about the NFA following all non-deterministic paths in parallel 26 Discrete Mathematical Structures Formal Languages
✄ � � ✂ � � � ✄ � Deteministic Finite Automata (DFA) A special case of an NFA Also called a finite state machine No state has an -transition Σ , there is at most one edge labeled a leaving s s S and a Current Next State l 0 1 l d State Output 0 1 – 0 1 1 1 1 l, d 27 Discrete Mathematical Structures Formal Languages
✄ ✄ � ✁ ☎ ☎ ✄ ✂ � ☎ ✁ ✁ � � ✁ DFA Simulation DFA() s s 0 ; c nextchar(); while c eof —move is the φ : Σ s move( s , c ); S S function ☎✝✆ c nextchar(); if s F return true ; return false ; 28 Discrete Mathematical Structures Formal Languages
Recommend
More recommend