Regular Expressions Another means to describe languages Regular Expressions accepted by Finite Automata. In some books, regular languages, by definition, are described using regular expressions. Specifying Languages Regular Languages Recall: how do we specify languages? A regular expression describes a If language is finite, you can list all of its strings. language using only the set operations L = {a, aa, aba, aca} of: Descriptive: Union L = {x | n a (x) = n b (x)} Concatenation Using basic Language operations L= {aa, ab} * ∪ {b}{bb} * Kleene Star Regular languages are described using this last method Kleene Star Operation Regular Expressions The set of strings that can be obtained by Regular expressions are the mechanism concatenating any number of elements of a by which regular languages are language L is called the Kleene Star, L * described: � Take the “set operation” definition of the * U i 0 1 2 3 4 L L L L L L L ... = = � � � � language and: i 0 = Replace ∪ with + Note that since, L * contains L 0 , λ is an Replace {} with () element of L * And you have a regular expression 1
Regular expressions Regular Expression { λ } Recursive definition of regular languages / λ expression over Σ : { 011} 011 ∅ is a regular language and its regular 1. { 0,1} 0 + 1 expression is ∅ { λ } is a regular language and λ is its regular {0, 01} 0 + 01 2. expression {110} * {0,1} (110) * (0+1) For each a ∈ Σ , { a } is a regular language and 3. {10, 11, 01} * (10 + 11 + 01) * its regular expression is a {0, 11} * ({11} * ∪ {101, λ }) (0 + 11) * ((11) * + 101 + λ ) Regular Expression Regular Expressions 4. If L 1 and L 2 are regular languages with regular Some shorthand expressions r 1 and r 2 then If we apply precedents to the operators, we can -- L 1 ∪ L 2 is a regular language with regular relax the full parenthesized definition: expression (r 1 + r 2 ) Kleene star has highest precedent -- L 1 L 2 is a regular language with regular Concatenation had mid precedent expression (r 1 r 2 ) -- book uses (r 1 •r 2 ) + has lowest precedent -- L 1 * is a regular language with regular expression Thus (r 1 * ) a + b * c is the same as (a + ((b * )c)) -- Regular expressions can be parenthesized to (a + b) * is not the same as a + b * indicate operator precidence I.e. (r 1 ) Only languages obtainable by using rules 1-4 are regular languages . Regular Expressions Regular Expressions More shorthand Even more shorthand Equating regular expressions. Sometimes you might see in the book: Two regular expressions are considered equal if r n where n indicates the number of they describe the same language concatenations of r (e.g. r 6 ) 1 * 1 * = 1 * r + to indicate one or more concatenations of r. (a + b) * ≠ a + b * Note that this is only shorthand! r 6 and r + are not regular expressions. 2
Regular Expressions Regular Expressions Important thing to remember Questions? A regular expression is not a language A regular expression is used to describe a language. It is incorrect to say that for a language L, L = (a + b + c) * But it’s okay to say that L is described by (a + b + c) * Examples of Regular Languages Examples of Regular Languages All finite languages can be described All finite languages can be described by using regular expressions regular expressions A finite language L can be expressed as Can anyone tell me why? the union of languages each with one string corresponding to a string in L Example: L = {a, aa, aba, aca} L = {a} ∪ { aa} ∪ {aba} ∪ {aca} Regular expression: (a + aa + aba + aca) Examples of Regular Languages Examples of Regular Languages L = {x ∈ {0,1} * | |x| is even} L = {x ∈ {0,1} * | x does not end in 01 } Any string of even length can be obtained by If x does not end in 01, then either concatenating strings length 2. |x| < 2 or Any concatenation of strings of length 2 will be even x ends in 00, 10, or 11 L = {00, 01, 10, 11} * A regular expression that describes L is: ε + 0 + 1 + (0 + 1) * (00 + 10 + 11) Regular expressions describing L: (00 + 01 + 10 + 11) * ((0 + 1)(0 + 1)) * 3
Useful properties of regular Examples of Regular Languages expressions L = {x ∈ {0,1} * | x contains an odd Commutative number of 0s } L + M = M + L Associative Express x = yz (L + M) + N = L + (M + N) y is a string of the form y=1 i 01 j (LM)N = L(MN) In z, there must be an even number of Identities additional 0s or z = (01 k 01 m ) * ∅ + L = L + ∅ = L x can be described by (1 * 01 * )(01 * 01 * ) * λ L = L λ = L Questions? ∅ L = L ∅ = ∅ Useful properties of regular Useful properties of regular expressions expressions Closures Distributed (L * ) * = L * L (M + N) = LM + LN ∅ * = λ (M + N)L = ML + NL λ * = λ Idempotent L + = LL * L + L = L L * = L + + λ Questions? Practical uses for regular expressions Practical uses for regular expressions grep How a compiler works Global (search for) Regular Expressions and Print Stream Parse lexer parser codegen of tokens Finds patterns of characters in a text file. Tree Object grep man foo.txt code Source grep [ab]*c[de]? foo.txt file 4
Practical uses for regular expressions Practical uses for regular expressions How a compiler works How a compiler works The Lexical Analyzer (lexer) reads source Tokens can be described using regular code and generates a stream of tokens expressions! What is a token? Identifier Keyword Number Operator Punctuation Examples of Regular Languages Examples of Regular Languages L = set of valid C identifiers L = set of valid C keywords A valid C identifier begins with a letter or _ This is a finite set A valid C identifier contains letters, L can be described by numbers, and _ if + then + else + while + do + goto + break If we let: + switch + … l = {a , b , … , z , A , B , … , Z} d = {1 , 2 , … , 9 , 0} Then a regular expression for L: (l + _)(l + d + _) * Summary Practical uses for regular expressions lex Regular languages can be expressed using only the set operations of union, concatenation, Kleene Star. Program that will create a lexical analyzer. Regular languages Input: set of valid tokens Means of describing: Regular Expression Machine for accepting: Finite Automata Tokens are given by regular expressions. Practical uses Text search (grep) Compilers / Lexical Analysis (lex) Questions? Questions? 5
For next time The bottom line Chicken or the egg? Regular expressions and finite automata are equivalent in their ability to describe Which came first, the regular expression or the finite automata? languages. McCulloch/Pitts -- used finite automata to model neural Every regular expression has a FA that accepts the networks (1943) language it describes Kleene (mid 1950s) -- Applied to regular sets The language accepted by an FA can be described Ken Thompson/ Bell Labs folk (1970s) -- QED / ed / grep by some regular expression. / lex / awk / … The Kleene Theorem! (1956) Recall: Princeton dudes (1937) But that’s next time…. 6
Recommend
More recommend