regular expressions regular languages
play

Regular Expressions = Regular Languages Mark Greenstreet, CpSc - PowerPoint PPT Presentation

Regular Expressions = Regular Languages Mark Greenstreet, CpSc 421, Term 1, 2008/09 17 September 2008 p.1/18 Lecture Outline Regular Expressions Regular Expresssions Equivalence of Regular Expressions and Finite Automata 17


  1. Regular Expressions = Regular Languages Mark Greenstreet, CpSc 421, Term 1, 2008/09 17 September 2008 – p.1/18

  2. Lecture Outline Regular Expressions ✈ Regular Expresssions ✈ Equivalence of Regular Expressions and Finite Automata 17 September 2008 – p.2/18

  3. Regular Madlibs Once upon a , there was a that noun noun past tense verb . zero or more adjectives plural noun ✈ Let avocado denote the language { avocado } . ✈ Let noun = avocado ∪ beach ∪ carrot ∪ caterpillar ∪ pencil ∪ penguins ∪ zombie . ✈ Let pluralNoun = noun s . ✈ Let verb = add ∪ compile ∪ eat ∪ sing ∪ swim ∪ walk . ✈ Let pastVerb = verb ed . ✈ Let adjective = beautiful ∪ big ∪ cold ∪ considerable ∪ furry ∪ insipid ∪ yellow . ✈ Now, our Madlib TM is Once upon a noun , there was a noun , that pastVerb ( adjective ) ∗ pluralNoun. 17 September 2008 – p.3/18

  4. Regular Madlibs Once upon a , there was a that noun noun past tense verb . zero or more adjectives plural noun ✈ Let avocado denote the language { avocado } . ✈ Let noun = avocado ∪ beach ∪ carrot ∪ caterpillar ∪ pencil ∪ penguins ∪ zombie . ✈ Let pluralNoun = noun s . ✈ Let verb = add ∪ compile ∪ eat ∪ sing ∪ swim ∪ walk . ✈ Let pastVerb = verb ed . ✈ Let adjective = beautiful ∪ big ∪ cold ∪ considerable ∪ furry ∪ insipid ∪ yellow . ✈ Now, our Madlib TM is Once upon a pencil , there was a noun , that pastVerb ( adjective ) ∗ pluralNoun. 17 September 2008 – p.3/18

  5. Regular Madlibs Once upon a , there was a that noun noun past tense verb . zero or more adjectives plural noun ✈ Let avocado denote the language { avocado } . ✈ Let noun = avocado ∪ beach ∪ carrot ∪ caterpillar ∪ pencil ∪ penguins ∪ zombie . ✈ Let pluralNoun = noun s . ✈ Let verb = add ∪ compile ∪ eat ∪ sing ∪ swim ∪ walk . ✈ Let pastVerb = verb ed . ✈ Let adjective = beautiful ∪ big ∪ cold ∪ considerable ∪ furry ∪ insipid ∪ yellow . ✈ Now, our Madlib TM is Once upon a pencil , there was a carrot , that pastVerb ( adjective ) ∗ pluralNoun. 17 September 2008 – p.3/18

  6. Regular Madlibs Once upon a , there was a that noun noun past tense verb . zero or more adjectives plural noun ✈ Let avocado denote the language { avocado } . ✈ Let noun = avocado ∪ beach ∪ carrot ∪ caterpillar ∪ pencil ∪ penguins ∪ zombie . ✈ Let pluralNoun = noun s . ✈ Let verb = add ∪ compile ∪ eat ∪ sing ∪ swim ∪ walk . ✈ Let pastVerb = verb ed . ✈ Let adjective = beautiful ∪ big ∪ cold ∪ considerable ∪ furry ∪ insipid ∪ yellow . ✈ Now, our Madlib TM is Once upon a pencil , there was a carrot , that walked ( adjective ) ∗ pluralNoun. 17 September 2008 – p.3/18

  7. Regular Madlibs Once upon a , there was a that noun noun past tense verb . zero or more adjectives plural noun ✈ Let avocado denote the language { avocado } . ✈ Let noun = avocado ∪ beach ∪ carrot ∪ caterpillar ∪ pencil ∪ penguins ∪ zombie . ✈ Let pluralNoun = noun s . ✈ Let verb = add ∪ compile ∪ eat ∪ sing ∪ swim ∪ walk . ✈ Let pastVerb = verb ed . ✈ Let adjective = beautiful ∪ big ∪ cold ∪ considerable ∪ furry ∪ insipid ∪ yellow . ✈ Now, our Madlib TM is Once upon a pencil , there was a carrot , that walked beautiful, ( adjective ) ∗ pluralNoun. 17 September 2008 – p.3/18

  8. Regular Madlibs Once upon a , there was a that noun noun past tense verb . zero or more adjectives plural noun ✈ Let avocado denote the language { avocado } . ✈ Let noun = avocado ∪ beach ∪ carrot ∪ caterpillar ∪ pencil ∪ penguins ∪ zombie . ✈ Let pluralNoun = noun s . ✈ Let verb = add ∪ compile ∪ eat ∪ sing ∪ swim ∪ walk . ✈ Let pastVerb = verb ed . ✈ Let adjective = beautiful ∪ big ∪ cold ∪ considerable ∪ furry ∪ insipid ∪ yellow . ✈ Now, our Madlib TM is Once upon a pencil , there was a carrot , that walked beautiful, considerable pluralNoun. 17 September 2008 – p.3/18

  9. Regular Madlibs Once upon a , there was a that noun noun past tense verb . zero or more adjectives plural noun ✈ Let avocado denote the language { avocado } . ✈ Let noun = avocado ∪ beach ∪ carrot ∪ caterpillar ∪ pencil ∪ penguins ∪ zombie . ✈ Let pluralNoun = noun s . ✈ Let verb = add ∪ compile ∪ eat ∪ sing ∪ swim ∪ walk . ✈ Let pastVerb = verb ed . ✈ Let adjective = beautiful ∪ big ∪ cold ∪ considerable ∪ furry ∪ insipid ∪ yellow . ✈ Now, our Madlib TM is Once upon a pencil , there was a carrot , that walked beautiful, considerable penguins. 17 September 2008 – p.3/18

  10. Regular Expressions ✈ A regular expression, α , is L ( R ) R where ∅ ∅ { ǫ } ǫ { c } c ∈ Σ c R 1 ∪ R 2 L ( R 1 ) ∪ L ( R 2 ) R 1 and R 2 are regular expressions R 1 · R 2 L ( R 1 ) · L ( R 2 ) R 1 and R 2 are regular expressions R ∗ L ( R 1 ) ∗ R 1 is a regular expression 1 ✈ Language union, concatenation, and asteration were defined in the Sept. 10 notes and Sipser p. 44. 17 September 2008 – p.4/18

  11. Regular Expressions Examples Let Σ = { a , b } . ✈ a ∗ b ∗ – the set of all string with zero or more a ’s followed by zero or more b ’s. For example, the strings ǫ , a , aaab , bb , and aabbb are in this language. The strings aba and ba are not. ✈ ( aaa ) ∗ ( bb ) ∗ b – the set of all strings consisting of a number of a ’s that is divisible by three followed by an odd number of b ’s. For example, the strings b , aaabbb , and aaaaaaaaaaaabbbbb are in this language, but the strings ǫ , baaa , and aabbb are not. ✈ a Σ ∗ b – the set of all strings that begin with an a and end with a b . For example, the strings ab , ababab and abbbaabaaabab are in this language, but the strings a , aba , and babbab are not. 17 September 2008 – p.5/18

  12. A Few More Remarks ✈ We’ll write Σ as a regular language that generates the language of all strings in Σ 1 . ✈ From the definition of L ∗ , we note that ǫ ∈ L ∗ for any language L . In particular, note that ∅ ∗ = { ǫ } . ✈ Regular expressions and programming languages. The following regular expressions describe various lexical pieces of Java: ✈ The keyword class: class . ✈ Identifiers: ([ A − Z ] ∪ [ a − z ] ∪ ∪ $)([ A − Z ] ∪ [ a − z ] ∪ ∪ $ ∪ [ 0 − 9 ]) ∗ , where [ A − Z ] denotes all characters from A to Z , and likewise for [ a − z ] and [ 0 − 9 ] . ✈ Floating point numbers: (([ 0 − 9 ] + . [ 0 − 9 ] ∗ ) ∪ ([ 0 − 9 ] ∗ . [ 0 − 9 ] + ))( ǫ ∪ ( e (+ ∪ − ∪ ǫ )[ 0 − 9 ] + )) [ 0 − 9 ] + e (+ ∪ − ∪ ǫ )[ 0 − 9 ] + , S where [ 0 − 9 ] + = [ 0 − 9 ][ 0 − 9 ] ∗ . 17 September 2008 – p.6/18

  13. RE = DFA = NFA Every DFA is an NFA DFAs NFAs Treat edge labels as Power Set Show a construction Construction regular expressions. for each case in definition Eliminate states to get of regular expression. regular expression. Regular Expressions ✈ We will show that every language described by a regular expression is recognized by an NFA. ✈ We will then show that every language recognized by a DFA has a corresponding regular expression. 17 September 2008 – p.7/18

  14. From REs to NFAs – strategy ✈ Regular expressions are defined inductively (see slide 4) ✈ Our proof is by induction on the structure of the regular expression. ✈ One case for each way to form a regular expression: ✈ The empty language: ∅ ✈ The empty string: ǫ ✈ A single symbol: c ✈ Union of two REs: R 1 ∪ R 2 ✈ Concatenation of two REs: R 1 · R 2 ✈ Kleene star: R ∗ 17 September 2008 – p.8/18

  15. From REs to NFAs ✈ R = ∅ : ✈ R = ǫ : c ✈ R = c : N 1 R recognizes 1 ... ε ✈ R = R 1 ∪ R 2 : ε ... N 2 R recognizes 2 17 September 2008 – p.9/18

  16. From REs to NFAs (cont.) N 1 R N 2 R recognizes recognizes 1 2 ε ε ε ✈ R = R 1 · R 2 : . . . . . . ε ε N 1 R recognizes 1 ε ε ... ε ✈ R = R ∗ 1 : 17 September 2008 – p.10/18

  17. An Example R = ( b ∪ c ∪ ab ) ∗ ✈ a ≡ a b ≡ b c ≡ c ✈ ab ≡ ε a b b ε ✈ b ∪ c ≡ c ε b ε ε c ✈ b ∪ c ∪ ab ≡ ε ε ε a b ε b ε ε ε c ε ✈ ( b ∪ c ∪ ab ) ∗ ≡ ε ε ε a b 17 September 2008 – p.11/18

  18. From DFAs to REs ✈ Given a DFA, we want to construct a regular expression that for the DFA’s language. ✈ The “hard” part is keeping track of all of the possible paths from the start state to an accepting state, especially because there can be many possible loops. ✈ The key observation is that the symbols that label edges in a DFA are simple regular expressions. ✈ We’ll generalize this idea and allow arbitrary regular expressions on edges. ✈ We’ll use the flexibility of regular expressions to allow us to eliminate one state from the DFA at a time. We’ll modify the REs for the remaining edges to account for the deleted states. Thus, our new DFA will recognize the same language as the original one. ✈ By successively deleting states, we’ll eventually get to a DFA with a start state, an accept state, and a single edge from the start state to the accept state. The label for this edge is the RE corresponding to the original DFA. 17 September 2008 – p.12/18

Recommend


More recommend