regular expressions
play

Regular Expressions Dr. Mattox Beckman University of Illinois at - PowerPoint PPT Presentation

Objectives Regular Expressions Syntax of Regular Expressions Conversion to Right Linear Grammar Regular Expressions Dr. Mattox Beckman University of Illinois at Urbana-Champaign Department of Computer Science Objectives Regular Expressions


  1. Objectives Regular Expressions Syntax of Regular Expressions Conversion to Right Linear Grammar Regular Expressions Dr. Mattox Beckman University of Illinois at Urbana-Champaign Department of Computer Science

  2. Objectives Regular Expressions Syntax of Regular Expressions Conversion to Right Linear Grammar Objectives You should be able to... ◮ Explain the syntax of regular expressions. ◮ Explain the limitations of regular expressions. ◮ Know how to convert a regular expression into an NFA.

  3. Objectives Regular Expressions Syntax of Regular Expressions Conversion to Right Linear Grammar Motivation ◮ Regular Languages were developed by Noam Chomsky in his quest to describe human languages. ◮ Computer Scientists like them because they are able to describe “words” or “tokens” very easily. Examples: Integers a bunch of digits Reals an integer, a dot, and an integer Past Tense English Verbs a bunch of letters ending with “ed” Proper Nouns a bunch of letters, the fjrst of which must be capitalized

  4. Objectives Regular Expressions Syntax of Regular Expressions Conversion to Right Linear Grammar A bunch of digits?! ◮ We need something a bit more formal if we want to communicate properly. ◮ We will use a pattern (or a regular expression ) to represent the kinds of words we want to describe. ◮ As it will turn out, these expressions will correspond to NFAs. ◮ Kinds of patterns we will use: ◮ Single letters ◮ Repetition ◮ Grouping ◮ Choices

  5. Objectives Regular Expressions Syntax of Regular Expressions Conversion to Right Linear Grammar Single Letters ◮ To match a single character, just write the character. ◮ To match the letter “a”... ◮ Regular Expression: a ◮ State machine: a q 0 q 1 start ◮ To match the character “8”... ◮ Regular Expression: 8 ◮ State machine: 8 q 0 q 1 start

  6. Objectives Regular Expressions Syntax of Regular Expressions Conversion to Right Linear Grammar Juxtaposition ◮ To match longer things, just put two regular expressions together. ◮ To match the character “a” followed by the character “8”... ◮ Regular expression: a8 ◮ State machine: a 8 q 0 q 1 q 2 start ◮ To match the string “hello”... ◮ Regular expression: hello ◮ State machine: h e l l o q 0 q 1 q 2 q 3 q 4 q 5 start

  7. Objectives Regular Expressions Syntax of Regular Expressions Conversion to Right Linear Grammar Repetition ◮ Zero or more copies of A , add * ◮ Regular expression A* ◮ State machine: A q 0 q 1 q 2 q 1 ǫ ǫ start ǫ ǫ ◮ One or more copies of A , add + ◮ Regular expression A+ ◮ State machine: A q 0 ǫ q 1 q 2 ǫ q 1 start ǫ

  8. Objectives Regular Expressions Syntax of Regular Expressions Conversion to Right Linear Grammar Grouping ◮ To groups things together, use parenthesis. ◮ To match one or more copies of the word “hi”... ◮ Regular expression: (hi)+ ◮ State machine: h i q 0 q 1 q 2 q 3 q 4 ǫ ǫ start ǫ

  9. Objectives Regular Expressions Syntax of Regular Expressions Conversion to Right Linear Grammar Choice ◮ To make a choice, use the vertical bar (also called “pipe”). ◮ To match A of B . ◮ Regular expression: A|B ◮ State machine: A a 0 a 1 ǫ ǫ q 0 q 1 start ǫ ǫ B b 0 b 1

  10. (0|1)*0 (aa)*a (aa)*a(aa)* (aa|bb)*((ab|ba)(aa|bb)*(ab|ba)(aa|bb)*)* (0|1)+ (0|1)* ab*a Objectives Regular Expressions Syntax of Regular Expressions Conversion to Right Linear Grammar Examples Expression (Some) Matches (Some) Rejects aa , aba , abbba ba , aaba , abaa any binary number, ǫ any binary number empty string even binary numbers odd number of a s odd number of a s even number of a s and b

  11. [0-9]+ X.*Y Objectives Regular Expressions Syntax of Regular Expressions Conversion to Right Linear Grammar Some Notational Shortcuts ◮ A range of characters: [Xa-z] matches X and between a and z (inclusively). ◮ Any character at all: . ◮ Escape: \ Expression (Some) Matches integers anything at all between an X and a Y fmoating point numbers (positive, without exponents) [0-9]* \ .[0-9]*

  12. aa|aaa|aaaaa|aaaaaaa Objectives Regular Expressions Syntax of Regular Expressions Conversion to Right Linear Grammar Things to know... ◮ They are greedy . X.*Y will match XabaaYaababY entirely, not just XabaaY . ◮ They cannot count very well. ◮ They can only count as high as you have states in the machine. ◮ This regular expression matches some primes: ◮ You cannot match an infjnite number of primes. ◮ You cannot match “nested comments”. ( \ *.* \ *)

  13. Objectives Regular Expressions Syntax of Regular Expressions Conversion to Right Linear Grammar Right Linear Grammars A Right Linear Grammar is one in which every production has the form A → x or A → xB or A → B where A and B are arbitrary (possibly identical) nonterminal symbols, and x is an arbitrary terminal symbol. ◮ “At most one non-terminal symbol in the right hand side.” ◮ It turns out these are equivalent to NFAs! ◮ Have one nonterminal symbol for each state, one terminal symbol for each production.

  14. s a f d Objectives Regular Expressions Syntax of Regular Expressions Conversion to Right Linear Grammar Example 1 ◮ Regular Expression: asdf ◮ State machine: q 0 q 1 q 2 q 3 q 4 start ◮ Grammar: S 0 → a S 1 S 1 → s S 2 S 2 → d S 3 S 3 → f S 4 S 4 → ǫ

  15. f a f d s s d d s Objectives Regular Expressions Syntax of Regular Expressions Conversion to Right Linear Grammar Example 2 ◮ Regular Expression: a(s|d)+f S 0 → a S 1 q 2 S 1 → s S 2 d S 3 | S 2 → s S 2 q 0 q 1 q 4 d S 3 start | f S 4 | S 3 → s S 2 q 3 d S 3 | f S 4 |

  16. b a b Objectives Regular Expressions Syntax of Regular Expressions Conversion to Right Linear Grammar Going from Regular Expression to Right Linear Grammar ◮ One way: Regular Expression → NFA → DFA → RLG ◮ Aonther way: direct conversion. We’ll use a “bottom up” strategy. Characters To convert a single character a , we make a simple prodcution. S → a where S is the start symbol. Concatenation To concatenate two regular expressions, add the second start symbol to the end of any “accepting” states from the fjrst grammar. Regexp: ab Regexp: a Regexp: b S 1 → a S 2 S 1 → S 2 → S 2 →

  17. b a b a b a Objectives Regular Expressions Syntax of Regular Expressions Conversion to Right Linear Grammar Choice and Repetition Choice To choose between two regular expressions, add a new start symbol that “picks” one of the choices. Regexp: a|b S → S 1 | S 2 Regexp: a Regexp: b S 1 → S 2 → S 1 → S 2 → Kleene Plus If S is the start symbol, then for every rule of the form A → x (“accepting states”) add another rule of the form A → x S . You may have to remove ǫ productions fjrst. Regexp: a|b Regexp: (a|b)+ S → S 1 | S 2 S → S 1 | S 2 S 1 → S 1 → a | a S S 2 → S 2 → b | b S

  18. b a Objectives Regular Expressions Syntax of Regular Expressions Conversion to Right Linear Grammar Choice and Repetition Kleene Star If S is the start symbol, then for every rule of the form A → x (“accepting states”) add another rule of the form A → x S . Also add an ǫ rule. Regexp: a|b Regexp: (a|b)* S → S 1 | S 2 S → S 1 | S 2 | ǫ S 1 → S 1 → a | a S S 2 → S 2 → b | b S

  19. Objectives Regular Expressions Syntax of Regular Expressions Conversion to Right Linear Grammar Credits The algorithm for converting a regular expression to a right linear grammar is based partly on the discussion here: http://vasy.inria.fr/people/Gordon.Pace/Research/Soft- ware/Relic/Transformations/RE/toRG.html

Recommend


More recommend