Objectives Regular Expressions Syntax of Regular Expressions Objectives Regular Expressions Syntax of Regular Expressions Objectives You should be able to ... Regular Languages ◮ Use the syntax of regular expressions to model a given set of strings. Dr. Mattox Beckman ◮ Give examples of the limitations of regular expressions. University of Illinois at Urbana-Champaign Department of Computer Science Objectives Regular Expressions Syntax of Regular Expressions Objectives Regular Expressions Syntax of Regular Expressions Motivation A Bunch of Digits?! ◮ Regular languages were developed by Noam Chomsky in his quest to describe human ◮ We need something a bit more formal if we want to communicate properly. languages. ◮ We will use a pattern (or a regular expression ) to represent the kinds of words we want to ◮ Computer scientists like them because they are able to describe “words” or “tokens” very describe. easily. ◮ These expressions will correspond to NFAs. Examples: ◮ Kinds of patterns we will use: Integers a bunch of digits ◮ Single letters ◮ Repetition Reals an integer, a dot, and an integer ◮ Grouping Past Tense English Verbs a bunch of letters ending with “ed” ◮ Choices Proper Nouns a bunch of letters, the fjrst of which must be capitalized
Objectives Regular Expressions Syntax of Regular Expressions Objectives Regular Expressions Syntax of Regular Expressions Single Letters Juxtaposition ◮ To match a single character, just write the character. ◮ To match longer things, just put two regular expressions together. ◮ To match the letter “a” ... ◮ To match the character “a” followed by the character “8” ... ◮ Regular expression: a8 ◮ Regular expression: a ◮ State machine: ◮ State machine: a a 8 q 0 q 1 q 2 q 0 q 1 start start ◮ To match the string “hello” ... ◮ To match the character “8” ... ◮ Regular expression: hello ◮ Regular expression: 8 ◮ State machine: ◮ State machine: h e l l o q 0 q 1 q 2 q 3 q 4 q 5 start 8 q 0 q 1 start Objectives Regular Expressions Syntax of Regular Expressions Objectives Regular Expressions Syntax of Regular Expressions Repetition Grouping ◮ Zero or more copies of A , add * ◮ Regular expression A* ◮ To groups things together, use parenthesis. ◮ State machine: ◮ To match one or more copies of the word “hi” ... A q 0 ǫ q 1 q 2 ǫ q 3 start ◮ Regular expression: (hi)+ ◮ State machine: ǫ h i q 0 ǫ q 1 q 2 q 3 ǫ q 4 start ǫ ◮ One or more copies of A , add + ǫ ◮ Regular expression A+ ◮ We use Thompson’s construction to build the state machine. The extra ǫ transitions are ◮ State machine: important! A q 0 ǫ q 1 q 2 ǫ q 3 start ǫ
ab*a X.*Y (0|1)* (0|1)+ (aa)*a (aa)*a(aa)* (aa|bb)*((ab|ba)(aa|bb)*(ab|ba)(aa|bb)*)* [0-9]+ (0|1)*0 aa|aaa|aaaaa|aaaaaaa Objectives Regular Expressions Syntax of Regular Expressions Objectives Regular Expressions Syntax of Regular Expressions Choice Examples ◮ To make a choice, use the vertical bar (also called “pipe”). ◮ To match A or B ... Expression (Some) Matches (Some) Rejects ◮ Regular expression: A|B aa , aba , abbba ba , aaba , abaa ◮ State machine: any binary number, ǫ any binary number empty string A even binary numbers a 0 a 1 odd number of a s ǫ ǫ odd number of a s q 0 q 1 start even number of a s and b ǫ ǫ B b 0 b 1 Objectives Regular Expressions Syntax of Regular Expressions Objectives Regular Expressions Syntax of Regular Expressions Some Notational Shortcuts Things to Know ... ◮ A range of characters: [Xa-z] matches X and between a and z (inclusively). ◮ They are greedy . X.*Y will match XabaaYaababY entirely, not just XabaaY . ◮ Any character at all: . ◮ They cannot count very well. ◮ Escape: \ ◮ They can only count as high as you have states in the machine. Expression (Some) Matches ◮ This regular expression matches some primes: integers ◮ You cannot match an infjnite number of primes. anything at all between an X and a Y ◮ You cannot match “nested comments.” ( \ *.* \ *) fmoating point numbers (positive, without exponents) [0-9]* \ .[0-9]*
Recommend
More recommend