Formal Languages: strings and alphabets A formal language is a set of strings, each string composed of symbols from a finite set called an alphabet (or a vocabulary). Examples • Let Σ 1 = { 0 , 1 } be an alphabet. Then all binary numbers are strings over Σ 1 . For instance: 01101 , 000001 , 1101 . • Let Σ 2 = { a , b , c , d , e , f , g } be an alphabet. Then bee , dad , cabbage , and face are strings over Σ 2 , as are fffff and agagag . • Let Σ 3 = { ba , ca , fa , ce , fe , ge } be an alphabet. Then face is a string over Σ 3 but bee , dad or cabbage are not. Raquel Fernández TTTV 2014 - lecture 1a 9
Formal Languages: strings and alphabets A formal language is a set of strings, each string composed of symbols from a finite set called an alphabet (or a vocabulary). Examples • Let Σ 1 = { 0 , 1 } be an alphabet. Then all binary numbers are strings over Σ 1 . For instance: 01101 , 000001 , 1101 . • Let Σ 2 = { a , b , c , d , e , f , g } be an alphabet. Then bee , dad , cabbage , and face are strings over Σ 2 , as are fffff and agagag . • Let Σ 3 = { ba , ca , fa , ce , fe , ge } be an alphabet. Then face is a string over Σ 3 but bee , dad or cabbage are not. • Let Σ 4 = {♠ , △ , ♣} be an alphabet. Then ♠♠ and ♣△♣ are strings over Σ 4 . Raquel Fernández TTTV 2014 - lecture 1a 9
Strings and Substrings The length of a string is the number of token symbols from the alphabet it contains. Raquel Fernández TTTV 2014 - lecture 1a 10
Strings and Substrings The length of a string is the number of token symbols from the alphabet it contains. Examples • the length of face over Σ 2 = { a , b , c , d , e , f , g } is 4 Raquel Fernández TTTV 2014 - lecture 1a 10
Strings and Substrings The length of a string is the number of token symbols from the alphabet it contains. Examples • the length of face over Σ 2 = { a , b , c , d , e , f , g } is 4 • the length of face over Σ 3 = { ba , ca , fa , ce , fe , ge } is Raquel Fernández TTTV 2014 - lecture 1a 10
Strings and Substrings The length of a string is the number of token symbols from the alphabet it contains. Examples • the length of face over Σ 2 = { a , b , c , d , e , f , g } is 4 • the length of face over Σ 3 = { ba , ca , fa , ce , fe , ge } is 2 Raquel Fernández TTTV 2014 - lecture 1a 10
Strings and Substrings The length of a string is the number of token symbols from the alphabet it contains. Examples • the length of face over Σ 2 = { a , b , c , d , e , f , g } is 4 • the length of face over Σ 3 = { ba , ca , fa , ce , fe , ge } is 2 The string of length 0 is called the empty string, denoted ǫ Raquel Fernández TTTV 2014 - lecture 1a 10
Strings and Substrings The length of a string is the number of token symbols from the alphabet it contains. Examples • the length of face over Σ 2 = { a , b , c , d , e , f , g } is 4 • the length of face over Σ 3 = { ba , ca , fa , ce , fe , ge } is 2 The string of length 0 is called the empty string, denoted ǫ Given a string s , a substring of s is a string formed by taking contiguous symbols of s in the order in which they occurr in s . An initial substring is called a prefix and a final substring, a suffix. Raquel Fernández TTTV 2014 - lecture 1a 10
Strings and Substrings The length of a string is the number of token symbols from the alphabet it contains. Examples • the length of face over Σ 2 = { a , b , c , d , e , f , g } is 4 • the length of face over Σ 3 = { ba , ca , fa , ce , fe , ge } is 2 The string of length 0 is called the empty string, denoted ǫ Given a string s , a substring of s is a string formed by taking contiguous symbols of s in the order in which they occurr in s . An initial substring is called a prefix and a final substring, a suffix. Examples Let unthinkable be a string over Σ = { a , b , c . . . x , y , z } Raquel Fernández TTTV 2014 - lecture 1a 10
Strings and Substrings The length of a string is the number of token symbols from the alphabet it contains. Examples • the length of face over Σ 2 = { a , b , c , d , e , f , g } is 4 • the length of face over Σ 3 = { ba , ca , fa , ce , fe , ge } is 2 The string of length 0 is called the empty string, denoted ǫ Given a string s , a substring of s is a string formed by taking contiguous symbols of s in the order in which they occurr in s . An initial substring is called a prefix and a final substring, a suffix. Examples Let unthinkable be a string over Σ = { a , b , c . . . x , y , z } Then, ǫ , un , unth , unthinkable are prefixes, while ǫ , e , able , thinkable , and unthinkable are suffixes. Other substrings include nthi , inka , bl . Raquel Fernández TTTV 2014 - lecture 1a 10
Some Operations on Strings Raquel Fernández TTTV 2014 - lecture 1a 11
Some Operations on Strings • Concatenation: two string s 1 and s 2 over Σ can be concatenated (written one after the other) to form a new string s 1 · s 2 over Σ . Σ = { a , b } a · b = ab Raquel Fernández TTTV 2014 - lecture 1a 11
Some Operations on Strings • Concatenation: two string s 1 and s 2 over Σ can be concatenated (written one after the other) to form a new string s 1 · s 2 over Σ . Σ = { a , b } a · b = ab • Exponent: we can apply an exponent operator n to a string s . The resulting string s n is obtained by concatenating s with itself n times. a 0 = ǫ , a 1 = a , a 2 = aa , a 3 = aaa . . . Raquel Fernández TTTV 2014 - lecture 1a 11
Some Operations on Strings • Concatenation: two string s 1 and s 2 over Σ can be concatenated (written one after the other) to form a new string s 1 · s 2 over Σ . Σ = { a , b } a · b = ab • Exponent: we can apply an exponent operator n to a string s . The resulting string s n is obtained by concatenating s with itself n times. a 0 = ǫ , a 1 = a , a 2 = aa , a 3 = aaa . . . • Kleene star: a special exponent operator ∗ which applied to a string s denotes any string obtained by concatenating s with itself any number of times. a ∗ = ǫ or a or aa or aaa . . . Raquel Fernández TTTV 2014 - lecture 1a 11
Formal Languages Σ ∗ Raquel Fernández TTTV 2014 - lecture 1a 12
Formal Languages Σ ∗ denotes the set of all strings over an alphabet Σ . → note that Σ ∗ is always infinite, regardless of the number of symbols Σ contains. Raquel Fernández TTTV 2014 - lecture 1a 12
Formal Languages Σ ∗ denotes the set of all strings over an alphabet Σ . → note that Σ ∗ is always infinite, regardless of the number of symbols Σ contains. We may now define a formal language over an alphabet Σ as any subset of Σ ∗ Raquel Fernández TTTV 2014 - lecture 1a 12
Formal Languages Σ ∗ denotes the set of all strings over an alphabet Σ . → note that Σ ∗ is always infinite, regardless of the number of symbols Σ contains. We may now define a formal language over an alphabet Σ as any subset of Σ ∗ Examples of formal languages Let Σ = { a , b , c . . . x , y , z } . Then Σ ∗ is the set of strings over the Latin alphabet and the following subsets of Σ ∗ are possible formal languages: Raquel Fernández TTTV 2014 - lecture 1a 12
Formal Languages Σ ∗ denotes the set of all strings over an alphabet Σ . → note that Σ ∗ is always infinite, regardless of the number of symbols Σ contains. We may now define a formal language over an alphabet Σ as any subset of Σ ∗ Examples of formal languages Let Σ = { a , b , c . . . x , y , z } . Then Σ ∗ is the set of strings over the Latin alphabet and the following subsets of Σ ∗ are possible formal languages: • the set of strings consisting of consonants only Raquel Fernández TTTV 2014 - lecture 1a 12
Formal Languages Σ ∗ denotes the set of all strings over an alphabet Σ . → note that Σ ∗ is always infinite, regardless of the number of symbols Σ contains. We may now define a formal language over an alphabet Σ as any subset of Σ ∗ Examples of formal languages Let Σ = { a , b , c . . . x , y , z } . Then Σ ∗ is the set of strings over the Latin alphabet and the following subsets of Σ ∗ are possible formal languages: • the set of strings consisting of consonants only • the set of strings containing at least one vowel and one consonant Raquel Fernández TTTV 2014 - lecture 1a 12
Formal Languages Σ ∗ denotes the set of all strings over an alphabet Σ . → note that Σ ∗ is always infinite, regardless of the number of symbols Σ contains. We may now define a formal language over an alphabet Σ as any subset of Σ ∗ Examples of formal languages Let Σ = { a , b , c . . . x , y , z } . Then Σ ∗ is the set of strings over the Latin alphabet and the following subsets of Σ ∗ are possible formal languages: • the set of strings consisting of consonants only • the set of strings containing at least one vowel and one consonant • the set of strings whose length is less than 9 symbols Raquel Fernández TTTV 2014 - lecture 1a 12
Formal Languages Σ ∗ denotes the set of all strings over an alphabet Σ . → note that Σ ∗ is always infinite, regardless of the number of symbols Σ contains. We may now define a formal language over an alphabet Σ as any subset of Σ ∗ Examples of formal languages Let Σ = { a , b , c . . . x , y , z } . Then Σ ∗ is the set of strings over the Latin alphabet and the following subsets of Σ ∗ are possible formal languages: • the set of strings consisting of consonants only • the set of strings containing at least one vowel and one consonant • the set of strings whose length is less than 9 symbols • the set { one , two , three , four , five , six , seven , eight , nine , ten } Raquel Fernández TTTV 2014 - lecture 1a 12
Formal Languages Σ ∗ denotes the set of all strings over an alphabet Σ . → note that Σ ∗ is always infinite, regardless of the number of symbols Σ contains. We may now define a formal language over an alphabet Σ as any subset of Σ ∗ Examples of formal languages Let Σ = { a , b , c . . . x , y , z } . Then Σ ∗ is the set of strings over the Latin alphabet and the following subsets of Σ ∗ are possible formal languages: • the set of strings consisting of consonants only • the set of strings containing at least one vowel and one consonant • the set of strings whose length is less than 9 symbols • the set { one , two , three , four , five , six , seven , eight , nine , ten } • the set of all English words Raquel Fernández TTTV 2014 - lecture 1a 12
Formal Languages Σ ∗ denotes the set of all strings over an alphabet Σ . → note that Σ ∗ is always infinite, regardless of the number of symbols Σ contains. We may now define a formal language over an alphabet Σ as any subset of Σ ∗ Examples of formal languages Let Σ = { a , b , c . . . x , y , z } . Then Σ ∗ is the set of strings over the Latin alphabet and the following subsets of Σ ∗ are possible formal languages: • the set of strings consisting of consonants only • the set of strings containing at least one vowel and one consonant • the set of strings whose length is less than 9 symbols • the set { one , two , three , four , five , six , seven , eight , nine , ten } • the set of all English words • the empty set Raquel Fernández TTTV 2014 - lecture 1a 12
Formal Languages How can we characterise the language(s) we are interested in? • given an alphabet Σ and the infinite set Σ ∗ of formal languages it can give rise to, how can we select a particular formal language? Raquel Fernández TTTV 2014 - lecture 1a 13
Formal Languages How can we characterise the language(s) we are interested in? • given an alphabet Σ and the infinite set Σ ∗ of formal languages it can give rise to, how can we select a particular formal language? For instance: ∗ can we distinguish the set of strings of letters that constitute proper English words? ∗ can we distinguish the set of strings of words that count as well-formed sentences of Dutch? Raquel Fernández TTTV 2014 - lecture 1a 13
Formal Languages How can we characterise the language(s) we are interested in? • given an alphabet Σ and the infinite set Σ ∗ of formal languages it can give rise to, how can we select a particular formal language? For instance: ∗ can we distinguish the set of strings of letters that constitute proper English words? ∗ can we distinguish the set of strings of words that count as well-formed sentences of Dutch? We have two formal mechanisms at our disposal: • formalisms (formal expressions and grammars): sets of rules • automata: computational devices for computing languages Raquel Fernández TTTV 2014 - lecture 1a 13
Formalisms and Automata • Formalisms and automata allow us to distinguish a formal language of interest (a set of strings) from other possible languages over a given alphabet. ∗ they capture the patterns that characterise a language ∗ as such, they act as a definition of the language they capture • From an abstract point of view, a natural language – like Dutch or English – is a set of strings (of sounds/letters, of words, etc.) • Therefore, formalisms and automata can help us to model aspects of natural languages. Raquel Fernández TTTV 2014 - lecture 1a 14
Formalisms and Automata • Formalisms and automata allow us to distinguish a formal language of interest (a set of strings) from other possible languages over a given alphabet. ∗ they capture the patterns that characterise a language ∗ as such, they act as a definition of the language they capture • From an abstract point of view, a natural language – like Dutch or English – is a set of strings (of sounds/letters, of words, etc.) • Therefore, formalisms and automata can help us to model aspects of natural languages. This week: we’ll look into one formalism to define formal languages, regular expressions, and into one type of automaton, finite state automata. Raquel Fernández TTTV 2014 - lecture 1a 14
Regular Expressions Regular expressions are a formal notation for characterising sets of strings that follow a fairly simple regular pattern. Raquel Fernández TTTV 2014 - lecture 1a 15
Regular Expressions Regular expressions are a formal notation for characterising sets of strings that follow a fairly simple regular pattern. We can construct regular expressions over an alphabet Σ as follows: Regular expression Languages empty set: ∅ {} { ǫ } empty string: ǫ symbol ( ∀ a ∈ Σ ): a { a } If a and b are reg exp, so are: concatenation: a · b { ab } disjunction (or union): ( a | b ) { a , b } Kleene star (or closure): { ǫ, a , aa , aaa , aaaaa , . . . } a ∗ Raquel Fernández TTTV 2014 - lecture 1a 15
Regular Expressions Regular expressions are a formal notation for characterising sets of strings that follow a fairly simple regular pattern. We can construct regular expressions over an alphabet Σ as follows: Regular expression Languages empty set: ∅ {} { ǫ } empty string: ǫ symbol ( ∀ a ∈ Σ ): a { a } If a and b are reg exp, so are: concatenation: a · b { ab } disjunction (or union): ( a | b ) { a , b } Kleene star (or closure): { ǫ, a , aa , aaa , aaaaa , . . . } a ∗ • we often ignore the dot in concatenation ( a · b ) and simply write ab • disjunction (or union) may be written as a | b , a + b or a ∪ b Raquel Fernández TTTV 2014 - lecture 1a 15
Regular Expressions Regular expressions are a formal notation for characterising sets of strings that follow a fairly simple regular pattern. We can construct regular expressions over an alphabet Σ as follows: Regular expression Languages empty set: ∅ {} { ǫ } empty string: ǫ symbol ( ∀ a ∈ Σ ): a { a } If a and b are reg exp, so are: concatenation: a · b { ab } disjunction (or union): ( a | b ) { a , b } Kleene star (or closure): { ǫ, a , aa , aaa , aaaaa , . . . } a ∗ • we often ignore the dot in concatenation ( a · b ) and simply write ab • disjunction (or union) may be written as a | b , a + b or a ∪ b • a + is the set of a -strings with at least one a ; Raquel Fernández TTTV 2014 - lecture 1a 15
Regular Expressions Regular expressions are a formal notation for characterising sets of strings that follow a fairly simple regular pattern. We can construct regular expressions over an alphabet Σ as follows: Regular expression Languages empty set: ∅ {} { ǫ } empty string: ǫ symbol ( ∀ a ∈ Σ ): a { a } If a and b are reg exp, so are: concatenation: a · b { ab } disjunction (or union): ( a | b ) { a , b } Kleene star (or closure): { ǫ, a , aa , aaa , aaaaa , . . . } a ∗ • we often ignore the dot in concatenation ( a · b ) and simply write ab • disjunction (or union) may be written as a | b , a + b or a ∪ b • a + is the set of a -strings with at least one a ; can be used to abbreviate a ∗ a or aa ∗ Raquel Fernández TTTV 2014 - lecture 1a 15
Regular Expressions Regular expressions are a formal notation for characterising sets of strings that follow a fairly simple regular pattern. We can construct regular expressions over an alphabet Σ as follows: Regular expression Languages empty set: ∅ {} { ǫ } empty string: ǫ symbol ( ∀ a ∈ Σ ): a { a } If a and b are reg exp, so are: concatenation: a · b { ab } disjunction (or union): ( a | b ) { a , b } Kleene star (or closure): { ǫ, a , aa , aaa , aaaaa , . . . } a ∗ • we often ignore the dot in concatenation ( a · b ) and simply write ab • disjunction (or union) may be written as a | b , a + b or a ∪ b • a + is the set of a -strings with at least one a ; can be used to abbreviate a ∗ a or aa ∗ • the notation Σ ∗ can be seen as abbreviating ( a | b | ... ) ∗ for any symbol a , b , . . . in Σ . Raquel Fernández TTTV 2014 - lecture 1a 15
Regular Expressions Regular expressions are a formal notation for characterising sets of strings that follow a fairly simple regular pattern. We can construct regular expressions over an alphabet Σ as follows: Regular expression Languages empty set: ∅ {} { ǫ } empty string: ǫ symbol ( ∀ a ∈ Σ ): a { a } If a and b are reg exp, so are: concatenation: a · b { ab } disjunction (or union): ( a | b ) { a , b } Kleene star (or closure): { ǫ, a , aa , aaa , aaaaa , . . . } a ∗ • we often ignore the dot in concatenation ( a · b ) and simply write ab • disjunction (or union) may be written as a | b , a + b or a ∪ b • a + is the set of a -strings with at least one a ; can be used to abbreviate a ∗ a or aa ∗ • the notation Σ ∗ can be seen as abbreviating ( a | b | ... ) ∗ for any symbol a , b , . . . in Σ . • Σ n can be seen as abbreviating the concatenation of ( a | b | ... ) with itself n times Raquel Fernández TTTV 2014 - lecture 1a 15
Regular Expressions Regular expressions are a formal notation for characterising sets of strings that follow a fairly simple regular pattern. We can construct regular expressions over an alphabet Σ as follows: Regular expression Languages empty set: ∅ {} { ǫ } empty string: ǫ symbol ( ∀ a ∈ Σ ): a { a } If a and b are reg exp, so are: concatenation: a · b { ab } disjunction (or union): ( a | b ) { a , b } Kleene star (or closure): { ǫ, a , aa , aaa , aaaaa , . . . } a ∗ • we often ignore the dot in concatenation ( a · b ) and simply write ab • disjunction (or union) may be written as a | b , a + b or a ∪ b • a + is the set of a -strings with at least one a ; can be used to abbreviate a ∗ a or aa ∗ • the notation Σ ∗ can be seen as abbreviating ( a | b | ... ) ∗ for any symbol a , b , . . . in Σ . • Σ n can be seen as abbreviating the concatenation of ( a | b | ... ) with itself n times • a n can be used to abbreviate the concatenation of a with itself n times Raquel Fernández TTTV 2014 - lecture 1a 15
Regular Expressions: Examples What kind of strings would the language defined by each of the following regular expressions contain? Let Σ = { a , b , c , d ... x , y , z } Regular expression Language Raquel Fernández TTTV 2014 - lecture 1a 16
Regular Expressions: Examples What kind of strings would the language defined by each of the following regular expressions contain? Let Σ = { a , b , c , d ... x , y , z } Regular expression Language ( ab ) ∗ c Raquel Fernández TTTV 2014 - lecture 1a 16
Regular Expressions: Examples What kind of strings would the language defined by each of the following regular expressions contain? Let Σ = { a , b , c , d ... x , y , z } Regular expression Language ( ab ) ∗ c { c , abc , ababc , abababc , . . . } Raquel Fernández TTTV 2014 - lecture 1a 16
Regular Expressions: Examples What kind of strings would the language defined by each of the following regular expressions contain? Let Σ = { a , b , c , d ... x , y , z } Regular expression Language ( ab ) ∗ c { c , abc , ababc , abababc , . . . } ( a | b ) ∗ c Raquel Fernández TTTV 2014 - lecture 1a 16
Regular Expressions: Examples What kind of strings would the language defined by each of the following regular expressions contain? Let Σ = { a , b , c , d ... x , y , z } Regular expression Language ( ab ) ∗ c { c , abc , ababc , abababc , . . . } ( a | b ) ∗ c { c , ac , bc , aac , abc , bac , bbc , babc . . . } Raquel Fernández TTTV 2014 - lecture 1a 16
Regular Expressions: Examples What kind of strings would the language defined by each of the following regular expressions contain? Let Σ = { a , b , c , d ... x , y , z } Regular expression Language ( ab ) ∗ c { c , abc , ababc , abababc , . . . } ( a | b ) ∗ c { c , ac , bc , aac , abc , bac , bbc , babc . . . } ( a ∗ c ) | ( b ∗ c ) Raquel Fernández TTTV 2014 - lecture 1a 16
Regular Expressions: Examples What kind of strings would the language defined by each of the following regular expressions contain? Let Σ = { a , b , c , d ... x , y , z } Regular expression Language ( ab ) ∗ c { c , abc , ababc , abababc , . . . } ( a | b ) ∗ c { c , ac , bc , aac , abc , bac , bbc , babc . . . } ( a ∗ c ) | ( b ∗ c ) { c , ac , aac , aaac , . . . bc , bbc , bbbc , . . . } Raquel Fernández TTTV 2014 - lecture 1a 16
Regular Expressions: Examples What kind of strings would the language defined by each of the following regular expressions contain? Let Σ = { a , b , c , d ... x , y , z } Regular expression Language ( ab ) ∗ c { c , abc , ababc , abababc , . . . } ( a | b ) ∗ c { c , ac , bc , aac , abc , bac , bbc , babc . . . } ( a ∗ c ) | ( b ∗ c ) { c , ac , aac , aaac , . . . bc , bbc , bbbc , . . . } me ( o ) ∗ w Raquel Fernández TTTV 2014 - lecture 1a 16
Regular Expressions: Examples What kind of strings would the language defined by each of the following regular expressions contain? Let Σ = { a , b , c , d ... x , y , z } Regular expression Language ( ab ) ∗ c { c , abc , ababc , abababc , . . . } ( a | b ) ∗ c { c , ac , bc , aac , abc , bac , bbc , babc . . . } ( a ∗ c ) | ( b ∗ c ) { c , ac , aac , aaac , . . . bc , bbc , bbbc , . . . } me ( o ) ∗ w { mew , meow , meoow , meooow , meooooooooow , . . . } Raquel Fernández TTTV 2014 - lecture 1a 16
Regular Expressions: Examples What kind of strings would the language defined by each of the following regular expressions contain? Let Σ = { a , b , c , d ... x , y , z } Regular expression Language ( ab ) ∗ c { c , abc , ababc , abababc , . . . } ( a | b ) ∗ c { c , ac , bc , aac , abc , bac , bbc , babc . . . } ( a ∗ c ) | ( b ∗ c ) { c , ac , aac , aaac , . . . bc , bbc , bbbc , . . . } me ( o ) ∗ w { mew , meow , meoow , meooow , meooooooooow , . . . } ba ( a ) + Raquel Fernández TTTV 2014 - lecture 1a 16
Regular Expressions: Examples What kind of strings would the language defined by each of the following regular expressions contain? Let Σ = { a , b , c , d ... x , y , z } Regular expression Language ( ab ) ∗ c { c , abc , ababc , abababc , . . . } ( a | b ) ∗ c { c , ac , bc , aac , abc , bac , bbc , babc . . . } ( a ∗ c ) | ( b ∗ c ) { c , ac , aac , aaac , . . . bc , bbc , bbbc , . . . } me ( o ) ∗ w { mew , meow , meoow , meooow , meooooooooow , . . . } ba ( a ) + { baa , baaaaaa , baaaaaaaaa , . . . } Raquel Fernández TTTV 2014 - lecture 1a 16
Regular Expressions: Examples What kind of strings would the language defined by each of the following regular expressions contain? Let Σ = { a , b , c , d ... x , y , z } Regular expression Language ( ab ) ∗ c { c , abc , ababc , abababc , . . . } ( a | b ) ∗ c { c , ac , bc , aac , abc , bac , bbc , babc . . . } ( a ∗ c ) | ( b ∗ c ) { c , ac , aac , aaac , . . . bc , bbc , bbbc , . . . } me ( o ) ∗ w { mew , meow , meoow , meooow , meooooooooow , . . . } ba ( a ) + { baa , baaaaaa , baaaaaaaaa , . . . } sun Σ ∗ Raquel Fernández TTTV 2014 - lecture 1a 16
Regular Expressions: Examples What kind of strings would the language defined by each of the following regular expressions contain? Let Σ = { a , b , c , d ... x , y , z } Regular expression Language ( ab ) ∗ c { c , abc , ababc , abababc , . . . } ( a | b ) ∗ c { c , ac , bc , aac , abc , bac , bbc , babc . . . } ( a ∗ c ) | ( b ∗ c ) { c , ac , aac , aaac , . . . bc , bbc , bbbc , . . . } me ( o ) ∗ w { mew , meow , meoow , meooow , meooooooooow , . . . } ba ( a ) + { baa , baaaaaa , baaaaaaaaa , . . . } sun Σ ∗ { sun , sunglasses , sunset , sunz , sunaaaaa , sunyxjshiksr . . . } Raquel Fernández TTTV 2014 - lecture 1a 16
Regular Expressions: Examples What kind of strings would the language defined by each of the following regular expressions contain? Let Σ = { a , b , c , d ... x , y , z } Regular expression Language ( ab ) ∗ c { c , abc , ababc , abababc , . . . } ( a | b ) ∗ c { c , ac , bc , aac , abc , bac , bbc , babc . . . } ( a ∗ c ) | ( b ∗ c ) { c , ac , aac , aaac , . . . bc , bbc , bbbc , . . . } me ( o ) ∗ w { mew , meow , meoow , meooow , meooooooooow , . . . } ba ( a ) + { baa , baaaaaa , baaaaaaaaa , . . . } sun Σ ∗ { sun , sunglasses , sunset , sunz , sunaaaaa , sunyxjshiksr . . . } ( co ) 2 Σ ∗ Raquel Fernández TTTV 2014 - lecture 1a 16
Regular Expressions: Examples What kind of strings would the language defined by each of the following regular expressions contain? Let Σ = { a , b , c , d ... x , y , z } Regular expression Language ( ab ) ∗ c { c , abc , ababc , abababc , . . . } ( a | b ) ∗ c { c , ac , bc , aac , abc , bac , bbc , babc . . . } ( a ∗ c ) | ( b ∗ c ) { c , ac , aac , aaac , . . . bc , bbc , bbbc , . . . } me ( o ) ∗ w { mew , meow , meoow , meooow , meooooooooow , . . . } ba ( a ) + { baa , baaaaaa , baaaaaaaaa , . . . } sun Σ ∗ { sun , sunglasses , sunset , sunz , sunaaaaa , sunyxjshiksr . . . } ( co ) 2 Σ ∗ { coco , cocoa , coconut , cocoz , coconjsbfx , cocococovuyfvco . . . } Raquel Fernández TTTV 2014 - lecture 1a 16
Regular Expressions in Programming Languages Many programming languages such as Perl, Python, Java, or Unix tools like grep include ways to specify regular expressions. For instance: Perl notation underlying regular expression ranges: ( a | b | c | d | . . . | z ) [a-z] optionality: colo(u)?r colo ( u | ǫ ) r digits: (0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9) \d There are many other options, such as negation, upper-/lower-case, white spaces, etc. Raquel Fernández TTTV 2014 - lecture 1a 17
Regular Expressions in Programming Languages Many programming languages such as Perl, Python, Java, or Unix tools like grep include ways to specify regular expressions. For instance: Perl notation underlying regular expression ranges: ( a | b | c | d | . . . | z ) [a-z] optionality: colo(u)?r colo ( u | ǫ ) r digits: (0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9) \d There are many other options, such as negation, upper-/lower-case, white spaces, etc. These operators are “syntactic sugar”: all regular expressions can be constructed with the basic operations we have seen earlier: concatenation, disjunction, and kleene star, plus the empty string. Raquel Fernández TTTV 2014 - lecture 1a 17
Regular Expressions in Programming Languages Many programming languages such as Perl, Python, Java, or Unix tools like grep include ways to specify regular expressions. For instance: Perl notation underlying regular expression ranges: ( a | b | c | d | . . . | z ) [a-z] optionality: colo(u)?r colo ( u | ǫ ) r digits: (0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9) \d There are many other options, such as negation, upper-/lower-case, white spaces, etc. These operators are “syntactic sugar”: all regular expressions can be constructed with the basic operations we have seen earlier: concatenation, disjunction, and kleene star, plus the empty string. This syntactic sugar, however, is very useful! regex are used all over the place for string search. This won’t be covered in class – see the book and practice in the werkcolleges. Raquel Fernández TTTV 2014 - lecture 1a 17
Regular Expressions and Automata • A regular expression allows us characterise a formal language declaratively. • We can characterise the same language procedurally by means of an automaton that specifies how the language is computed. Raquel Fernández TTTV 2014 - lecture 1a 18
Regular Expressions and Automata • A regular expression allows us characterise a formal language declaratively. • We can characterise the same language procedurally by means of an automaton that specifies how the language is computed. regular expression: meo ∗ w regular expression: ( b | c ) ar Raquel Fernández TTTV 2014 - lecture 1a 18
Regular Expressions and Automata • A regular expression allows us characterise a formal language declaratively. • We can characterise the same language procedurally by means of an automaton that specifies how the language is computed. regular expression: meo ∗ w regular expression: ( b | c ) ar language: { mew , meow , meoow , meooow ... } language: { bar , car } Raquel Fernández TTTV 2014 - lecture 1a 18
Regular Expressions and Automata • A regular expression allows us characterise a formal language declaratively. • We can characterise the same language procedurally by means of an automaton that specifies how the language is computed. regular expression: meo ∗ w regular expression: ( b | c ) ar language: { mew , meow , meoow , meooow ... } language: { bar , car } o c m e w a r q 0 q 1 q 2 q 3 q 0 q 1 q 2 q 3 b Raquel Fernández TTTV 2014 - lecture 1a 18
Finite State Automata: Formal Definition Raquel Fernández TTTV 2014 - lecture 1a 19
Finite State Automata: Formal Definition We can formally specify an FSA by the following 5 parameters: • Q : a finite set of states • Σ : a finite input alphabet • q 0 ∈ Q : the start state • F : a set of final or accepting states ( F ⊆ Q ) • δ : a transition function between states that maps pairs of states and input symbols � q , i � to a new state q ′ ( Q × Σ → Q ). Raquel Fernández TTTV 2014 - lecture 1a 19
Finite State Automata: Formal Definition We can formally specify an FSA by the following 5 parameters: • Q : a finite set of states • Σ : a finite input alphabet • q 0 ∈ Q : the start state • F : a set of final or accepting states ( F ⊆ Q ) • δ : a transition function between states that maps pairs of states and input symbols � q , i � to a new state q ′ ( Q × Σ → Q ). c a r q 0 q 1 q 2 q 3 b Raquel Fernández TTTV 2014 - lecture 1a 19
Finite State Automata: Formal Definition We can formally specify an FSA by the following 5 parameters: • Q : a finite set of states • Σ : a finite input alphabet • q 0 ∈ Q : the start state • F : a set of final or accepting states ( F ⊆ Q ) • δ : a transition function between states that maps pairs of states and input symbols � q , i � to a new state q ′ ( Q × Σ → Q ). Q = { q 0 , q 1 , q 2 , q 3 } c a r Σ = { a , b , c , r } q 0 q 1 q 2 q 3 b start state: q 0 F = { q 3 } δ = { ( � q 0 , c � , q 1 ) , ( � q 0 , b � , q 1 ) , ( � q 1 , a � , q 2 ) , ( � q 2 , r � , q 3 ) } Raquel Fernández TTTV 2014 - lecture 1a 19
Finite State Automata with ǫ -transitions We may extend the transition function δ with the empty string symbol ǫ , so that δ = Q × Σ ∪ { ǫ } → Q regular expression: colo ( u | ǫ ) r Raquel Fernández TTTV 2014 - lecture 1a 20
Finite State Automata with ǫ -transitions We may extend the transition function δ with the empty string symbol ǫ , so that δ = Q × Σ ∪ { ǫ } → Q u colo r regular expression: colo ( u | ǫ ) r q 0 q 1 q 2 q 3 ǫ Raquel Fernández TTTV 2014 - lecture 1a 20
Finite State Automata with ǫ -transitions We may extend the transition function δ with the empty string symbol ǫ , so that δ = Q × Σ ∪ { ǫ } → Q u colo r regular expression: colo ( u | ǫ ) r q 0 q 1 q 2 q 3 ǫ . . . but note that every FSA with ǫ -transitions is equivalent to one without them: Raquel Fernández TTTV 2014 - lecture 1a 20
Finite State Automata with ǫ -transitions We may extend the transition function δ with the empty string symbol ǫ , so that δ = Q × Σ ∪ { ǫ } → Q u colo r regular expression: colo ( u | ǫ ) r q 0 q 1 q 2 q 3 ǫ . . . but note that every FSA with ǫ -transitions is equivalent to one without them: colo u r q 0 q 1 q 2 q 3 regular expression: colo ( u | ǫ ) r r (more on this when we discuss non-deterministic FSAs) Raquel Fernández TTTV 2014 - lecture 1a 20
Regular Expressions and FSAs Regular expressions and FSAs capture the same class of languages: any language that is the denotation of some regular expression can be computed by an FSA, and vice-versa. Raquel Fernández TTTV 2014 - lecture 1a 21
Regular Expressions and FSAs Regular expressions and FSAs capture the same class of languages: any language that is the denotation of some regular expression can be computed by an FSA, and vice-versa. Let’s see how we can build an FSA from any regular expression. Reg. exp. Languages ∅ {} empty set: empty string: ǫ { ǫ } symbol ( ∀ a ∈ Σ ): { a } a If a and b are reg exp, so are: concatenation: ab { ab } disjunction (or union): ( ab | ba ) { ab , ba } Kleene star (or closure): a ∗ { ǫ, a , aa , aaa , aaaaa , . . . } Strategy: • Base case: build an automaton for simple expressions • Inductive step: show how to reproduce each of the operations on regular expressions with an automaton Raquel Fernández TTTV 2014 - lecture 1a 21
From Reg Exp to FSA: Base Case Regular expression Corresponding FSAs a q 0 q 1 a Raquel Fernández TTTV 2014 - lecture 1a 22
From Reg Exp to FSA: Base Case Regular expression Corresponding FSAs a q 0 q 1 a ǫ q 0 q 1 ǫ Raquel Fernández TTTV 2014 - lecture 1a 22
Recommend
More recommend