taaltheorie en taalverwerking
play

Taaltheorie en Taalverwerking BSc Artificial Intelligence Raquel - PowerPoint PPT Presentation

Taaltheorie en Taalverwerking BSc Artificial Intelligence Raquel Fernndez Institute for Logic, Language, and Computation Winter 2014, lecture 1a Raquel Fernndez TTTV 2014 - lecture 1a 1 TTTV: Practical Matters Lecturer: Raquel


  1. Formal Languages: strings and alphabets A formal language is a set of strings, each string composed of symbols from a finite set called an alphabet (or a vocabulary). Examples • Let Σ 1 = { 0 , 1 } be an alphabet. Then all binary numbers are strings over Σ 1 . For instance: 01101 , 000001 , 1101 . • Let Σ 2 = { a , b , c , d , e , f , g } be an alphabet. Then bee , dad , cabbage , and face are strings over Σ 2 , as are fffff and agagag . • Let Σ 3 = { ba , ca , fa , ce , fe , ge } be an alphabet. Then face is a string over Σ 3 but bee , dad or cabbage are not. Raquel Fernández TTTV 2014 - lecture 1a 9

  2. Formal Languages: strings and alphabets A formal language is a set of strings, each string composed of symbols from a finite set called an alphabet (or a vocabulary). Examples • Let Σ 1 = { 0 , 1 } be an alphabet. Then all binary numbers are strings over Σ 1 . For instance: 01101 , 000001 , 1101 . • Let Σ 2 = { a , b , c , d , e , f , g } be an alphabet. Then bee , dad , cabbage , and face are strings over Σ 2 , as are fffff and agagag . • Let Σ 3 = { ba , ca , fa , ce , fe , ge } be an alphabet. Then face is a string over Σ 3 but bee , dad or cabbage are not. • Let Σ 4 = {♠ , △ , ♣} be an alphabet. Then ♠♠ and ♣△♣ are strings over Σ 4 . Raquel Fernández TTTV 2014 - lecture 1a 9

  3. Strings and Substrings The length of a string is the number of token symbols from the alphabet it contains. Raquel Fernández TTTV 2014 - lecture 1a 10

  4. Strings and Substrings The length of a string is the number of token symbols from the alphabet it contains. Examples • the length of face over Σ 2 = { a , b , c , d , e , f , g } is 4 Raquel Fernández TTTV 2014 - lecture 1a 10

  5. Strings and Substrings The length of a string is the number of token symbols from the alphabet it contains. Examples • the length of face over Σ 2 = { a , b , c , d , e , f , g } is 4 • the length of face over Σ 3 = { ba , ca , fa , ce , fe , ge } is Raquel Fernández TTTV 2014 - lecture 1a 10

  6. Strings and Substrings The length of a string is the number of token symbols from the alphabet it contains. Examples • the length of face over Σ 2 = { a , b , c , d , e , f , g } is 4 • the length of face over Σ 3 = { ba , ca , fa , ce , fe , ge } is 2 Raquel Fernández TTTV 2014 - lecture 1a 10

  7. Strings and Substrings The length of a string is the number of token symbols from the alphabet it contains. Examples • the length of face over Σ 2 = { a , b , c , d , e , f , g } is 4 • the length of face over Σ 3 = { ba , ca , fa , ce , fe , ge } is 2 The string of length 0 is called the empty string, denoted ǫ Raquel Fernández TTTV 2014 - lecture 1a 10

  8. Strings and Substrings The length of a string is the number of token symbols from the alphabet it contains. Examples • the length of face over Σ 2 = { a , b , c , d , e , f , g } is 4 • the length of face over Σ 3 = { ba , ca , fa , ce , fe , ge } is 2 The string of length 0 is called the empty string, denoted ǫ Given a string s , a substring of s is a string formed by taking contiguous symbols of s in the order in which they occurr in s . An initial substring is called a prefix and a final substring, a suffix. Raquel Fernández TTTV 2014 - lecture 1a 10

  9. Strings and Substrings The length of a string is the number of token symbols from the alphabet it contains. Examples • the length of face over Σ 2 = { a , b , c , d , e , f , g } is 4 • the length of face over Σ 3 = { ba , ca , fa , ce , fe , ge } is 2 The string of length 0 is called the empty string, denoted ǫ Given a string s , a substring of s is a string formed by taking contiguous symbols of s in the order in which they occurr in s . An initial substring is called a prefix and a final substring, a suffix. Examples Let unthinkable be a string over Σ = { a , b , c . . . x , y , z } Raquel Fernández TTTV 2014 - lecture 1a 10

  10. Strings and Substrings The length of a string is the number of token symbols from the alphabet it contains. Examples • the length of face over Σ 2 = { a , b , c , d , e , f , g } is 4 • the length of face over Σ 3 = { ba , ca , fa , ce , fe , ge } is 2 The string of length 0 is called the empty string, denoted ǫ Given a string s , a substring of s is a string formed by taking contiguous symbols of s in the order in which they occurr in s . An initial substring is called a prefix and a final substring, a suffix. Examples Let unthinkable be a string over Σ = { a , b , c . . . x , y , z } Then, ǫ , un , unth , unthinkable are prefixes, while ǫ , e , able , thinkable , and unthinkable are suffixes. Other substrings include nthi , inka , bl . Raquel Fernández TTTV 2014 - lecture 1a 10

  11. Some Operations on Strings Raquel Fernández TTTV 2014 - lecture 1a 11

  12. Some Operations on Strings • Concatenation: two string s 1 and s 2 over Σ can be concatenated (written one after the other) to form a new string s 1 · s 2 over Σ . Σ = { a , b } a · b = ab Raquel Fernández TTTV 2014 - lecture 1a 11

  13. Some Operations on Strings • Concatenation: two string s 1 and s 2 over Σ can be concatenated (written one after the other) to form a new string s 1 · s 2 over Σ . Σ = { a , b } a · b = ab • Exponent: we can apply an exponent operator n to a string s . The resulting string s n is obtained by concatenating s with itself n times. a 0 = ǫ , a 1 = a , a 2 = aa , a 3 = aaa . . . Raquel Fernández TTTV 2014 - lecture 1a 11

  14. Some Operations on Strings • Concatenation: two string s 1 and s 2 over Σ can be concatenated (written one after the other) to form a new string s 1 · s 2 over Σ . Σ = { a , b } a · b = ab • Exponent: we can apply an exponent operator n to a string s . The resulting string s n is obtained by concatenating s with itself n times. a 0 = ǫ , a 1 = a , a 2 = aa , a 3 = aaa . . . • Kleene star: a special exponent operator ∗ which applied to a string s denotes any string obtained by concatenating s with itself any number of times. a ∗ = ǫ or a or aa or aaa . . . Raquel Fernández TTTV 2014 - lecture 1a 11

  15. Formal Languages Σ ∗ Raquel Fernández TTTV 2014 - lecture 1a 12

  16. Formal Languages Σ ∗ denotes the set of all strings over an alphabet Σ . → note that Σ ∗ is always infinite, regardless of the number of symbols Σ contains. Raquel Fernández TTTV 2014 - lecture 1a 12

  17. Formal Languages Σ ∗ denotes the set of all strings over an alphabet Σ . → note that Σ ∗ is always infinite, regardless of the number of symbols Σ contains. We may now define a formal language over an alphabet Σ as any subset of Σ ∗ Raquel Fernández TTTV 2014 - lecture 1a 12

  18. Formal Languages Σ ∗ denotes the set of all strings over an alphabet Σ . → note that Σ ∗ is always infinite, regardless of the number of symbols Σ contains. We may now define a formal language over an alphabet Σ as any subset of Σ ∗ Examples of formal languages Let Σ = { a , b , c . . . x , y , z } . Then Σ ∗ is the set of strings over the Latin alphabet and the following subsets of Σ ∗ are possible formal languages: Raquel Fernández TTTV 2014 - lecture 1a 12

  19. Formal Languages Σ ∗ denotes the set of all strings over an alphabet Σ . → note that Σ ∗ is always infinite, regardless of the number of symbols Σ contains. We may now define a formal language over an alphabet Σ as any subset of Σ ∗ Examples of formal languages Let Σ = { a , b , c . . . x , y , z } . Then Σ ∗ is the set of strings over the Latin alphabet and the following subsets of Σ ∗ are possible formal languages: • the set of strings consisting of consonants only Raquel Fernández TTTV 2014 - lecture 1a 12

  20. Formal Languages Σ ∗ denotes the set of all strings over an alphabet Σ . → note that Σ ∗ is always infinite, regardless of the number of symbols Σ contains. We may now define a formal language over an alphabet Σ as any subset of Σ ∗ Examples of formal languages Let Σ = { a , b , c . . . x , y , z } . Then Σ ∗ is the set of strings over the Latin alphabet and the following subsets of Σ ∗ are possible formal languages: • the set of strings consisting of consonants only • the set of strings containing at least one vowel and one consonant Raquel Fernández TTTV 2014 - lecture 1a 12

  21. Formal Languages Σ ∗ denotes the set of all strings over an alphabet Σ . → note that Σ ∗ is always infinite, regardless of the number of symbols Σ contains. We may now define a formal language over an alphabet Σ as any subset of Σ ∗ Examples of formal languages Let Σ = { a , b , c . . . x , y , z } . Then Σ ∗ is the set of strings over the Latin alphabet and the following subsets of Σ ∗ are possible formal languages: • the set of strings consisting of consonants only • the set of strings containing at least one vowel and one consonant • the set of strings whose length is less than 9 symbols Raquel Fernández TTTV 2014 - lecture 1a 12

  22. Formal Languages Σ ∗ denotes the set of all strings over an alphabet Σ . → note that Σ ∗ is always infinite, regardless of the number of symbols Σ contains. We may now define a formal language over an alphabet Σ as any subset of Σ ∗ Examples of formal languages Let Σ = { a , b , c . . . x , y , z } . Then Σ ∗ is the set of strings over the Latin alphabet and the following subsets of Σ ∗ are possible formal languages: • the set of strings consisting of consonants only • the set of strings containing at least one vowel and one consonant • the set of strings whose length is less than 9 symbols • the set { one , two , three , four , five , six , seven , eight , nine , ten } Raquel Fernández TTTV 2014 - lecture 1a 12

  23. Formal Languages Σ ∗ denotes the set of all strings over an alphabet Σ . → note that Σ ∗ is always infinite, regardless of the number of symbols Σ contains. We may now define a formal language over an alphabet Σ as any subset of Σ ∗ Examples of formal languages Let Σ = { a , b , c . . . x , y , z } . Then Σ ∗ is the set of strings over the Latin alphabet and the following subsets of Σ ∗ are possible formal languages: • the set of strings consisting of consonants only • the set of strings containing at least one vowel and one consonant • the set of strings whose length is less than 9 symbols • the set { one , two , three , four , five , six , seven , eight , nine , ten } • the set of all English words Raquel Fernández TTTV 2014 - lecture 1a 12

  24. Formal Languages Σ ∗ denotes the set of all strings over an alphabet Σ . → note that Σ ∗ is always infinite, regardless of the number of symbols Σ contains. We may now define a formal language over an alphabet Σ as any subset of Σ ∗ Examples of formal languages Let Σ = { a , b , c . . . x , y , z } . Then Σ ∗ is the set of strings over the Latin alphabet and the following subsets of Σ ∗ are possible formal languages: • the set of strings consisting of consonants only • the set of strings containing at least one vowel and one consonant • the set of strings whose length is less than 9 symbols • the set { one , two , three , four , five , six , seven , eight , nine , ten } • the set of all English words • the empty set Raquel Fernández TTTV 2014 - lecture 1a 12

  25. Formal Languages How can we characterise the language(s) we are interested in? • given an alphabet Σ and the infinite set Σ ∗ of formal languages it can give rise to, how can we select a particular formal language? Raquel Fernández TTTV 2014 - lecture 1a 13

  26. Formal Languages How can we characterise the language(s) we are interested in? • given an alphabet Σ and the infinite set Σ ∗ of formal languages it can give rise to, how can we select a particular formal language? For instance: ∗ can we distinguish the set of strings of letters that constitute proper English words? ∗ can we distinguish the set of strings of words that count as well-formed sentences of Dutch? Raquel Fernández TTTV 2014 - lecture 1a 13

  27. Formal Languages How can we characterise the language(s) we are interested in? • given an alphabet Σ and the infinite set Σ ∗ of formal languages it can give rise to, how can we select a particular formal language? For instance: ∗ can we distinguish the set of strings of letters that constitute proper English words? ∗ can we distinguish the set of strings of words that count as well-formed sentences of Dutch? We have two formal mechanisms at our disposal: • formalisms (formal expressions and grammars): sets of rules • automata: computational devices for computing languages Raquel Fernández TTTV 2014 - lecture 1a 13

  28. Formalisms and Automata • Formalisms and automata allow us to distinguish a formal language of interest (a set of strings) from other possible languages over a given alphabet. ∗ they capture the patterns that characterise a language ∗ as such, they act as a definition of the language they capture • From an abstract point of view, a natural language – like Dutch or English – is a set of strings (of sounds/letters, of words, etc.) • Therefore, formalisms and automata can help us to model aspects of natural languages. Raquel Fernández TTTV 2014 - lecture 1a 14

  29. Formalisms and Automata • Formalisms and automata allow us to distinguish a formal language of interest (a set of strings) from other possible languages over a given alphabet. ∗ they capture the patterns that characterise a language ∗ as such, they act as a definition of the language they capture • From an abstract point of view, a natural language – like Dutch or English – is a set of strings (of sounds/letters, of words, etc.) • Therefore, formalisms and automata can help us to model aspects of natural languages. This week: we’ll look into one formalism to define formal languages, regular expressions, and into one type of automaton, finite state automata. Raquel Fernández TTTV 2014 - lecture 1a 14

  30. Regular Expressions Regular expressions are a formal notation for characterising sets of strings that follow a fairly simple regular pattern. Raquel Fernández TTTV 2014 - lecture 1a 15

  31. Regular Expressions Regular expressions are a formal notation for characterising sets of strings that follow a fairly simple regular pattern. We can construct regular expressions over an alphabet Σ as follows: Regular expression Languages empty set: ∅ {} { ǫ } empty string: ǫ symbol ( ∀ a ∈ Σ ): a { a } If a and b are reg exp, so are: concatenation: a · b { ab } disjunction (or union): ( a | b ) { a , b } Kleene star (or closure): { ǫ, a , aa , aaa , aaaaa , . . . } a ∗ Raquel Fernández TTTV 2014 - lecture 1a 15

  32. Regular Expressions Regular expressions are a formal notation for characterising sets of strings that follow a fairly simple regular pattern. We can construct regular expressions over an alphabet Σ as follows: Regular expression Languages empty set: ∅ {} { ǫ } empty string: ǫ symbol ( ∀ a ∈ Σ ): a { a } If a and b are reg exp, so are: concatenation: a · b { ab } disjunction (or union): ( a | b ) { a , b } Kleene star (or closure): { ǫ, a , aa , aaa , aaaaa , . . . } a ∗ • we often ignore the dot in concatenation ( a · b ) and simply write ab • disjunction (or union) may be written as a | b , a + b or a ∪ b Raquel Fernández TTTV 2014 - lecture 1a 15

  33. Regular Expressions Regular expressions are a formal notation for characterising sets of strings that follow a fairly simple regular pattern. We can construct regular expressions over an alphabet Σ as follows: Regular expression Languages empty set: ∅ {} { ǫ } empty string: ǫ symbol ( ∀ a ∈ Σ ): a { a } If a and b are reg exp, so are: concatenation: a · b { ab } disjunction (or union): ( a | b ) { a , b } Kleene star (or closure): { ǫ, a , aa , aaa , aaaaa , . . . } a ∗ • we often ignore the dot in concatenation ( a · b ) and simply write ab • disjunction (or union) may be written as a | b , a + b or a ∪ b • a + is the set of a -strings with at least one a ; Raquel Fernández TTTV 2014 - lecture 1a 15

  34. Regular Expressions Regular expressions are a formal notation for characterising sets of strings that follow a fairly simple regular pattern. We can construct regular expressions over an alphabet Σ as follows: Regular expression Languages empty set: ∅ {} { ǫ } empty string: ǫ symbol ( ∀ a ∈ Σ ): a { a } If a and b are reg exp, so are: concatenation: a · b { ab } disjunction (or union): ( a | b ) { a , b } Kleene star (or closure): { ǫ, a , aa , aaa , aaaaa , . . . } a ∗ • we often ignore the dot in concatenation ( a · b ) and simply write ab • disjunction (or union) may be written as a | b , a + b or a ∪ b • a + is the set of a -strings with at least one a ; can be used to abbreviate a ∗ a or aa ∗ Raquel Fernández TTTV 2014 - lecture 1a 15

  35. Regular Expressions Regular expressions are a formal notation for characterising sets of strings that follow a fairly simple regular pattern. We can construct regular expressions over an alphabet Σ as follows: Regular expression Languages empty set: ∅ {} { ǫ } empty string: ǫ symbol ( ∀ a ∈ Σ ): a { a } If a and b are reg exp, so are: concatenation: a · b { ab } disjunction (or union): ( a | b ) { a , b } Kleene star (or closure): { ǫ, a , aa , aaa , aaaaa , . . . } a ∗ • we often ignore the dot in concatenation ( a · b ) and simply write ab • disjunction (or union) may be written as a | b , a + b or a ∪ b • a + is the set of a -strings with at least one a ; can be used to abbreviate a ∗ a or aa ∗ • the notation Σ ∗ can be seen as abbreviating ( a | b | ... ) ∗ for any symbol a , b , . . . in Σ . Raquel Fernández TTTV 2014 - lecture 1a 15

  36. Regular Expressions Regular expressions are a formal notation for characterising sets of strings that follow a fairly simple regular pattern. We can construct regular expressions over an alphabet Σ as follows: Regular expression Languages empty set: ∅ {} { ǫ } empty string: ǫ symbol ( ∀ a ∈ Σ ): a { a } If a and b are reg exp, so are: concatenation: a · b { ab } disjunction (or union): ( a | b ) { a , b } Kleene star (or closure): { ǫ, a , aa , aaa , aaaaa , . . . } a ∗ • we often ignore the dot in concatenation ( a · b ) and simply write ab • disjunction (or union) may be written as a | b , a + b or a ∪ b • a + is the set of a -strings with at least one a ; can be used to abbreviate a ∗ a or aa ∗ • the notation Σ ∗ can be seen as abbreviating ( a | b | ... ) ∗ for any symbol a , b , . . . in Σ . • Σ n can be seen as abbreviating the concatenation of ( a | b | ... ) with itself n times Raquel Fernández TTTV 2014 - lecture 1a 15

  37. Regular Expressions Regular expressions are a formal notation for characterising sets of strings that follow a fairly simple regular pattern. We can construct regular expressions over an alphabet Σ as follows: Regular expression Languages empty set: ∅ {} { ǫ } empty string: ǫ symbol ( ∀ a ∈ Σ ): a { a } If a and b are reg exp, so are: concatenation: a · b { ab } disjunction (or union): ( a | b ) { a , b } Kleene star (or closure): { ǫ, a , aa , aaa , aaaaa , . . . } a ∗ • we often ignore the dot in concatenation ( a · b ) and simply write ab • disjunction (or union) may be written as a | b , a + b or a ∪ b • a + is the set of a -strings with at least one a ; can be used to abbreviate a ∗ a or aa ∗ • the notation Σ ∗ can be seen as abbreviating ( a | b | ... ) ∗ for any symbol a , b , . . . in Σ . • Σ n can be seen as abbreviating the concatenation of ( a | b | ... ) with itself n times • a n can be used to abbreviate the concatenation of a with itself n times Raquel Fernández TTTV 2014 - lecture 1a 15

  38. Regular Expressions: Examples What kind of strings would the language defined by each of the following regular expressions contain? Let Σ = { a , b , c , d ... x , y , z } Regular expression Language Raquel Fernández TTTV 2014 - lecture 1a 16

  39. Regular Expressions: Examples What kind of strings would the language defined by each of the following regular expressions contain? Let Σ = { a , b , c , d ... x , y , z } Regular expression Language ( ab ) ∗ c Raquel Fernández TTTV 2014 - lecture 1a 16

  40. Regular Expressions: Examples What kind of strings would the language defined by each of the following regular expressions contain? Let Σ = { a , b , c , d ... x , y , z } Regular expression Language ( ab ) ∗ c { c , abc , ababc , abababc , . . . } Raquel Fernández TTTV 2014 - lecture 1a 16

  41. Regular Expressions: Examples What kind of strings would the language defined by each of the following regular expressions contain? Let Σ = { a , b , c , d ... x , y , z } Regular expression Language ( ab ) ∗ c { c , abc , ababc , abababc , . . . } ( a | b ) ∗ c Raquel Fernández TTTV 2014 - lecture 1a 16

  42. Regular Expressions: Examples What kind of strings would the language defined by each of the following regular expressions contain? Let Σ = { a , b , c , d ... x , y , z } Regular expression Language ( ab ) ∗ c { c , abc , ababc , abababc , . . . } ( a | b ) ∗ c { c , ac , bc , aac , abc , bac , bbc , babc . . . } Raquel Fernández TTTV 2014 - lecture 1a 16

  43. Regular Expressions: Examples What kind of strings would the language defined by each of the following regular expressions contain? Let Σ = { a , b , c , d ... x , y , z } Regular expression Language ( ab ) ∗ c { c , abc , ababc , abababc , . . . } ( a | b ) ∗ c { c , ac , bc , aac , abc , bac , bbc , babc . . . } ( a ∗ c ) | ( b ∗ c ) Raquel Fernández TTTV 2014 - lecture 1a 16

  44. Regular Expressions: Examples What kind of strings would the language defined by each of the following regular expressions contain? Let Σ = { a , b , c , d ... x , y , z } Regular expression Language ( ab ) ∗ c { c , abc , ababc , abababc , . . . } ( a | b ) ∗ c { c , ac , bc , aac , abc , bac , bbc , babc . . . } ( a ∗ c ) | ( b ∗ c ) { c , ac , aac , aaac , . . . bc , bbc , bbbc , . . . } Raquel Fernández TTTV 2014 - lecture 1a 16

  45. Regular Expressions: Examples What kind of strings would the language defined by each of the following regular expressions contain? Let Σ = { a , b , c , d ... x , y , z } Regular expression Language ( ab ) ∗ c { c , abc , ababc , abababc , . . . } ( a | b ) ∗ c { c , ac , bc , aac , abc , bac , bbc , babc . . . } ( a ∗ c ) | ( b ∗ c ) { c , ac , aac , aaac , . . . bc , bbc , bbbc , . . . } me ( o ) ∗ w Raquel Fernández TTTV 2014 - lecture 1a 16

  46. Regular Expressions: Examples What kind of strings would the language defined by each of the following regular expressions contain? Let Σ = { a , b , c , d ... x , y , z } Regular expression Language ( ab ) ∗ c { c , abc , ababc , abababc , . . . } ( a | b ) ∗ c { c , ac , bc , aac , abc , bac , bbc , babc . . . } ( a ∗ c ) | ( b ∗ c ) { c , ac , aac , aaac , . . . bc , bbc , bbbc , . . . } me ( o ) ∗ w { mew , meow , meoow , meooow , meooooooooow , . . . } Raquel Fernández TTTV 2014 - lecture 1a 16

  47. Regular Expressions: Examples What kind of strings would the language defined by each of the following regular expressions contain? Let Σ = { a , b , c , d ... x , y , z } Regular expression Language ( ab ) ∗ c { c , abc , ababc , abababc , . . . } ( a | b ) ∗ c { c , ac , bc , aac , abc , bac , bbc , babc . . . } ( a ∗ c ) | ( b ∗ c ) { c , ac , aac , aaac , . . . bc , bbc , bbbc , . . . } me ( o ) ∗ w { mew , meow , meoow , meooow , meooooooooow , . . . } ba ( a ) + Raquel Fernández TTTV 2014 - lecture 1a 16

  48. Regular Expressions: Examples What kind of strings would the language defined by each of the following regular expressions contain? Let Σ = { a , b , c , d ... x , y , z } Regular expression Language ( ab ) ∗ c { c , abc , ababc , abababc , . . . } ( a | b ) ∗ c { c , ac , bc , aac , abc , bac , bbc , babc . . . } ( a ∗ c ) | ( b ∗ c ) { c , ac , aac , aaac , . . . bc , bbc , bbbc , . . . } me ( o ) ∗ w { mew , meow , meoow , meooow , meooooooooow , . . . } ba ( a ) + { baa , baaaaaa , baaaaaaaaa , . . . } Raquel Fernández TTTV 2014 - lecture 1a 16

  49. Regular Expressions: Examples What kind of strings would the language defined by each of the following regular expressions contain? Let Σ = { a , b , c , d ... x , y , z } Regular expression Language ( ab ) ∗ c { c , abc , ababc , abababc , . . . } ( a | b ) ∗ c { c , ac , bc , aac , abc , bac , bbc , babc . . . } ( a ∗ c ) | ( b ∗ c ) { c , ac , aac , aaac , . . . bc , bbc , bbbc , . . . } me ( o ) ∗ w { mew , meow , meoow , meooow , meooooooooow , . . . } ba ( a ) + { baa , baaaaaa , baaaaaaaaa , . . . } sun Σ ∗ Raquel Fernández TTTV 2014 - lecture 1a 16

  50. Regular Expressions: Examples What kind of strings would the language defined by each of the following regular expressions contain? Let Σ = { a , b , c , d ... x , y , z } Regular expression Language ( ab ) ∗ c { c , abc , ababc , abababc , . . . } ( a | b ) ∗ c { c , ac , bc , aac , abc , bac , bbc , babc . . . } ( a ∗ c ) | ( b ∗ c ) { c , ac , aac , aaac , . . . bc , bbc , bbbc , . . . } me ( o ) ∗ w { mew , meow , meoow , meooow , meooooooooow , . . . } ba ( a ) + { baa , baaaaaa , baaaaaaaaa , . . . } sun Σ ∗ { sun , sunglasses , sunset , sunz , sunaaaaa , sunyxjshiksr . . . } Raquel Fernández TTTV 2014 - lecture 1a 16

  51. Regular Expressions: Examples What kind of strings would the language defined by each of the following regular expressions contain? Let Σ = { a , b , c , d ... x , y , z } Regular expression Language ( ab ) ∗ c { c , abc , ababc , abababc , . . . } ( a | b ) ∗ c { c , ac , bc , aac , abc , bac , bbc , babc . . . } ( a ∗ c ) | ( b ∗ c ) { c , ac , aac , aaac , . . . bc , bbc , bbbc , . . . } me ( o ) ∗ w { mew , meow , meoow , meooow , meooooooooow , . . . } ba ( a ) + { baa , baaaaaa , baaaaaaaaa , . . . } sun Σ ∗ { sun , sunglasses , sunset , sunz , sunaaaaa , sunyxjshiksr . . . } ( co ) 2 Σ ∗ Raquel Fernández TTTV 2014 - lecture 1a 16

  52. Regular Expressions: Examples What kind of strings would the language defined by each of the following regular expressions contain? Let Σ = { a , b , c , d ... x , y , z } Regular expression Language ( ab ) ∗ c { c , abc , ababc , abababc , . . . } ( a | b ) ∗ c { c , ac , bc , aac , abc , bac , bbc , babc . . . } ( a ∗ c ) | ( b ∗ c ) { c , ac , aac , aaac , . . . bc , bbc , bbbc , . . . } me ( o ) ∗ w { mew , meow , meoow , meooow , meooooooooow , . . . } ba ( a ) + { baa , baaaaaa , baaaaaaaaa , . . . } sun Σ ∗ { sun , sunglasses , sunset , sunz , sunaaaaa , sunyxjshiksr . . . } ( co ) 2 Σ ∗ { coco , cocoa , coconut , cocoz , coconjsbfx , cocococovuyfvco . . . } Raquel Fernández TTTV 2014 - lecture 1a 16

  53. Regular Expressions in Programming Languages Many programming languages such as Perl, Python, Java, or Unix tools like grep include ways to specify regular expressions. For instance: Perl notation underlying regular expression ranges: ( a | b | c | d | . . . | z ) [a-z] optionality: colo(u)?r colo ( u | ǫ ) r digits: (0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9) \d There are many other options, such as negation, upper-/lower-case, white spaces, etc. Raquel Fernández TTTV 2014 - lecture 1a 17

  54. Regular Expressions in Programming Languages Many programming languages such as Perl, Python, Java, or Unix tools like grep include ways to specify regular expressions. For instance: Perl notation underlying regular expression ranges: ( a | b | c | d | . . . | z ) [a-z] optionality: colo(u)?r colo ( u | ǫ ) r digits: (0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9) \d There are many other options, such as negation, upper-/lower-case, white spaces, etc. These operators are “syntactic sugar”: all regular expressions can be constructed with the basic operations we have seen earlier: concatenation, disjunction, and kleene star, plus the empty string. Raquel Fernández TTTV 2014 - lecture 1a 17

  55. Regular Expressions in Programming Languages Many programming languages such as Perl, Python, Java, or Unix tools like grep include ways to specify regular expressions. For instance: Perl notation underlying regular expression ranges: ( a | b | c | d | . . . | z ) [a-z] optionality: colo(u)?r colo ( u | ǫ ) r digits: (0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9) \d There are many other options, such as negation, upper-/lower-case, white spaces, etc. These operators are “syntactic sugar”: all regular expressions can be constructed with the basic operations we have seen earlier: concatenation, disjunction, and kleene star, plus the empty string. This syntactic sugar, however, is very useful! regex are used all over the place for string search. This won’t be covered in class – see the book and practice in the werkcolleges. Raquel Fernández TTTV 2014 - lecture 1a 17

  56. Regular Expressions and Automata • A regular expression allows us characterise a formal language declaratively. • We can characterise the same language procedurally by means of an automaton that specifies how the language is computed. Raquel Fernández TTTV 2014 - lecture 1a 18

  57. Regular Expressions and Automata • A regular expression allows us characterise a formal language declaratively. • We can characterise the same language procedurally by means of an automaton that specifies how the language is computed. regular expression: meo ∗ w regular expression: ( b | c ) ar Raquel Fernández TTTV 2014 - lecture 1a 18

  58. Regular Expressions and Automata • A regular expression allows us characterise a formal language declaratively. • We can characterise the same language procedurally by means of an automaton that specifies how the language is computed. regular expression: meo ∗ w regular expression: ( b | c ) ar language: { mew , meow , meoow , meooow ... } language: { bar , car } Raquel Fernández TTTV 2014 - lecture 1a 18

  59. Regular Expressions and Automata • A regular expression allows us characterise a formal language declaratively. • We can characterise the same language procedurally by means of an automaton that specifies how the language is computed. regular expression: meo ∗ w regular expression: ( b | c ) ar language: { mew , meow , meoow , meooow ... } language: { bar , car } o c m e w a r q 0 q 1 q 2 q 3 q 0 q 1 q 2 q 3 b Raquel Fernández TTTV 2014 - lecture 1a 18

  60. Finite State Automata: Formal Definition Raquel Fernández TTTV 2014 - lecture 1a 19

  61. Finite State Automata: Formal Definition We can formally specify an FSA by the following 5 parameters: • Q : a finite set of states • Σ : a finite input alphabet • q 0 ∈ Q : the start state • F : a set of final or accepting states ( F ⊆ Q ) • δ : a transition function between states that maps pairs of states and input symbols � q , i � to a new state q ′ ( Q × Σ → Q ). Raquel Fernández TTTV 2014 - lecture 1a 19

  62. Finite State Automata: Formal Definition We can formally specify an FSA by the following 5 parameters: • Q : a finite set of states • Σ : a finite input alphabet • q 0 ∈ Q : the start state • F : a set of final or accepting states ( F ⊆ Q ) • δ : a transition function between states that maps pairs of states and input symbols � q , i � to a new state q ′ ( Q × Σ → Q ). c a r q 0 q 1 q 2 q 3 b Raquel Fernández TTTV 2014 - lecture 1a 19

  63. Finite State Automata: Formal Definition We can formally specify an FSA by the following 5 parameters: • Q : a finite set of states • Σ : a finite input alphabet • q 0 ∈ Q : the start state • F : a set of final or accepting states ( F ⊆ Q ) • δ : a transition function between states that maps pairs of states and input symbols � q , i � to a new state q ′ ( Q × Σ → Q ). Q = { q 0 , q 1 , q 2 , q 3 } c a r Σ = { a , b , c , r } q 0 q 1 q 2 q 3 b start state: q 0 F = { q 3 } δ = { ( � q 0 , c � , q 1 ) , ( � q 0 , b � , q 1 ) , ( � q 1 , a � , q 2 ) , ( � q 2 , r � , q 3 ) } Raquel Fernández TTTV 2014 - lecture 1a 19

  64. Finite State Automata with ǫ -transitions We may extend the transition function δ with the empty string symbol ǫ , so that δ = Q × Σ ∪ { ǫ } → Q regular expression: colo ( u | ǫ ) r Raquel Fernández TTTV 2014 - lecture 1a 20

  65. Finite State Automata with ǫ -transitions We may extend the transition function δ with the empty string symbol ǫ , so that δ = Q × Σ ∪ { ǫ } → Q u colo r regular expression: colo ( u | ǫ ) r q 0 q 1 q 2 q 3 ǫ Raquel Fernández TTTV 2014 - lecture 1a 20

  66. Finite State Automata with ǫ -transitions We may extend the transition function δ with the empty string symbol ǫ , so that δ = Q × Σ ∪ { ǫ } → Q u colo r regular expression: colo ( u | ǫ ) r q 0 q 1 q 2 q 3 ǫ . . . but note that every FSA with ǫ -transitions is equivalent to one without them: Raquel Fernández TTTV 2014 - lecture 1a 20

  67. Finite State Automata with ǫ -transitions We may extend the transition function δ with the empty string symbol ǫ , so that δ = Q × Σ ∪ { ǫ } → Q u colo r regular expression: colo ( u | ǫ ) r q 0 q 1 q 2 q 3 ǫ . . . but note that every FSA with ǫ -transitions is equivalent to one without them: colo u r q 0 q 1 q 2 q 3 regular expression: colo ( u | ǫ ) r r (more on this when we discuss non-deterministic FSAs) Raquel Fernández TTTV 2014 - lecture 1a 20

  68. Regular Expressions and FSAs Regular expressions and FSAs capture the same class of languages: any language that is the denotation of some regular expression can be computed by an FSA, and vice-versa. Raquel Fernández TTTV 2014 - lecture 1a 21

  69. Regular Expressions and FSAs Regular expressions and FSAs capture the same class of languages: any language that is the denotation of some regular expression can be computed by an FSA, and vice-versa. Let’s see how we can build an FSA from any regular expression. Reg. exp. Languages ∅ {} empty set: empty string: ǫ { ǫ } symbol ( ∀ a ∈ Σ ): { a } a If a and b are reg exp, so are: concatenation: ab { ab } disjunction (or union): ( ab | ba ) { ab , ba } Kleene star (or closure): a ∗ { ǫ, a , aa , aaa , aaaaa , . . . } Strategy: • Base case: build an automaton for simple expressions • Inductive step: show how to reproduce each of the operations on regular expressions with an automaton Raquel Fernández TTTV 2014 - lecture 1a 21

  70. From Reg Exp to FSA: Base Case Regular expression Corresponding FSAs a q 0 q 1 a Raquel Fernández TTTV 2014 - lecture 1a 22

  71. From Reg Exp to FSA: Base Case Regular expression Corresponding FSAs a q 0 q 1 a ǫ q 0 q 1 ǫ Raquel Fernández TTTV 2014 - lecture 1a 22

Recommend


More recommend