alphabet an alphabet is a set of letters e g a b c z e g
play

Alphabet An alphabet is a set of letters . e.g., { a, b, c, . . . , z - PDF document

Alphabet An alphabet is a set of letters . e.g., { a, b, c, . . . , z } e.g., { , , . . . , } e.g., { a, b, c } e.g., { a, b } e.g., { 0 } (unary alphabet) e.g., { 0 , 1 } (binary alphabet) e.g., { N, S, E, W } (compass alphabet) We use the


  1. Alphabet An alphabet is a set of letters . e.g., { a, b, c, . . . , z } e.g., { α, β, . . . , ω } e.g., { a, b, c } e.g., { a, b } e.g., { 0 } (unary alphabet) e.g., { 0 , 1 } (binary alphabet) e.g., { N, S, E, W } (compass alphabet) We use the symbol Σ for alphabet and σ for a letter Σ = { a, b } and σ ∈ Σ, for example, σ = a A word is a sequence of letters over some alphabet. Word A word is also called a string : e.g., aab over the alphabet { a, b } e.g., aabb over the alphabet { a, b } e.g., b over the alphabet { a, b } e.g., aab also a word over { a, b, c } . 0011 is a word over the binary alphabet, we call such words “binary strings”. Length of a word A string has a length which is the number of letters in the word. For example suppose Σ = { a, b, c } and w = abc then | w | = 3 If w = aa then | w | = 2 Suppose we have the binary alphabet { 0 , 1 } . How many words of length 1? 2 (0, 1) How many words of length 2? 4 How many words of length 3? 8 Claim: the number of words is the number of letters in the alphabet raised to the number of letters in the word. That is, there are | Σ | i words of length i over Σ. Suppose our alphabet is { a, b, c } . Then the words of length 1 are: a, b, c. The words of length 2 are: aa, ab, ac, ba, bb, bc, ca, cb, cc. How many words of length 0 over the alphabet { a, b } ? 2 0 = 1 Empty word How many words of length 0 over the alphabet { a, b, c } ? 3 0 = 1 We use ǫ to represent the empty word. ǫ is a word over every alphabet Thus if we have two words, x, y , and | x | = 0 and | y | = 0 then x = y ǫ is the empty word , it is a word that has no length and therefore consists of no letters. An alphabet is a set of letters, so ǫ is never a letter in an alphabet. That is, words and letters are different types. Another point: the one-letter word “a” is not the same as the letter “a”.

  2. ALPHABET sigma = {a, b}; WORD x = empty for (INT i = 0; i < 5; ++i) { pick LETTER a in sigma x += a } print x Concatenation is a fundamental operation with words. It Concatenation consists of putting one next to the other. We denote concatenation with simply juxtaposition where unambiguous or with · if needed or wanting to be explicit. For example: Let w = abc and w ′ = aaa Then w · w ′ = abcaaa , or simply ww ′ = abcaaa Note that ww ′ � = w ′ w in general: w ′ w = aaaabc When would it be the case that ww ′ = w ′ w ? What would be the exact conditions on w ? Hint: the obvious answer is when w = w ′ , but that is not all cases. Suppose you have the words a and b in infinite supply. How many different ways can you concatenate them to create abab ? cat(a, b) = ab cat(ab , a) = aba cat(aba , b) = abab x = cat(a, b) x = cat(x, a) x = cat(x, b) print x cat(a, b) = ab cat(a, b) = ab cat(ab, ab) x = cat(a, b) y = cat(a, b) x = cat(x, y) print x cat(a, b) = ab cat(b, ab) = bab cat (a, bab) = abab

  3. x = cat(a, b) x = cat(b, x) x = cat(a, x) print x and two more (what are they?) abab ab ab a b a b abab aba b ab a a b abab a bab b ab a b Empty concatenation What happens when we concatenation with empty word? cat( ǫ , aba) = aba In general ǫ · w = w · ǫ = w . This also means that ǫǫ = ǫ . Repeated concatenations Instead of writing out all the words or letters through juxtaposition, we denote repeated concatenations with an exponent. w 2 = ww w 3 = www So w i is i copies of w juxtaposed. w 0 is zero copies of w , which is ǫ . ∀ w , w is a word, w 0 = ǫ Think of this like an algorithm:

  4. WORD repeated_concatenation(WORD w, INT amount) { WORD retval = ""; // i.e, epsilon for (INT i = 0; i < amount; ++i) { retval = cat(retval, w); } return retval; } w i w j = w j w i = w i + j ( w i ) j = w ij ( xy ) i = xyxyxy . . . xy In general, ( xy ) i � = x i y i . In some cases like ( ww ) 2 = wwww = w 2 w 2 it is true but in others like ( ab ) 2 = abab � = aabb it is not. Just like we concatenate words and letters, Concatenations of alphabets we can concatenate sets of words and sets of letters. In this case it is all possible combinations. Σ = { a, b } Σ 1 = { a, b } Σ 2 = Σ · Σ = { a, b } · { a, b } = { aa, ab, ba, bb } Σ 0 = { ǫ } Σ 2 · Σ = Σ 3 Σ i · Σ 0 = Σ i Σ 2 · Σ 0 = { aa, ab, ba, bb } · { ǫ } = { aaǫ, abǫ, baǫ, bbǫ } = { aa, ab, ba, bb } = Σ 2 Σ i = { σ 1 σ 2 . . . σ i : ∀ 1 ≤ j ≤ i ⇒ σ j ∈ Σ } Σ is an alphabet and Σ 1 is a set of words of length 1 over Σ. Σ i is the set of words of length i . So what is Σ 0 ∪ Σ 1 ∪ Σ 2 . . . mean? Σ 0 ∪ Σ 1 ∪ Σ 2 . . . = � i ≥ 0 Σ i which is all words of length 0 over Σ and all words of length 1 and length 2 etc. This is simply all words of any length, thus all possible words you can create using letters from Σ. This is a useful notion so we use Σ ⋆ as a shorthand for � i ≥ 0 Σ i . This is known as the Kleene star or Kleene closure. Defining the length of a word How do we define the length of a word? Informally “the number of letters in it”. But how do we measure that? Suppose we want an algorithm to determine the “number of letters”, but we don’t have a length() function. Suppose we can only do two things: remove the first letter, and check if it is empty. procedural version

  5. INT length(WORD w) { INT i = 0; while (w != empty) { w.pop_front(); // w = w[1:] ++i; } return i; } recursive version INT length(WORD w) { // base case if (w == empty) return 0; // recursive case return 1 + length(w.pop_front()); } Suppose that w ∈ Σ ⋆ . If w = ǫ it has length zero. Otherwise w � = ǫ so | w | ≥ 1 and thus w has at least one letter. That means that we can rewrite w as ax where a ∈ Σ and x ∈ Σ ⋆ . x may now be ǫ (what would w be if this is true) but a is always a single letter. So we can consider | w | = | ax | = 1 + | x | . Since | x | is smaller than | w | , we can keep repeating this until we reach ǫ . As in: | abc | = 1 + | bc | = 1 + 1 + | c | = 1 + 1 + 1 + | ǫ | = 1 + 1 + 1 + 0 = 3 These are called recursive definitions and we use them frequently. It makes it easier to prove things. It relates to inductive proofs (base case and inductive case). Here the definition has a base case and a recursive case. Recursive definition of the length of a word: | w | = 0 if w = ǫ | w | = 1 + | x | if w = ax where a ∈ Σ and x ∈ Σ ⋆ Note that for all possible w ∈ Σ ⋆ , only one of these two cases are valid. Sometimes we want to count the number of occurances of Counting letters a particular letter (e.g., a), We use n σ : Σ ⋆ → N as a function to count the number of σ letters in the input word. n a ( abc ) = 1 n b ( bbc ) = 2 n a ( bbb ) = 0 What is a recursive definition for n a ? Let’s start with algorithm.

  6. INPUT: some word w OUTPUT: a number n such that n_a(w)=n int i = 0; while (w != empty) { if (w.front() == a) { // w[0] == a ++i; } w = w.pop_front(); // w = w[1:] } return i; The alg is similar to the length calculation except there is a condition on whether to “add one”. Let’s rewrite as a recursive algorithm: INT n_a(WORD w) { if (w.empty()) return 0; if (w.front() == ’a’) return 1 + n_a(w.pop_front()); return n_a(w.pop_front()); } Recursive definition for number of letters. There are three cases, one base and two recursive. Base case is empty word, which has no letters and thus no letter σ . n σ ( w ) = 0 if w = ǫ Recursive case have at least one letter, which may or may not be σ : n σ ( σx ) = 1 + n σ ( x ) if w = σx where σ ∈ Σ and x ∈ Σ ⋆ n σ ( σ ′ x ) = 0 + n σ ( x ) if w = σ ′ x and σ ′ ∈ Σ \ { σ } and x ∈ Σ ⋆ The reverse of a word is the word with its letters written Reverse of a word backwards. We use w r to express this. So ( abc ) r = cba A word is a palindrome if x = x r e.g., redivider, aabbaa, aba, a, ǫ . The empty word and a single letter are always palindrome. Exercises: give a recursive definition for w r and the function pal(w) , which returns true if the word is a palindrome (hint: there are 2 base cases and 2 recursive cases). A prefix is a string that occurs at the start of a word. Substring operations x is a prefix of z if ∃ y ∈ Σ ⋆ such that z = xy How many prefices does a have? ǫ is a prefix of a a is a prefix of a because a = aǫ meaning y = ǫ for z = xy . A strict prefix is a prefix that has at least one letter less than the word If our word was aabb then we have the prefices: { ǫ, a, aa, aab, aabb } if our word was aabb then we have the strict prefices: { ǫ, a, aa, aab }

Recommend


More recommend