the expected number of repetitions in random words
play

The Expected Number of Repetitions in Random Words Arseny M. Shur - PowerPoint PPT Presentation

The Expected Number of Repetitions in Random Words Arseny M. Shur Ural Federal University, Ekaterinburg, Russia Shanghai, April 25, 2015 A. M. Shur (Ural Federal U) Expected Number of Repetitions Shanghai, April 25, 2015 1 / 19


  1. The Expected Number of Repetitions in Random Words Arseny M. Shur Ural Federal University, Ekaterinburg, Russia Shanghai, April 25, 2015 A. M. Shur (Ural Federal U) Expected Number of Repetitions Shanghai, April 25, 2015 1 / 19

  2. Combinatorics on Words A discipline that studies properties of sequences of symbols Born: 1906 A. Thue. Über unendliche Zeichenreihen. Norske vid. Selsk. Skr. Mat. Nat. Kl. 7, 1–22 (1906) Named: 1983 M. Lothaire. Combinatorics on Words. Vol. 17 of Encyclopedia of Mathemetics and Its Applications (1983) A. M. Shur (Ural Federal U) Expected Number of Repetitions Shanghai, April 25, 2015 2 / 19

  3. Combinatorics on Words A discipline that studies properties of sequences of symbols Born: 1906 A. Thue. Über unendliche Zeichenreihen. Norske vid. Selsk. Skr. Mat. Nat. Kl. 7, 1–22 (1906) Named: 1983 M. Lothaire. Combinatorics on Words. Vol. 17 of Encyclopedia of Mathemetics and Its Applications (1983) Sources: Algebra (terms) Symbolic dynamics (trajectories) Grammars and rewriting systems Algorithms for sequential data Biological strings . . . A. M. Shur (Ural Federal U) Expected Number of Repetitions Shanghai, April 25, 2015 2 / 19

  4. Palindromes and Squares A palindrome is a word which is equal to its reversal, like a b o p o b a i h h i A. M. Shur (Ural Federal U) Expected Number of Repetitions Shanghai, April 25, 2015 3 / 19

  5. Palindromes and Squares A palindrome is a word which is equal to its reversal, like a b o p o b a i h h i Palindromes are one of the most simple and common repetitions in words, along with squares, which are words consisting of two equal parts, like c o u s c o u s A. M. Shur (Ural Federal U) Expected Number of Repetitions Shanghai, April 25, 2015 3 / 19

  6. Palindromes and Squares A palindrome is a word which is equal to its reversal, like a b o p o b a i h h i Palindromes are one of the most simple and common repetitions in words, along with squares, which are words consisting of two equal parts, like c o u s c o u s Palindromes are in some sense counterparts of squares: in a sequence of states of some finite-state machine, a square indicates repeated behaviour, while a palindrome shows that the machine reversed back to front; among the basic data structures, palindromes correspond to stacks, while squares correspond to queues; as a consequence, the language of all palindromes is context-free, while the language of all squares is not. A. M. Shur (Ural Federal U) Expected Number of Repetitions Shanghai, April 25, 2015 3 / 19

  7. Counting Factors We consider finite words over finite ( k -letter) alphabets; we write w = w [ 1 .. n ] for a word of length n ; words of the form w [ i .. j ] are factors of w . Normally, n is assumed big, and k is fixed. A. M. Shur (Ural Federal U) Expected Number of Repetitions Shanghai, April 25, 2015 4 / 19

  8. Counting Factors We consider finite words over finite ( k -letter) alphabets; we write w = w [ 1 .. n ] for a word of length n ; words of the form w [ i .. j ] are factors of w . Normally, n is assumed big, and k is fixed. A lot of results on the possible number of distinct palindromic factors and square factors in a word: max number of palindromes is n (Droubay, Pirillo, 2001) max number of squares is between n − O ( √ n ) and 2 n − O ( log n ) (Ilie, 2007) min number of palindromes is k for k ≥ 3 and 8 for k = 2, n ≥ 9 min number of squares is 0 for k ≥ 3 (Thue, 1912) and 3 for k = 2 (Fraenkel, Simpson, 1995) any number of palindromes between min and max is available for k ≥ 4, a word can contain k palindromes and 0 squares simultaneously A. M. Shur (Ural Federal U) Expected Number of Repetitions Shanghai, April 25, 2015 4 / 19

  9. Problems and Simple Answers Problems Find the expected number of - distinct palindromes - distinct squares occurring as factors in a random k -ary word. A. M. Shur (Ural Federal U) Expected Number of Repetitions Shanghai, April 25, 2015 5 / 19

  10. Problems and Simple Answers Problems Find the expected number of - distinct palindromes - distinct squares occurring as factors in a random k -ary word. Theorem The expected number of distinct palindromic factors in a random word of length n over a fixed nontrivial alphabet is Θ( √ n ) . A. M. Shur (Ural Federal U) Expected Number of Repetitions Shanghai, April 25, 2015 5 / 19

  11. Problems and Simple Answers Problems Find the expected number of - distinct palindromes - distinct squares occurring as factors in a random k -ary word. Theorem The expected number of distinct palindromic factors in a random word of length n over a fixed nontrivial alphabet is Θ( √ n ) . As a by-product of the technique used, we also get Theorem The expected number of distinct square factors in a random word of length n over a fixed nontrivial alphabet is Θ( √ n ) . A. M. Shur (Ural Federal U) Expected Number of Repetitions Shanghai, April 25, 2015 5 / 19

  12. Some Explanations Let k (alphabetic size) be fixed; E ( n ) is the expectation studied. The expected number E m ( n ) of distinct palindromic factors of length m in a random word of length n is not greater than ⋆ the total number of k -ary palindromes of length m ; ⋆ the expected number of occurrences of palindromic factors of length m in a random word of length n . A. M. Shur (Ural Federal U) Expected Number of Repetitions Shanghai, April 25, 2015 6 / 19

  13. Some Explanations Let k (alphabetic size) be fixed; E ( n ) is the expectation studied. The expected number E m ( n ) of distinct palindromic factors of length m in a random word of length n is not greater than ⋆ the total number of k -ary palindromes of length m ; blue ⋆ the expected number of occurrences of palindromic factors of length m in a random word of length n . red Length 2 m Length 2 m + 1 k m k m + 1 n − 2 m + 1 n − 2 m k m k m p e p o m m A. M. Shur (Ural Federal U) Expected Number of Repetitions Shanghai, April 25, 2015 6 / 19

  14. Some Explanations (Ctd) Length 2 m Length 2 m + 1 k m k m + 1 n − 2 m + 1 n − 2 m k m k m p e p o m m E ( n ) = � E m ( n ) is bounded by the total area under the graphs; since all graphs are those of exponents, the area under each pair of graphs equals to the height of the highest point up to a constant multiple; thus, E ( n ) = O ( √ n ) ; some additional considerations show that the upper bound is sharp up to a constant multiple, implying E ( n ) = Θ( √ n ) . A. M. Shur (Ural Federal U) Expected Number of Repetitions Shanghai, April 25, 2015 7 / 19

  15. Some Explanations (Ctd) Length 2 m Length 2 m + 1 k m k m + 1 n − 2 m + 1 n − 2 m k m k m p e p o m m E ( n ) = � E m ( n ) is bounded by the total area under the graphs; since all graphs are those of exponents, the area under each pair of graphs equals to the height of the highest point up to a constant multiple; thus, E ( n ) = O ( √ n ) ; some additional considerations show that the upper bound is sharp up to a constant multiple, implying E ( n ) = Θ( √ n ) . A. M. Shur (Ural Federal U) Expected Number of Repetitions Shanghai, April 25, 2015 7 / 19

  16. Some Explanations (Ctd) Length 2 m Length 2 m + 1 k m k m + 1 n − 2 m + 1 n − 2 m k m k m p e p o m m E ( n ) = � E m ( n ) is bounded by the total area under the graphs; since all graphs are those of exponents, the area under each pair of graphs equals to the height of the highest point up to a constant multiple; thus, E ( n ) = O ( √ n ) ; some additional considerations show that the upper bound is sharp up to a constant multiple, implying E ( n ) = Θ( √ n ) . A. M. Shur (Ural Federal U) Expected Number of Repetitions Shanghai, April 25, 2015 7 / 19

  17. Some Explanations (Ctd) Length 2 m Length 2 m + 1 k m k m + 1 n − 2 m + 1 n − 2 m k m k m p e p o m m E ( n ) = � E m ( n ) is bounded by the total area under the graphs; since all graphs are those of exponents, the area under each pair of graphs equals to the height of the highest point up to a constant multiple; thus, E ( n ) = O ( √ n ) ; some additional considerations show that the upper bound is sharp up to a constant multiple, implying E ( n ) = Θ( √ n ) . A. M. Shur (Ural Federal U) Expected Number of Repetitions Shanghai, April 25, 2015 7 / 19

  18. Dependence on k Refinement of the obtained result: consider E ( n , k ) instead of E ( n ) and find the dependence of the constant in the Θ( √ n ) expression on k . A. M. Shur (Ural Federal U) Expected Number of Repetitions Shanghai, April 25, 2015 8 / 19

  19. Dependence on k Refinement of the obtained result: consider E ( n , k ) instead of E ( n ) and find the dependence of the constant in the Θ( √ n ) expression on k . intuition: more letters – more luck needed to get a palindrome; A. M. Shur (Ural Federal U) Expected Number of Repetitions Shanghai, April 25, 2015 8 / 19

  20. Dependence on k Refinement of the obtained result: consider E ( n , k ) instead of E ( n ) and find the dependence of the constant in the Θ( √ n ) expression on k . intuition: more letters – more luck needed to get a palindrome; √ kn ; broken by the picture: the peak on the right graph is ≈ Length 2 m Length 2 m + 1 k m k m + 1 n − 2 m + 1 n − 2 m k m k m p e p o m m A. M. Shur (Ural Federal U) Expected Number of Repetitions Shanghai, April 25, 2015 8 / 19

Recommend


More recommend