Discovering Hidden Repetitions Florin Manea a l Gawrychowski b , - PowerPoint PPT Presentation

Discovering Hidden Repetitions Florin Manea a l Gawrychowski b , Robert Merca¸ s c , Dirk Nowotka a Joint work with Pawe� a Christian-Albrechts-Universit¨ at zu Kiel b Max-Planck-Institute f¨ ur Informatik Saarbr¨ ucken c Otto-von-Guericke-Universit¨ at Magdeburg Toronto, April 2013 F. Manea Hidden Repetitions Toronto, April 2013

Pseudo-repetitions A word w is repetition : w = t n , for some proper prefix t (called root) primitive word : not a repetition. f -repetition : w ∈ t { t , f ( t ) } ∗ , for some proper prefix t (called root) f -primitive word : not an f -repetition. F. Manea Hidden Repetitions Toronto, April 2013 1

Pseudo-repetitions A word w is repetition : w = t n , for some proper prefix t (called root) primitive word : not a repetition. f -repetition : w ∈ t { t , f ( t ) } ∗ , for some proper prefix t (called root) f -primitive word : not an f -repetition. Example ACGTAC primitive from the classical point of view F. Manea Hidden Repetitions Toronto, April 2013 1

Pseudo-repetitions A word w is repetition : w = t n , for some proper prefix t (called root) primitive word : not a repetition. f -repetition : w ∈ t { t , f ( t ) } ∗ , for some proper prefix t (called root) f -primitive word : not an f -repetition. Example ACGTAC primitive from the classical point of view f -primitive for morphism f with f ( A ) = T , f ( C ) = G F. Manea Hidden Repetitions Toronto, April 2013 1

Pseudo-repetitions A word w is repetition : w = t n , for some proper prefix t (called root) primitive word : not a repetition. f -repetition : w ∈ t { t , f ( t ) } ∗ , for some proper prefix t (called root) f -primitive word : not an f -repetition. Example ACGTAC primitive from the classical point of view f -primitive for morphism f with f ( A ) = T , f ( C ) = G f -power for antimorphism f with f ( A ) = T , f ( C ) = G : ACGTAC = AC · f ( AC ) · AC F. Manea Hidden Repetitions Toronto, April 2013 1

Why Pseudo-repetitions? Repetitions: central in combinatorics on words and applications! F. Manea Hidden Repetitions Toronto, April 2013 2

Why Pseudo-repetitions? Repetitions: central in combinatorics on words and applications! [Czeizler, Kari, Seki. On a special class of primitive words. TCS, 2010.] Originated from computational biology: – Watson-Crick complement: an antimorphic involution – a single-stranded DNA and its complement encode the same information. F. Manea Hidden Repetitions Toronto, April 2013 2

Why Pseudo-repetitions? Repetitions: central in combinatorics on words and applications! [Czeizler, Kari, Seki. On a special class of primitive words. TCS, 2010.] Originated from computational biology: – Watson-Crick complement: an antimorphic involution – a single-stranded DNA and its complement encode the same information. Generally: strings with intrinsic (yet, hidden) repetitive structure. F. Manea Hidden Repetitions Toronto, April 2013 2

Why Pseudo-repetitions? Repetitions: central in combinatorics on words and applications! [Czeizler, Kari, Seki. On a special class of primitive words. TCS, 2010.] Originated from computational biology: – Watson-Crick complement: an antimorphic involution – a single-stranded DNA and its complement encode the same information. Generally: strings with intrinsic (yet, hidden) repetitive structure. Such structures appear also in music: ternary song form. F. Manea Hidden Repetitions Toronto, April 2013 2

Why Pseudo-repetitions? Repetitions: central in combinatorics on words and applications! [Czeizler, Kari, Seki. On a special class of primitive words. TCS, 2010.] Originated from computational biology: – Watson-Crick complement: an antimorphic involution – a single-stranded DNA and its complement encode the same information. Generally: strings with intrinsic (yet, hidden) repetitive structure. Such structures appear also in music: ternary song form. [Kari, Seki. An improved bound for an extension of Fine and Wilf theorem, and its optimality. Fundam. Informat. 2010.] [Chiniforooshan, Kari, Xu. Pseudopower avoidance. Fundam. Informat., 2012.] [Blondin Mass´ e, Gaboury, Hall´ e. Pseudoperiodic words. DLT 2012] [M., M¨ uller, Nowotka. The avoidability of cubes under permutations. DLT 2012.] [M., Mercas, Nowotka. F & W theorem and pseudo-repetitions. MFCS 2012.] [Gawrychowski, M., Mercas, Nowotka, Tiseanu. Finding Pseudo-Repetitions. STACS 2013.] [Gawrychowski, M., Nowotka. Discovering Hidden Repetitions. CiE 2013.] F. Manea Hidden Repetitions Toronto, April 2013 2

Finding Pseudo-repetitions Problem Given w ∈ V ∗ and f , decide whether this word is an f -repetition. F. Manea Hidden Repetitions Toronto, April 2013 3

Finding Pseudo-repetitions Problem Given w ∈ V ∗ and f , decide whether this word is an f -repetition. Problem Given w ∈ V + , decide whether there exists an f : V ∗ → V ∗ and a prefix t of w such that w ∈ t { t , f ( t ) } + . F. Manea Hidden Repetitions Toronto, April 2013 3

Finding Pseudo-repetitions Problem Given w ∈ V ∗ and f , decide whether this word is an f -repetition. Problem Given w ∈ V + , decide whether there exists an f : V ∗ → V ∗ and a prefix t of w such that w ∈ t { t , f ( t ) } + . Problem Given a word w ∈ V ∗ and f , (1) Enumerate all ( i , j , ℓ ) , 1 ≤ i , j , ℓ ≤ | w | , such that there exists t with w [ i .. j ] ∈ { t , f ( t ) } ℓ . (2) Given k, enumerate all ( i , j ) , 1 ≤ i , j ≤ | w | , so there exists t with w [ i .. j ] ∈ { t , f ( t ) } k . F. Manea Hidden Repetitions Toronto, April 2013 3

Basic tools Computational model: RAM with logarithmic word size. A word u , with | u | = n , over | V | ∈ O ( n c ). Build in linear time: – suffix array data structure for u ; – data structures allowing us to answer in O (1) queries: “How long is the longest common prefix of u [ i .. n ] and u [ j .. n ]?”, denoted LCPref u ( i , j ). F. Manea Hidden Repetitions Toronto, April 2013 4

Basic tools Computational model: RAM with logarithmic word size. A word u , with | u | = n , over | V | ∈ O ( n c ). Build in linear time: – suffix array data structure for u ; – data structures allowing us to answer in O (1) queries: “How long is the longest common prefix of u [ i .. n ] and u [ j .. n ]?”, denoted LCPref u ( i , j ). In our case: w is the input word, f a fixed anti-/morphism, u = wf ( w ), | u | ∈ O ( | w | ). F. Manea Hidden Repetitions Toronto, April 2013 4

Basic tools Computational model: RAM with logarithmic word size. A word u , with | u | = n , over | V | ∈ O ( n c ). Build in linear time: – suffix array data structure for u ; – data structures allowing us to answer in O (1) queries: “How long is the longest common prefix of u [ i .. n ] and u [ j .. n ]?”, denoted LCPref u ( i , j ). In our case: w is the input word, f a fixed anti-/morphism, u = wf ( w ), | u | ∈ O ( | w | ). Constant time: does w [ i .. j ] / f ( w [ i .. j ]) occur at position s in w ? F. Manea Hidden Repetitions Toronto, April 2013 4

Basic tool: Fine and Wilf Theorem [Fine, Wilf: Uniqueness theorem for periodic functions (1965).] Theorem If α ∈ u { u , v } ∗ and β ∈ v { u , v } ∗ have a common prefix of length at least | u | + | v | − gcd( | u | , | v | ) , then u and v are powers of a common word. F. Manea Hidden Repetitions Toronto, April 2013 5

Basic tools Basic structure of pseudo-repetitions (used for y = f ( x )). Lemma (Uniqueness-1) x, y words over V ; x, y not powers of the same word, w ∈ { x , y } ∗ . There exists a unique decomposition of w in factors x , y. F. Manea Hidden Repetitions Toronto, April 2013 6

Basic tools Basic structure of pseudo-repetitions (used for y = f ( x )). Lemma (Uniqueness-1) x, y words over V ; x, y not powers of the same word, w ∈ { x , y } ∗ . There exists a unique decomposition of w in factors x , y. Lemma (Uniqueness-2) f non-erasing anti-/morphism, x , y , z words over V , f ( x ) = f ( z ) = y, { x , y } ∗ x { x , y } ∗ ∩ { z , y } ∗ z { z , y } ∗ � = ∅ . Then x = z. F. Manea Hidden Repetitions Toronto, April 2013 6

Basic tools How to find the unique decomposition? (Take y to be the longest of x and f ( x ).) Lemma (Shifts) x , y ∈ V + , w ∈ { x , y } ∗ \ { x } ∗ , | x | ≤ | y | , x, y not powers of some word. M = max { p | x p is a prefix of w } and N = max { p | x p is a prefix of y } . We have: M ≥ N. F. Manea Hidden Repetitions Toronto, April 2013 7

Basic tools How to find the unique decomposition? (Take y to be the longest of x and f ( x ).) Lemma (Shifts) x , y ∈ V + , w ∈ { x , y } ∗ \ { x } ∗ , | x | ≤ | y | , x, y not powers of some word. M = max { p | x p is a prefix of w } and N = max { p | x p is a prefix of y } . We have: M ≥ N. If M = N then w ∈ y { x , y } ∗ holds. F. Manea Hidden Repetitions Toronto, April 2013 7

Basic tools How to find the unique decomposition? (Take y to be the longest of x and f ( x ).) Lemma (Shifts) x , y ∈ V + , w ∈ { x , y } ∗ \ { x } ∗ , | x | ≤ | y | , x, y not powers of some word. M = max { p | x p is a prefix of w } and N = max { p | x p is a prefix of y } . We have: M ≥ N. If M = N then w ∈ y { x , y } ∗ holds. If M > N then exactly one of the following holds: – w ∈ x M − N y { x , y } ∗ \ x M − N − 1 yxV ∗ , – w ∈ x M − N − 1 y { x , y } + \ x M − N yV ∗ and N > 0 . F. Manea Hidden Repetitions Toronto, April 2013 7

Discovering Hidden Repetitions Florin Manea a l Gawrychowski b , - PowerPoint PPT Presentation

Discovering Hidden Repetitions Florin Manea a l Gawrychowski b , Robert Merca s c , Dirk Nowotka a Joint work with Pawe a Christian-Albrechts-Universit at zu Kiel b Max-Planck-Institute f ur Informatik Saarbr ucken c

Repetitions in WordsPart I Narad Rampersad Department of Mathematics and Statistics

The Expected Number of Repetitions in Random Words Arseny M. Shur Ural Federal University,

Finding Hidden Supernovae with Finding Hidden Supernovae with Finding Hidden Supernovae with

Hidden Markov Models Discrete Markov Processes 1 Hidden Markov Models Hidden Markov Models 2

Discovering Gods Word (Part-2) Discovering Gods Word (Part-2) Hermeneutics = The science

Outline depmixS4: an R-package for hidden Markov models Hidden Markov Models Ingmar Visser 1

Another view Hidden Input CEC is constant error Hidden carrousel No vanishing gradients

The 3 rd Covenant Re-Discovering the Word of God within the words of the Bible Re-Discovering The

~ Discovering gold in the Cortez gold-trend of Nevada ~ NUG:V NULGF:QX Discovering gold in

Discovering Mammalian Endocytic Discovering Mammalian Endocytic Pathways with High- -Throughput

DISCOVERING OF CHILDREN NEEDS DISCOVERING OF CHILDREN NEEDS AND POTENTIALS: MAP SUPPORT IN

Discovering Flight Chapter Overview Discovering Flight The Early Days of Flight Chapter

Discovering Gods Word (Part-1) Discovering Gods Word The Inspired Word (Part-1) 2

LSTMs Overview Subhashini Venugopalan Neural Networks z t Output B Hidden Hidden Input WHY

Hidden Markov Models Pratik Lahiri Introduction A hidden Markov model (HMM) is a

CSCE 471/871 Lecture 3: Markov Chains Markov Chains and and Hidden Markov Models Hidden

PERMUTATIONS AND COMBINATIONS Finite Mathematics for Data Science Statistics

STEMI Management In 2012: Facilitating Timely Reperfusion Therapy For Urban And Rural Patients

Association of changes in clinical characteristics and management with improvement in survival

Plan for Today Regular Expressions: repetition and choice Context Free Grammars let : a |

Finding Repetition Patterns in Songs BRIDGES Team SIGCSE 2019 BRIDGES (SIGCSE 2019) Song

Generalized say |S| = 91 so Counting Rules |lineups of 5 students| = 91 5 ? NO! lineups have no

Flow Control: Repetition with Loops (Alice In Action, Ch 4) 23 July 2013 Slides Credit: Joel

The repetition threshold for binary rich words Lucas Mol Joint work with James D. Currie and

Sambuz

Useful Links

Newsletter

Mail Us