efficient data structures for the factor periodicity
play

Efficient Data Structures for the Factor Periodicity Problem Tomasz - PowerPoint PPT Presentation

Efficient Data Structures for the Factor Periodicity Problem Tomasz Kociumaka Jakub Radoszewski Wojciech Rytter Tomasz Wale University of Warsaw, Poland SPIRE 2012 Cartagena, October 23, 2012 Tomasz Kociumaka Efficient Data Structures


  1. Efficient Data Structures for the Factor Periodicity Problem Tomasz Kociumaka Jakub Radoszewski Wojciech Rytter Tomasz Waleń University of Warsaw, Poland SPIRE 2012 Cartagena, October 23, 2012 Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 1/15

  2. Factor Periodicity Problem w: a b a a b a b a a b a a b a b a a b a b a a b a a b a b a a b a a b Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 2/15

  3. Factor Periodicity Problem w: a b a a b a b a a b a a b a b a a b a b a a b a a b a b a a b a a b 11 22 Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 2/15

  4. Factor Periodicity Problem w: a b a a b a b a a b a a b a b a a b a b a a b a a b a b a a b a a b 11 22 a a b a b a a b a b a a Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 2/15

  5. Factor Periodicity Problem w: a b a a b a b a a b a a b a b a a b a b a a b a a b a b a a b a a b 11 22 a a b a b a a b a b a a Periods of w [11 .. 22] are 5 Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 2/15

  6. Factor Periodicity Problem w: a b a a b a b a a b a a b a b a a b a b a a b a a b a b a a b a a b 11 22 a a b a b a a b a b a a Periods of w [11 .. 22] are 5, 10 Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 2/15

  7. Factor Periodicity Problem w: a b a a b a b a a b a a b a b a a b a b a a b a a b a b a a b a a b 11 22 a a b a b a a b a b a a Periods of w [11 .. 22] are 5, 10, 11 Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 2/15

  8. Factor Periodicity Problem w: a b a a b a b a a b a a b a b a a b a b a a b a a b a b a a b a a b 11 22 a a b a b a a b a b a a Periods of w [11 .. 22] are 5, 10, 11 and 12. Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 2/15

  9. Factor Periodicity Problem w: a b a a b a b a a b a a b a b a a b a b a a b a a b a b a a b a a b 11 22 a a b a b a a b a b a a Periods of w [11 .. 22] are 5, 10, 11 and 12. Notation Per ( w [11 .. 22])= { 5 , 10 , 11 , 12 } , per ( w [11 .. 22]) = 5 . Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 2/15

  10. Arithmetic sets A word of length m might have Θ( m ) periods, e.g. a m . Definition A set A = { a, a + d, a + 2 d, . . . , a + kd } ⊆ Z is called arithmetic . An integer d is called the difference of A . Observe that an arithmetic set can be represented by three integers: a , d and k . Fact Let v be a word of length m . Then Per ( v ) is a union of at most log m disjoint arithmetic sets. For example Per ( w [11 .. 22]) = { 5 } ∪ { 10 , 11 , 12 } = { 5 , 10 } ∪ { 11 , 12 } . Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 3/15

  11. Formal problem statement Problem (Periods Queries) Design a data structure that for a fixed word w of length n answers the following queries. Given integers i , j ( 1 ≤ i ≤ j ≤ n ) compute Per ( w [ i..j ]) respresented as a union of O (log | w | ) arithmetic sets. Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 4/15

  12. Formal problem statement Problem (Periods Queries) Design a data structure that for a fixed word w of length n answers the following queries. Given integers i , j ( 1 ≤ i ≤ j ≤ n ) compute Per ( w [ i..j ]) respresented as a union of O (log | w | ) arithmetic sets. Definition We say that p is an (1 + δ ) -period of v if | v | ≥ (1 + δ ) p . Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 4/15

  13. Formal problem statement Problem (Periods Queries) Design a data structure that for a fixed word w of length n answers the following queries. Given integers i , j ( 1 ≤ i ≤ j ≤ n ) compute Per ( w [ i..j ]) respresented as a union of O (log | w | ) arithmetic sets. Definition We say that p is an (1 + δ ) -period of v if | v | ≥ (1 + δ ) p . Problem ( (1 + δ ) -Period Queries) Let us fix a real number δ > 0 . Design a data structure that for a fixed word w of length n answers the following queries. Given integers i, j ( 1 ≤ i ≤ j ≤ n ) compute all (1 + δ ) -periods of w [ i..j ] respresented as a union of O (1) arithmetic sets. Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 4/15

  14. Related work To the best of our knowledge no previous research on the general case of Period Queries. Even for computing the maximal period, only straightforward solutions: memorize all answers — O ( n 2 ) space, O (1) query time compute the answer from scratch for each query — no extra space, O ( n ) query time Efficient data structures for primitivity testing (generalized by (1 + δ ) -Period Queries with δ = 1 ) Karhum¨ aki, Lifshits & Rytter; CPM 2007 O ( n log n ) space, O (1) query time, Crochemore et. al; SPIRE 2010 O ( n log ε n ) space, O (log n ) query time. Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 5/15

  15. Our results Several results based on the common idea but different tools. Space All periods (1 + δ ) -periods O (log 1+ ε n ) O (log ε n ) O ( n ) O (log n (log log n ) 2 ) O ((log log n ) 2 ) O ( n log log n ) O ( n log ε n ) O (log n log log n ) O (log log n ) O ( n log n ) O (log n ) O (1) Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 6/15

  16. Our results Several results based on the common idea but different tools. Space All periods (1 + δ ) -periods O (log 1+ ε n ) O (log ε n ) O ( n ) O (log n (log log n ) 2 ) O ((log log n ) 2 ) O ( n log log n ) O ( n log ε n ) O (log n log log n ) O (log log n ) O ( n log n ) O (log n ) O (1) Standard assumption on the model of computation � 0 , 1 , . . . , n O (1) � integer alphabet, i.e. Σ ⊆ , word RAM model with w = Ω(log n ) , randomization. Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 6/15

  17. Our approach Let Borders ( v ) = {| u | : u is a border of v } . Fact Per ( v ) = | v | ⊖ Borders ( v ) = {| v | − b : b ∈ Borders ( v ) } . Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 7/15

  18. Our approach Let Borders ( v ) = {| u | : u is a border of v } . Fact Per ( v ) = | v | ⊖ Borders ( v ) = {| v | − b : b ∈ Borders ( v ) } . We compute Borders ( v ) ∩ { 2 k , . . . , 2 k +1 − 1 } separately for each k ∈ { 0 , . . . , ⌈ log | v |⌉} . 2 k − 1 2 k − 1 2 k 2 k v Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 7/15

  19. Our approach Let Borders ( v ) = {| u | : u is a border of v } . Fact Per ( v ) = | v | ⊖ Borders ( v ) = {| v | − b : b ∈ Borders ( v ) } . We compute Borders ( v ) ∩ { 2 k , . . . , 2 k +1 − 1 } separately for each k ∈ { 0 , . . . , ⌈ log | v |⌉} . 2 k − 1 2 k − 1 2 k 2 k v prefix suffix Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 7/15

  20. Our approach Let Borders ( v ) = {| u | : u is a border of v } . Fact Per ( v ) = | v | ⊖ Borders ( v ) = {| v | − b : b ∈ Borders ( v ) } . We compute Borders ( v ) ∩ { 2 k , . . . , 2 k +1 − 1 } separately for each k ∈ { 0 , . . . , ⌈ log | v |⌉} . 2 k − 1 2 k − 1 2 k 2 k border v prefix suffix Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 7/15

  21. Our approach Let Borders ( v ) = {| u | : u is a border of v } . Fact Per ( v ) = | v | ⊖ Borders ( v ) = {| v | − b : b ∈ Borders ( v ) } . We compute Borders ( v ) ∩ { 2 k , . . . , 2 k +1 − 1 } separately for each k ∈ { 0 , . . . , ⌈ log | v |⌉} . 2 k − 1 2 k − 1 2 k 2 k border v prefix suffix Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 7/15

  22. Close occurrences Let Occ ( u, v ) be the set of positions of v where an occurrence of u starts. Arithmetic sets naturally appear as the Occ sets. Fact Let | v | ≤ 2 | u | . Then Occ ( u, v ) is arithmetic. Moreover, if | Occ ( u, v ) | ≥ 3 then its difference is equal to per ( u ) . Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 8/15

  23. Close occurrences Let Occ ( u, v ) be the set of positions of v where an occurrence of u starts. Arithmetic sets naturally appear as the Occ sets. Fact Let | v | ≤ 2 | u | . Then Occ ( u, v ) is arithmetic. Moreover, if | Occ ( u, v ) | ≥ 3 then its difference is equal to per ( u ) . Case with | Occ ( u, v ) | ≤ 2 is trivial. Assume | Occ ( u, v ) | ≥ 2 Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 8/15

  24. Close occurrences Let Occ ( u, v ) be the set of positions of v where an occurrence of u starts. Arithmetic sets naturally appear as the Occ sets. Fact Let | v | ≤ 2 | u | . Then Occ ( u, v ) is arithmetic. Moreover, if | Occ ( u, v ) | ≥ 3 then its difference is equal to per ( u ) . Case with | Occ ( u, v ) | ≤ 2 is trivial. Assume | Occ ( u, v ) | ≥ 2 v u periods of u Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 8/15

Recommend


More recommend