Efficient Seeds Computation Revisited Michalis Christou, Maxime Crochemore, Costas S. Iliopoulos, Marcin Kubica, Solon P. Pissis, Jakub Radoszewski , Wojciech Rytter, Bartosz Szreder, Tomasz Waleń King’s College London & University of Warsaw CPM Mondello, Palermo, June 29, 2011 1/1
Why quasiperiodicity? Periodicity: a b a a a b a a a b a a a b a a 2/1
Why quasiperiodicity? Periodicity: a b a a a b a a a b a a a b a a a b 2/1
Why quasiperiodicity? Periodicity: a b a a a b a a a b a a a b a a a b a b a a b a a b a a a b a a a b 2/1
Why quasiperiodicity? Periodicity: a b a a a b a a a b a a a b a a a b Quasiperiodicity: a b a a b a a b a a a b a a a b 2/1
Types of quasiperiodicity a b a a b a a b a a a b a a 3/1
Types of quasiperiodicity Cover: a b a a b a a b a a a b a a every letter of the string is covered by some occurrence of the cover 3/1
Types of quasiperiodicity Cover: a b a a b a a b a a a b a a every letter of the string is covered by some occurrence of the cover 3/1
Types of quasiperiodicity a a a b a a b a a b a a a b a a b a 3/1
Types of quasiperiodicity Seed: a a a b a a b a a b a a a b a a b a every letter of the string is covered by some occurrence of the seed, occurrences may be external 3/1
Main related problems Problem: Cover computation find the shortest cover (all the covers) of a string u 4/1
Main related problems Problem: Cover computation find the shortest cover (all the covers) of a string u Solution: Apostolico, Farach, and Iliopoulos (1991), Moore and Smyth (1994), O ( n ) time algorithms. 4/1
Main related problems Problem: Cover computation find the shortest cover (all the covers) of a string u Solution: Apostolico, Farach, and Iliopoulos (1991), Moore and Smyth (1994), O ( n ) time algorithms. Harder problem: Cover array compute C [ 1 . . n ] , where C [ i ] is the shortest cover of the string u [ 1 . . i ] 4/1
Main related problems Problem: Cover computation find the shortest cover (all the covers) of a string u Solution: Apostolico, Farach, and Iliopoulos (1991), Moore and Smyth (1994), O ( n ) time algorithms. Harder problem: Cover array compute C [ 1 . . n ] , where C [ i ] is the shortest cover of the string u [ 1 . . i ] Solution: Breslauer (1992), O ( n ) time algorithm. 4/1
Main related problems Problem: Cover computation find the shortest cover (all the covers) of a string u Solution: Apostolico, Farach, and Iliopoulos (1991), Moore and Smyth (1994), O ( n ) time algorithms. Harder problem: Cover array compute C [ 1 . . n ] , where C [ i ] is the shortest cover of the string u [ 1 . . i ] Solution: Breslauer (1992), O ( n ) time algorithm. Another problem: Seed computation find the shortest seed (all the seeds) of a string 4/1
Main related problems Problem: Cover computation find the shortest cover (all the covers) of a string u Solution: Apostolico, Farach, and Iliopoulos (1991), Moore and Smyth (1994), O ( n ) time algorithms. Harder problem: Cover array compute C [ 1 . . n ] , where C [ i ] is the shortest cover of the string u [ 1 . . i ] Solution: Breslauer (1992), O ( n ) time algorithm. Another problem: Seed computation find the shortest seed (all the seeds) of a string Solution: Iliopoulos, Moore & Park (1996), O ( n log n ) time algorithm. 4/1
Main contributions 1. Left seeds We introduce a natural intermediate notion between seeds and covers and give O ( n ) time algorithms for computing the shortest left seed and the left seed array. 2. Seed array We show how to compute the seed array in O ( n 2 ) time. 3. New (simpler) seeds computation We present a novel approach to seed computation. Our algorithm works in o ( n log n ) time for some cases. 5/1
Left/right seeds Cover: Seed: a b a a b a a b a a a b a a a a a b a a b a a b a a a b a a b a 6/1
Left/right seeds Cover: Seed: a b a a b a a b a a a b a a a a a b a a b a a b a a a b a a b a a b a a b a a b a a a b a a b a 6/1
Left/right seeds Cover: Seed: a b a a b a a b a a a b a a a a a b a a b a a b a a a b a a b a Left seed: a b a a b a a b a a a b a a b a is a prefix of the string, however its occurrence may exceed the right end of the string 6/1
Left/right seeds Cover: Seed: a b a a b a a b a a a b a a a a a b a a b a a b a a a b a a b a a a b a a b a a b a a a b a a 6/1
Left/right seeds Cover: Seed: a b a a b a a b a a a b a a a a a b a a b a a b a a a b a a b a Right seed: a a b a a b a a b a a a b a a is a suffix of the string, however its occurrence may exceed the left end of the string 6/1
Left/right seeds Cover: Seed: a b a a b a a b a a a b a a a a a b a a b a a b a a a b a a b a Left seed: a b a a b a a b a a a b a a b a is a prefix of the string, however its occurrence may exceed the right end of the string 6/1
Left seeds computation Problem: Left seed computation find the shortest lest seed of a string u 7/1
Left seeds computation Problem: Left seed computation find the shortest lest seed of a string u Harder problem: Left seed array compute LSeed [ 1 . . n ] , where LSeed [ i ] is the shortest left seed of the string u [ 1 . . i ] 7/1
Left seeds computation Problem: Left seed computation find the shortest lest seed of a string u Harder problem: Left seed array compute LSeed [ 1 . . n ] , where LSeed [ i ] is the shortest left seed of the string u [ 1 . . i ] Solution: We present O ( n ) time algorithms solving both the problems. 7/1
Left seeds computation The period of a string We say that a positive integer p is the (shortest) period of a string u = u 1 . . . u n (notation: p = per ( u ) ) if p is the smallest positive number, such that u i = u i + p , for i = 1 , . . . , n − p . 8/1
Left seeds computation The period of a string We say that a positive integer p is the (shortest) period of a string u = u 1 . . . u n (notation: p = per ( u ) ) if p is the smallest positive number, such that u i = u i + p , for i = 1 , . . . , n − p . Lemma. The length of the shortest left seed of u equals: min { C [ j ] : per ( u ) ≤ j ≤ | u |} where C [ 1 . . n ] is the cover array of u . 8/1
Left seeds computation The period of a string We say that a positive integer p is the (shortest) period of a string u = u 1 . . . u n (notation: p = per ( u ) ) if p is the smallest positive number, such that u i = u i + p , for i = 1 , . . . , n − p . Lemma. The length of the shortest left seed of u equals: min { C [ j ] : per ( u ) ≤ j ≤ | u |} where C [ 1 . . n ] is the cover array of u . Corollary. The left seed of a string can be computed in O ( n ) time. 8/1
The proof Lemma. The length of the shortest left seed of u equals: min { C [ j ] : per ( u ) ≤ j ≤ | u |} where C [ 1 . . n ] is the cover array of u . Proof. Assume that s is a left seed of u . s s s s u 1 n 9/1
The proof Lemma. The length of the shortest left seed of u equals: min { C [ j ] : per ( u ) ≤ j ≤ | u |} where C [ 1 . . n ] is the cover array of u . Proof. Assume that s is a left seed of u . s s s s u 1 n j Then s is a cover of u [ 1 . . j ] for some j . 9/1
The proof Lemma. The length of the shortest left seed of u equals: min { C [ j ] : per ( u ) ≤ j ≤ | u |} where C [ 1 . . n ] is the cover array of u . Proof. Assume that s is a left seed of u . s s s s u 1 n j Then s is a cover of u [ 1 . . j ] for some j . The string u has a border ≥ n − j , hence per ( u ) ≤ j . 9/1
The proof Lemma. The length of the shortest left seed of u equals: min { C [ j ] : per ( u ) ≤ j ≤ | u |} where C [ 1 . . n ] is the cover array of u . Proof. Assume that s is a left seed of u . s s s s u 1 n j Then s is a cover of u [ 1 . . j ] for some j . The string u has a border ≥ n − j , hence per ( u ) ≤ j . (Recall that per ( u ) + border ( u ) = | u | ). 9/1
The proof Lemma. The length of the shortest left seed of u equals: min { C [ j ] : per ( u ) ≤ j ≤ | u |} where C [ 1 . . n ] is the cover array of u . Proof (cont). We have proved that the shortest left seed of u corresponds to one of the covers C [ j ] for j ≥ per ( u ) . We need to show that each value C [ j ] for j ≥ per ( u ) corresponds to some left seed of u . 10/1
The proof Lemma. The length of the shortest left seed of u equals: min { C [ j ] : per ( u ) ≤ j ≤ | u |} where C [ 1 . . n ] is the cover array of u . Proof (cont). Assume that s is a cover of v = u [ 1 . . j ] for some j ≥ per ( u ) . v s 1 n j per ( u ) 11/1
The proof Lemma. The length of the shortest left seed of u equals: min { C [ j ] : per ( u ) ≤ j ≤ | u |} where C [ 1 . . n ] is the cover array of u . Proof (cont). Assume that s is a cover of v = u [ 1 . . j ] for some j ≥ per ( u ) . v v v s 1 n j per ( u ) Then v is a left seed of u . 11/1
The proof Lemma. The length of the shortest left seed of u equals: min { C [ j ] : per ( u ) ≤ j ≤ | u |} where C [ 1 . . n ] is the cover array of u . Proof (cont). Assume that s is a cover of v = u [ 1 . . j ] for some j ≥ per ( u ) . v v v s 1 n j per ( u ) Then v is a left seed of u . Hence, s is also a left seed of u . 11/1
Left seeds computation Lemma. The length of the shortest left seed of u equals: min { C [ j ] : per ( u ) ≤ j ≤ | u |} where C [ 1 . . n ] is the cover array of u . 12/1
Recommend
More recommend