Tighter Bounds for the Sum of Irreducible LCP Values Juha Kärkkäinen 1 Dominik Kempa 1 Marcin Piątkowski 2 1 University of Helsinki 2 Nicolaus Copernicus University
Outline 1 Cyclic words 2 Irreducible LCP values 3 Upper bound for the sum of irreducible values 4 Lower bound for the sum of irreducible values J. Kärkkäinen, D. Kempa, M. Piątkowski Tighter Bounds for the Sum of Irreducible LCP Values
Cyclic suffixes W = { { v 1 , v 2 , v 3 } } = { { aab , aab , ab } } suf ( W ) � 1 , 0 � a a b a a b · · · � 1 , 1 � a b a a b a · · · a 0 a 1 � 1 , 2 � b a a b a a · · · � 2 , 0 � a a b a a b · · · a n a 2 � 2 , 1 � a b a a b a · · · � 2 , 2 � b a a b a a · · · . . . a 3 � 3 , 0 � a b a b a b · · · � 3 , 1 � b a b a b a · · · J. Kärkkäinen, D. Kempa, M. Piątkowski Tighter Bounds for the Sum of Irreducible LCP Values
Cyclic suffix array W = { { v 1 , v 2 , v 3 } } = { { aab , aab , ab } } SA W suf ( W ) � 1 , 0 � a a b a a b · · · � 2 , 0 � a a b a a b · · · a 0 a 1 � 1 , 1 � a b a a b a · · · � 2 , 1 � a b a a b a · · · a n a 2 � 3 , 0 � a b a b a b · · · � 1 , 2 � b a a b a a · · · . . . a 3 � 2 , 2 � b a a b a a · · · � 3 , 1 � b a b a b a · · · J. Kärkkäinen, D. Kempa, M. Piątkowski Tighter Bounds for the Sum of Irreducible LCP Values
Cyclic suffix array W = { { v 1 , v 2 , v 3 } } = { { aab , aab , ab } } SA W suf ( W ) � 1 , 0 � a a b a a b · · · � 2 , 0 � a a b a a b · · · a 0 a 1 � 1 , 1 � a b a a b a · · · � 2 , 1 � a b a a b a · · · a n a 2 � 3 , 0 � a b a b a b · · · � 1 , 2 � b a a b a a · · · . . . a 3 � 2 , 2 � b a a b a a · · · � 3 , 1 � b a b a b a · · · J. Kärkkäinen, D. Kempa, M. Piątkowski Tighter Bounds for the Sum of Irreducible LCP Values
Longest common prefix and distinguishing prefix arrays W = { { v 1 , v 2 , v 3 } } = { { aab , aab , ab } } SA W suf ( W ) LCP W DP W � 1 , 0 � a a b a a b · · · − − � 2 , 0 � a a b a a b · · · ∞ ∞ � 1 , 1 � a b a a b a · · · 1 2 � 2 , 1 � a b a a b a · · · ∞ ∞ � 3 , 0 � a b a b a b · · · 3 4 � 1 , 2 � b a a b a a · · · 0 1 � 2 , 2 � b a a b a a · · · ∞ ∞ � 3 , 1 � b a b a b a · · · 2 3 J. Kärkkäinen, D. Kempa, M. Piątkowski Tighter Bounds for the Sum of Irreducible LCP Values
Burrows-Wheeler transform W = { { v 1 , v 2 , v 3 } } = { { aab , aab , ab } } SA W suf ( W ) BWT ( W ) LCP W DP W � 1 , 0 � a a b a a b · · · − − b � 2 , 0 � a a b a a b · · · ∞ ∞ b � 1 , 1 � a b a a b a · · · 1 2 a � 2 , 1 � a b a a b a · · · ∞ ∞ a � 3 , 0 � a b a b a b · · · 3 4 b � 1 , 2 � b a a b a a · · · 0 1 a � 2 , 2 � b a a b a a · · · ∞ ∞ a � 3 , 1 � b a b a b a · · · 2 3 a J. Kärkkäinen, D. Kempa, M. Piątkowski Tighter Bounds for the Sum of Irreducible LCP Values
Cyclic equivalence Cyclic equivalence Two multisets of words V and W are cyclically equivalent if suf ( V ) = suf ( W ) . Example multiset Equivalence class W = { { aab , aab , ab } } { { aab , aab , ab } } { { aab , aab , ba } } { { aba , aab , ab } } { { aba , aab , ba } } aabaabaabaab · · · { { baa , aab , ab } } { { baa , aab , ba } } abaabaabaaba · · · { { aab , aba , ab } } { { aab , aba , ba } } baabaabaabaa · · · . . aabaabaabaab · · · . . . . abaabaabaaba · · · { { aabaab , ab } } { { aabaab , ba } } baabaabaabaa · · · { { abaaba , ab } } { { abaaba , ba } } abababababab · · · { { baabaa , ab } } { { baabaa , ba } } babababababa · · · J. Kärkkäinen, D. Kempa, M. Piątkowski Tighter Bounds for the Sum of Irreducible LCP Values
Cyclic equivalence Lemma } s Let W = { { w i } i = 1 be a multiset of cyclic words. Then: There exists a set of cyclic words V = { v i } t i = 1 such that 1 suf ( W ) = suf ( V ) . } p There exists a multiset of primitive cyclic words U = { { u i } i = 1 such 2 that suf ( W ) = suf ( V ) . J. Kärkkäinen, D. Kempa, M. Piątkowski Tighter Bounds for the Sum of Irreducible LCP Values
Cyclic equivalence Lemma } s Let W = { { w i } i = 1 be a multiset of cyclic words. Then: There exists a set of cyclic words V = { v i } t i = 1 such that 1 suf ( W ) = suf ( V ) . } p There exists a multiset of primitive cyclic words U = { { u i } i = 1 such 2 that suf ( W ) = suf ( V ) . Remark If two multisets of words V and W are cyclically equivalent, then LCP V = LCP W , DP V = DP W and BWT V = BWT W . J. Kärkkäinen, D. Kempa, M. Piątkowski Tighter Bounds for the Sum of Irreducible LCP Values
Cyclic equivalence Lemma } s Let W = { { w i } i = 1 be a multiset of cyclic words. Then: There exists a set of cyclic words V = { v i } t i = 1 such that 1 suf ( W ) = suf ( V ) . } p There exists a multiset of primitive cyclic words U = { { u i } i = 1 such 2 that suf ( W ) = suf ( V ) . Remark If two multisets of words V and W are cyclically equivalent, then LCP V = LCP W , DP V = DP W and BWT V = BWT W . Theorem (Mantaci, Restivo, Rosone, Sciortino – 2007) The mapping from a word v to the cyclical equivalence class of IBWT ( v ) is a bijection J. Kärkkäinen, D. Kempa, M. Piątkowski Tighter Bounds for the Sum of Irreducible LCP Values
Irreducible LCP and DP values Irreducible values A value LCP W [ i ] (respectively DP W [ i ] ) is irreducible if BWT W [ i ] � = BWT W [ i − 1 ] . SA W suf ( W ) BWT ( W ) LCP W DP W � 1 , 0 � a a b a a b · · · − − b � 2 , 0 � a a b a a b · · · ∞ ∞ b � 1 , 1 � a b a a b a · · · 1 2 a � 2 , 1 � a b a a b a · · · ∞ ∞ a � 3 , 0 � a b a b a b · · · 3 4 b � 1 , 2 � b a a b a a · · · 0 1 a � 2 , 2 � b a a b a a · · · ∞ ∞ a � 3 , 1 � b a b a b a · · · 2 3 a J. Kärkkäinen, D. Kempa, M. Piątkowski Tighter Bounds for the Sum of Irreducible LCP Values
Sum of irreducible LCP values Irreducible values n � } s W = { { w i } and | w i | = n i = 1 i = 1 Σ ilcp ( W ) – sum of irreducible LCP values Σ idp ( W ) – sum of distinguishing prefixes lengths Theorem (Kärkkäinen, Manzini, Puglisi – 2009) Σ ilcp ( W ) = O ( n log n ) J. Kärkkäinen, D. Kempa, M. Piątkowski Tighter Bounds for the Sum of Irreducible LCP Values
New upper bounds for the sum of irreducible lcp values Theorem For any multiset W of words of total length n > 0, we have Σ ilcp ( W ) � Σ idp ( W ) � n ⌈ lg n ⌉ − 2 ⌈ lg n ⌉ + 1 Theorem For any multiset W of words of total length n > 0 such that BWT ( W ) has r runs, we have Σ ilcp ( W ) + r − 1 = Σ idp ( W ) � n ⌈ lg r ⌉ − 2 ⌈ lg r ⌉ + 1 J. Kärkkäinen, D. Kempa, M. Piątkowski Tighter Bounds for the Sum of Irreducible LCP Values
Reverse suffixes STree ( W ) W = { { a , aab , abb , b } } • aaaaaa · · · (1) a b baabaa · · · (2) abaaba · · · (3) a a b b bbabba · · · (4) aabaab · · · a a a a (5) b b b b babbab · · · (6) abbabb · · · ( 1 ) ( 5 ) ( 3 ) ( 7 ) ( 2 ) ( 6 ) ( 4 ) ( 8 ) (7) bbbbbb · · · (8) J. Kärkkäinen, D. Kempa, M. Piątkowski Tighter Bounds for the Sum of Irreducible LCP Values
Reverse suffixes STree ( W ) W = { { a , aab , abb , b } } • aaaaaa · · · (1) a b baabaa · · · (2) abaaba · · · (3) a a b b bbabba · · · (4) aabaab · · · a a a a (5) b b b b babbab · · · (6) abbabb · · · ( 1 ) ( 5 ) ( 3 ) ( 7 ) ( 2 ) ( 6 ) ( 4 ) ( 8 ) (7) bbbbbb · · · (8) Dispersal pair with respect to � The pair of leaves ( u , v ) of an ordered tree is called a dispersal pair if u < v the subtree rooted at their nearest common ancestor contains no leaf w such that u < w < v . J. Kärkkäinen, D. Kempa, M. Piątkowski Tighter Bounds for the Sum of Irreducible LCP Values
Reverse suffixes STree ( W ) W = { { a , aab , abb , b } } • aaaaaa · · · (1) a b baabaa · · · (2) abaaba · · · (3) a a b b bbabba · · · (4) aabaab · · · a a a a (5) b b b b babbab · · · (6) abbabb · · · ( 1 ) ( 5 ) ( 3 ) ( 7 ) ( 2 ) ( 6 ) ( 4 ) ( 8 ) (7) bbbbbb · · · (8) Lemma � � D STree ( W ) , � W = Σ idp ( W ) D ( T , � ) – the number of dispersal pairs in T J. Kärkkäinen, D. Kempa, M. Piątkowski Tighter Bounds for the Sum of Irreducible LCP Values
n log n upper bound d ( 1 ) = 0 d ( n ) = i ∈ [ 1 .. ⌊ n / 2 ⌋ ] d ( n , i ) max when n > 1 d ( n , k ) = d ( k ) + d ( n − k ) + min { 2 k , n − 1 } where n > 0 and k ∈ [ 0 .. ⌊ n / 2 ⌋ ] . Lemma d ( n ) = max { D ( T , � ) } , where the maximum is taken over any rooted tree T with n leaves and any total order � on its leaves. J. Kärkkäinen, D. Kempa, M. Piątkowski Tighter Bounds for the Sum of Irreducible LCP Values
n log n upper bound Lemma d ( n ) = n ⌈ lg n ⌉ − 2 ⌈ lg n ⌉ + 1 Theorem For any multiset W of words of total length n > 0, we have Σ ilcp ( W ) � Σ idp ( W ) � d ( n ) � n ⌈ lg n ⌉ − 2 ⌈ lg n ⌉ + 1 J. Kärkkäinen, D. Kempa, M. Piątkowski Tighter Bounds for the Sum of Irreducible LCP Values
n log r upper bound Lemma � �� � � � If BWT ( W ) has r runs, then � D u STree ( W ) , � W � < r for every vertex u in STree ( W ) . J. Kärkkäinen, D. Kempa, M. Piątkowski Tighter Bounds for the Sum of Irreducible LCP Values
Recommend
More recommend