in place bijective bwt transforms
play

In-Place (Bijective) BWT Transforms Dominik Kppl Kyushu - PowerPoint PPT Presentation

In-Place (Bijective) BWT Transforms Dominik Kppl Kyushu University Daiki Hashimoto Tohoku University Diptarama Ayumi Shinohara data structures Burrows-Wheeler Transform (BWT) [Burrows,Wheeler '94] Bijective BWT (BBWT) [Gil,Scott '12]


  1. In-Place (Bijective) BWT Transforms Dominik Köppl Kyushu University Daiki Hashimoto Tohoku University Diptarama Ayumi Shinohara

  2. data structures Burrows-Wheeler Transform (BWT) [Burrows,Wheeler '94] Bijective BWT (BBWT) [Gil,Scott '12] 2

  3. BWT of bacabbabb T = bacabbabb$ 3

  4. BWT of bacabbabb T = bacabbabb$ all suffjxes bacabbabb$ acabbabb$ cabbabb$ abbabb$ bbabb$ babb$ abb$ bb$ b$ $ 4

  5. BWT of bacabbabb T = bacabbabb$ all suffjxes $ bacabbabb$ b acabbabb$ a cabbabb$ c abbabb$ a bbabb$ b babb$ prev. char b abb$ a bb$ b b$ b $ 5

  6. BWT of bacabbabb T = bacabbabb$ all suffjxes $ bacabbabb$ $ bacabbabb$ b acabbabb$ b acabbabb$ a cabbabb$ a cabbabb$ c abbabb$ c abbabb$ a bbabb$ a bbabb$ b babb$ b babb$ align prev. char b abb$ b abb$ left a bb$ a bb$ b b$ b b$ b $ b $ 6

  7. BWT of bacabbabb T = bacabbabb$ all suffjxes BWT $ bacabbabb$ $ bacabbabb$ b $ b acabbabb$ b acabbabb$ b abb$ a cabbabb$ a cabbabb$ c abbabb$ c abbabb$ c abbabb$ b acabbabb$ a bbabb$ a bbabb$ b babb$ b babb$ b babb$ b b$ < lex sort align prev. char b abb$ b abb$ $ bacabbabb$ left a bb$ a bb$ a bb$ b b$ b b$ a bbabb$ b $ b $ a cabbabb$ lex. order 7

  8. the BBWT is the BWT of the Lyndon factorization of an input text with respect to ≺ ω 8

  9. the BBWT is the BWT of the Lyndon factorization 1. of an input text with respect to ≺ ω 2. 9

  10. Lyndon words – a – aabab Lyndon word is smaller than ● any proper suffix ● any rotation 10

  11. Lyndon words – a – aabab Lyndon word is smaller than ● any proper suffix ● any rotation not Lyndon words: – abaab (rotation aabab smaller) – abab ( abab not smaller than suffjx ab ) 11

  12. Lyndon factorization [Chen+ '58] ● input: text T = T 1 T 2 T t ⋯ ● output: factorization T 1 ... T t with – T x is Lyndon word – T x ≥ lex T x +1 – factorization uniquely defjned – linear time [Duval'88] (Chen-Fox-Lyndon Theorem) (Chen-Fox-Lyndon theorem) 12

  13. example T = bacabbabb Lyndon factorization : b|ac|abb|abb – b,ac,abb , and abb are Lyndon – b > lex ac > lex abb ≥ lex abb 13

  14. ≺ ω order ● u ≺ ω w : ⟺ u u u u ... < lex w w w w ... ● ab < lex aba ● aba ≺ ω ab 14

  15. ≺ ω order ● u ≺ ω w : ⟺ u u u u ... < lex w w w w ... ● ab < lex aba abababab⋯ abaabaaba⋯ ● aba ≺ ω ab 15

  16. BBWT of bacabbabb b|ac|abb|abb 16

  17. BBWT of bacabbabb b|ac|abb|abb b ac abb abb ca bab bab bba bba 17

  18. BBWT of bacabbabb b|ac|abb|abb b b ac abb abb ac ca bab bab ca bba bba abb bab bba abb bab bba 18

  19. BBWT of bacabbabb b|ac|abb|abb b abb b ac abb abb ac abb ca bab bab ca ac bba bba abb bab bab bab ≺ ω bba bba abb bba bab b bba ca 19

  20. BBWT of bacabbabb b|ac|abb|abb BBWT b abb abb b b ac abb abb ac abb abb b ca bab bab ca ac ac c bba bba abb bab bab b bab bab bab b ≺ ω bba bba bba a abb bba bba a bab b b b bba ca ca a BBWT( T ) = bbcbbaaba 20

  21. BBWT of bacabbabb b|ac|abb|abb BBWT b abb abb b b ac abb abb ac abb abb b ca bab bab ca ac ac c bba bba abb bab bab b bab bab bab b ≺ ω bba bba bba a abb bba bba a bab b b b bba ca ca a BBWT( T ) = bbcbbaaba BWT( T $ ) = bbcbbb$aaa 21

  22. motivation properties of BBWT : ● no $ necessary ● BBWT is more compressible than BWT for various inputs [Scott and Gill '12] ● BBWT is indexible (full text index) ● is computable in O( n ) time with O( n ) words [Bannai+ '19] however, O( n ) words can be too much for large n 22

  23. in-place computation ● Σ: alphabet, σ := |Σ| alphabet size ● T : text, n := | T | ● L := n lg σ bits workspace ● aim : in-place computation transform T BWT BBWT with ↔ ↔ | L | + O(lg n ) bits of workspace L T := b a c a b b a b b 23

  24. known solutions work- input output time reference space text BWT in-place O( n 2 ) Crochemore+ '15 BWT text in-place O( n 2+ε ) O( n lg σ ) O( n text BBWT Bonomo+ '14 bits lg n /lg lg n ) σ : alphabet size, n : text length, 24 ε is a constant with 0 < ε < 1

  25. in-place conversions text known O( n 2 ) O( n 2+ ε ) O( n 2 ) O( n 2+ ε ) BWT BBWT O( n 2+ ε ) working space: n lg σ + O(lg n ) bits (including text) 25

  26. forward search F L T = bacabbabb$ b $ a b a c b a b b b b b $ b a b a c a 26

  27. forward search F L T = bacabbabb$ b $ a b a c b a b b b b b $ b a b a c a 27

  28. forward search F L T = bacabbabb$ b $ a b a c b a b b b b b $ b a b a c a 28

  29. forward search F L T = bacabbabb$ b $ a b a c b a b b b b b $ b a b a c a 29

  30. forward search F L T = bacabbabb$ b $ a b a c can calculate with b a rank and select on F and L b b b b b $ b a b a c a 30

  31. L .rank L [ i ] ( L [ i ]) forward search F L T = bacabbabb$ 1 $ b 1 1 a b 2 2 a c 1 FL mapping: 3 a b 3 FL( i ) = L .select F [ i ] ( F .rank F [ i ] ( F [ i ]) ) 1 b b 4 2 b b 5 3 b $ 1 4 b a 1 5 b a 2 1 c a 3 F .rank F [i] ( F [ i ]) 31

  32. L .rank L [ i ] ( L [ i ]) backward search F L T = bacabbabb$ 1 $ b 1 1 a b 2 2 a c 1 3 a b 3 1 b b 4 2 b b 5 3 b $ 1 4 b a 1 5 b F .rank F [i] ( F [ i ]) a 2 1 c a 3 32 FM index [Ferragina, Manzini '00]

  33. L .rank L [ i ] ( L [ i ]) backward search F L T = bacabbabb$ 1 $ b 1 1 a b 2 2 a c 1 3 a b 3 1 b b 4 2 b b 5 3 b $ 1 4 b a 1 5 b F .rank F [i] ( F [ i ]) a 2 1 c a 3 33 FM index [Ferragina, Manzini '00]

  34. L .rank L [ i ] ( L [ i ]) backward search F L T = bacabbabb$ 1 $ b 1 1 a b 2 2 a c 1 3 a b 3 1 b b 4 2 b b 5 3 b $ 1 4 b a 1 5 b F .rank F [i] ( F [ i ]) a 2 1 c a 3 34 FM index [Ferragina, Manzini '00]

  35. L .rank L [ i ] ( L [ i ]) backward search F L T = bacabbabb$ 1 $ b 1 1 a b 2 2 a c 1 3 a b 3 1 b b 4 2 b b 5 3 b $ 1 4 b a 1 5 b F .rank F [i] ( F [ i ]) a 2 1 c a 3 35 FM index [Ferragina, Manzini '00]

  36. L .rank L [ i ] ( L [ i ]) backward search F L T = bacabbabb$ 1 $ b 1 1 a b 2 LF mapping: 2 a c 1 LF( i ) := F .select L [ i ] ( L .rank L [ i ] ( i ) ) 3 a b 3 1 b b 4 2 b b 5 3 b $ 1 4 b a 1 5 b F .rank F [i] ( F [ i ]) a 2 1 c a 3 36 FM index [Ferragina, Manzini '00]

  37. L .rank L [ i ] ( L [ i ]) backward search F L T = bacabbabb$ 1 $ b 1 1 a b 2 LF mapping: 2 a c 1 LF( i ) := F .select L [ i ] ( L .rank L [ i ] ( i ) ) 3 a b 3 = F .select L [ i ] (1) + L .rank L [ i ] ( i )-1 1 b b 4 2 b b 5 3 b $ 1 4 b a 1 5 b F .rank F [i] ( F [ i ]) a 2 1 c a 3 37 FM index [Ferragina, Manzini '00]

  38. L .rank L [ i ] ( L [ i ]) backward search F L T = bacabbabb$ 1 $ b 1 1 a b 2 LF mapping: 2 a c 1 LF( i ) := F .select L [ i ] ( L .rank L [ i ] ( i ) ) 3 a b 3 = F .select L [ i ] (1) + L .rank L [ i ] ( i )-1 1 b b 4 2 b b 5 = |{ j : L [ j ] < L [ i ]}| + L .rank L [ i ] ( i ) 3 b $ 1 4 b a 1 5 b F .rank F [i] ( F [ i ]) a 2 1 c a 3 38 FM index [Ferragina, Manzini '00]

  39. LF: time complexity If we store BWT( T ) in L : – L [ i ] = BWT[ i ]: O(1) time ⇒ for any c : L .rank c ( i ) in O( n ) time – LF( i ) = |{ j : L [ j ] < L [ i ]}| + L .rank L [ i ] ( i ) O( n ) time O( n ) time 39

  40. FL: time complexity ● FL( i ) = L .select F [ i ] ( F .rank F [ i ] ( F [ i ]) ) FL(i) = L .select F [ i ] ( i - |{ j : L [ j ] < i }| ) ● If we know F [ i ]: FL( i ) in O( n ) time ● however, the fastest in-place computation of F [ i ] takes O( n 1+ε ) time [Munro,Raman '96] for any constant ε with 0 < ε < 1 40

  41. road map text 1. O( n 2+ ε ) O( n 2 ) BWT BBWT 2. O( n 2+ ε ) working space: n lg σ + O(lg n ) bits (including text) 41

  42. text BBWT → 42

  43. text BBWT → for each Lyndon factor T x with x = 1 up to t : prepend T x [| T x |] to BBWT p 1 (insert position in BBWT ) ← for each i = | T x |-1 down to 1 : p LF( p ) + 1 ← insert T x [ i ] at BBWT[ p ] [Bonomo+ '14] 43

  44. text BBWT → T = bacabbabb ● Lyndon factorization: b|ac|abb|abb ● fjrst: insert b 44

  45. text BBWT → T = bacabbabb ● Lyndon factorization: b|ac|abb|abb ● fjrst: insert b F L 1 b b 1 45

  46. text BBWT → T = bacabbabb F L 1 a b 1 ● Lyndon factorization: 2 a b 2 3 a c 1 b|ac|abb|abb 1 b b 3 ● fjrst: insert b 2 b b 4 3 b a 1 F L how to calculate? 4 b a 2 1 b b 1 5 b b 5 1 c a 3 46

  47. BBWT( T 1 T 2 ) T = b|ac|abb|abb = T 1 T 2 T 3 T 4 ● next Lyndon factor: ac F L 1 b b 1 47

  48. BBWT( T 1 T 2 ) T = b|ac|abb|abb = T 1 T 2 T 3 T 4 ● next Lyndon factor: ac F L F L 1 b b 1 1 b c 1 1 c b 1 48

  49. BBWT( T 1 T 2 ) T = b|ac|abb|abb = T 1 T 2 T 3 T 4 ● next Lyndon factor: ac F L F L F L 1 b b 1 1 b c 1 1 a c 1 1 c b 1 1 b b 1 1 c a 1 49

  50. BBWT( T 1 T 2 T 3 ) T = b|ac|abb|abb ● next Lyndon factor: abb F L 1 a c 1 1 b b 1 1 c a 1 50

  51. BBWT( T 1 T 2 T 3 ) T = b|ac|abb|abb ● next Lyndon factor: abb F L F L 1 a c 1 1 a b 1 1 b b 1 1 b c 1 1 c a 1 2 b b 2 1 c a 1 51

Recommend


More recommend