inducing suffix and lcp arrays in external memory
play

Inducing Suffix and LCP Arrays in External Memory Timo Bingmann, - PowerPoint PPT Presentation

Inducing Suffix and LCP Arrays in External Memory Timo Bingmann, Johannes Fischer, and Vitaly Osipov | January 7th, 2013 @ ALENEX13 I NSTITUTE OF T HEORETICAL I NFORMATICS A LGORITHMICS KIT University of the State of Baden-Wuerttemberg


  1. Inducing Suffix and LCP Arrays in External Memory Timo Bingmann, Johannes Fischer, and Vitaly Osipov | January 7th, 2013 @ ALENEX’13 I NSTITUTE OF T HEORETICAL I NFORMATICS – A LGORITHMICS KIT – University of the State of Baden-Wuerttemberg and www.kit.edu National Laboratory of the Helmholtz Association

  2. Abstract We consider text index construction in external memory (EM). Our first contribution is an inducing algorithm for suffix arrays in external memory. Practical tests show that this outperforms the previous best EM suffix sorter [Dementiev et al., ALENEX 2005] by a factor of about two in time and I/O-volume. Our second contribution is to augment the first algorithm to also construct the array of longest common prefixes (LCPs). This yields the first EM construction algorithm for LCP arrays. The overhead in time and I/O volume for this extended algorithm over plain suffix array construction is roughly two. The algorithms scale far beyond problem sizes previously considered in the literature (text size of 80 GiB using only 4 GiB of RAM in our experiments). Timo Bingmann, Johannes Fischer, and Vitaly Osipov – Inducing Suffix and LCP Arrays in External Memory Institute of Theoretical Informatics – Algorithmics January 7th, 2013 2/19

  3. Overview Introduction and Motivation 1 Evolution of Suffix Array Construction Algorithms History of LCP Construction Algorithms Example of the Inducing Step in the eSAIS Algorithm 2 Inducing the Suffix Array Inducing the LCP Array Finding Ranks of S ∗ -Suffixes Implementation and Experimental Results 3 Implementation Highlights Experiments – eSAIS vs. DC3 Timo Bingmann, Johannes Fischer, and Vitaly Osipov – Inducing Suffix and LCP Arrays in External Memory Institute of Theoretical Informatics – Algorithmics January 7th, 2013 3/19

  4. 0 1 2 3 4 5 6 7 8 9 10 11 12 Example T = [ cababcbababb$ ] i LCP i T i 0 0 c a b a b c b a b a b b $ 1 0 a b a b c b a b a b b $ 2 0 b a b c b a b a b b $ 3 0 a b c b a b a b b $ 4 0 b c b a b a b b $ 5 0 c b a b a b b $ 6 0 b a b a b b $ 7 0 a b a b b $ 8 0 b a b b $ 9 0 a b b $ 10 0 b b $ 11 0 b $ 12 - $ Timo Bingmann, Johannes Fischer, and Vitaly Osipov – Inducing Suffix and LCP Arrays in External Memory Institute of Theoretical Informatics – Algorithmics January 7th, 2013 4/19

  5. 0 1 2 3 4 5 6 7 8 9 10 11 12 Example T = [ cababcbababb$ ] SA i LCP i T SA i ... n 12 - $ 7 0 a b a b b $ 1 4 a b a b c b a b a b b $ 9 2 a b b $ 3 2 a b c b a b a b b $ 11 0 b $ 6 1 b a b a b b $ 8 3 b a b b $ 2 3 b a b c b a b a b b $ 10 1 b b $ 4 1 b c b a b a b b $ 0 0 c a b a b c b a b a b b $ 5 1 c b a b a b b $ Timo Bingmann, Johannes Fischer, and Vitaly Osipov – Inducing Suffix and LCP Arrays in External Memory Institute of Theoretical Informatics – Algorithmics January 7th, 2013 5/19

  6. 0 1 2 3 4 5 6 7 8 9 10 11 12 Example T = [ cababcbababb$ ] SA i LCP i T SA i ... n 12 - $ 7 0 a b a b b $ 1 4 a b a b c b a b a b b $ 9 2 a b b $ 3 2 a b c b a b a b b $ 11 0 b $ 6 1 b a b a b b $ 8 3 b a b b $ 2 3 b a b c b a b a b b $ 10 1 b b $ 4 1 b c b a b a b b $ 0 0 c a b a b c b a b a b b $ 5 1 c b a b a b b $ Timo Bingmann, Johannes Fischer, and Vitaly Osipov – Inducing Suffix and LCP Arrays in External Memory Institute of Theoretical Informatics – Algorithmics January 7th, 2013 5/19

  7. 0 1 2 3 4 5 6 7 8 9 10 11 12 Example T = [ cababcbababb$ ] SA i LCP i T SA i ... n 12 - $ 7 0 a b a b b $ 1 4 a b a b c b a b a b b $ 9 2 a b b $ 3 2 a b c b a b a b b $ 11 0 b $ 6 1 b a b a b b $ 8 3 b a b b $ 2 3 b a b c b a b a b b $ 10 1 b b $ 4 1 b c b a b a b b $ 0 0 c a b a b c b a b a b b $ 5 1 c b a b a b b $ Timo Bingmann, Johannes Fischer, and Vitaly Osipov – Inducing Suffix and LCP Arrays in External Memory Institute of Theoretical Informatics – Algorithmics January 7th, 2013 5/19

  8. F MM [PST07] BW O ( n ) tree original updated BWT S IT 1999 LS 1/2 copy A/B copy 2000 runs BK diffcover KSPP KS 2003 mod2 split DC3 KA L/S split MaF Na KJP 2004 deep-shallow succinct fixed Σ SS M 2005 bpr chains Mo MP 2006 NZ divsufsort cache aware O ( n log | Σ | ) 2007 Doubling NZC AN SAIS/SADS Prefix SFE-coding 2009 N Induced Copying Recursion 2011 OSACA

  9. Suffix Sorting 256 MiB of Gutenberg Text F MM [PST07] BW in RAM on Intel i7 2.67 GHz O ( n ) tree original updated BWT S IT 1999 LS 1/2 copy A/B copy 2000 runs 396 s BK diffcover KSPP 106 s KS 2003 66 s mod2 split DC3 KA L/S split MaF Na KJP 2004 deep-shallow 50 s succinct fixed Σ SS M 2005 bpr chains Mo MP 2006 NZ divsufsort cache aware O ( n log | Σ | ) 2007 39 s 129 / 189 s Doubling NZC AN SAIS/SADS Mo: 57 s Prefix SFE-coding 2009 98 s N Induced Copying Recursion 2011 OSACA

  10. Suffix Sorting 256 MiB of Gutenberg Text F MM [PST07] BW in RAM on Intel i7 2.67 GHz O ( n ) tree original updated BWT S IT 1999 LS 1/2 copy A/B copy 2000 runs 396 s BK diffcover KSPP 106 s KS 2003 66 s mod2 split DC3 KA L/S split MaF Na KJP 2004 deep-shallow 50 s succinct fixed Σ SS M 2005 bpr chains Mo MP 2006 NZ divsufsort cache aware O ( n log | Σ | ) 2007 39 s 129 / 189 s Doubling NZC AN SAIS/SADS Mo: 57 s Prefix SFE-coding 2009 98 s N Induced Copying Recursion 2011 OSACA

  11. Suffix Sorting 256 MiB of Gutenberg Text F MM [PST07] BW in RAM on Intel i7 2.67 GHz O ( n ) tree original updated BWT S IT 1999 LS 1/2 copy A/B copy 2000 runs 396 s BK diffcover KSPP 106 s KS 2003 66 s mod2 split DC3 KA L/S split MaF Na KJP 2004 deep-shallow 50 s succinct fixed Σ SS M 2005 bpr chains Mo MP 2006 NZ divsufsort cache aware O ( n log | Σ | ) 2007 39 s 129 / 189 s Doubling NZC AN SAIS/SADS Mo: 57 s Prefix SFE-coding 2009 98 s N Induced Copying Recursion 2011 OSACA

  12. LCP Construction Algorithms Algorithm Construction Time Space T → SA,LCP O ( n log n ) MM 1993 9 n KLAAP 2001 T,SA → LCP O ( n ) 13 n T → SA,LCP O ( n ) O ( n ) KS 2003 EM T,SA → LCP O ( n ) M 2004 9 n / 5 n 6 n + O ( n PT 2008 T,SA → v -LCP O ( nv ) v + v ) √ 5 n + 3 Φ -KMP 2009 T,SA → PLCP O ( n log n ) 8 n O ( n 2 ) T,SA,BWT,LF → LCP GO 2011 11 n T → SA,LCP O ( n ) F 2011 9 n Timo Bingmann, Johannes Fischer, and Vitaly Osipov – Inducing Suffix and LCP Arrays in External Memory Institute of Theoretical Informatics – Algorithmics January 7th, 2013 7/19

  13. 0 1 2 3 4 5 6 7 8 9 10 11 12 Example T = [ cababcbababb$ ] SA i T i − 1 T SA i ... n 12 b $ 7 b a b a b b $ 1 c a b a b c b a b a b b $ 9 b a b b $ 3 b a b c b a b a b b $ 11 b b $ 6 c b a b a b b $ 8 a b a b b $ 2 a b a b c b a b a b b $ 10 a b b $ 4 a b c b a b a b b $ 0 - c a b a b c b a b a b b $ 5 b c b a b a b b $ Timo Bingmann, Johannes Fischer, and Vitaly Osipov – Inducing Suffix and LCP Arrays in External Memory Institute of Theoretical Informatics – Algorithmics January 7th, 2013 8/19

  14. 0 1 2 3 4 5 6 7 8 9 10 11 12 Example T = [ cababcbababb$ ] SA i T i − 1 T SA i ... n 12 b $ 7 b a b a b b $ 1 c a b a b c b a b a b b $ 9 b a b b $ 3 b a b c b a b a b b $ 11 b b $ 6 c b a b a b b $ 8 a b a b b $ 2 a b a b c b a b a b b $ 10 a b b $ 4 a b c b a b a b b $ 0 - c a b a b c b a b a b b $ 5 b c b a b a b b $ Timo Bingmann, Johannes Fischer, and Vitaly Osipov – Inducing Suffix and LCP Arrays in External Memory Institute of Theoretical Informatics – Algorithmics January 7th, 2013 8/19

  15. 0 1 2 3 4 5 6 7 8 9 10 11 12 Example T = [ cababcbababb$ ] SA i T i − 1 T SA i ... n 12 b $ 7 b a b a b b $ 1 c a b a b c b a b a b b $ 9 b a b b $ 3 b a b c b a b a b b $ 11 b b $ 6 c b a b a b b $ 8 a b a b b $ 2 a b a b c b a b a b b $ 10 a b b $ 4 a b c b a b a b b $ 0 - c a b a b c b a b a b b $ 5 b c b a b a b b $ Timo Bingmann, Johannes Fischer, and Vitaly Osipov – Inducing Suffix and LCP Arrays in External Memory Institute of Theoretical Informatics – Algorithmics January 7th, 2013 8/19

  16. 0 1 2 3 4 5 6 7 8 9 10 11 12 Example T = [ cababcbababb$ ] SA i T i − 1 T SA i ... n 12 b $ 7 b a b a b b $ 1 c a b a b c b a b a b b $ 9 b a b b $ 3 b a b c b a b a b b $ 11 b b $ 6 c b a b a b b $ 8 a b a b b $ 2 a b a b c b a b a b b $ 10 a b b $ 4 a b c b a b a b b $ 0 - c a b a b c b a b a b b $ 5 b c b a b a b b $ Timo Bingmann, Johannes Fischer, and Vitaly Osipov – Inducing Suffix and LCP Arrays in External Memory Institute of Theoretical Informatics – Algorithmics January 7th, 2013 8/19

  17. 0 1 2 3 4 5 6 7 8 9 10 11 12 Example T = [ cababcbababb$ ] SA i T i − 1 T SA i ... n 12 b $ 7 b a b a b b $ 1 c a b a b c b a b a b b $ 9 b a b b $ 3 b a b c b a b a b b $ 11 b b $ 6 c b a b a b b $ 8 a b a b b $ 2 a b a b c b a b a b b $ 10 a b b $ 4 a b c b a b a b b $ 0 - c a b a b c b a b a b b $ 5 b c b a b a b b $ Timo Bingmann, Johannes Fischer, and Vitaly Osipov – Inducing Suffix and LCP Arrays in External Memory Institute of Theoretical Informatics – Algorithmics January 7th, 2013 8/19

Recommend


More recommend