fully compressed pattern matching by recompression
play

Fully compressed pattern matching by recompression Artur Je - PowerPoint PPT Presentation

Fully compressed pattern matching by recompression Artur Je University of Wrocaw 9 VII 2012 FCPM by recompression 9 VII 2012 1 / 18 Artur Je SLP Definition (SLP: Straight Line Programme) CFG generating exactly one word X i X j X


  1. Fully compressed pattern matching by recompression Artur Jeż University of Wrocław 9 VII 2012 FCPM by recompression 9 VII 2012 1 / 18 Artur Jeż

  2. SLP Definition (SLP: Straight Line Programme) CFG generating exactly one word X i → X j X k or X i → a FCPM by recompression 9 VII 2012 2 / 18 Artur Jeż

  3. SLP Definition (SLP: Straight Line Programme) CFG generating exactly one word X i → X j X k or X i → a Example X 0 = a , X 1 = b , X n + 1 = X n − 1 X n − 2 a , b , ba , bab , babba , babbababb , . . . FCPM by recompression 9 VII 2012 2 / 18 Artur Jeż

  4. SLP Definition (SLP: Straight Line Programme) CFG generating exactly one word X i → X j X k or X i → a Example X 0 = a , X 1 = b , X n + 1 = X n − 1 X n − 2 a , b , ba , bab , babba , babbababb , . . . Relations to LZ and LZW LZW rules X i → aX j , text is X 1 X 2 X 3 . . . LZ LZ to SLP: from n to O ( n log ( N / n )) FCPM by recompression 9 VII 2012 2 / 18 Artur Jeż

  5. SLP Definition (SLP: Straight Line Programme) CFG generating exactly one word X i → X j X k or X i → a Example X 0 = a , X 1 = b , X n + 1 = X n − 1 X n − 2 a , b , ba , bab , babba , babbababb , . . . Relations to LZ and LZW LZW rules X i → aX j , text is X 1 X 2 X 3 . . . LZ LZ to SLP: from n to O ( n log ( N / n )) many algorithms for SLPs CPM for LZ [Gawrychowski ESA’11] in theory (word equations, equations in groups, verification...) FCPM by recompression 9 VII 2012 2 / 18 Artur Jeż

  6. This talk Definition (CPM, FCPM) Compressed pattern matching: text is compressed, pattern not. Fully Compressed pattern matching: both text and pattern are compressed. FCPM by recompression 9 VII 2012 3 / 18 Artur Jeż

  7. This talk Definition (CPM, FCPM) Compressed pattern matching: text is compressed, pattern not. Fully Compressed pattern matching: both text and pattern are compressed. Results An O (( n + m ) log M ) algorithm for FCPM for SLP. (Previously: O ( nm 2 ) , [Lifshits, CPM’07]). FCPM by recompression 9 VII 2012 3 / 18 Artur Jeż

  8. This talk Definition (CPM, FCPM) Compressed pattern matching: text is compressed, pattern not. Fully Compressed pattern matching: both text and pattern are compressed. Results An O (( n + m ) log M ) algorithm for FCPM for SLP. (Previously: O ( nm 2 ) , [Lifshits, CPM’07]). Different approach A new technique; recompression. decompresses text and pattern compresses them again (in the same way) in the end: pattern is a single symbol FCPM by recompression 9 VII 2012 3 / 18 Artur Jeż

  9. Technique Where it comes from Mehlhorn, Gawry FCPM by recompression 9 VII 2012 4 / 18 Artur Jeż

  10. Technique Where it comes from Mehlhorn, Gawry Applicable to Fully Compressed Membership Problem [ ∈ NP] Word equations [alternative PSPACE algorithm] Fully Compressed Pattern Matching [SLPs, LZ, O (( n + m ) log M log ( n + m )) ] construction of a grammar for a string [alternative log ( N / n ) approximation algorithm] other? FCPM by recompression 9 VII 2012 4 / 18 Artur Jeż

  11. Example Equality of strings How to test equality of strings? a a a b a b c a b a b b a b c b a a a a b a b c a b a b b a b c b a FCPM by recompression 9 VII 2012 5 / 18 Artur Jeż

  12. Example Equality of strings How to test equality of strings? a a a b a b c a b a b b a b c b a a a a b a b c a b a b b a b c b a FCPM by recompression 9 VII 2012 5 / 18 Artur Jeż

  13. Example Equality of strings How to test equality of strings? a 3 b a b c a b a b b a b c b a a 3 b a b c a b a b b a b c b a FCPM by recompression 9 VII 2012 5 / 18 Artur Jeż

  14. Example Equality of strings How to test equality of strings? a 3 b a b c a b a b 2 a b c b a a 3 b a b c a b a b 2 a b c b a FCPM by recompression 9 VII 2012 5 / 18 Artur Jeż

  15. Example Equality of strings How to test equality of strings? a 3 b d c d a b 2 d c b a a 3 b d c d a b 2 d c b a FCPM by recompression 9 VII 2012 5 / 18 Artur Jeż

  16. Example Equality of strings How to test equality of strings? a 3 b d c d a b 2 d c e a 3 b d c d a b 2 d c e FCPM by recompression 9 VII 2012 5 / 18 Artur Jeż

  17. Example Equality of strings How to test equality of strings? a 3 b d c d a b 2 d c e a 3 b d c d a b 2 d c e FCPM by recompression 9 VII 2012 5 / 18 Artur Jeż

  18. Example Equality of strings How to test equality of strings? a 3 b d c d a b 2 d c e a 3 b d c d a b 2 d c e Iterate! FCPM by recompression 9 VII 2012 5 / 18 Artur Jeż

  19. How to generalise? Idea For both strings replace pairs of letters replace (maximal) blocks of the same letter When every letter is compressed, the length reduces by half in an iteration. FCPM by recompression 9 VII 2012 6 / 18 Artur Jeż

  20. How to generalise? Idea For both strings replace pairs of letters replace (maximal) blocks of the same letter When every letter is compressed, the length reduces by half in an iteration. TODO formalise for SLPs for pattern matching running time FCPM by recompression 9 VII 2012 6 / 18 Artur Jeż

  21. Formalisation In one phase FCPM by recompression 9 VII 2012 7 / 18 Artur Jeż

  22. Formalisation In one phase L ← list of letters, P ← list of pairs of letters FCPM by recompression 9 VII 2012 7 / 18 Artur Jeż

  23. Formalisation In one phase L ← list of letters, P ← list of pairs of letters for every letter a ∈ L do replace (maximal) blocks a ℓ with a ℓ FCPM by recompression 9 VII 2012 7 / 18 Artur Jeż

  24. Formalisation In one phase L ← list of letters, P ← list of pairs of letters for every letter a ∈ L do replace (maximal) blocks a ℓ with a ℓ for every pair of letter ab ∈ P do replace pairs ab with c FCPM by recompression 9 VII 2012 7 / 18 Artur Jeż

  25. Formalisation In one phase L ← list of letters, P ← list of pairs of letters for every letter a ∈ L do replace (maximal) blocks a ℓ with a ℓ for every pair of letter ab ∈ P do replace pairs ab with c It will shorten the strings by constant factor. FCPM by recompression 9 VII 2012 7 / 18 Artur Jeż

  26. Formalisation In one phase L ← list of letters, P ← list of pairs of letters for every letter a ∈ L do replace (maximal) blocks a ℓ with a ℓ for every pair of letter ab ∈ P do replace pairs ab with c It will shorten the strings by constant factor. Loop, while nontrivial. ( O ( log M ) iterations). FCPM by recompression 9 VII 2012 7 / 18 Artur Jeż

  27. SLPs Grammar form More general rules: X i → uX j vX k w , j , k < i . FCPM by recompression 9 VII 2012 8 / 18 Artur Jeż

  28. SLPs Grammar form More general rules: X i → uX j vX k w , j , k < i . Lemma There are | G | + 4 n different maximal lengths of blocks in G. Proof. blocks contained in explicit words: assign to explicit letters blocks not contained in explicit words: at most 4 per rule FCPM by recompression 9 VII 2012 8 / 18 Artur Jeż

  29. SLPs Grammar form More general rules: X i → uX j vX k w , j , k < i . Lemma There are | G | + 4 n different maximal lengths of blocks in G. Proof. blocks contained in explicit words: assign to explicit letters blocks not contained in explicit words: at most 4 per rule Lemma There are | G | + 4 n different pairs of letters in G. FCPM by recompression 9 VII 2012 8 / 18 Artur Jeż

  30. Blocks compression Compression of a FCPM by recompression 9 VII 2012 9 / 18 Artur Jeż

  31. Blocks compression Compression of a X 1 → baaba , X 2 → aaX 1 baX 1 baa FCPM by recompression 9 VII 2012 9 / 18 Artur Jeż

  32. Blocks compression Compression of a X 1 → baaba , X 2 → aaX 1 baX 1 baa (no problem) FCPM by recompression 9 VII 2012 9 / 18 Artur Jeż

  33. Blocks compression Compression of a X 1 → baaba , X 2 → aaX 1 baX 1 baa (no problem) X 1 → a , X 2 → aX 1 aX 1 a FCPM by recompression 9 VII 2012 9 / 18 Artur Jeż

  34. Blocks compression Compression of a X 1 → baaba , X 2 → aaX 1 baX 1 baa (no problem) X 1 → a , X 2 → aX 1 aX 1 a (problem) FCPM by recompression 9 VII 2012 9 / 18 Artur Jeż

  35. Blocks compression Compression of a X 1 → baaba , X 2 → aaX 1 baX 1 baa (no problem) X 1 → a , X 2 → aX 1 aX 1 a (problem) X 1 → abaaba , X 2 → aX 1 aX 1 a FCPM by recompression 9 VII 2012 9 / 18 Artur Jeż

  36. Blocks compression Compression of a X 1 → baaba , X 2 → aaX 1 baX 1 baa (no problem) X 1 → a , X 2 → aX 1 aX 1 a (problem) X 1 → abaaba , X 2 → aX 1 aX 1 a (problem) FCPM by recompression 9 VII 2012 9 / 18 Artur Jeż

  37. Blocks compression Compression of a X 1 → baaba , X 2 → aaX 1 baX 1 baa (no problem) X 1 → a , X 2 → aX 1 aX 1 a (problem) X 1 → abaaba , X 2 → aX 1 aX 1 a (problem) Definition (Crossing block) a has a crossing block if some of its maximal blocks is contained in X i but not in explicit words in X i ’s rule. FCPM by recompression 9 VII 2012 9 / 18 Artur Jeż

Recommend


More recommend