string matching with variable length gaps
play

String Matching with Variable Length Gaps By Philip Bille, Inge Li - PowerPoint PPT Presentation

String Matching with Variable Length Gaps By Philip Bille, Inge Li Grtz, Hjalte Wedel Vildhj and David Kofoed Wind Presented by Hjalte Wedel Vildhj October 13, 2010 SPIRE 2010, Los Cabos, Mexico The Variable Length Gap Problem Given some


  1. String Matching with Variable Length Gaps By Philip Bille, Inge Li Gørtz, Hjalte Wedel Vildhøj and David Kofoed Wind Presented by Hjalte Wedel Vildhøj October 13, 2010 SPIRE 2010, Los Cabos, Mexico

  2. The Variable Length Gap Problem Given some string T ∈ Σ + and a variable length gap pattern P = P 1 · g { a 1 , b 1 } · P 2 · g { a 2 , b 2 } · · · g { a k − 1 , b k − 1 } · P k . Find the end positions for all occurrences of P in T .

  3. The Variable Length Gap Problem Given some string T ∈ Σ + and a variable length gap pattern P = P 1 · g { a 1 , b 1 } · P 2 · g { a 2 , b 2 } · · · g { a k − 1 , b k − 1 } · P k . Some x ∈ Σ ∗ s.t. a 1 ≤ | x | ≤ b 1 Find the end positions for all occurrences of P in T .

  4. The Variable Length Gap Problem Given some string T ∈ Σ + and a variable length gap pattern P = P 1 · g { a 1 , b 1 } · P 2 · g { a 2 , b 2 } · · · g { a k − 1 , b k − 1 } · P k . Some x ∈ Σ ∗ s.t. a 1 ≤ | x | ≤ b 1 Find the end positions for all occurrences of P in T . Example: P = A · g { 6 , 7 } · CC · g { 2 , 6 } · GT T = ATCGGCTCCAGACCAGTACCCGTTCCGTGGT �� Solution:

  5. The Variable Length Gap Problem Given some string T ∈ Σ + and a variable length gap pattern P = P 1 · g { a 1 , b 1 } · P 2 · g { a 2 , b 2 } · · · g { a k − 1 , b k − 1 } · P k . Some x ∈ Σ ∗ s.t. a 1 ≤ | x | ≤ b 1 Find the end positions for all occurrences of P in T . Example: P = A · g { 6 , 7 } · CC · g { 2 , 6 } · GT 6 6 T = ATCGGCTCCAGACCAGTACCCGTTCCGTGGT end pos in T � � Solution: 17

  6. The Variable Length Gap Problem Given some string T ∈ Σ + and a variable length gap pattern P = P 1 · g { a 1 , b 1 } · P 2 · g { a 2 , b 2 } · · · g { a k − 1 , b k − 1 } · P k . Some x ∈ Σ ∗ s.t. a 1 ≤ | x | ≤ b 1 Find the end positions for all occurrences of P in T . Example: P = A · g { 6 , 7 } · CC · g { 2 , 6 } · GT 8 6 T = ATCGGCTCCAGACCAGTACCCGTTCCGTGGT � � Solution: 17 Not a valid match!

  7. The Variable Length Gap Problem Given some string T ∈ Σ + and a variable length gap pattern P = P 1 · g { a 1 , b 1 } · P 2 · g { a 2 , b 2 } · · · g { a k − 1 , b k − 1 } · P k . Some x ∈ Σ ∗ s.t. a 1 ≤ | x | ≤ b 1 Find the end positions for all occurrences of P in T . Example: P = A · g { 6 , 7 } · CC · g { 2 , 6 } · GT 6 6 T = ATCGGCTCCAGACCAGTACCCGTTCCGTGGT end pos in T � � Solution: 17 , 28

  8. The Variable Length Gap Problem Given some string T ∈ Σ + and a variable length gap pattern P = P 1 · g { a 1 , b 1 } · P 2 · g { a 2 , b 2 } · · · g { a k − 1 , b k − 1 } · P k . Some x ∈ Σ ∗ s.t. a 1 ≤ | x | ≤ b 1 Find the end positions for all occurrences of P in T . Example: P = A · g { 6 , 7 } · CC · g { 2 , 6 } · GT 5 7 T = ATCGGCTCCAGACCAGTACCCGTTCCGTGGT end pos in T � � Solution: 17 , 28

  9. The Variable Length Gap Problem Given some string T ∈ Σ + and a variable length gap pattern P = P 1 · g { a 1 , b 1 } · P 2 · g { a 2 , b 2 } · · · g { a k − 1 , b k − 1 } · P k . Some x ∈ Σ ∗ s.t. a 1 ≤ | x | ≤ b 1 Find the end positions for all occurrences of P in T . Example: P = A · g { 6 , 7 } · CC · g { 2 , 6 } · GT 3 7 T = ATCGGCTCCAGACCAGTACCCGTTCCGTGGT end pos in T � � Solution: 17 , 28 , 31

  10. The Variable Length Gap Problem Given some string T ∈ Σ + and a variable length gap pattern P = P 1 · g { a 1 , b 1 } · P 2 · g { a 2 , b 2 } · · · g { a k − 1 , b k − 1 } · P k . Some x ∈ Σ ∗ s.t. a 1 ≤ | x | ≤ b 1 Find the end positions for all occurrences of P in T . Example: P = A · g { 6 , 7 } · CC · g { 2 , 6 } · GT T = ATCGGCTCCAGACCAGTACCCGTTCCGTGGT � � Solution: 17 , 28 , 31

  11. A Closer Look At The Problem P 3 P 3 P 3 P = A · g { 6 , 7 } · CC · g { 2 , 6 } · GT P 3 P 2 P 2 P 2 P 2 P 2 P 1 P 1 P 1 P 1 P 1 A T C G G C T C C A G A C C A G T A C C C G T T C C G T G G T 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

  12. A Closer Look At The Problem Parameters n = | T | α = # occ. of P 1 , P 2 , . . . , P k in T k k � � m = | P i | A = a i i = 1 i = 1 P 3 P 3 P 3 P = A · g { 6 , 7 } · CC · g { 2 , 6 } · GT P 3 P 2 P 2 P 2 P 2 P 2 P 1 P 1 P 1 P 1 P 1 A T C G G C T C C A G A C C A G T A C C C G T T C C G T G G T 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

  13. A Closer Look At The Problem Parameters n = | T | α = # occ. of P 1 , P 2 , . . . , P k in T k k � � m = | P i | A = a i i = 1 i = 1 Known Upper Bounds By Time Space � � n ( k log w Bille & Thorup 1 O + log k ) + m log m + A O ( m + A ) w Morgante et al. 2 O (( n + m ) log k + α ) O ( m + α ) 1 P. Bille and M. Thorup. Regular expression matching with multi-strings and intervals. In Proc. 21st SODA, 2010 2 M. Morgante, A. Policriti, N. Vitacolonna, and A. Zuccolo. Structured motifs search. J. Comput. Bio. , 12(8):1065-1082, 2005

  14. A Closer Look At The Problem Parameters n = | T | α = # occ. of P 1 , P 2 , . . . , P k in T k k � � m = | P i | A = a i i = 1 i = 1 Known Upper Bounds By Time Space � � n ( k log w Bille & Thorup O + log k ) + m log m + A O ( m + A ) w Morgante et al. O (( n + m ) log k + α ) O ( m + α ) Can you get the best of both?

  15. Illustrating the Algorithm P 3 P 3 P 3 P = A · g { 6 , 7 } · CC · g { 2 , 6 } · GT P 3 P 2 P 2 P 2 P 2 P 2 P 1 P 1 P 1 P 1 P 1 A T C G G C T C C A G A C C A G T A C C C G T T C C G T G G T 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 L 2 L 3

  16. Illustrating the Algorithm P 3 P 3 P 3 P = A · g { 6 , 7 } · CC · g { 2 , 6 } · GT P 3 P 2 P 2 P 2 P 2 P 2 P 1 P 1 P 1 P 1 P 1 A T C G G C T C C A G A C C A G T A C C C G T T C C G T G G T 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 L 2 L 3

  17. Illustrating the Algorithm P 3 P 3 P 3 P = A · g { 6 , 7 } · CC · g { 2 , 6 } · GT P 3 P 2 P 2 P 2 P 2 P 2 P 1 P 1 P 1 P 1 P 1 A T C G G C T C C A G A C C A G T A C C C G T T C C G T G G T 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 L 2 L 3

  18. Illustrating the Algorithm P 3 P 3 P 3 P = A · g { 6 , 7 } · CC · g { 2 , 6 } · GT P 3 P 2 P 2 P 2 P 2 P 2 P 1 P 1 P 1 P 1 P 1 A T C G G C T C C A G A C C A G T A C C C G T T C C G T G G T 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 L 2 L 3

  19. Illustrating the Algorithm P 3 P 3 P 3 P = A · g { 6 , 7 } · CC · g { 2 , 6 } · GT P 3 P 2 P 2 P 2 P 2 P 2 P 1 P 1 P 1 P 1 P 1 A T C G G C T C C A G A C C A G T A C C C G T T C C G T G G T 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 L 2 L 3

  20. Illustrating the Algorithm P 3 P 3 P 3 P = A · g { 6 , 7 } · CC · g { 2 , 6 } · GT P 3 P 2 P 2 P 2 P 2 P 2 P 1 P 1 P 1 P 1 P 1 A T C G G C T C C A G A C C A G T A C C C G T T C C G T G G T 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 L 2 L 3

  21. Illustrating the Algorithm P 3 P 3 P 3 P = A · g { 6 , 7 } · CC · g { 2 , 6 } · GT P 3 P 2 P 2 P 2 P 2 P 2 P 1 P 1 P 1 P 1 P 1 A T C G G C T C C A G A C C A G T A C C C G T T C C G T G G T 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 L 2 L 3

  22. Illustrating the Algorithm P 3 P 3 P 3 P = A · g { 6 , 7 } · CC · g { 2 , 6 } · GT P 3 P 2 P 2 P 2 P 2 P 2 P 1 P 1 P 1 P 1 P 1 A T C G G C T C C A G A C C A G T A C C C G T T C C G T G G T 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 L 2 L 3

  23. Illustrating the Algorithm P 3 P 3 P 3 P = A · g { 6 , 7 } · CC · g { 2 , 6 } · GT P 3 P 2 P 2 P 2 P 2 P 2 P 1 P 1 P 1 P 1 P 1 A T C G G C T C C A G A C C A G T A C C C G T T C C G T G G T 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 L 2 L 3

  24. Illustrating the Algorithm P 3 P 3 P 3 P = A · g { 6 , 7 } · CC · g { 2 , 6 } · GT P 3 P 2 P 2 P 2 P 2 P 2 P 1 P 1 P 1 P 1 P 1 A T C G G C T C C A G A C C A G T A C C C G T T C C G T G G T 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 L 2 L 3

  25. Illustrating the Algorithm P 3 P 3 P 3 P = A · g { 6 , 7 } · CC · g { 2 , 6 } · GT P 3 P 2 P 2 P 2 P 2 P 2 P 1 P 1 P 1 P 1 P 1 A T C G G C T C C A G A C C A G T A C C C G T T C C G T G G T 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 L 2 L 3

  26. Illustrating the Algorithm P 3 P 3 P 3 P = A · g { 6 , 7 } · CC · g { 2 , 6 } · GT P 3 P 2 P 2 P 2 P 2 P 2 P 1 P 1 P 1 P 1 P 1 A T C G G C T C C A G A C C A G T A C C C G T T C C G T G G T 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 dead L 2 L 3

  27. Illustrating the Algorithm P 3 P 3 P 3 P = A · g { 6 , 7 } · CC · g { 2 , 6 } · GT P 3 P 2 P 2 P 2 P 2 P 2 P 1 P 1 P 1 P 1 P 1 A T C G G C T C C A G A C C A G T A C C C G T T C C G T G G T 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 dead L 2 L 3

  28. Illustrating the Algorithm P 3 P 3 P 3 P = A · g { 6 , 7 } · CC · g { 2 , 6 } · GT P 3 P 2 P 2 P 2 P 2 P 2 P 1 P 1 P 1 P 1 P 1 A T C G G C T C C A G A C C A G T A C C C G T T C C G T G G T 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 L 2 L 3

Recommend


More recommend