suffix tree and suffix array
play

Suffix tree and Suffix array Karatsuba CS214: Algorithms and - PowerPoint PPT Presentation

Suffix tree and Suffix array Karatsuba CS214: Algorithms and Complexity Shanghai Jiao Tong University 2016.12.22 Q: How to find a match of S in a target DNA sequence? S: DNA: Q: How to find a match of S in a target DNA sequence? S: DNA:


  1. Application with Suffix Tries (2) S=abaa$ Find the longest common substring of $ b S and t a a S= abaa$ a $ t= babaa b a $ a $ max commom=ba a $

  2. Application with Suffix Tries (2) S=abaa$ Find the longest common substring of $ b S and t a a S= abaa$ a $ t= babaa b a $ a $ max commom=abaa a $

  3. Constructing Suffix Tries

  4. To build a suffix tries of S First build a suffix tries of S[0] Add char one by one into the suffix tries

  5. r

  6. S=abaa$ a r a

  7. S=abaa$ a Suffix: r a a

  8. S=abaa$ ab Suffix: r ab a b

  9. S=abaa$ ab Suffix: r ab a b b b

  10. S=abaa$ ab Suffix: r ab a b b b

  11. S=abaa$ ab Suffix: r ab a b b b

  12. S=abaa$ aba Suffix: r aba a b b b a

  13. S=abaa$ aba Suffix: r aba a b ba b a a

  14. S=abaa$ aba Suffix: r aba a b ba b a a

  15. S=abaa$ aba Suffix: r aba a b ba a b a a

  16. S=abaa$ aba Suffix: r aba a b ba a b a a

  17. S=abaa$ abaa$ Suffix: r $ abaa$ a b baa$ aa$ a a$ b a $ $ a a $ a $ $

  18. How many nodes can a suffix trie have?

  19. Space-Efficient Suffix Trees

  20. A More Compact Represntation S=abaa$ r $ a b a b a $ a a $ a $ $

  21. A More Compact Represntation S=abaa$ 12345 r $ a baa$ a$ baa$ $

  22. A More Compact Represntation S=abaa$ 12345 r 5:5 3:3 2:4 4:5 2:4 5:5

  23. How to construct suffix tree in Linear time Further reading: Ukkonens Algorithm

  24. Suffix arrays

  25. Suffix Array Example str = catttcat $ 1 catttcat$ 8 $ 2 attcat$ 6 at$ 3 ttcat$ sort the suffixes 2 attcat$ alphabetically 4 tcat$ 5 cat$ 5 cat$ 1 cattcat$ 6 at$ 7 t$ 7 t$ 4 tcat$ 8 $ 3 ttcat$

  26. Suffix Arrays What can we do with this? 8 $ 1. Counting: 6 at$ how many times does ’at’ occur? 2 attcat$ 5 cat$ All the suffixes that start with ’at’ 1 cattcat$ will be next to each other in the array. 7 t$ Binary search to find ’at’ 4 tcat$ 3 ttcat$

  27. Suffix Arrays What can we do with this? 8 $ 2. K-mer counting: 6 at$ k-length substring that occurs exactly i 2 attcat$ times. 5 cat$ 1 cattcat$ 7 t$ 4 tcat$ 3 ttcat$

  28. Suffix Arrays K = 2 CurrentCount 1 8 $ 6 at$ 2 attcat$ 5 cat$ 1 cattcat$ 7 t$ 4 tcat$ 3 ttcat$

  29. Suffix Arrays K = 2 CurrentCount 1 8 $ 1 6 at$ 2 attcat$ 5 cat$ 1 cattcat$ 7 t$ 4 tcat$ 3 ttcat$

  30. Suffix Arrays K = 2 CurrentCount 1 8 $ 1 6 at$ 2 2 attcat$ 5 cat$ 1 cattcat$ 7 t$ 4 tcat$ 3 ttcat$

Recommend


More recommend