Application with Suffix Tries (2) S=abaa$ Find the longest common substring of $ b S and t a a S= abaa$ a $ t= babaa b a $ a $ max commom=ba a $
Application with Suffix Tries (2) S=abaa$ Find the longest common substring of $ b S and t a a S= abaa$ a $ t= babaa b a $ a $ max commom=abaa a $
Constructing Suffix Tries
To build a suffix tries of S First build a suffix tries of S[0] Add char one by one into the suffix tries
r
S=abaa$ a r a
S=abaa$ a Suffix: r a a
S=abaa$ ab Suffix: r ab a b
S=abaa$ ab Suffix: r ab a b b b
S=abaa$ ab Suffix: r ab a b b b
S=abaa$ ab Suffix: r ab a b b b
S=abaa$ aba Suffix: r aba a b b b a
S=abaa$ aba Suffix: r aba a b ba b a a
S=abaa$ aba Suffix: r aba a b ba b a a
S=abaa$ aba Suffix: r aba a b ba a b a a
S=abaa$ aba Suffix: r aba a b ba a b a a
S=abaa$ abaa$ Suffix: r $ abaa$ a b baa$ aa$ a a$ b a $ $ a a $ a $ $
How many nodes can a suffix trie have?
Space-Efficient Suffix Trees
A More Compact Represntation S=abaa$ r $ a b a b a $ a a $ a $ $
A More Compact Represntation S=abaa$ 12345 r $ a baa$ a$ baa$ $
A More Compact Represntation S=abaa$ 12345 r 5:5 3:3 2:4 4:5 2:4 5:5
How to construct suffix tree in Linear time Further reading: Ukkonens Algorithm
Suffix arrays
Suffix Array Example str = catttcat $ 1 catttcat$ 8 $ 2 attcat$ 6 at$ 3 ttcat$ sort the suffixes 2 attcat$ alphabetically 4 tcat$ 5 cat$ 5 cat$ 1 cattcat$ 6 at$ 7 t$ 7 t$ 4 tcat$ 8 $ 3 ttcat$
Suffix Arrays What can we do with this? 8 $ 1. Counting: 6 at$ how many times does ’at’ occur? 2 attcat$ 5 cat$ All the suffixes that start with ’at’ 1 cattcat$ will be next to each other in the array. 7 t$ Binary search to find ’at’ 4 tcat$ 3 ttcat$
Suffix Arrays What can we do with this? 8 $ 2. K-mer counting: 6 at$ k-length substring that occurs exactly i 2 attcat$ times. 5 cat$ 1 cattcat$ 7 t$ 4 tcat$ 3 ttcat$
Suffix Arrays K = 2 CurrentCount 1 8 $ 6 at$ 2 attcat$ 5 cat$ 1 cattcat$ 7 t$ 4 tcat$ 3 ttcat$
Suffix Arrays K = 2 CurrentCount 1 8 $ 1 6 at$ 2 attcat$ 5 cat$ 1 cattcat$ 7 t$ 4 tcat$ 3 ttcat$
Suffix Arrays K = 2 CurrentCount 1 8 $ 1 6 at$ 2 2 attcat$ 5 cat$ 1 cattcat$ 7 t$ 4 tcat$ 3 ttcat$
Recommend
More recommend