algorithms in bioinformatics a practical introduction

Algorithms in Bioinformatics: A Practical Introduction Suffix tree - PowerPoint PPT Presentation

Algorithms in Bioinformatics: A Practical Introduction Suffix tree Overview What is suffix tree? Simple application of suffix tree Linear time algorithm for constructing suffix tree Suffix array FM-index 1-mismatch search


  1. Common substrings of more than 2 strings (VI) By traversing T, for each internal node v, compute C(v). [C(v) 3. is defined as the number of distinct termination symbols in the subtree rooted at v] This step takes O(Kn) time.  3 c a % # $ g 3 3 4 5 5 g c g a 3 3 2 c g # % a c a c c $ $ # $ $ # # % % 4 3 3 1 2 1 4 2 3 2 1

  2. Common substrings of more than 2 strings (VII) Traverse T and visit every internal node v. For each v, if V(C(v)) < 4. string-depth of v, set V(C(v)) = string-depth of v. [After step 4, V(k) = the length of the longest substring common to exactly k of these strings.] l(k)=V(k). For i=k-1 downto 2, l(i)=max{l(i+1), V(i)} . 5. This two steps take O(n) time.  For our example, V(2) = 3, V(3) = 2.  Thus, l(3) = 2, l(2) = 3.  In total, this algorithm takes O(Kn) time.  Actually, we can improve this algorithm to O(n) time by mean  of lcp!

  3. Linear time algorithm for constructing suffix tree

  4. Straightforward construction of suffix tree  Consider S = s 1 s 2 … s n where s n =$  Algorithm:  Initialize the tree with only a root  For i = n to 1  Includes S[i..n] into the tree  Time: O(n 2 )

  5. Example of construction  S=acca$ Init For-loop c c a c a a a $ a $ $ $ c $ $ $    c  c $ $ a c a a a $ $ a $ $ $ $ 4 4 5 5 5 3 4 3 2 4 1 3 2 5 5 I 4 I 3 I 5 I 2 I 1

  6. Construction of generalized suffix tree  S ’ = c# Init For-loop c c c a a a c c c $ # $ # $ c  c  c c a c a c a a a a # $ $ a $ $ a $ $ a $ $ $ $ $ $ 4 1 3 2 2 4 1 3 2 2 4 1 3 2 1 5 5 5 I 1 J 2 J 1

  7. Can we construct a suffix tree in o(n 2 ) time? Yes. We can construct it in O(n) time.  Weiner ’ s algorithm [1973]  Linear time for constant size alphabet, but much space  McGreight ’ s algorithm [JACM 1976]  Linear time for constant size alphabet, quadratic space  Ukkonen ’ s algorithm [Algorithmica, 1995]  Online algorithm, linear time for constant size alphabet, less space  Farach ’ s algorithm [FOCS 1997]  Linear time for general alphabet  Hon,Sadakane, and Sung ’ s algorithm [FOCS 2003]  O(n) bit space O(n log e n) time for 0<e<1  O(n) bit space O(n) time for suffix array construction  We will discuss Farach ’ s algorithm later. 

  8. Idea  Build Odd Suffix Tree and Even Suffix Tree  Then, merge odd and even suffix tree. $ a g $ a g ca ca $ $ 7 6 7 6 g g c c a a $ $ c c a a g g g g 5 5 $ $ c c $ $ a a g g g g $ $ 4 4 $ $ 2 2 1 3 1 3 Even Suffix Tree Odd Suffix Tree

  9. Idea Input: a string S of length n  Recursively compute the suffix tree T o of all 1. suffixes beginning at the odd positions. T o is of size n/2.  From T o , compute T e which is the suffix tree 2. for all suffixes beginning at the even positions. Merge T o and T e to form the suffix tree for 3. S.

  10. Stage 1: Constructing odd suffix tree  Given a string S[1..n], we generate a new string S ’ [1..n/2] as follows.  we map pairs of characters into single characters as follows:  S[1..2], S[3..4], S[5..6], … , S[n-1..n].  Remove the duplicates from the pairs of characters and sort them by radix sort.  S ’ [i] = rank of S[2i-1..2i] in the sorted list, for i=1, 2, … , n/2.  By recursion, we get the suffix tree T ’ for S ’  Convert T ’ to the odd suffix tree T o .

  11. Example (I)  S = aaabbbabbaba$  S[1..2]=aa, S[3..4]=ab, S[5..6]=bb, S[7..8]=ab, S[9..10]=ba, S[11..12]=ba.  By stable sort, aa < ab < ba < bb.  Rank(aa)=1, Rank(ab)=2, Rank(ba)=3, Rank(bb)=4.  So, S ’ =124233$.

  12. Example (II)  By recursion, construct the suffix tree T ’ for S ’ : $ 2 3 7 3 $ 6 5 1 4 2

  13. Example (III)  Convert T ’ to the odd tree: $ a 13 5 b i  2i-1 $ 11 9 7 1 This is not a suffix tree 3

  14. Example (IV)  Refine the odd tree T o : $ b a 13 b 5 a b $ 11 9 7 1 3

  15. Time complexity for building the odd tree  Let Time(n) be the time to build a suffix tree for a string of length n.  Stable sorting and refinement of the odd trees take O(n) time.  Build suffix tree for S ’ takes Time(n/2).  So, Stage 1 takes Time(n/2)+O(n) time.

  16. Stage 2: Build the even tree Generate the lex-ordering of the 1. leaves in T e . For any two adjacent leaves 2i and 2j, 2. we find lcp(2i, 2j). Construct the even tree T e from left to 3. right (according to the lex-ordering).

  17. Build the even tree (Step 1)  We get the lex-ordering of the leaves in T o .  Generate the lex-ordering of the leaves in T e .  For each leaf i in T o , get the preceding character c=S[i-1] and form a pair (c,i). Each pair represents a even suffix i-1.  Perform stable sorting on those pairs. We get the lex-ordering of the leaves in T e .

  18. Example S = aaabbbabbaba$  Lex-ordering of the leaves in T o :  13 < 1 < 7 < 3 < 11 < 9 < 5  The pairs are:  (a,13), ($,1), (b,7), (a, 3), (a, 11), (b, 9), (b, 5).  After stable sorting, we have  ($, 1), (a, 13), (a, 3), (a, 11), (b, 7), (b, 9), (b, 5).  Hence, the lex-ordering of the leaves of T e :  12 < 2 < 10 < 6 < 8 < 4

  19. Build the even tree (Step 2)  For any two adjacent leaves 2i and 2j, we first find lcp(2i, 2j).  Observation: lcp(2i, 2j) =  lcp(2i+1, 2j+1)+1 if S[2i]=S[2j]  0 otherwise  Proof:  If S[2i] ≠ S[2j], lcp(2i,2j)=0.  Otherwise, lcp(2i,2j)=1+lcp(2i+1,2j+1).

  20. Example  Recall that the lex- $ b a ordering of leaves: 13 b  12 < 2 < 10 < 6 < 8 < 4. 5 a b  By the previous observation, we have $  lcp(8,4)=lcp(9,5)+1=2  Similarly, we have 11 9  lcp(12,2)=1, lcp(2,10)=1, 7 lcp(10,6)=0, lcp(6,8)=1, 1 lcp(8,4)=2 3

  21. Build the even tree (Step 3)  Construct the even tree T e from left to right. a a $ $ a a b b 12 12 b b b b a 12 a 10 b b b b a a b b a a $ $ 2 2

  22. Build the even tree (Step 3) b b a a a b $ $ $ a a a b b b 12 12 12 b b b b b b a a a 10 10 8 10 b b b 8 b b b 6 6 a a a b b b 6 a a a 4 $ $ $ 2 2 2

  23. Time complexity for building the even tree  Step 1: O(n) time  Step 2: O(n) time  Step 3: O(n) time

  24. Stage 3: Merge odd and even trees We can merge T o and T e by DFS. However, it takes O(n 2 ) time.  $ $ b b a a babbaba$ babbaba$ 13 13 b b a,1 a,1 b,1 b,1 a a 5 5 $,1 $,1 a a b b a a b b 13 13 b,1 b,1 b b a,1 a,1 b,1 b,1 b b b b b b b b $,1 $,1 a a b b $ $ a a a a a a b b a a b b a,1 a,1 a,2 a,2 12 12 $ $ b b a a b b b b b,1 b,1 a a b b $ $ a,2 a,2 $ $ b,1 b,1 b b 11 11 9 9 a a a a $,1 $,1 a,2 a,2 a a b b 10 10 b,8 b,8 b b $ $ 8 8 a a 12 12 7 7 a,2 a,2 b b $ $ a,11 1 1 b b 11 11 9 9 b,5 b,5 a a 1 1 3 3 10 10 b,10 b,5 b,5 b b 4 4 8 8 a,4 b,9 a,4 b,9 Odd tree 5 5 b b a a 7 7 3 3 6 6 2 2 b b 6 6 a a 4 4 $ $ 2 2 Even tree

  25. Stage 3: Merge odd and even trees We merge T o and T e by DFS. We merge two edges as long as they start with the  same character. The merge is ended when one edge is longer than the other. $ b a b a 13 5 a b $ a b 12 b $ b a 10 b 8 b 11 9 a b 6 7 a 4 1 $ 2 3

  26. Merge odd and even trees We merge T o and T e by DFS. We merge two edges as long as they start with the  same character. The merge is ended when one edge is longer than the other. a,1 b,1 $,1 b,1 13 a,1 b,2 $,1 12 a,4 a,11 $,1 $,1 b,3 b,8 10 8 a,4 11 9 b,3 2 b,9 4 7 b,3 5 $,1 6 1 3

  27. Merge odd and even trees The merging may over-merged some nodes.  To correct the tree, we need to unmerge some nodes.  a,1 b,1 $,1 b,1 13 a,1 b,2 $,1 12 a,4 a,11 $,1 $,1 b,3 b,8 10 8 a,4 11 9 b,3 2 b,9 4 7 b,3 5 $,1 6 1 3

  28. Definition of L() and d()  For every node u which may be over-merged, there exist two leaves 2i and 2j-1 such that u=lca(2i, 2j-1).  Denote L(u) be the correct depth of u, that is, lcp(2i,2j-1).  Note that lcp(2i,2j-1) = 1+lcp(2i+1,2j) if S[2i]=S[2j-1]; 0 otherwise.  Let v be lca(2i+1,2j).  Denote d(u) = v.  Note that d() is equivalent to suffix link!

  29. Example of d() a,1 b,1 $,1 b,1 13 a,1 b,2 $,1 12 a,4 a,11 d() $,1 $,1 b,3 b,8 10 8 a,4 11 9 b,3 2 b,9 4 7 b,3 5 $,1 6 1 3

  30. Relationship between L() and d() Suppose u = lca(2i, 2j-1).  Note1: if u is not the root, then S[2i]=S[2j-1].  Note2: lcp(2i,2j-1) = 1+lcp(2i+1,2j) if S[2i]=S[2j-1]; 0  otherwise Note3: d(u) = lcp(2i+1,2j)  Hence, L(u) = 1 + L(d(u)) if u is not the root. Otherwise,  L(u)=0. Lemma: L(u) = the length of the purple path from u to the root. 

  31. Example of L() L( )=1 a,1 b,1 $,1 L( )=1 b,1 13 a,1 b,2 $,1 L( )=2 L( )=2 12 L( )=2 a,4 a,11 $,1 $,1 b,3 b,8 10 8 L( )=4 a,4 11 9 b,3 2 b,9 L( )=2 L( )=3 4 7 b,3 5 $,1 6 1 3

  32. Unmerge the border nodes based on L() (I) L( )=1 a,1 b,1 $,1 L( )=1 b,1 13 a,1 b,2 $,1 L( )=2 L( )=2 12 L( )=2 a,4 a,11 $,1 $,1 b,3 b,8 10 8 L( )=4 a,4 11 9 b,3 2 b,9 L( )=2 L( )=3 4 7 b,3 5 $,1 6 1 3

  33. Unmerge the border nodes based on L() (II) a,1 b,1 $,1 b,1 13 a,1 b,1 $,1 a,1 a,2 12 b,1 a,2 b,1 $,1 a,2 b,8 10 8 a,2 a,11 11 9 b,5 1 b,10 b,5 b,9 a,4 4 5 7 3 6 2

  34. Time complexity for merging  Merge the tree using DFS takes O(n) time.  Compute the links d() takes O(n) time.  Compute L() takes O(n) time.  Unmerge takes O(n) time.

  35. Total time complexity of Farach ’ s algorithm  Stage 1: Time(n/2)+O(n)  Stage 2: O(n)  Stage 3: O(n)  Thus, Time(n) = Time(n/2)+O(n).  By solving the equation,  Time(n)= O(n).

  36. Disadvantage of suffix tree  Suffix tree is space inefficient. It requires O(n| Σ |log n) bits.  Manber and Myers (SIAM J. Comp 1993) proposes a new data structure, called suffix array, which has a similar functionality as suffix tree. Moreover, it only requires O(n log n) bits.

  37. Suffix Array (I) It is just sorted suffixes.  E.g. consider S = acacag$  Suffix Position SA[i] Suffix acacag$ 1 7 $ cacag$ 2 1 acacag$ acag$ 3 => 3 acag$ 5 cag$ 4 Sort ag$ ag$ 5 2 cacag$ g$ 6 4 cag$ $ 7 6 g$ Suffix array is an array of n indices. Thus, it takes O(n log n)  bits.

  38. Observation  The leaves of a suffix tree is in suffix array order. SA[i] Suffix $ a g ca 7 $ $ 7 6 1 acacag$ g c 3 acag$ a $ c 5 ag$ a g g 2 cacag$ 5 $ c $ a g 4 cag$ g $ 2 4 $ 6 g$ 1 3

  39. Linear time construction of suffix array from suffix tree  Recall that the suffix tree T of S[1..n] can be constructed in O(n) time.  Then, by “ lexical ” depth-first traversal of T , the suffix array of S is obtained.  This takes O(n) time.  However, the space used during construction is the same as that for suffix tree! This defeats the purpose of suffix array.  Today, we can build suffix array using O(n) bit space and O(n) time.

  40. range(T,Q)  For a pattern Q, its SA[i] Suffix occurrences in T form a 1 7 $ consecutive SA range. 2 1 acacag$ 3 3 acag$  Example: For T= acacag$, ca 4 5 ag$ occurs in SA[5] and SA[6]. 5 2 cacag$  Definition: 6 4 cag$  We called range(T,Q)=[st..ed] 7 6 g$ if  Q is a prefix of every T j for j=SA[st], SA[st+1], … , SA[ed]  where T j = j suffix of T = T[j..n].  Example: range(T,ca)=[5..6]

  41. Find occurrence of query Q in a string S using suffix array Input: (1) the suffix array of a string T of  length n and (2) a query Q of length m Aim: check if Q occurs in T  Idea: binary search! 

  42. Algorithm

  43. Example  Consider T = acacag$ i SA[i] Suffix  Pattern Q = acag 1 7 $ 2 1 acacag$ 3 3 acag$  L=1 4 5 ag$ 5 2 cacag$  R=7 6 4 cag$ 7 6 g$  M=(L+R)/2=4

  44. Example  Consider T = acacag$ i SA[i] Suffix  Pattern Q = acag 1 7 $ 2 1 acacag$  L=1 3 3 acag$ 4 5  R=7 ag$ 5 2 cacag$  M=(L+R)/2=4 6 4 cag$ 7 6 g$  suffix-SA[M] > Q.  Set R=M=4.

  45. Example  Consider T = acacag$ i SA[i] Suffix  Pattern Q = acag 1 7 $ 2 1 acacag$  L=1 3 3 acag$ 4 5  R=4 ag$ 5 2 cacag$  M=(L+R)/2=2 6 4 cag$ 7 6 g$  suffix-SA[M] < Q.  Set L=M=2.

  46. Example  Consider T = acacag$ i SA[i] Suffix  Pattern Q = acag 1 7 $ 2 1 acacag$ 3 3 acag$  L=2 4 5 ag$ 5 2 cacag$  R=4 6 4 cag$  M=(L+R)/2=3 7 6 g$  The pattern Q is found at SA[M]=3.

  47. Can we do better?  During each step of binary search,  we need to compare Q with a suffix using O(m) time, which is time consuming.  Can we do better?  We have the following observation.  Suppose LCP(Q, suffix-SA[L]) is l and LCP(Q, suffix-SA[R]) is r.  Then, LCP(Q, suffix-SA[M]) > min{l,r}.  Below, we describe how to utilize this observation to speedup the computation.

  48. Algorithm

  49. Example  Consider T = acacag$ i SA[i] Suffix  Pattern Q = acag 1 7 $ 2 1 acacag$ 3 3 acag$  L=1, l=0 4 5 ag$ 5 2 cacag$  R=7, r=0 6 4 cag$ 7 6 g$  mlr = min(l,r)=0  M=(L+R)/2=4

  50. Example Consider T = acacag$  i SA[i] Suffix Pattern Q = acag  1 7 $ L=1, l=0 2 1 acacag$  R=7, r=0 3 3 acag$  mlr = min(l,r)=0  4 5 ag$ M=(L+R)/2=4, m=1  5 2 cacag$ 6 4 cag$ The (m+1) char of suffix-SA[M] is g.  The (m+1) char of Q is c. 7 6 g$  So, suffix-SA[M] > Q.  Set R=M=4 and r=m=1. 

  51. Example Consider T = acacag$  i SA[i] Suffix Pattern Q = acag  1 7 $ L=1, l=0 2 1 acacag$  R=4, r=1 3 3 acag$  mlr = min(l,r)=0  4 5 ag$ M=(L+R)/2=2, m=3  5 2 cacag$ 6 4 cag$ The (m+1) char of suffix-SA[M] is c.  The (m+1) char of Q is g. 7 6 g$  So, suffix-SA[M] < Q.  Set L=M=2 and l=m=3. 

  52. Example  Consider T = acacag$ i SA[i] Suffix  Pattern Q = acag 1 7 $ 2 1 acacag$  L=2, l=3 3 3 acag$ 4 5  R=4, r=1 ag$ 5 2 cacag$  mlr = min(l,r)=1 6 4 cag$  M=(L+R)/2=3, m=4 7 6 g$  The pattern Q is found at SA[M]=3.

  53. Time analysis  Binary search will perform log n comparisons  Each comparison takes at most O(m) time  In the worst case, O(m log n) time.  Myers and Manber report that, in practice, the time is O(m + log n).

  54. Suffix array and suffix tree  We show one example of replacing suffix tree by suffix array  Note that most applications related to suffix tree can be solved using suffix array with some time blow up!  When space is limited, replacing suffix tree by suffix array is a good choice.

  55. The size is still too big!  Why?  DNA sequences can be very long! E.g. Fly: ~100M bases, Human: ~3G bases, Tree: ~9G bases  Storage to store indexing data structure for human genome Suffix Tree: ~40G bytes Suffix Array: ~13G bytes  Can we further reduce the space?

  56. Solution  Grossi, Vitter (STOC2000)  Compressed suffix array (CSA)  Ferragine, Manzini (FOCS2000)  FM-index  Both of them can be stored in O(n) bit space  For Human Genome  Both CSA and FM-index can be stored within 2G bytes.

  57. FM-index Consider a text T=acacag$  FM-index stores:  SA[i] Suffix T[SA[i]-1] The string BW=g$ccaaa A. 7 $ g C[x] = total no. of 1 acacag$ $ B. occurrences of each symbol 3 acag$ c less than x. E.g. C[a]=1, 5 ag$ c C[c]=4, C[g]=6, C[t]=7 2 cacag$ a A data-structure occ(x, i) C. 4 cag$ a which tells us the number of 6 g$ a occurrences of x in BW[1..i] using O(1) time.

  58. Data-structure for answering the occ(x, i) query? n/log 2 n 1 2 3 4 BW ……… log 2 n log n ……… 1 2 3 log n 00x0xxx0x0 Given the text BW[1..n], we divide BW[1..n] into buckets of size log 2 n.  For each bucket i = 1, … , n/log 2 n, we store  P[i] = number of x ’ s in BW[1.. ilog 2 n].  Each bucket is further subdivided into log n sub-buckets of size log n.  For each sub-bucket j of the bucket i, we store  Q[i][j] = number of x ’ s in the first j sub-buckets. 

  59. Data-structure for answering the occ(x, i) query?  We also need a lookup table rank(b,k)  b is any string of length (log n)/2  1 ≤ k ≤ (log n)/2  rank(b,k) = number of x in the first k characters of b. 1 2 3 ……………… (log n)/2 Number of entries 00 … 0 log n log n log n = = 2 n 2 2 2 … 1 1 2 ……………………… x0x..0 Each entry take s O (log log n ) bits … Total space xx … x = = O ( n log n log log n ) o ( n ) bits

  60. Space complexity of the occ() data-structure  P[1..n/log 2 n] uses O(n/log n) bits  Q[1..n/log 2 n][1..log n] uses O(n log log n / log n) bits  rank(b,k) uses 2 logn/2 (log n/2) = o(n) bits  In total, we use O(n log log n / log n) bits.

  61. How to compute occ(x,i)? n/log 2 n 1 2 3 4 BW ……… log 2 n log n ……… 1 2 3 log n 00x0xxx0x0 Suppose log n = 10.  To compute occ(x, 327),  The result is P[3]+Q[4][2]+rank(00x0x,5)+rank(xx0x0,2)  Hence, O(1) time to compute occ(x,i). 

  62. Size of FM-index  Structure A can be store in 2n bits  Structure B can be store in O(log n) bits  Structure C can be store in O(n log log n/log n) bits  In total, the size of FM-index is O(n) bits.

  63. Observation  C[x]+occ(x,i) is the number of SA[i] Suffix T[SA[i]-1] suffixes smaller than of xT SA[i] . 7 $ g 1 acacag$ $  Example: 3 acag$ c  C[c]=4 5 ag$ c  occ(c, 6)=2 2 cacag$ a  Number of suffixes smaller than 4 cag$ a cT[SA[6]..n]=ccag$ is 6. 6 g$ a

  64. Lemma  Suppose range(T,Q) is [st..ed].  Then, range(T,xQ) = [p..q] where  p = C[x] + occ(x,st-1) + 1  q = C[x] + occ(x,ed)  Proof:  p = 1+number of suffixes strictly smaller than xQ. The latter term = number of suffixes smaller than or equal to xT SA[st-1] .  q = number of suffixes smaller than or equal to xT SA[ed] .

  65. Backward search Given the text T and the FM-index, we want to determine if Q  exists in T. Algorithm BW_exist(Q[1..m]) x=Q[m], i=m-1; 1. /* find range(T,Q[m]) */ 2. st = C[x]+1, ed = C[x+1]; while (st ≤ ed and i>1) { 3. /* find range(T, Q[i-1..m]) */ x = Q[i-1];  st = C[x] + occ(x, st-1) + 1;  ed = C[x] + occ(x, ed);  i = i – 1;  } if st > ed, then pattern not found else pattern found. 4.

  66. Example  T = acacag$ SA[i] Suffix T[SA[i]-1] 7 $ g  Q = aca 1 acacag$ $ 3 acag$ c 5 ag$ c  Q[3..3]=a 2 cacag$ a  sp=C[a] +1 4 cag$ a =1+1=2 6 g$ a  ep=C[c] =4 g $ c c a a a

  67. Example  T = acacag$ SA[i] Suffix T[SA[i]-1] 7 $ g  Q = aca 1 acacag$ $ 3 acag$ c 5 ag$ c  Q[2..3]=ca 2 cacag$ a  st=C[c]+occ(c,st old -1)+1 4 cag$ a =4+0+1=5 6 g$ a  ed=C[c]+occ(c,ed old ) =4+2=6 g $ c c a a a

  68. Example  T = acacag$ SA[i] Suffix T[SA[i]-1] 7 $ g  Q = aca 1 acacag$ $ 3 acag$ c 5 ag$ c  Q[1..3]=aca 2 cacag$ a  st=C[a]+occ(a,st old -1)+1 4 cag$ a =1+0+1=2 6 g$ a  ed=C[a]+occ(a,ed old ) =1+2=3 g $ c c a a a

  69. Example  T = acacag$ SA[i] Suffix T[SA[i]-1] 7 $ g  Q = aca 1 acacag$ $ 3 acag$ c 5 ag$ c  Q[1..3]=aca 2 cacag$ a  st=C[a]+occ(a,st old -1)+1 4 cag$ a =1+0+1=2 Q occurrences in T! 6 g$ a  ed=C[a]+occ(a,ed old ) =1+2=3 g $ c c a a a

  70. Time complexity of Backward Search  To find a pattern Q[1..m]  Step 1, 2, and 4 can be computed in O(1) time.  For step 3,  We need to iterate the loop for m-1 times.  Each iteration of the loop can be computed in O(1) time.  The loop takes O(m) time.  In total, O(m) time for backward search.

  71. Conclusion  Suffix tree is a powerful data-structure which has a lot of applications in Computational Biology.  Problems:  Suffix tree is too big!  Can be solved using CSA and FM-index  Suffix tree can only solve exact match problem! (Most of the biology problems are approximate match!)  Many works have been done on this area! But still not pratical. One of the important area to explore!

Recommend


More recommend


Explore More Topics

Stay informed with curated content and fresh updates.