enhanced de bruijn graphs
play

Enhanced de Bruijn Graphs Pierre MORISSE - PowerPoint PPT Presentation

Enhanced de Bruijn Graphs Pierre MORISSE pierre.morisse2@univ-rouen.fr Supervisors: Thierry LECROQ and Arnaud LEFEBVRE Laboratoire dInformatique, de Traitement de lInformation et des Syst` emes September 14, 2017 Introduction Classical


  1. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion de Bruijn graph A de Bruijn graph of order k is a graph structure that allows to compute overlaps of constant length k − 1 between the k -mers of the reads of a given set. Formal definition For a set of reads R = { r 1 , r 2 ,..., r n } , DBG k ( R ) = ( V , E ) such as: V : { w ; | w | = k and ∃ i ; w ∈ Fact ( r i ) } E : { ( s , d ); s , d ∈ V and suff k − 1 ( s ) = pref k − 1 ( d ) } P. Morisse Enhanced de Bruin Graphs 9/35

  2. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion de Bruijn graph Example With the set of reads S = { AGCTTACA, CTTACGTA, GTATACTG } , we obtain the following de Bruijn graph of order 6: AGCTTA GCTTAC CTTACA CTTACG TTACGT TACGTA GTATAC TATACT ATACTG Drawback Faces difficulties with locally insufficient coverage. P. Morisse Enhanced de Bruin Graphs 10/35

  3. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion de Bruijn graph Example With the set of reads S = { AGCTTACA, CTTACGTA, GTATACTG } , we obtain the following de Bruijn graph of order 6: AGCTTA GCTTAC CTTACA CTTACG TTACGT TACGTA GTATAC TATACT ATACTG Drawback Faces difficulties with locally insufficient coverage. P. Morisse Enhanced de Bruin Graphs 10/35

  4. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Introduction 1 2 Classical graph structures Enhanced de Bruijn graph 3 PgSA 4 HG-CoLoR 5 6 Conclusion P. Morisse Enhanced de Bruin Graphs 11/35

  5. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Multiple de Bruijn graphs Usually, multiple de Bruijn graphs of different orders are built Requires a different graph for each order Consumes large amounts of memory P. Morisse Enhanced de Bruin Graphs 12/35

  6. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Multiple de Bruijn graphs Usually, multiple de Bruijn graphs of different orders are built Requires a different graph for each order Consumes large amounts of memory P. Morisse Enhanced de Bruin Graphs 12/35

  7. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Multiple de Bruijn graphs Usually, multiple de Bruijn graphs of different orders are built Requires a different graph for each order Consumes large amounts of memory P. Morisse Enhanced de Bruin Graphs 12/35

  8. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Enhanced de Bruijn graph Idea Enhance the de Bruijn graph with the capability of computing overlaps of variable lengths between the k -mers, in an overlap graph fashion, in order to avoid building multiple de Bruijn graphs of different orders. Formal definition For a set of reads R = { r 1 , r 2 ,..., r n } , eDBG k , m ( R ) = ( V , E ) such as: V : { w ; | w | = k and ∃ i ; w ∈ Fact ( r i ) } E : { ( s , l , d ); s , d ∈ V ; m ≤ l ≤ k − 1 and suff l ( s ) = pref l ( d ) } P. Morisse Enhanced de Bruin Graphs 13/35

  9. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Enhanced de Bruijn graph Idea Enhance the de Bruijn graph with the capability of computing overlaps of variable lengths between the k -mers, in an overlap graph fashion, in order to avoid building multiple de Bruijn graphs of different orders. Formal definition For a set of reads R = { r 1 , r 2 ,..., r n } , eDBG k , m ( R ) = ( V , E ) such as: V : { w ; | w | = k and ∃ i ; w ∈ Fact ( r i ) } E : { ( s , l , d ); s , d ∈ V ; m ≤ l ≤ k − 1 and suff l ( s ) = pref l ( d ) } P. Morisse Enhanced de Bruin Graphs 13/35

  10. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Enhanced de Bruijn graph Idea Enhance the de Bruijn graph with the capability of computing overlaps of variable lengths between the k -mers, in an overlap graph fashion, in order to avoid building multiple de Bruijn graphs of different orders. Formal definition For a set of reads R = { r 1 , r 2 ,..., r n } , eDBG k , m ( R ) = ( V , E ) such as: V : { w ; | w | = k and ∃ i ; w ∈ Fact ( r i ) } E : { ( s , l , d ); s , d ∈ V ; m ≤ l ≤ k − 1 and suff l ( s ) = pref l ( d ) } P. Morisse Enhanced de Bruin Graphs 13/35

  11. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Enhanced de Bruijn graph Idea Enhance the de Bruijn graph with the capability of computing overlaps of variable lengths between the k -mers, in an overlap graph fashion, in order to avoid building multiple de Bruijn graphs of different orders. Formal definition For a set of reads R = { r 1 , r 2 ,..., r n } , eDBG k , m ( R ) = ( V , E ) such as: V : { w ; | w | = k and ∃ i ; w ∈ Fact ( r i ) } E : { ( s , l , d ); s , d ∈ V ; m ≤ l ≤ k − 1 and suff l ( s ) = pref l ( d ) } P. Morisse Enhanced de Bruin Graphs 13/35

  12. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Enhanced de Bruijn graph Example With the set of reads S = { AGCTTACA, CTTACGTA, GTATACTG } , we obtain the following enhanced de Bruijn graph of order 6,3: AGCTTA 4 3 5 5 4 GCTTAC CTTACA 3 4 4 5 3 5 5 5 5 CTTACG TTACGT TACGTA GTATAC TATACT ATACTG 3 4 P. Morisse Enhanced de Bruin Graphs 14/35

  13. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Enhanced de Bruijn graph Example With the set of reads S = { AGCTTACA, CTTACGTA, GTATACTG } , we obtain the following enhanced de Bruijn graph of order 6,3: AGCTTA 4 3 5 5 4 GCTTAC CTTACA 3 4 4 5 3 5 5 5 5 CTTACG TTACGT TACGTA GTATAC TATACT ATACTG 3 4 P. Morisse Enhanced de Bruin Graphs 14/35

  14. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Enhanced de Bruijn graph Example With the set of reads S = { AGCTTACA, CTTACGTA, GTATACTG } , we obtain the following enhanced de Bruijn graph of order 6,3: AGCTTA 4 3 5 5 4 GCTTAC CTTACA 3 4 4 5 3 5 5 5 5 CTTACG TTACGT TACGTA GTATAC TATACT ATACTG 3 4 P. Morisse Enhanced de Bruin Graphs 14/35

  15. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Construction The enhanced de Bruijn graph does not need to be explicitly built It can be traversed with the help of an index structure: All the k -mers from the reads are stored in the index The index is queried to retrieve the edges Makes backwards traversal easy P. Morisse Enhanced de Bruin Graphs 15/35

  16. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Construction The enhanced de Bruijn graph does not need to be explicitly built It can be traversed with the help of an index structure: All the k -mers from the reads are stored in the index The index is queried to retrieve the edges Makes backwards traversal easy P. Morisse Enhanced de Bruin Graphs 15/35

  17. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Construction The enhanced de Bruijn graph does not need to be explicitly built It can be traversed with the help of an index structure: All the k -mers from the reads are stored in the index The index is queried to retrieve the edges Makes backwards traversal easy P. Morisse Enhanced de Bruin Graphs 15/35

  18. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Construction The enhanced de Bruijn graph does not need to be explicitly built It can be traversed with the help of an index structure: All the k -mers from the reads are stored in the index The index is queried to retrieve the edges Makes backwards traversal easy P. Morisse Enhanced de Bruin Graphs 15/35

  19. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Construction The enhanced de Bruijn graph does not need to be explicitly built It can be traversed with the help of an index structure: All the k -mers from the reads are stored in the index The index is queried to retrieve the edges Makes backwards traversal easy P. Morisse Enhanced de Bruin Graphs 15/35

  20. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Introduction 1 2 Classical graph structures Enhanced de Bruijn graph 3 PgSA 4 HG-CoLoR 5 6 Conclusion P. Morisse Enhanced de Bruin Graphs 16/35

  21. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Definition PgSA [Kowalski et al., 2015] is a data structure that allows the indexing of a set of reads, in order to answer the following queries on the reads, for a given string f : In which reads does f occur? 1 In how many reads does f occur? 2 What are the occurrences positions of f ? 3 What is the number of occurrences of f ? 4 In which reads does f occur only once? 5 In how many reads does f occur only once? 6 What are the occurrences positions of f in the reads where it 7 occurs only once? P. Morisse Enhanced de Bruin Graphs 17/35

  22. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Definition PgSA [Kowalski et al., 2015] is a data structure that allows the indexing of a set of reads, in order to answer the following queries on the reads, for a given string f : In which reads does f occur? 1 In how many reads does f occur? 2 What are the occurrences positions of f ? 3 What is the number of occurrences of f ? 4 In which reads does f occur only once? 5 In how many reads does f occur only once? 6 What are the occurrences positions of f in the reads where it 7 occurs only once? P. Morisse Enhanced de Bruin Graphs 17/35

  23. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Definition PgSA [Kowalski et al., 2015] is a data structure that allows the indexing of a set of reads, in order to answer the following queries on the reads, for a given string f : In which reads does f occur? 1 In how many reads does f occur? 2 What are the occurrences positions of f ? 3 What is the number of occurrences of f ? 4 In which reads does f occur only once? 5 In how many reads does f occur only once? 6 What are the occurrences positions of f in the reads where it 7 occurs only once? P. Morisse Enhanced de Bruin Graphs 17/35

  24. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Definition PgSA [Kowalski et al., 2015] is a data structure that allows the indexing of a set of reads, in order to answer the following queries on the reads, for a given string f : In which reads does f occur? 1 In how many reads does f occur? 2 What are the occurrences positions of f ? 3 What is the number of occurrences of f ? 4 In which reads does f occur only once? 5 In how many reads does f occur only once? 6 What are the occurrences positions of f in the reads where it 7 occurs only once? P. Morisse Enhanced de Bruin Graphs 17/35

  25. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Definition PgSA [Kowalski et al., 2015] is a data structure that allows the indexing of a set of reads, in order to answer the following queries on the reads, for a given string f : In which reads does f occur? 1 In how many reads does f occur? 2 What are the occurrences positions of f ? 3 What is the number of occurrences of f ? 4 In which reads does f occur only once? 5 In how many reads does f occur only once? 6 What are the occurrences positions of f in the reads where it 7 occurs only once? P. Morisse Enhanced de Bruin Graphs 17/35

  26. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Definition PgSA [Kowalski et al., 2015] is a data structure that allows the indexing of a set of reads, in order to answer the following queries on the reads, for a given string f : In which reads does f occur? 1 In how many reads does f occur? 2 What are the occurrences positions of f ? 3 What is the number of occurrences of f ? 4 In which reads does f occur only once? 5 In how many reads does f occur only once? 6 What are the occurrences positions of f in the reads where it 7 occurs only once? P. Morisse Enhanced de Bruin Graphs 17/35

  27. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Definition PgSA [Kowalski et al., 2015] is a data structure that allows the indexing of a set of reads, in order to answer the following queries on the reads, for a given string f : In which reads does f occur? 1 In how many reads does f occur? 2 What are the occurrences positions of f ? 3 What is the number of occurrences of f ? 4 In which reads does f occur only once? 5 In how many reads does f occur only once? 6 What are the occurrences positions of f in the reads where it 7 occurs only once? P. Morisse Enhanced de Bruin Graphs 17/35

  28. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Definition PgSA [Kowalski et al., 2015] is a data structure that allows the indexing of a set of reads, in order to answer the following queries on the reads, for a given string f : In which reads does f occur? 1 In how many reads does f occur? 2 What are the occurrences positions of f ? 3 What is the number of occurrences of f ? 4 In which reads does f occur only once? 5 In how many reads does f occur only once? 6 What are the occurrences positions of f in the reads where it 7 occurs only once? P. Morisse Enhanced de Bruin Graphs 17/35

  29. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Index construction Concatenation of the reads, with respect to their overlaps Ex: ACGT + GTGG ⇒ ACGTGG Construction of the sparse suffix array of the obtained pseudogenome Construction of an auxiliary array Queries are handled by a binary search over the suffix array, and with the help of the auxiliary array P. Morisse Enhanced de Bruin Graphs 18/35

  30. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Index construction Concatenation of the reads, with respect to their overlaps Ex: ACGT + GTGG ⇒ ACGTGG Construction of the sparse suffix array of the obtained pseudogenome Construction of an auxiliary array Queries are handled by a binary search over the suffix array, and with the help of the auxiliary array P. Morisse Enhanced de Bruin Graphs 18/35

  31. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Index construction Concatenation of the reads, with respect to their overlaps Ex: ACGT + GTGG ⇒ ACGTGG Construction of the sparse suffix array of the obtained pseudogenome Construction of an auxiliary array Queries are handled by a binary search over the suffix array, and with the help of the auxiliary array P. Morisse Enhanced de Bruin Graphs 18/35

  32. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Index construction Concatenation of the reads, with respect to their overlaps Ex: ACGT + GTGG ⇒ ACGTGG Construction of the sparse suffix array of the obtained pseudogenome Construction of an auxiliary array Queries are handled by a binary search over the suffix array, and with the help of the auxiliary array P. Morisse Enhanced de Bruin Graphs 18/35

  33. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Index construction Concatenation of the reads, with respect to their overlaps Ex: ACGT + GTGG ⇒ ACGTGG Construction of the sparse suffix array of the obtained pseudogenome Construction of an auxiliary array Queries are handled by a binary search over the suffix array, and with the help of the auxiliary array P. Morisse Enhanced de Bruin Graphs 18/35

  34. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Index construction Concatenation of the reads, with respect to their overlaps Ex: ACGT + GTGG ⇒ ACGTGG Construction of the sparse suffix array of the obtained pseudogenome Construction of an auxiliary array Queries are handled by a binary search over the suffix array, and with the help of the auxiliary array P. Morisse Enhanced de Bruin Graphs 18/35

  35. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Traversal of the enhanced de Bruijn graph Extract the k -mers of the reads Build the index of the k -mers Query the index, looping over the third query (what are the occurrences positions of f ?), to retrieve the edges P. Morisse Enhanced de Bruin Graphs 19/35

  36. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Traversal of the enhanced de Bruijn graph Extract the k -mers of the reads Build the index of the k -mers Query the index, looping over the third query (what are the occurrences positions of f ?), to retrieve the edges P. Morisse Enhanced de Bruin Graphs 19/35

  37. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Traversal of the enhanced de Bruijn graph Extract the k -mers of the reads Build the index of the k -mers Query the index, looping over the third query (what are the occurrences positions of f ?), to retrieve the edges P. Morisse Enhanced de Bruin Graphs 19/35

  38. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Traversal of the enhanced de Bruijn graph Example Traversing the previous enhanced de Bruijn graph: 4 AGCTTA 3 5 5 4 GCTTAC CTTACA 3 4 4 5 3 5 5 5 5 CTTACG TTACGT TACGTA GTATAC TATACT ATACTG 3 4 P. Morisse Enhanced de Bruin Graphs 20/35

  39. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Traversal of the enhanced de Bruijn graph Example k -mers set GCTTAC 1: AGCTTA 5 2: ATACTG 3: CTTACA PgSA 3 4: CTTACG AGCTTA TTACGT Index 5: GCTTAC 4 6: GTATAC 4 7: TACGTA CTTACG 8: TATACT 9: TTACGT CTTACA P. Morisse Enhanced de Bruin Graphs 21/35

  40. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Traversal of the enhanced de Bruijn graph Example k -mers set GCTTAC 1: AGCTTA 5 2: ATACTG 3: CTTACA PgSA 3 4: CTTACG AGCTTA TTACGT Index 5: GCTTAC 4 6: GTATAC 4 7: TACGTA CTTACG 8: TATACT 9: TTACGT CTTACA P. Morisse Enhanced de Bruin Graphs 21/35

  41. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Traversal of the enhanced de Bruijn graph Example k -mers set GCTTAC Occurrences positions? 1: AGCTTA 5 2: ATACTG 3: CTTACA PgSA 3 4: CTTACG AGCTTA TTACGT Index 5: GCTTAC 4 6: GTATAC 4 7: TACGTA CTTACG 8: TATACT 9: TTACGT CTTACA P. Morisse Enhanced de Bruin Graphs 21/35

  42. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Traversal of the enhanced de Bruijn graph Example k -mers set GCTTAC Occurrences positions? 1: AGCTTA 5 2: ATACTG 3: CTTACA PgSA 3 4: CTTACG AGCTTA TTACGT Index 5: GCTTAC 4 6: GTATAC 4 7: TACGTA CTTACG 8: TATACT { (1,1) (5,0) } 9: TTACGT CTTACA P. Morisse Enhanced de Bruin Graphs 21/35

  43. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Traversal of the enhanced de Bruijn graph Example k -mers set GCTTAC 1: AGCTTA 5 2: ATACTG 3: CTTACA PgSA 3 4: CTTACG AGCTTA TTACGT Index 5: GCTTAC 4 6: GTATAC 4 7: TACGTA CTTACG 8: TATACT { (1,1) (5,0) } 9: TTACGT CTTACA P. Morisse Enhanced de Bruin Graphs 21/35

  44. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Traversal of the enhanced de Bruijn graph Example k -mers set GCTTAC 1: AGCTTA 5 2: ATACTG 3: CTTACA PgSA 3 4: CTTACG AGCTTA TTACGT Index 5: GCTTAC 4 6: GTATAC 4 7: TACGTA CTTACG 8: TATACT { (1,1) (5,0) } 9: TTACGT CTTACA P. Morisse Enhanced de Bruin Graphs 21/35

  45. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Traversal of the enhanced de Bruijn graph Example k -mers set GCTTAC Occurrences positions? 1: AGCTTA 5 2: ATACTG 3: CTTACA PgSA 3 4: CTTACG AGCTTA TTACGT Index 5: GCTTAC 4 6: GTATAC 4 7: TACGTA CTTACG 8: TATACT 9: TTACGT CTTACA P. Morisse Enhanced de Bruin Graphs 21/35

  46. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Traversal of the enhanced de Bruijn graph Example k -mers set GCTTAC Occurrences positions? 1: AGCTTA 5 2: ATACTG 3: CTTACA PgSA 3 4: CTTACG AGCTTA TTACGT Index 5: GCTTAC 4 6: GTATAC 4 7: TACGTA CTTACG 8: TATACT { (1,2) ; (3,0) ; (4,0) ; (5,1) } 9: TTACGT CTTACA P. Morisse Enhanced de Bruin Graphs 21/35

  47. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Traversal of the enhanced de Bruijn graph Example k -mers set GCTTAC 1: AGCTTA 5 2: ATACTG 3: CTTACA PgSA 3 4: CTTACG AGCTTA TTACGT Index 5: GCTTAC 4 6: GTATAC 4 7: TACGTA CTTACG 8: TATACT { (1,2) ; (3,0) ; (4,0) ; (5,1) } 9: TTACGT CTTACA P. Morisse Enhanced de Bruin Graphs 21/35

  48. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Traversal of the enhanced de Bruijn graph Example k -mers set GCTTAC 1: AGCTTA 5 2: ATACTG 3: CTTACA PgSA 3 4: CTTACG AGCTTA TTACGT Index 5: GCTTAC 4 6: GTATAC 4 7: TACGTA CTTACG 8: TATACT { (1,2) ; (3,0) ; (4,0) ; (5,1) } 9: TTACGT CTTACA P. Morisse Enhanced de Bruin Graphs 21/35

  49. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Traversal of the enhanced de Bruijn graph Example k -mers set GCTTAC 1: AGCTTA 5 2: ATACTG 3: CTTACA PgSA 3 4: CTTACG AGCTTA TTACGT Index 5: GCTTAC 4 6: GTATAC 4 7: TACGTA CTTACG 8: TATACT { (1,2) ; (3,0) ; (4,0) ; (5,1) } 9: TTACGT CTTACA P. Morisse Enhanced de Bruin Graphs 21/35

  50. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Traversal of the enhanced de Bruijn graph Example k -mers set GCTTAC 1: AGCTTA 5 2: ATACTG 3: CTTACA PgSA 3 4: CTTACG AGCTTA TTACGT Index 5: GCTTAC 4 6: GTATAC 4 7: TACGTA CTTACG 8: TATACT { (1,2) ; (3,0) ; (4,0) ; (5,1) } 9: TTACGT CTTACA P. Morisse Enhanced de Bruin Graphs 21/35

  51. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Traversal of the enhanced de Bruijn graph Example k -mers set GCTTAC Occurrences positions? 1: AGCTTA 5 2: ATACTG 3: CTTACA PgSA 3 4: CTTACG AGCTTA TTACGT Index 5: GCTTAC 4 6: GTATAC 4 7: TACGTA CTTACG 8: TATACT 9: TTACGT CTTACA P. Morisse Enhanced de Bruin Graphs 21/35

  52. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Traversal of the enhanced de Bruijn graph Example k -mers set GCTTAC Occurrences positions? 1: AGCTTA 5 2: ATACTG 3: CTTACA PgSA 3 4: CTTACG AGCTTA TTACGT Index 5: GCTTAC 4 6: GTATAC 4 7: TACGTA { (1,3) ; (3,1) ; (4,1) ; CTTACG 8: TATACT (5,2) ; (9,0) } 9: TTACGT CTTACA P. Morisse Enhanced de Bruin Graphs 21/35

  53. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Traversal of the enhanced de Bruijn graph Example k -mers set GCTTAC 1: AGCTTA 5 2: ATACTG 3: CTTACA PgSA 3 4: CTTACG AGCTTA TTACGT Index 5: GCTTAC 4 6: GTATAC 4 7: TACGTA { (1,3) ; (3,1) ; (4,1) ; CTTACG 8: TATACT (5,2) ; (9,0) } 9: TTACGT CTTACA P. Morisse Enhanced de Bruin Graphs 21/35

  54. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Traversal of the enhanced de Bruijn graph Example k -mers set GCTTAC 1: AGCTTA 5 2: ATACTG 3: CTTACA PgSA 3 4: CTTACG AGCTTA TTACGT Index 5: GCTTAC 4 6: GTATAC 4 7: TACGTA { (1,3) ; (3,1) ; (4,1) ; CTTACG 8: TATACT (5,2) ; (9,0) } 9: TTACGT CTTACA P. Morisse Enhanced de Bruin Graphs 21/35

  55. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Traversal of the enhanced de Bruijn graph Example k -mers set GCTTAC 1: AGCTTA 5 2: ATACTG 3: CTTACA PgSA 3 4: CTTACG AGCTTA TTACGT Index 5: GCTTAC 4 6: GTATAC 4 7: TACGTA { (1,3) ; (3,1) ; (4,1) ; CTTACG 8: TATACT (5,2) ; (9,0) } 9: TTACGT CTTACA P. Morisse Enhanced de Bruin Graphs 21/35

  56. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Traversal of the enhanced de Bruijn graph Example k -mers set GCTTAC 1: AGCTTA 5 2: ATACTG 3: CTTACA PgSA 3 4: CTTACG AGCTTA TTACGT Index 5: GCTTAC 4 6: GTATAC 4 7: TACGTA { (1,3) ; (3,1) ; (4,1) ; CTTACG 8: TATACT (5,2) ; (9,0) } 9: TTACGT CTTACA P. Morisse Enhanced de Bruin Graphs 21/35

  57. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Traversal of the enhanced de Bruijn graph Example k -mers set GCTTAC 1: AGCTTA 5 2: ATACTG 3: CTTACA PgSA 3 4: CTTACG AGCTTA TTACGT Index 5: GCTTAC 4 6: GTATAC 4 7: TACGTA { (1,3) ; (3,1) ; (4,1) ; CTTACG 8: TATACT (5,2) ; (9,0) } 9: TTACGT CTTACA P. Morisse Enhanced de Bruin Graphs 21/35

  58. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Introduction 1 2 Classical graph structures Enhanced de Bruijn graph 3 PgSA 4 HG-CoLoR 5 6 Conclusion P. Morisse Enhanced de Bruin Graphs 22/35

  59. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Context Due to their high error rate, error correction of long reads is mandatory Various methods already exist for the correction of short reads, but are not applicable to long reads Forces the development of new error correction methods Two main categories: self-correction and hybrid correction P. Morisse Enhanced de Bruin Graphs 23/35

  60. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Context Due to their high error rate, error correction of long reads is mandatory Various methods already exist for the correction of short reads, but are not applicable to long reads Forces the development of new error correction methods Two main categories: self-correction and hybrid correction P. Morisse Enhanced de Bruin Graphs 23/35

  61. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Context Due to their high error rate, error correction of long reads is mandatory Various methods already exist for the correction of short reads, but are not applicable to long reads Forces the development of new error correction methods Two main categories: self-correction and hybrid correction P. Morisse Enhanced de Bruin Graphs 23/35

  62. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Context Due to their high error rate, error correction of long reads is mandatory Various methods already exist for the correction of short reads, but are not applicable to long reads Forces the development of new error correction methods Two main categories: self-correction and hybrid correction P. Morisse Enhanced de Bruin Graphs 23/35

  63. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Workflow 5 steps: Correct the short reads 1 Align the short reads on the long reads, to find seeds 2 Merge the overlapping seeds 3 Link the seeds, by traversing the enhanced de Bruijn graph 4 Extend the obtained corrected long read, on the left (resp. right) 5 of the leftmost (resp. rightmost) seed P. Morisse Enhanced de Bruin Graphs 24/35

  64. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Workflow 5 steps: Correct the short reads 1 Align the short reads on the long reads, to find seeds 2 Merge the overlapping seeds 3 Link the seeds, by traversing the enhanced de Bruijn graph 4 Extend the obtained corrected long read, on the left (resp. right) 5 of the leftmost (resp. rightmost) seed P. Morisse Enhanced de Bruin Graphs 24/35

  65. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Workflow 5 steps: Correct the short reads 1 Align the short reads on the long reads, to find seeds 2 Merge the overlapping seeds 3 Link the seeds, by traversing the enhanced de Bruijn graph 4 Extend the obtained corrected long read, on the left (resp. right) 5 of the leftmost (resp. rightmost) seed P. Morisse Enhanced de Bruin Graphs 24/35

  66. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Workflow 5 steps: Correct the short reads 1 Align the short reads on the long reads, to find seeds 2 Merge the overlapping seeds 3 Link the seeds, by traversing the enhanced de Bruijn graph 4 Extend the obtained corrected long read, on the left (resp. right) 5 of the leftmost (resp. rightmost) seed P. Morisse Enhanced de Bruin Graphs 24/35

  67. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Workflow 5 steps: Correct the short reads 1 Align the short reads on the long reads, to find seeds 2 Merge the overlapping seeds 3 Link the seeds, by traversing the enhanced de Bruijn graph 4 Extend the obtained corrected long read, on the left (resp. right) 5 of the leftmost (resp. rightmost) seed P. Morisse Enhanced de Bruin Graphs 24/35

  68. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Step 4: Seeds linking Seeds are used as anchor points on the enhanced de Bruijn graph The graph is traversed to link together the seeds and assemble the k -mers P. Morisse Enhanced de Bruin Graphs 25/35

  69. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Step 4: Seeds linking Seeds are used as anchor points on the enhanced de Bruijn graph The graph is traversed to link together the seeds and assemble the k -mers P. Morisse Enhanced de Bruin Graphs 25/35

  70. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Step 4: Seeds linking long read seed 1 seed 2 seed 3 k − 1 k − 2 . . . k − 1 . . . k − 2 k − 1 . . . k − 1 . . . src dst k − 3 . . . k − 2 k − 3 . . . P. Morisse Enhanced de Bruin Graphs 26/35

  71. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Step 4: Seeds linking long read src dst seed 3 k − 1 k − 2 . . . k − 1 . . . k − 2 k − 1 . . . k − 1 . . . src dst k − 3 . . . k − 2 k − 3 . . . P. Morisse Enhanced de Bruin Graphs 26/35

  72. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Step 4: Seeds linking long read src dst seed 3 k − 1 k − 2 . . . k − 1 . . . k − 2 k − 1 . . . k − 1 . . . src dst k − 3 . . . k − 2 k − 3 . . . P. Morisse Enhanced de Bruin Graphs 26/35

  73. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Step 4: Seeds linking long read src dst seed 3 k − 1 k − 2 . . . k − 1 . . . k − 2 k − 1 . . . k − 1 . . . src src dst k − 3 . . . k − 2 k − 3 . . . P. Morisse Enhanced de Bruin Graphs 26/35

  74. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Step 4: Seeds linking long read src dst seed 3 k − 1 k − 2 . . . k − 1 . . . k − 2 k − 1 . . . k − 1 . . . src src dst k − 3 . . . k − 2 k − 3 . . . P. Morisse Enhanced de Bruin Graphs 26/35

  75. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Step 4: Seeds linking long read src dst seed 3 k − 1 k − 2 . . . k − 1 . . . k − 2 k − 1 . . . k − 1 . . . src dst k − 3 . . . k − 2 k − 3 . . . P. Morisse Enhanced de Bruin Graphs 26/35

  76. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Step 4: Seeds linking long read src dst seed 3 k − 1 k − 2 . . . k − 1 . . . k − 2 k − 1 . . . k − 1 . . . src dst k − 3 . . . k − 2 k − 3 . . . P. Morisse Enhanced de Bruin Graphs 26/35

  77. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Step 4: Seeds linking long read src dst seed 3 k − 1 k − 2 . . . k − 1 . . . k − 2 k − 1 . . . k − 1 . . . src dst k − 3 k − 2 . . . k − 3 . . . P. Morisse Enhanced de Bruin Graphs 26/35

  78. Introduction Classical graph structures Enhanced de Bruijn graph PgSA HG-CoLoR Conclusion Step 4: Seeds linking long read src dst seed 3 k − 1 k − 2 . . . k − 1 . . . k − 2 k − 1 . . . k − 1 . . . src dst k − 3 . . . k − 2 k − 3 . . . P. Morisse Enhanced de Bruin Graphs 26/35

Recommend


More recommend