nanopore sequencing technology and tools for genome
play

Nanopore Sequencing Technology and Tools for Genome Assembly: - PowerPoint PPT Presentation

Nanopore Sequencing Technology and Tools for Genome Assembly: Computational Analysis of the Current State, Bottlenecks and Future Directions Damla Senol Cali, Jeremie S. Kim, Saugata Ghose, Can Alkan and Onur Mutlu Contact:


  1. Nanopore Sequencing Technology and Tools for Genome Assembly: Computational Analysis of the Current State, Bottlenecks and Future Directions Damla Senol Cali, Jeremie S. Kim, Saugata Ghose, Can Alkan and Onur Mutlu Contact: dsenol@andrew.cmu.edu February 16, 2019

  2. Nanopore Sequencing & Tools Damla Senol Cali, Jeremie S. Kim, Saugata Ghose, Can Alkan, and Onur Mutlu. "Nanopore Sequencing Technology and Tools for Genome Assembly: Computational Analysis of the Current State, Bottlenecks BiBVersion arXivVersion and Future Directions." Briefings in Bioinformatics (2018). Damla Senol Cali 2 02/16/2019

  3. Executive Summary q Motivation: Nanopore sequencing is an emerging and a promising technology with its ability to generate long reads and provide portability . q Problem: q High error rates of the technology q Critical importance of the tools to 1) overcome the high error rates of the technology, and 2) enable fast, real-time data analysis. q Goal: Analyze the multiple steps and the associated tools in the genome assembly pipeline using nanopore sequence data. q Key Contributions: o Analysis of the tools in multiple dimensions: accuracy , performance , memory usage and scalability . o New bottlenecks and tradeoffs that different combinations of tools lead to o Guidelines for both practitioners and tool developers Damla Senol Cali 3 02/16/2019

  4. Outline q Background and Motivation o Nanopore Sequencing Technology o Comparison with Prior Technologies o Nanopore Genome Assembly Pipeline o Our Goal q Experimental Methodology q Results and Analysis q Conclusion Damla Senol Cali 4 02/16/2019

  5. Nanopore Sequencing Technology q Nanopore sequencing is an emerging and a promising single-molecule DNA sequencing technology. q First nanopore sequencing device, MinION , made commercially available by Oxford Nanopore Technologies (ONT) in May 2014. o Inexpensive o Long read length (> 882 Kbp) o Produces data in real time o Pocket-sized and portable Damla Senol Cali 5 02/16/2019

  6. Nanopore Sequencing q Nanopore is a nano-scale hole. q In nanopore sequencers, an ionic current passes through the nanopores. q When the DNA strand passes through the nanopore, the sequencer measures the change in current . q This change is used to identify the bases in the strand with the help of different electrochemical structures of the different bases. Damla Senol Cali 6 02/16/2019

  7. Why Nanopore Sequencing? Nanopore Sequencing (Prior) High-Throughput Technology Sequencing Technologies q q Require an amplification step Do not require an amplification before the sequencing process, step before the sequencing q Require labeling of the DNA or process, q nucleotide for detection during Do not require any labeling of sequencing, the DNA or nucleotide for q Generate billions of short but detection during sequencing, accurate reads, q Allow sequencing of very long q Provide high throughput, high reads , and speed and low cost, q Provide portability, low cost and q Suffers from massive amount of high throughput. data and short reads, which poses q One major drawback: high error challenges due to the repetitive rates ( ∽ 10-15%) sequences in the genome. Damla Senol Cali 7 02/16/2019

  8. Nanopore Genome Assembly Pipeline Raw signal Basecalling data DNA reads Read-to-Read Overlap Finding Overlaps Assembly Assembly Draft assembly Read Mapping (Optional) Mappings of reads against Improved draft assembly Polishing (Optional) assembly Damla Senol Cali 8 02/16/2019

  9. Our Goal q Comprehensively analyze the multiple steps and the associated state-of-the-art tools in genome assembly pipelines using nanopore sequence data in terms of accuracy , performance , memory usage , and scalability . q Reveal bottlenecks and trade-offs that different combinations of tools lead to. q Provide guidelines for both practitioners , such that they can determine the appropriate tools and tool combinations that can satisfy their goals, and tool developers , such that they can make design choices to improve current and future tools. Damla Senol Cali 9 02/16/2019

  10. Outline q Background and Motivation q Experimental Methodology q Results and Analysis q Conclusion Damla Senol Cali 10 02/16/2019

  11. Experimental Methodology Damla Senol Cali 11 02/16/2019

  12. Experimental Methodology (cont.) Accuracy Metrics Performance Metrics q q Average Identity Wall clock time q o Percentage similarity between the assembly Peak memory usage q and the reference genome Parallel speedup o Higher ( ≃100% ) is preferred q Coverage o Ratio of the #aligned bases in the reference genome to the length of reference genome o Higher ( ≃100% ) is preferred q Number of mismatches o Total number of single-base differences between the assembly and the reference genome o Lower ( ≃0 ) is preferred q Number of indels o Total number of insertions and deletions between the assembly and the reference genome o Lower ( ≃0 ) is preferred Damla Senol Cali 12 02/16/2019

  13. Outline q Background and Motivation q Experimental Methodology q Results and Analysis o Basecalling Tools § Accuracy § Performance o Read-to-Read Overlap Finding Tools o Assembly Tools o Read Mapping and Polishing Tools (optional) q Conclusion Damla Senol Cali 13 02/16/2019

  14. Nanopore Genome Assembly Pipeline Raw signal Basecalling data Tools: Metrichor, Nanonet, Scrappie, Nanocall, DeepNano DNA reads Read-to-Read Overlap Finding Tools: GraphMap, Minimap Overlaps Assembly Assembly Tools: Canu, Miniasm Draft assembly Read Mapping Tools: BWA-MEM, Minimap, (GraphMap) Mappings of reads against Polishing Improved draft assembly Tools: Nanopolish, Racon assembly Damla Senol Cali 14 02/16/2019

  15. Basecalling Tools q Metrichor o ONT’s cloud-based basecaller o Uses recurrent neural networks ( RNN ) for basecalling q Nanonet o ONT’s offline and open-source alternative for Metrichor o Uses RNN for basecalling q Scrappie o ONT’s newest basecaller that explicitly addresses basecalling errors in homopolymer regions q Nanocall [David+, Bioinformatics 2016] o Uses Hidden Markov Models ( HMM ) for basecalling q DeepNano [Boža+, PloS One 2017] o Uses RNN for basecalling Damla Senol Cali 15 02/16/2019

  16. Nanopore Genome Assembly Pipeline Raw signal Basecalling data Tools: Metrichor, Nanonet, Scrappie, Nanocall, DeepNano DNA reads Pipeline A: [Basecalling tool] Read-to-Read Overlap Finding + Canu Tools: GraphMap, Minimap Pipeline B: [Basecalling tool] Overlaps + GraphMap + Miniasm Assembly Pipeline C: [Basecalling tool] Assembly Tools: Canu, Miniasm + Minimap + Miniasm Draft assembly Read Mapping Tools: BWA-MEM, Minimap, (GraphMap) Mappings of reads against Polishing Improved draft assembly Tools: Nanopolish, Racon assembly Damla Senol Cali 16 02/16/2019

  17. Basecalling –Accuracy Accuracy An Ac Analysis Re Results for Ba Basecalling Tools 100 100 450 450 90 90 400 400 80 80 350 350 70 70 300 300 Percentage (%) Percentage (%) 60 60 250 250 KBp) # (KBp 50 50 200 200 # 40 40 150 150 30 30 100 100 20 20 50 50 10 10 0 0 Metrichor Scrappie Nanocall DeepNano Nanonet Observation 1-a: Metrichor, Nanonet and Scrappie have similar A A B B C C A A B B C C A A B B C C A A B B C C A A B B C C L L . . L L . . L L . . L L . . L L . . L L . . L L . . L L . . L L . . L L . . L L . . L L . . L L . . L L . . L L . . P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P identity and coverage trends among all of the evaluated Iden entity (%) Cov over erage e (%) # Mismatches es # Indel els scenarios. Damla Senol Cali 17 02/16/2019

  18. Basecalling –Accuracy Accuracy An Ac Analysis Re Results for Ba Basecalling Tools 100 100 450 450 90 90 400 400 80 80 350 350 70 70 300 300 Percentage (%) Percentage (%) 60 60 250 250 KBp) # (KBp 50 50 200 200 # 40 40 150 150 30 30 100 100 20 20 50 50 10 10 0 0 Metrichor Scrappie Nanocall DeepNano Nanonet Observation 1-b: However, Nanocall and DeepNano cannot A A B B C C A A B B C C A A B B C C A A B B C C A A B B C C L L . . L L . . L L . . L L . . L L . . L L . . L L . . L L . . L L . . L L . . L L . . L L . . L L . . L L . . L L . . P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P reach these three basecallers’ accuracies: they have lower identity Iden entity (%) Cov over erage e (%) # Mismatches es # Indel els and lower coverage . Damla Senol Cali 18 02/16/2019

Recommend


More recommend