virana a standardized analysis of viral next generation
play

VIRANA: A Standardized Analysis of Viral Next Generation Sequencing - PowerPoint PPT Presentation

VIRANA: A Standardized Analysis of Viral Next Generation Sequencing Data Bastian Beggel Max-Planck-Institute for Informatics Saarbrcken Improvements in the rate of DNA sequencing Source: Stratton et al., Nature 2009 Bastian Beggel Slide 2


  1. VIRANA: A Standardized Analysis of Viral Next Generation Sequencing Data Bastian Beggel Max-Planck-Institute for Informatics Saarbrücken

  2. Improvements in the rate of DNA sequencing Source: Stratton et al., Nature 2009 Bastian Beggel Slide 2

  3. Reduction in the cost of DNA sequencing Bastian Beggel Slide 3

  4. Result set of viral NGS data analysis NGS datasets haplotype level position-wise read level 60% ATATC…GATCG Pileup 20% ATATC…TATCG 10% ATATC…TATCG • HIV tropism • Hypermutation • Dual infections Bastian Beggel Slide 4

  5. Standardized processing of next-generation sequencing data Standardized Custom Pre-processing Analysis Analysis Quality Control Coverage HIV Tropism • Withdraw/ clip bad • Number of mapped • g2p[454] quality reads reads per position Hypermutation Map to reference Pileups/ Dynamics • Classify reads as • Select reference • Summarize data hypermutated • Solve alignment position-wise problem • Analysis of changes Statistics • Correlate NGS data Haplotypes with clinical • Raw output parameters • Visualization • Complexity Bastian Beggel Slide 5

  6. VIRANA as a Web-Service Upload Download summary sequence data statistics and plots Bastian Beggel Slide 6

  7. Pileups summarize NGS data position-wise ID Pos Ref. NT G A T C Cov. A_BL 1 T 0.1% 32.9% 66.8% 0.1% 8300 A_BL 2 G 99.6% 0.2% 0.0% 0.0% 8305 A_BL 3 T 0.1% 0.0% 99.4% 0.4% 8331 A_BL 4 T 0.0% 0.0% 99.7% 0.1% 8334 A_BL 5 G 98.4% 0.6% 0.1% 0.0% 8338 Error model LoFreq (Wilm et al., 2012) • Modeling biases in sequencing error rates • Uses base-specific quality scores (phred scores) • Poisson–binomial distribution Bastian Beggel Slide 7

  8. Dynamic pileups to analyze changes over time ID1 ID1 Pos Ref. NT dG dA dT dC mCov. A B 1 T 0.0% 0.0% 0.1% -0.1% 8300 A B 2 G -0.1% 0.0% -0.1% 0.2% 8305 A B 3 T 0.0% 0.0% -4.5% 4.5% 8331 A B 4 T 0.0% 0.0% 0.0% 0.0% 8334 A B 5 G -0.1% 0.0% -0.1% 0.2% 8338 Error model deepSVN (Gerstung et al., 2012) • Compares two similar samples • Adapts error rates to genomic context • Hierarchical binomial model (overdispersed) Bastian Beggel Slide 8

  9. Median and sample coverage Median Coverage Single Sample Coverage Coverage Coverage NT position NT position Bastian Beggel Slide 9

  10. Monitoring of resistance mutations Prevalence of resistance Dynamics of mutations at baseline resistance mutations 181T Patient NGS frequency 12M BL NGS frequency Sample time Bastian Beggel Slide 10

  11. Monitoring genetic change AA Bastian Beggel Slide 11

  12. Quasispecies estimation using ShoRAH Source: Zargordi et al. 2011 Bastian Beggel Slide 12

  13. Visualization of the viral quasispecies Principal Component 2 Principal Component 1 Bastian Beggel Slide 13

  14. Conclusions • State-of-the-art processing of viral NGS data • HBV, HCV, HIV • Summary plots and statistics • Coverage • Pileups, Dynamics • Resistance mutation • Quasispecies • Fully automated • Web-based version in planning • Looking forward to more collaborations Bastian Beggel Slide 14

  15. Thank you for your attention Acknowledgements • Thomas Lengauer • Sven-Eric Schelhorn • Alex Thielen • Martin Däumer • Rolf Kaiser Bastian Beggel Slide 15

  16. End Bastian Beggel Slide 16

Recommend


More recommend