hiv tropism assessment hiv tropism assessment hiv tropism
play

HIV tropism assessment HIV tropism assessment HIV tropism assessment - PowerPoint PPT Presentation

HIV tropism assessment HIV tropism assessment HIV tropism assessment HIV tropism assessment using next generation sequencing using next generation sequencing using next generation sequencing using next generation sequencing Mattia CF Prosperi


  1. HIV tropism assessment HIV tropism assessment HIV tropism assessment HIV tropism assessment using next generation sequencing using next generation sequencing using next generation sequencing using next generation sequencing Mattia CF Prosperi National Institute for Infectious Diseases “Lazzaro Spallanzani” (INMI) Dept. Virology Via Portuense, 292 – 00149 – Rome, Italy. e ‐ mail: ahnven@yahoo.it

  2. Summary • Next ‐ generation ( aka ultra ‐ deep) sequencing (NGS) • Technologies, features • Low level tools to analyse NGS data • Sequence alignment • Error ‐ correction High level tool for clinical purposes • • Ultra ‐ deep prediction of HIV ‐ 1 coreceptor usage •Statistical learning model • Web server

  3. Next generation sequencing • Technologies – 454, Illumina, ABI Solid, Polonator, Helicos • Fields of application – De ‐ novo sequencing – Re ‐ sequencing – Metagenomics

  4. Next ‐ generation sequencing data • 454 GS FLX, Roche – A sequence read is ~ 400 bases long (with Titanium upgrade) – 400 ‐ 600 million bases per 10 ‐ hour run – Higher error rate than Sanger sequencing • Approximately 0.1% and 0.05% for homopolymeric and non ‐ homopolymeric regions (estimated on a HIV plasmid clone) – Possible presence of contaminants • Other technologies: Illumina, ABI Solid, Helicos… – shorter reads, higher base throughput

  5. Web ‐ server • Easy user interface – Parallelization of read alignment and error correction •Computational burden reduced from hours to minutes – Online tools for ngs ‐ aided diagnostics: •HIV ‐ 1 tropism prediction – Graph generator for variability analysis

  6. Caspur associated universities

  7. Sequence alignment • Optimized local pairwise alignment against a given consensus sequence •Smith ‐ Waterman ‐ Gotoh in forward and reverse – gap open/extension parameter optimisation via grid search in [1, 30] and [0.3, 3] with step size of 5 and 0.5 respectively – Two possible optimisation functions, where m is the number of matches, g is the number of gaps, N is the alignment length: » m/N (similarity maximisation) » m ‐ m*g/N (gap minimisation and similarity maximisation, accounting for alignment length)

  8. Contaminant detection • A random alignment score distribution is derived by – aligning n (at least n =400) random sequences, whose lengths are normally distributed on the actual lane average read length and std – applying the given optimisation procedure to each random sequence • A z test with Gumbel’s extreme value distribution test (like BLAST e ‐ value) is performed for each real read alignment score, corrected for multiple testing with Benjamini Hochberg • Sequences with an adj.p >0.01 are discarded

  9. Error detection/correction • For each position of the consensus (and relative indels) we execute a statistical test for over ‐ representation of changes within the reads – chi ‐ square statistic • After Bonferroni correction for multiple testing, we exclude positions with adj.p >0.01

  10. Web Service Interface

  11. Variations plot Variations plot

  12. Shannon entropy plot Shannon entropy plot

  13. HIV Diagnostics application HIV Diagnostics application • Idea from Martin Daumer’s group (institute of Immunology, Kaiserslauten) and MPI • HIV ‐ 1 coreceptor usage prediction – Uses statistical learning applied to NGS data •Existing methods are: geno2pheno, pssm •We developed a new method based on logistic regression ( Prosperi et al. AIDS Research and Human Retroviruses 2009; 25(3).) – Alternative to TROFILE method •Pro: less expensive, quicker results, NGS gives also description of the quasispecies •Contra: results not always concordant with TROFILE

  14. Statistical Learning Model Statistical Learning Model • Logistic Regression – accuracy 92.76% – AUC (0.93)

  15. CXCR4 usage prediction

  16. CXCR4 usage prediction

  17. People at CASPUR and INMI • MR Capobianchi, G Ippolito • A Desideri, G Chillemi • I Abbate, G Rozera • A Barbato, A Bruselles

Recommend


More recommend