bcool trans accurate and variant preserving correction
play

BCOOL-Trans Accurate and variant-preserving correction for RNA-seq - PowerPoint PPT Presentation

BCOOL-Trans Accurate and variant-preserving correction for RNA-seq Camille Marchet and Antoine Limasset Univ. Lille, CNRS, Inria, UMR 9189 - CRIStAL SeqBio 2018 Rouen 1 / 25 Introduction Tools to study RNA-seq: Most assembly/quantification


  1. BCOOL-Trans Accurate and variant-preserving correction for RNA-seq Camille Marchet and Antoine Limasset Univ. Lille, CNRS, Inria, UMR 9189 - CRIStAL SeqBio 2018 Rouen 1 / 25

  2. Introduction Tools to study RNA-seq: Most assembly/quantification and some variant calling methods are k -mer based Correctors are mostly k -mer based Rely on "solidity" 2 / 25

  3. Introduction: RNA-seq correction challenges 3 / 25

  4. Introduction: RNA-seq correction challenges 4 / 25

  5. Motivations 5 / 25

  6. Motivations 6 / 25

  7. Motivations 7 / 25

  8. Motivations 8 / 25

  9. State of the art: k -mer spectrum Main idea Find abundant/trusted kmers in dataset Replace untrusted kmers in reads by trusted ones 9 / 25

  10. State of the art: RNA correction Strategy included in assemblers/KISSPLICE[Sacomoto et al. 2012]/Rcorrector[Song et al. 2015]: 10 / 25

  11. BCOOL [Limasset et al. 2018]: main concepts 11 / 25

  12. BCOOL: improvements in read correction Map reads on unitigs: better handle close errors (distant of less than k ) Use large k : handle repeated region Results after correction of genomic reads: *Correction ratio = by how much the number of errors was divided 12 / 25

  13. BCOOL-Trans enhancements 1- Work with all k -mers Graph construction scale to dozen billions kmers Keep rare k -mer Easier to find overlaps 13 / 25

  14. BCOOL-Trans enhancements 2- Work with large k -mers and remove only tips 14 / 25

  15. BCOOL-Trans enhancements 2- Work with large k -mers and remove only tips 15 / 25

  16. BCOOL-Trans enhancements 3- Advanced tip removal 16 / 25

  17. BCOOL-Trans enhancements 4- Mapping strategy 17 / 25

  18. BCOOL-Trans enhancements 4- Mapping strategy 18 / 25

  19. BCOOL-Trans enhancements 5- Paired-end reads merging 19 / 25

  20. Correction quality proof of concept Data Mouse transcriptome 100M reads with FluxSimulator[Griebel et al. 2012] 1% error rate Mock BCOOL-Trans "cleaned" graph + BCOOL’s mapping module BCOOL-TransN means: all true k -mers + erroneous k -mers of occurence > N in cleaned graph Corrector Recall Precision Ratio correction* % Erroneous reads BFC 58.76 96.16 2.34 30.18 Rcorrector 93.37 99.80 14.68 4.34 BCOOL-Trans5 92.75 97.75 10.64 5.58 BCOOL-Trans7 98.41 13.7 94.20 4.17 *Correction ratio = by how much the number of errors was divided 20 / 25

  21. Results: paired-end merging Data Mouse transcriptome Paired-end reads (2x150 nt) 4,422,720 reads with Flux Simulator Results: Percentage of merged pairs: 97.3454 Not trivial pair-mapping rate: 76% 21 / 25

  22. Discussion: expected outcomes After Bcool-Trans correction: Speedup Scaling Less errors in data More signal to detect rare and significant Longer "merged-reads": more context for assembly, variant calling... Other applications Meta-genomic data Meta-transcriptomic data 22 / 25

  23. Future work: development Graph construction Relative abundance tipping, adaptative threshold (cf Rcorrector) Distance related tipping (cf RNASpades [Bushmanova et al. 2018 BiorXiv]) Graph alignment Use partial read mapping Multiple starting anchors 23 / 25

  24. Future work: experiments Benchmark versus main SOTA methods (Rcorrector, BayesHammer [Nikolenko et al. 2013], BFC [Li 2015] . . . ) Assess impacts on assembly, variant calling, quantification, differential expression Assess impact (in particular merged reads) on hybrid long read correction 24 / 25

  25. Conclusion BCOOL-Trans is a work-in-progress RNA-seq corrector. It is scalable, and it uses new strategies to well-preserve variants. 25 / 25

Recommend


More recommend