short read quality assessment
play

Short read quality assessment Martin Morgan 1 June 20-23, 2011 1 - PowerPoint PPT Presentation

Short read quality assessment Martin Morgan 1 June 20-23, 2011 1 mtmorgan@fhcrc.org Why sequence? e.g., RNA-seq Expression in novel (un-annotated) regions Exon junction / RNA editing insights Allele-specific / transcript isoform


  1. Short read quality assessment Martin Morgan 1 June 20-23, 2011 1 mtmorgan@fhcrc.org

  2. Why sequence? e.g., RNA-seq ◮ Expression in novel (un-annotated) regions ◮ Exon junction / RNA editing insights ◮ Allele-specific / transcript isoform quantification ◮ Non-model organisms ◮ Greater dynamic range and sensitivity? Lessons from microarrays ◮ Initially: variability between manufactures, technologies, labs ◮ MAQC: quality control standards and analysis protocols

  3. Example work flow – [4] Sample ◮ Purify poly(A)+ RNA with oligo(dT) magnetic beads ◮ cDNA synthesis primed with random hexamers Microarray ◮ Dye-swap, hybridization, florescence, analysis RNA-seq ◮ Fragment and size-select ◮ Illumina adapter ligation

  4. Example work flow – [4] Sample ◮ Purify poly(A)+ RNA with oligo(dT) magnetic beads ◮ cDNA synthesis primed with random hexamers Microarray ◮ Dye-swap, hybridization, florescence, analysis RNA-seq ◮ Fragment and size-select ◮ Illumina adapter ligation

  5. Example work flow – [4] Sample ◮ Purify poly(A)+ RNA with oligo(dT) magnetic beads ◮ cDNA synthesis primed with random hexamers Microarray ◮ Dye-swap, hybridization, florescence, analysis RNA-seq ◮ Fragment and size-select ◮ Illumina adapter ligation

  6. Key issues ◮ Experimental design [1] ◮ Replication ◮ Randomization and blocking, e.g., batch effects ◮ Depth of coverage ◮ Statistical power ◮ Library complexity ◮ Coverage heterogeneity ◮ Estimation biases ◮ Legitimate comparison ◮ Sequencing uncertainty [2]

  7. Key issues ◮ Experimental design [1] ◮ Replication ◮ Randomization and blocking, e.g., batch effects ◮ Depth of coverage ◮ Statistical power ◮ Library complexity ROC simulation ◮ Coverage heterogeneity ◮ Replication (red vs. blue) ◮ Estimation biases ◮ Randomization and blocking ◮ Legitimate comparison (solid vs. dot) ◮ Sequencing uncertainty [2]

  8. Key issues 0 1 2 3 4 0 1 2 3 4 5 6 7 8 ◮ Experimental design [1] 1.0 0.8 ◮ Replication 0.6 ◮ Randomization and 0.4 Cumulative proportion of reads blocking, e.g., batch 0.2 effects 0.0 1 2 3 4 1.0 ◮ Depth of coverage 0.8 ◮ Statistical power 0.6 ◮ Library complexity 0.4 0.2 ◮ Coverage heterogeneity 0.0 ◮ Estimation biases 0 1 2 3 4 0 1 2 3 4 Number of occurrences of each read (log 10 ) ◮ Legitimate comparison ◮ Sequencing uncertainty [2] Cumulative proportion of reads occuring 0, 1, . . . times

  9. Key issues ◮ Experimental design [1] 1.0 Cummulative proportion ◮ Replication 0.8 ◮ Randomization and blocking, e.g., batch 0.6 effects 0.4 ◮ Depth of coverage ◮ Statistical power 0.2 ◮ Library complexity 0.0 ◮ Coverage heterogeneity 2.0 2.2 2.4 2.6 ◮ Estimation biases Copies per read (log 10 ) ◮ Legitimate comparison ◮ Sequencing uncertainty [2] Actual versus uniform φ X 174 coverage

  10. Key issues ◮ Experimental design [1] ◮ Replication ◮ Randomization and blocking, e.g., batch effects ◮ Depth of coverage ◮ Statistical power ◮ Library complexity ◮ Coverage heterogeneity ◮ Estimation biases ◮ Legitimate comparison Read count increases with gene length ◮ Sequencing uncertainty [2]

  11. Key issues ◮ Experimental design [1] ◮ Replication ◮ Randomization and blocking, e.g., batch effects ◮ Depth of coverage ◮ Statistical power ◮ Library complexity ◮ Coverage heterogeneity ◮ Estimation biases Reads, stratified by cycle, ◮ Legitimate comparison supporting a spurious SNP call in ◮ Sequencing uncertainty [2] φ X 174

  12. Case study Subset of Brooks et al. [3] ◮ RNAi and mRNA-seq to identify pasilla-regulated alternative splicing ◮ Purified polyA, random hexamer primed ◮ Single- and paired end sequences ◮ Alignment to reference genome and curated splic junctions

  13. P. L. Auer and R. W. Doerge. Statistical design and analysis of RNA sequencing data. Genetics , 185:405–416, Jun 2010. H. C. Bravo and R. A. Irizarry. Model-based quality assessment and base-calling for second-generation sequencing data. Biometrics , 66:665–674, Sep 2010. A. N. Brooks, L. Yang, M. O. Duff, K. D. Hansen, J. W. Park, S. Dudoit, S. E. Brenner, and B. R. Graveley. Conservation of an RNA regulatory map between Drosophila and mammals. Genome Res. , 21:193–202, Feb 2011. J. H. Malone and B. Oliver. Microarrays, deep sequencing and the true measure of the transcriptome. BMC Biol. , 9:34, 2011.

Recommend


More recommend