Short read quality assessment Martin Morgan 1 June 20-23, 2011 1 mtmorgan@fhcrc.org
Why sequence? e.g., RNA-seq ◮ Expression in novel (un-annotated) regions ◮ Exon junction / RNA editing insights ◮ Allele-specific / transcript isoform quantification ◮ Non-model organisms ◮ Greater dynamic range and sensitivity? Lessons from microarrays ◮ Initially: variability between manufactures, technologies, labs ◮ MAQC: quality control standards and analysis protocols
Example work flow – [4] Sample ◮ Purify poly(A)+ RNA with oligo(dT) magnetic beads ◮ cDNA synthesis primed with random hexamers Microarray ◮ Dye-swap, hybridization, florescence, analysis RNA-seq ◮ Fragment and size-select ◮ Illumina adapter ligation
Example work flow – [4] Sample ◮ Purify poly(A)+ RNA with oligo(dT) magnetic beads ◮ cDNA synthesis primed with random hexamers Microarray ◮ Dye-swap, hybridization, florescence, analysis RNA-seq ◮ Fragment and size-select ◮ Illumina adapter ligation
Example work flow – [4] Sample ◮ Purify poly(A)+ RNA with oligo(dT) magnetic beads ◮ cDNA synthesis primed with random hexamers Microarray ◮ Dye-swap, hybridization, florescence, analysis RNA-seq ◮ Fragment and size-select ◮ Illumina adapter ligation
Key issues ◮ Experimental design [1] ◮ Replication ◮ Randomization and blocking, e.g., batch effects ◮ Depth of coverage ◮ Statistical power ◮ Library complexity ◮ Coverage heterogeneity ◮ Estimation biases ◮ Legitimate comparison ◮ Sequencing uncertainty [2]
Key issues ◮ Experimental design [1] ◮ Replication ◮ Randomization and blocking, e.g., batch effects ◮ Depth of coverage ◮ Statistical power ◮ Library complexity ROC simulation ◮ Coverage heterogeneity ◮ Replication (red vs. blue) ◮ Estimation biases ◮ Randomization and blocking ◮ Legitimate comparison (solid vs. dot) ◮ Sequencing uncertainty [2]
Key issues 0 1 2 3 4 0 1 2 3 4 5 6 7 8 ◮ Experimental design [1] 1.0 0.8 ◮ Replication 0.6 ◮ Randomization and 0.4 Cumulative proportion of reads blocking, e.g., batch 0.2 effects 0.0 1 2 3 4 1.0 ◮ Depth of coverage 0.8 ◮ Statistical power 0.6 ◮ Library complexity 0.4 0.2 ◮ Coverage heterogeneity 0.0 ◮ Estimation biases 0 1 2 3 4 0 1 2 3 4 Number of occurrences of each read (log 10 ) ◮ Legitimate comparison ◮ Sequencing uncertainty [2] Cumulative proportion of reads occuring 0, 1, . . . times
Key issues ◮ Experimental design [1] 1.0 Cummulative proportion ◮ Replication 0.8 ◮ Randomization and blocking, e.g., batch 0.6 effects 0.4 ◮ Depth of coverage ◮ Statistical power 0.2 ◮ Library complexity 0.0 ◮ Coverage heterogeneity 2.0 2.2 2.4 2.6 ◮ Estimation biases Copies per read (log 10 ) ◮ Legitimate comparison ◮ Sequencing uncertainty [2] Actual versus uniform φ X 174 coverage
Key issues ◮ Experimental design [1] ◮ Replication ◮ Randomization and blocking, e.g., batch effects ◮ Depth of coverage ◮ Statistical power ◮ Library complexity ◮ Coverage heterogeneity ◮ Estimation biases ◮ Legitimate comparison Read count increases with gene length ◮ Sequencing uncertainty [2]
Key issues ◮ Experimental design [1] ◮ Replication ◮ Randomization and blocking, e.g., batch effects ◮ Depth of coverage ◮ Statistical power ◮ Library complexity ◮ Coverage heterogeneity ◮ Estimation biases Reads, stratified by cycle, ◮ Legitimate comparison supporting a spurious SNP call in ◮ Sequencing uncertainty [2] φ X 174
Case study Subset of Brooks et al. [3] ◮ RNAi and mRNA-seq to identify pasilla-regulated alternative splicing ◮ Purified polyA, random hexamer primed ◮ Single- and paired end sequences ◮ Alignment to reference genome and curated splic junctions
P. L. Auer and R. W. Doerge. Statistical design and analysis of RNA sequencing data. Genetics , 185:405–416, Jun 2010. H. C. Bravo and R. A. Irizarry. Model-based quality assessment and base-calling for second-generation sequencing data. Biometrics , 66:665–674, Sep 2010. A. N. Brooks, L. Yang, M. O. Duff, K. D. Hansen, J. W. Park, S. Dudoit, S. E. Brenner, and B. R. Graveley. Conservation of an RNA regulatory map between Drosophila and mammals. Genome Res. , 21:193–202, Feb 2011. J. H. Malone and B. Oliver. Microarrays, deep sequencing and the true measure of the transcriptome. BMC Biol. , 9:34, 2011.
Recommend
More recommend