assessing differential gene expression from rna seq data
play

Assessing Differential Gene Expression from RNA-Seq Data Yanming Di - PowerPoint PPT Presentation

Assessing Differential Gene Expression from RNA-Seq Data Yanming Di Department of Statistics Oregon State University June 15, 2011, OSU, Corvallis with acknowledgement to Dan Schafer, Jason Cumbie and Jeff Chang Y Di (OSU) Assessing DE from


  1. Assessing Differential Gene Expression from RNA-Seq Data Yanming Di Department of Statistics Oregon State University June 15, 2011, OSU, Corvallis with acknowledgement to Dan Schafer, Jason Cumbie and Jeff Chang Y Di (OSU) Assessing DE from RNA-Seq June 15, 2011, OSU, Corvallis 1 / 23

  2. RNA-Seq Data RNA-Seq Work Flow 1 Biological question 2 Experimental design 3 RNA-Sequencing: sample preparation, sequencing 4 Alignment, assignment to features (genes, exomes, etc.) 5 Exploratory statistical analyses: 1 Testing for differential gene expression 2 Regression analysis (to deal with covariates) 3 Gene enrichment tests and gene oncology analysis 4 Integration with proteomic and metabolomic data 6 Experimental verification/Biological confirmation Y Di (OSU) Assessing DE from RNA-Seq June 15, 2011, OSU, Corvallis 2 / 23

  3. RNA-Seq Data RNA-Seq Data Structure (two-group comparison) Group A A A B B B Replicates A1 A2 A3 B1 B2 B3 AT1G01010 46 64 60 35 77 40 AT1G01020 43 39 49 43 45 32 AT1G01030 27 35 20 16 24 26 AT1G01040 66 25 90 72 43 64 AT1G01050 67 45 60 49 78 90 AT1G01060 0 21 8 0 15 2 AT1G01070 9 20 1 16 34 6 AT1G01080 127 98 184 170 191 382 AT1G01090 171 116 453 291 346 563 . . . Total 2.1M 1.3M 3.5M 1.9M 1.9M 3.3M Y Di (OSU) Assessing DE from RNA-Seq June 15, 2011, OSU, Corvallis 3 / 23

  4. RNA-Seq Data Naive Statistical Tests for Differential Gene Expression 1 Two-sample t -test: • Estimate the group means. • Estimate the variance. • The result is significant if the mean difference is large relative to the variance. Underlying assumptions: 1) the distribution is normal or 2) the sample size is large . 2 Test based on Poisson distribution for count data. Assumption: 3) the variance is the same as the mean . None of 1), 2) or 3) is appropriate for RNA-Seq data with biological replicates . Y Di (OSU) Assessing DE from RNA-Seq June 15, 2011, OSU, Corvallis 4 / 23

  5. RNA-Seq Data Challenges 1 A very large number of genes to be tested (26 , 222 in the Arabidopsis example): • Need to control for multiple testing. 2 Count variability that cannot be modeled with commonly-used probability distributions, such as binomial and Poisson: • Test based on Poisson model will give too many false positive results. 3 A necessarily small number (2 or 3 in each group) of biological replicates due to resource constraints (cost, labor, . . . ): • Normal approximation (two sample t -test) will not work. • Permutation test will not work (no power). • Difficult to estimate the model parameter for each gene separately . 4 A wide range of expression levels: • Mean-variance relationship will change with mean level. Y Di (OSU) Assessing DE from RNA-Seq June 15, 2011, OSU, Corvallis 5 / 23

Recommend


More recommend