Normalization and differential expression II Katharina H¨ oßel Statistical Analysis of RNA-Seq Data May 29th, 2012 Katharina H¨ oßel, Normalization and differential expression II, 29/05/2012 1
Overview • Differential expression analysis for sequence count data (Anders, Huber 2010) • Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments (Bullard, Purdom, Hansen, Dudoit 2010) Katharina H¨ oßel, Normalization and differential expression II, 29/05/2012 2
Background • RNA-sequencing: reads are mapped to a class (=gene) • the number of reads in a class is called ‘read count’ • read count is linearly related to the abundance of the target transcript • interest: comparing counts between different biological conditions → statistical testing Katharina H¨ oßel, Normalization and differential expression II, 29/05/2012 3
DESeq - Statistics • read counts can be approximated by a Poisson distribution lambda=0.5 0.6 ● ● lambda= 5 ● lambda=10 ● 0.5 Wahrscheinlichkeit Pr(X=x) 0.4 0.3 ● 0.2 ● ● ● ● ● ● ● ● 0.1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5 10 15 20 x=1,...,n • Poisson leads to overdispersion problem Katharina H¨ oßel, Normalization and differential expression II, 29/05/2012 4
→ use of negative binomial distribution 0.14 ● ● ● p=0.5, r=5 ● p=1/3, r=5 ● p=1/6, r=5 0.12 ● ● 0.10 ● Wahrscheinlichkeit Pr(K=k) 0.08 ● ● ● ● ● ● ● ● ● 0.06 ● ● ● ● ● 0.04 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.02 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.00 0 10 20 30 40 k=1,...,n Katharina H¨ oßel, Normalization and differential expression II, 29/05/2012 5
Comparison: Poisson vs. NB Poisson distribution negative binomial distribution parameters λ r , p � k + r − 1 � Pr ( X = x ) = λ x x ! e − λ p r (1 − p ) k distr.function Pr( K = k ) = r − 1 E ( K ) = r (1 − p ) expectation E ( X ) = λ p var ( K ) = r (1 − p ) variance var ( X ) = λ p 2 Katharina H¨ oßel, Normalization and differential expression II, 29/05/2012 6
DESeq - Model I distribution K ij ∼ NB ( µ ij , σ 2 ij ) , (1) i – genes, j – samples, K – read counts expectation value µ ij = q i ,ρ ( j ) · s j (2) q i ,ρ ( j ) – expected read count (per gene and condition) s j – scaling factor across genes and groups (depends on sampling depth resp. coverage of sample j ) → normalization and adjusting for coverage Katharina H¨ oßel, Normalization and differential expression II, 29/05/2012 7
DESeq - Model II variance σ 2 s 2 ij = µ ij + · v i ,ρ ( j ) (3) j ���� ���� � �� � shot noise size factor raw variance parameter � �� � raw variance v i ,ρ ( j ) – per-gene raw variance parameter is assumed to be a smooth function of q i ,ρ : v i ,ρ ( j ) = v ρ ( q i ,ρ ( j ) ) (4) → allows pooling of data from genes with similar expression strength Katharina H¨ oßel, Normalization and differential expression II, 29/05/2012 8
DESeq - Parameter reduction example: • n = 10 . 000 genes • m = 20 samples • G = 2 groups ` a 10 samples each number of parameters for model fit is reduced in two steps: 1 mean 2 variance parameters needed for . . . mean variance total naive NB n · m = 200 . 000 n · m = 200 . 000 400.000 after step 1 n · G + m = 20 . 020 n · m = 200 . 000 220.020 after step 2 n · G + m = 20 . 020 n · G = 20 . 000 40.020 Katharina H¨ oßel, Normalization and differential expression II, 29/05/2012 9
DESeq - Fitting I size factors k ij ˆ s j = median i (5) 1 ( � m v =1 k iv ) m empirical expectation values (common scale) q i ρ = 1 k ij � ˆ (6) m ρ ˆ s j j : ρ ( j )= ρ Katharina H¨ oßel, Normalization and differential expression II, 29/05/2012 10
DESeq - Fitting II sample variances (common scale) � k ij � 2 1 � w i ρ = − ˆ q i ρ (7) m ρ − 1 ˆ s j j : ρ ( j )= ρ they define z i ρ = ˆ q i ρ 1 � (8) m ρ ˆ s j j : ρ ( j )= ρ w i ρ − z i ρ is an unbiased estimator of v i ρ . local regression ⇒ ˆ v ρ (ˆ q i ρ ) = w ρ (ˆ q i ρ ) − z i ρ (9) Katharina H¨ oßel, Normalization and differential expression II, 29/05/2012 11
DESeq - Testing I We have two biological conditions, A and B. null hypothesis : counts for A and B are identical q iA = q iB test statistic : counting reads for each condition: K iA , K iB sum: K iS = K iA + K iB p ( a , b ) = Pr( K iA = a ) Pr( K iB = b ) performing nbinomTest as fisher’s exact test on negative binomial data p value � p ( a , b ) ≤ p ( kiA , kiB ) p ( a , b ) a + b = kiS p i = (10) � a + b = k iS p ( a , b ) Katharina H¨ oßel, Normalization and differential expression II, 29/05/2012 12
DESeq - Applications I (Fly embryos) orange variance estimate by DESeq (fit w ( q )) dotted orange variance estimate by edgeR purple variance via Poisson distribution Katharina H¨ oßel, Normalization and differential expression II, 29/05/2012 13
DESeq - Applications II Testing for differential expression between conditions A and B: Scatter plot of log2 ratio (fold change) versus mean. Katharina H¨ oßel, Normalization and differential expression II, 29/05/2012 14
DESeq - Conclusions • using parametric methods (e.g., tests) • sharing information between genes • Poisson distribution is adequate for modelling read counts within technical replicates (small dispersion) → using NB for biological replicates Katharina H¨ oßel, Normalization and differential expression II, 29/05/2012 15
DESeq - R/Bioconductor package • available via Bioconductor • current version 1.9.7 by 2012/05/25 (example computations in paper were done in 1.1.12) • huge changelog: bugfixes, addition/removal/renaming of functions, adding/removing/extending functionality, new methods etc. • handling of variance • variance stabilization • testing procedure • diagnose plots → this software is evolving! Katharina H¨ oßel, Normalization and differential expression II, 29/05/2012 16
Overview • Differential expression analysis for sequence count data (Anders, Huber 2010) • Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments (Bullard, Purdom, Hansen, Dudoit 2010) Katharina H¨ oßel, Normalization and differential expression II, 29/05/2012 17
Evaluation of statistical methods . . . - Motivation • Microarrays vs. RNA-Seq • different statistical tests • different approaches of normalization • calibration • assess biases based on seq. technology • length biases • flow cell effects • library preparation effects Katharina H¨ oßel, Normalization and differential expression II, 29/05/2012 18
Evaluation - Methods • 2 biological samples: brain vs. universal human reference (UHR) • performing Microarray, RNA-Seq analysis and qRT-PCR on ∼ 1000 genes • compare expression values obtained from Microarray and RNA-Seq experiments using qRT-PCR as benchmark • nested RNA-Seq setup Katharina H¨ oßel, Normalization and differential expression II, 29/05/2012 19
Evaluation - Normalization global vs. quantile-based methods 1 total lane counts (RNA-Seq standard) 2 per-lane counts for “housekeeping gene” POLR2A (borrowed from qRT-PCR) 3 per-lane quantile for genes with reads in at least 1 lane (borrowed from Microarrays) Katharina H¨ oßel, Normalization and differential expression II, 29/05/2012 20
Evaluation - Differential Expression generalized linear model (GLM) log( E ( X ij | d i )) = log d i + + λ a ( i , j ) θ ij � �� � ���� � �� � offset technical effects expression level tests • fisher’s exact test • likelihood ratio test (GLM based) • t-test (GLM based + delta) Katharina H¨ oßel, Normalization and differential expression II, 29/05/2012 21
Evaluation results - ROC curves a) no filtering b) removing all genes with < 20 reads in either condition Katharina H¨ oßel, Normalization and differential expression II, 29/05/2012 22
Evaluation results - influence of gene length ranks of DE statistics vs. gene lengths a) no weighting 1 b) weighting by √ length Katharina H¨ oßel, Normalization and differential expression II, 29/05/2012 23
Evaluation results - calibration method Katharina H¨ oßel, Normalization and differential expression II, 29/05/2012 24
Evaluation results - biological and technical effects Katharina H¨ oßel, Normalization and differential expression II, 29/05/2012 25
Evaluation results - ROC curves RNA-Seq vs. Microarrays Katharina H¨ oßel, Normalization and differential expression II, 29/05/2012 26
Recommend
More recommend