rnaseq normalization and differential expression i
play

RNAseq: Normalization and differential expression I Jens Gietzelt - PowerPoint PPT Presentation

RNAseq: Normalization and differential expression I Jens Gietzelt 22.05.2012 Robinson, Oshlack. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biology. 2010 Hardcastle, Kelly. baySeq: Empirical


  1. RNAseq: Normalization and differential expression I Jens Gietzelt 22.05.2012 Robinson, Oshlack. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biology. 2010 Hardcastle, Kelly. baySeq: Empirical Bayesian methods for identifying differential expression in sequence count data. BMC Bioinformatics. 2010

  2. Introduction Pairwise calibration (EdgeR) Differential expression Outline of the presentation Introduction 1 Pairwise calibration (EdgeR) 2 Differential expression 3 Jens Gietzelt

  3. Introduction Pairwise calibration (EdgeR) Differential expression Introduction normalization: comparison of expression levels between genes within a sample (same scale) however technical effects introduce a bias in the comparison between samples ⇒ normalization is crucial before performing differential expression calibration method EdgeR takes advantage of within-sample comparability differential expression: appropriate distribution for count data incorporate calibration parameters Jens Gietzelt

  4. Introduction Pairwise calibration (EdgeR) Differential expression Framework Y g , k ... observed count for gene g in library k G N k = � Y g , k ... total number of reads for library k g =1 η g , k ... number of transcripts of gene g in library k L g ... length of gene g G S k = � η g , k L g ... total RNA output of sample k g =1 E ( Y g , k ) = η g , k L g N k S k counts are a linear function of the number of transcripts library size calibration ( Y g , k / N k ) is appropriate for the comparison of replicates comparison of biologically different samples may be biased by varying RNA composition Jens Gietzelt

  5. kidney vs. liver dataset

  6. Introduction Pairwise calibration (EdgeR) Differential expression Trimmed mean of log-foldchange RNA production S k of one sample cannot be determined directly estimation of relative differences of RNA production f k = S k / S r of a pair of samples ( k , r ) assumption: most genes are not differentially expressed ⇒ compute robust mean over log-foldchanges: double filtering over both mean and difference of log-values calculate a weighted mean over the log-foldchanges ⇒ resacle factors f k = TMM ( k , r ) , where r is reference sample g ∈ G ∗ w g , ( k , r ) (log 2 ( Y g , k / N k ) − log 2 ( Y g , r / N r )) � � � log 2 TMM ( k , r ) = � g ∈ G ∗ w g , ( k , r ) � 1 � − 1 − 1 1 − 1 w g , ( k , r ) = + Y g , k N k Y g , r N r Jens Gietzelt

  7. kidney vs. liver dataset

  8. Simulation: pair of samples simulated data sampled from poisson distribution

  9. Simulation: replicates Cloonan: log-transformation and quantile normalization

  10. Introduction Pairwise calibration (EdgeR) Differential expression Differential expression methods in use: DegSeq (normal distr.) EdgeR (negative binomial) DEseq (negative binomial, multiple groups) baySeq (negative binomial, multiple groups) Myrna (permutation based) Jens Gietzelt

  11. Introduction Pairwise calibration (EdgeR) Differential expression EdgeR technical replicates: poisson distr. biologically different samples: negative binomial distr. Y ∼ NB ( p , m ) Y ... number of successes in a sequence of Bernoulli trials with probability p before r failures occur alternative parametrization: q g , e ... proportion of sequenced RNA of gene g for experimental group e Y g , k , e ∼ NB ( q g , e N k f k , φ g ) E ( Y g , k , e ) = µ g , k , e = q g , e N k f k , Var ( Y g , k , e ) = µ g , k , e + µ 2 g , k , e φ g test if q g , 1 is significantly different from q g , 2 dispersons φ g are moderated towards a common disperson Jens Gietzelt

  12. Introduction Pairwise calibration (EdgeR) Differential expression baySeq I empirical Bayes approach to detect differential expression D g = { Y g , k , N k , f k } k =1 ,..., K M ... user specified model θ M ... vector of parameters of model M P ( M | D g ) = P ( D g | M ) P ( M ) P ( D g ) calculate marginal likelihood: � P ( D g | M ) = P ( D g | θ M , M ) P ( θ M | M ) d θ M Jens Gietzelt

  13. Introduction Pairwise calibration (EdgeR) Differential expression baySeq II � P ( D g | M ) = P ( D g | θ M , M ) P ( θ M | M ) d θ M e.g. Poisson-Gamma conjugacy, however no such conjugacy with negative binomial data ⇒ define an empirical distribution on θ M and estimate the marginal likelihood numerically prior P ( M ) is estimated by iteration: p ∗ P ( M ) = p g , g = P ( M | D g ) baySeq: applicable to complex experimental designs computationally intensive Jens Gietzelt

Recommend


More recommend