statistical models for sequencing data from experimental
play

Statistical Models for sequencing data: from Experimental Design to - PowerPoint PPT Presentation

Best practices in the analysis of RNA-Seq data 28 th -29 th March 2018 University of Cambridge, Cambridge, UK Statistical Models for sequencing data: from Experimental Design to Generalized Linear Models Oscar M. Rueda Breast Cancer Functional


  1. Best practices in the analysis of RNA-Seq data 28 th -29 th March 2018 University of Cambridge, Cambridge, UK Statistical Models for sequencing data: from Experimental Design to Generalized Linear Models Oscar M. Rueda Breast Cancer Functional Genomics Group. CRUK Cambridge Research Institute (a.k.a. Li Ka Shing Centre) � Oscar.Rueda@cruk.cam.ac.uk 1

  2. Outline • Experimental Design • Design and Contrast matrices • Generalized linear models • Models for coun:ng data 2

  3. To consult the statistician after an experiment is finished is often merely to ask him to conduct a post mortem examination. He can perhaps say what the experiment died of. Sir Ronald Fisher (1890-1962) [evolu:onary biologist, gene:cist and sta:s:cian] 3

  4. An approximate answer to the right problem is worth a good deal more than an exact answer to an approximate problem. John Tukey (1915-2000) [Sta:s:cian] 4

  5. An unsophisticated forecaster uses statistics as a drunken man uses lamp-posts - for support rather than for illumination. Andrew Lang (1844-1912) [Poet, novelist and literary cri:c] 5

  6. Experimental Design

  7. Design of an experiment • Select biological ques:ons of interest • Iden:fy an appropriate measure to answer that ques:on • Select addi:onal variables or factors that can have an influence in the result of the experiment • Select a sample size and the sample units • Assign samples to lanes/flow cells. 7

  8. Principles of Sta:s:cal Design of Experiments • R. A. Fisher: – Replica:on – Blocking – Randomiza:on. • They have been used in microarray studies from the beginning. • Bar coding makes easy to adapt them to NGS studies. 8

  9. Unreplicated Data Inferences for RNA and fragment-level can be obtained through Fisher’s test. But they don’t reflect biological variability. 9 Auer and Doerge. Genetics 185:405-416(2010)

  10. Replicated Data Inferences for treatment effect using generalized linear models Is this a good design? (more on this later). We should randomize within block! 10 Auer and Doerge. Genetics 185:405-416(2010)

  11. Balanced Block Designs • Avoids confounding effects: – Lane effects (any errors from the point where the sample is input to the flow cell un:l the data output). Examples: systema:cally bad sequencing cycles, errors in base calling… – Batch effects (any errors afer random fragmenta:on of the RNA un:l it is input to the flow cell). Examples: PCR amplifica:on, reverse transcrip:on ar:facts… – Other effects non related to treatment. 11 Auer and Doerge. Genetics 185:405-416(2010)

  12. Balanced blocks by mul:plexing Auer and Doerge. Genetics 185:405-416(2010)

  13. Benefits of a proper design • NGS is benefited with design principles • Technical replicates can not replace biological replicates • It is possible to avoid mul:plexing with enough biological replicates and sequencing lanes • The advantages of mul:plexing are bigger than the disadvantages (cost, loss of sequencing depth, bar-code bias…) 13

  14. Design and contrast matrices

  15. Sta:s:cal models – We want to model the expected result of an outcome (dependent variable) under given values of other variables (independent variables) Arbitrary function (any shape) A set of k Expected value of variable Y independent variables E ( Y ) = f ( X ) (also called factors) This is the Y = f ( X ) + ε variability around the expected mean of y 15

  16. Design matrix – Represents the independent variables that have an influence in the response variable, but also the way we have coded the information and the design of the experiment. – For now, let’s restrict to models Y = β X + ε Stochastic error Response variable Parameter vector Design matrix 16

  17. Types of designs considered • Models with 1 factor – Models with two treatments – Models with several treatments • Models with 2 factors – Interac:ons • Paired designs • Models with categorical and con:nuous factors • TimeCourse Experiments • Mul:factorial models. 17

  18. Strategy • Define our set of samples • Define the factors, type of factors (con:nuous, categorical), number of levels… • Define the set of parameters: the effects we want to es:mate • Build the design matrix, that relates the informa:on that each sample contains about the parameters. • Es:mate the parameters of the model: tes:ng • Further es:ma:on (and tes:ng): contrast matrices.

  19. Models with 1 factor, 2 levels Treatme Sample Treatment Sample1 Treatment A Sample 2 Control Sample 3 Treatment A Sample 4 Control Sample 5 Treatment A Sample 6 Control Number of samples: 6 Number of factors: 1 Treatment: Number of levels: 2 Possible parameters (What differences are important)? - Effect of Treatment A - Effect of Control 19

  20. Design matrix for models with 1 factor, 2 levels Sample Treatment Sample1 Treatment A Sample 2 Control Sample 3 Treatment A Sample 4 Control Treat. A Control Sample 5 Treatment A Parameters (coefficients, Sample 6 Control levels of the variable) ! $ ! $ S 1 Sample 1 1 0 # & # & Sample 2 S 2 0 1 # & # & ! $ T Sample 3 # & # & S 3 1 0 = # & # & # & C Sample 4 " % 0 1 S 4 # & # & Sample 5 1 0 S 5 # & # & # & Sample 6 0 1 # & S 6 " % " % C is the mean expression of the control T is the mean expression of the treatment Design Matrix Equivalent to a t-test 20

  21. Design matrix for models with 1 factor, 2 levels Sample Treatment Sample1 Treatment A Sample 2 Control Sample 3 Treatment A Sample 4 Control Treat. A Control Sample 5 Treatment A Parameters (coefficients, Sample 6 Control levels of the variable) ! $ ! $ S 1 Sample 1 1 0 # & # & Sample 2 S 2 0 1 # & # & ! $ T Sample 3 # & # & S 3 1 0 = # & # & # & C Sample 4 " % 0 1 S 4 # & # & Sample 5 1 0 S 5 # & # & # & Sample 6 0 1 # & S 6 " % " % Design Matrix Equivalent to a t-test 21

  22. Intercepts Different parameteriza:on: using intercept Let’s now consider this parameteriza:on: Sample Treatment Sample1 Treatment A C= Baseline expression T A = Baseline expression + effect of treatment Sample 2 Control Sample 3 Treatment A So the set of parameters are: Sample 4 Control Sample 5 Treatment A C = Control (mean expression of the control) Sample 6 Control a = T A – Control (mean change in expression under treatment 22

  23. Intercept Different parameteriza:on: using intercept Treatment A Intercept Parameters (coefficients, levels of the variable) ! $ ! $ S 1 Sample 1 1 1 # & # & Sample 2 S 2 1 0 # & # & ! $ β 0 Sample 3 # & # & S 3 1 1 = # & # & # & a Sample 4 # & 1 0 S 4 " % # & # & Sample 5 1 1 S 5 # & # & # & Intercept measures the Sample 6 1 0 # & S 6 " % " % baseline expression. a measures now the differen:al expression between Treatment A and Design Matrix Control 23

  24. Contrast matrices Are the two parameteriza:ons equivalent? " $ ˆ T " $ & ' 1 − 1 = T − C # % ˆ & ' C # % Contrast matrices allow us to es:mate (and test) linear Contrast matrix combina:ons of our coefficients. 24

  25. Models with 1 factor, more than 2 levels Treatme Sample Treatment Sample1 Treatment A Sample 2 Treatment B Sample 3 Control Sample 4 Treatment A Sample 5 Treatment B Sample 6 Control ANOVA models Number of samples: 6 Number of factors: 1 Treatment: Number of levels: 3 Possible parameters (What differences are important)? - Effect of Treatment A - Effect of Treatment B - Effect of Control 25 - Differences between treatments?

  26. Design matrix for ANOVA models ! $ ! $ S 1 1 0 0 # & # & ! $ T A S 2 0 1 0 Sample Treatment # & # & # & # & # & 0 0 1 S 3 T B # & Sample1 Treatment A = # & # & 1 0 0 S 4 # & Sample 2 Treatment B # & C # & # & " % 0 1 0 S 5 # & # & Sample 3 Control # & 0 0 1 # & S 6 " % " % Sample 4 Treatment A Sample 5 Treatment B ! $ ! $ S 1 1 1 0 Sample 6 Control # & # & ! $ β 0 S 2 1 0 1 # & # & # & # & # & S 3 1 0 0 a # & = # & # & 1 1 0 S 4 # & b # & # & # & " % 1 1 1 S 5 # & # & # & 1 0 0 # & S 6 " % " % 26

Recommend


More recommend