lessons from gene expression
play

Lessons from Gene Expression Kasper Daniel Hansen < - PowerPoint PPT Presentation

Lessons from Gene Expression Kasper Daniel Hansen < khansen@jhsph.edu | www.hansenlab.org > McKusick-Nathans Institute of Genetic Medicine Department of Biostatistics Johns Hopkins University 1 Genomic Data Science Specialization


  1. Lessons from Gene Expression Kasper Daniel Hansen < khansen@jhsph.edu | www.hansenlab.org > McKusick-Nathans Institute of Genetic Medicine Department of Biostatistics Johns Hopkins University 1

  2. Genomic Data Science Specialization @Coursera by JHU Liliana Florea 
 Kasper D. Hansen 
 6 classes 
 Je ff Leek 
 Mihaela Pertea 
 4 weeks per class 
 Steven Salzberg James Taylor Continuous rollout 2

  3. 3

  4. RNA-seq scRNA-seq Microarrays 4

  5. Replication / Reproducibility Replicate samples Replicate the experiment Replicate the conclusion Computation replication / reproduction It is di ffi cult to get a man to understand something, when his salary depends upon his not understanding it! 
 - Upton Sinclair 5

  6. Science “Proof” “Crap” 6

  7. Science “Proof” “Crap” 7

  8. Science Most of biology 8

  9. Different sub-fields have different standards “without knowing anything - if this was your plot, what do you think about that little guy top right" http://drbecca.scientopia.org/2015/08/18/whose-problem-is-the-reproducibility-crisis-anyway/ 9

  10. Technical variation Focus 10

  11. Controls 11

  12. � Seq. tech. does not remove biol. variability a b Sequencing s.d. Sequencing s.d. 1.5 1.5 0.5 0.5 cor: 0.592 n : 5,003 cor: 0.492 n : 2,463 0.5 1.5 0.5 1.5 Array s.d. Array s.d. c COX4NB RASGRP1 Sequencing 1 Centered expression –1 1 Array –1 10 40 10 40 Sample index Hansen (2011) Nat. Biotech 12

  13. Number of replicates GWAS Cell Biology 13

  14. Number of replicates GWAS Cell Biology 14

  15. Number of replicates “We applied MixupMapper to fj ve publicly available human genetical genomics datasets. On average, 3% of all analyzed samples had been assigned incorrect expression phenotypes: in one of the datasets 23% of the samples had incorrect expression phenotypes. “ Westra (2011) Bioinformatics Studies with huge number of samples have challenges as well 15

  16. PERSPECTIVES Batch effects h 0.16 Labels not shuffled 0.14 Fraction of sign-reversed correlations Labels shuffled r 0.12 e 0.10 0.06 0.06 n 0.04 s 0.02 n si- 0 2002/2003 2002/2004 2002/2005 2003/2004 2003/2005 2004/2005 ly Batch pair , Figure 3 | Batch effects also change the correlations between genes. We normalized every gene in the second gene expression data set 2 in Tackling the widespread and critical impact of batch e ff ects to mean 0, variance 1 within each batch. (The 2006 batch was omitted owing to small sample size.) We identified all significant correlations ( p < 0.05) in high-throughput methods between pairs of genes within each batch using a linear model. We looked at genes that showed a significant correlation in two batches and counted the fraction of times that the correlation changed between the two batches. A large percentage of significant correlations reversed signs across batches, Leek (2010) Nat Rev Genet 16 suggesting that the correlation structure between genes changes substantially across batches. To confirm this phenomenon is due to batch, we repeated the process — looking for significant correla- tions that changed sign across batches — but with the batch labels randomly permuted. With random batches, a much smaller fraction of significant correlations change signs. This suggests that correlation patterns differ by batch, which would affect rank-based prediction methods as well as system biology approaches that rely on between-gene correlation to estimate pathways. Experimental design solutions Glossary GENETICS 737

  17. Combining experiments - Gene Expression Barcode Biggest barrier is metadata Latest: McCall (2014) NAR 17

  18. Speed of light measured by different groups WJ Youden (1972) Technometrics 18

  19. Analysis One-of-a-kind As-a-utility 19

  20. How do we know whether something works Fake data 
 Real data (simulations) Well designed, 
 well executed 
 reference experiments 20

  21. Lessons from gene expression Huge advantage in common so fu ware platform / common formats Designed reference experiments Technological standardization Physical models does not help (not clear this is general) All data is publicly available data 21

Recommend


More recommend