harnessing crowd sourcing to assess genes based on effect
play

Harnessing Crowd-Sourcing to Assess Genes based on Effect Size - PowerPoint PPT Presentation

Harnessing Crowd-Sourcing to Assess Genes based on Effect Size Using Visual Inference Methods Di Cook, Monash University Joint work with Niladri Roy Chowdhury, Eric Hare, Mahbub Majumder, Michelle Graham, Tengfei Yin, Heike Hofmann Outline


  1. Harnessing Crowd-Sourcing to Assess Genes based on Effect Size Using Visual Inference Methods Di Cook, Monash University Joint work with Niladri Roy Chowdhury, Eric Hare, Mahbub Majumder, Michelle Graham, Tengfei Yin, Heike Hofmann

  2. Outline Analysis outline, edgeR, … background Our top genes: good, maybe, ugly Why - video of dispersion First experiment, is there any structure Re-analysis of published study VicBioStat 2016, Melbourne, Australia 2 …36

  3. Our Data RNA libraries sequenced by Illumina HiSeq2000 Alignment by bowtie Rsamtools to import bam files, rtracklayer to import gff files GenomicRanges to count reads Negative binomial model using edgeR to compute differential expression FDR yields ~2000 significantly expressed genes VicBioStat 2016, Melbourne, Australia 3 …36

  4. TOP 25 GENES geno Emptyvector RPA Glyma13g12080 Glyma13g11960 Glyma13g12010 Glyma06g03100 Glyma10g36890 1 1 2 3 4 5 10 5 ? ? ? ? " The Good ( ✔ ), 0 Glyma16g29220 Glyma18g10330 Glyma03g06420 Glyma09g28100 Glyma16g05640 6 7 8 9 Maybe ( ? ) & 10 10 5 Ugly ( ✘ ) ? ? ? " " 0 log2(normalized counts + 1) Glyma09g03270 Glyma09g29370 Glyma09g24780 Glyma14g34080 Glyma02g39150 ordered list of 14 11 12 13 15 10 genes 5 ? ? ? ? ! 0 Glyma02g03290 Glyma08g36390 Glyma20g26600 Glyma01g38130 Glyma18g01720 16 17 18 19 20 10 5 ? ? ? ! ! 0 Glyma05g16350 Glyma18g07090 Glyma12g36140 Glyma12g03280 Glyma02g13850 21 22 23 25 24 25 10 5 ? ? ! ! ! 0 insufficient sufficient insufficient sufficient insufficient sufficient insufficient sufficient insufficient sufficient Fe

  5. TOP 25 GENES geno Emptyvector RPA Glyma13g12080 Glyma13g11960 Glyma13g12010 Glyma06g03100 Glyma10g36890 1 2 3 4 5 10 5 ? ? ? ? " The Good ( ✔ ), 0 Glyma16g29220 Glyma18g10330 Glyma03g06420 Glyma09g28100 Glyma16g05640 6 7 8 9 Maybe ( ? ) & 10 10 5 Ugly ( ✘ ) ? ? ? " " 0 log2(normalized counts + 1) Glyma09g03270 Glyma09g29370 Glyma09g24780 Glyma14g34080 Glyma02g39150 ordered list of 14 11 12 13 15 10 genes 5 ? ? ? ? ! 0 Glyma02g03290 Glyma08g36390 Glyma20g26600 Glyma01g38130 Glyma18g01720 16 17 18 19 20 10 5 ? ? ? ! ! 0 Glyma05g16350 Glyma18g07090 Glyma12g36140 Glyma12g03280 Glyma02g13850 21 22 23 25 24 10 5 ? ? ! ! ! 0 insufficient sufficient insufficient sufficient insufficient sufficient insufficient sufficient insufficient sufficient Fe

  6. TOP 25 GENES geno Emptyvector RPA Glyma13g12080 Glyma13g11960 Glyma13g12010 Glyma06g03100 Glyma10g36890 1 2 3 4 5 10 5 ? ? ? ? " The Good ( ✔ ), 0 Glyma16g29220 Glyma18g10330 Glyma03g06420 Glyma09g28100 Glyma16g05640 6 7 8 9 Maybe ( ? ) & 10 10 5 Ugly ( ✘ ) ? ? ? " " 0 log2(normalized counts + 1) Glyma09g03270 Glyma09g29370 Glyma09g24780 Glyma14g34080 Glyma02g39150 ordered list of 14 11 12 13 15 10 genes 5 ? ? ? ? ! 0 Glyma02g03290 Glyma08g36390 Glyma20g26600 Glyma01g38130 Glyma18g01720 16 17 18 19 20 10 5 ? ? ? ! ! 0 Glyma05g16350 Glyma18g07090 Glyma12g36140 Glyma12g03280 Glyma02g13850 21 22 23 25 24 10 Do you agree? 5 ? ? ! ! ! 0 insufficient sufficient insufficient sufficient insufficient sufficient insufficient sufficient insufficient sufficient Fe

  7. Why? Dispersion

  8. Why?

  9. Why? Level N inflates dispersion

  10. Why?

  11. Why? Gene B inflates dispersion

  12. Why?

  13. Why? In reality, gene B here inflates dispersion, making gene A not signif.

  14. Why?

  15. tagwise dispersion log (counts pm) cranvas ggplot2

  16. tagwise dispersion Each point = one gene log (counts pm) cranvas ggplot2

  17. tagwise dispersion Each point = one gene Trended dispersion log (counts pm) cranvas ggplot2

  18. tagwise dispersion Classical Each point = interaction plot one gene of one gene Trended dispersion log (counts pm) cranvas ggplot2

  19. tagwise dispersion Classical Each point = interaction plot one gene of one gene Trended dispersion log (counts pm) Plots linked, clicking on a point in left plot shows the interaction plot for that gene cranvas ggplot2

  20. tagwise dispersion Classical Each point = interaction plot one gene of one gene log (counts pm) Plots linked, clicking on a point in left plot shows the interaction plot for that gene cranvas ggplot2

  21. tagwise dispersion Classical interaction plot of one gene log (counts pm) Plots linked, clicking on a point in left plot shows the interaction plot for that gene cranvas ggplot2

  22. tagwise dispersion log (counts pm) Plots linked, clicking on a point in left plot shows the interaction plot for that gene cranvas ggplot2

  23. tagwise dispersion log (counts pm) cranvas ggplot2

  24. tagwise dispersion log (counts pm) cranvas ggplot2

  25. So we ran a little experiment Compare the results with random results Take the experimental design, 2x2x3, and permute the labels Re-run the analysis, record most significant gene Plot the results VicBioStat 2016, Melbourne, Australia 7 …36

  26. In which of these plots do the two groups have the most vertical difference? 1 2 3 4 5 6 7 8 9 10 log2(normalized counts + 1) 11 12 13 14 15 16 17 18 19 20 Emptyvector RPA Emptyvector RPA Emptyvector RPA Emptyvector RPA Emptyvector RPA geno geno_1_5, 5/7

  27. In which of these plots is the green line the steepest, and the spread of the green points relatively small? 1 2 3 4 5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 6 7 8 9 10 ● ● log2(normalized counts + 1) ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 11 12 13 14 15 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 16 17 18 19 20 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● i s i s i s i s i s Fe interaction_2_1, 4/5

  28. Experiment Five different sets of null plots Five different locations of true data plot inside the lineup Shown to a sample of Amazon Turk workers Overwhelmingly in both cases, the true data is picked, slightly less so for interaction VicBioStat 2016, Melbourne, Australia 10 …36

  29. Experiment Five different sets of null plots Five different locations of true data plot inside the lineup Shown to a sample of Amazon Turk workers Overwhelmingly in both cases, the true data is picked, slightly less so for interaction Data has SOME SIGNAL! VicBioStat 2016, Melbourne, Australia 10 …36

  30. Human vs chimp Data from “Sex-specific and lineage-specific alternative splicing in primates” Blekhman, Marioni, Zumbo, Stephens, Gilad, Genome Research, 2010 20: 180-189, http:// genome.cshlp.org/content/suppl/2009/12/16/ gr.099226.109.DC1.html Human, chimp (and rhesus) liver RNA 3x2(M/F) individuals, 2 reps for each species VicBioStat 2016, Melbourne, Australia Image from son’s T − shirt! 11 …36

  31. Human vs chimp Pairwise comparisons of species Likelihoods compared, FDR<0.05 VicBioStat 2016, Melbourne, Australia 12 …36

  32. Human vs chimp Re-analyzed using edgeR, exactTest (Yes, not taking dependencies into account - but a quick re-do of analysis wanted) Just Human-Chimp Yields 3630 differentially expressed genes, at FDR<0.01, mostly overlapping with published results VicBioStat 2016, Melbourne, Australia 13 …36

  33. Visual testing Create multiple sets of permutations of the labels of human, chimp Conduct edgeR/exactTest on each of the permutations Record the top 2500 genes based on p- value Make lineups of j’th ordered gene of actual data against those of permuted data VicBioStat 2016, Melbourne, Australia 14 …36

  34. You try Pick one plot among the 20 “Which plot has the largest vertical difference between the two groups?” Point your mobile device to this web page goo.gl/gG60uR VicBioStat 2016, Melbourne, Australia 15 …36

  35. Human-chimp 1 1 2 3 4 5 3 3 2 2 1 1 0 0 6 7 8 9 10 10 3 3 2 2 1 1 log10(cpm) log10(cpm) 0 11 11 12 12 13 13 14 14 15 15 3 3 2 2 1 1 0 16 16 17 17 18 18 19 19 20 20 3 3 2 2 1 0 HS HS PT PT HS HS PT PT HS HS PT PT HS HS PT PT HS HS PT PT

  36. Human-chimp 2 1 2 3 4 5 3 3 2 2 1 1 0 0 6 7 8 9 10 10 3 3 2 2 1 1 log10(cpm) log10(cpm) 0 11 11 12 12 13 13 14 14 15 15 3 3 2 2 1 1 0 16 16 17 17 18 18 19 19 20 20 3 3 2 2 1 1 0 HS HS PT PT HS HS PT PT HS HS PT PT HS HS PT PT HS HS PT PT

  37. Human-chimp 3 1 2 3 4 5 3 3 2 2 1 1 0 0 6 7 8 9 10 10 3 3 2 2 1 1 log10(cpm) log10(cpm) 0 11 11 12 12 13 13 14 14 15 15 3 3 2 2 1 1 0 16 16 17 17 18 18 19 19 20 20 3 3 2 2 1 0 HS HS PT PT HS HS PT PT HS HS PT PT HS HS PT PT HS HS PT PT

Recommend


More recommend