my p value is lower than your p value beyond gwas in
play

My P value is lower than your P value! Beyond GWAS in livestock - PowerPoint PPT Presentation

My P value is lower than your P value! Beyond GWAS in livestock genomics Joanna Szyda Motivation P value based inference Motivation Biology emerges from pathways, not from single genes Eric Lander Motivation Combine various sources


  1. My P value is lower than your P value! Beyond GWAS in livestock genomics Joanna Szyda

  2. Motivation P value based inference

  3. Motivation „ Biology emerges from pathways, not from single genes ” Eric Lander

  4. Motivation • Combine various sources of biological information • Use computational resources (data analysis) • Use brain  (biological conclusions)

  5. Outline Data set 1  Illustration of methodology and biological conclusions ARSBFGLBAC10172 4408169577_E B B 0.8830 9.9999 ARSBFGLBAC1020 4408169577_E A B 0.8990 9.9999 ARSBFGLBAC10245 4408169577_E B B 0.6582 9.9999 Combine selected sources of information Data set 2  Illustration of the available genetic variability @HWI WI-1K 1KL15 157: 7:87: 7:C3N 3NCK CKACX CXX: X:8: 8:230 307:2 :203 034:7 :7845 453 3 2:N :N:0 :0:A :AGTT TTCC GG GGGA GAACT CTTGC GCTG TGTAT ATGTG TGCA CAGGG GGAG AGCA CAGGT GTGCT CTCT CTGTG TGCCA CAAC ACCTG TGGA GAGG GGGGA GAGGG GGAT ATGGG GGGTG TGGG GGA + <= <=?DBDA DAB:+ :+<? <?<CB CB@GE GEED ED>?@ ?@A@ A@AA AACF): ):CE CECG CG@GF GFIGG GGFF FFFFG FGFI FIBF BFA<' <'5@E @E4; 4;5=@ =@?3> 3>88 889

  6. Data Set 1  SNP ARSBFGLBAC10172 4408169577_E B B 0.8830 ARSBFGLBAC1020 4408169577_E A B 0.8990 ARSBFGLBAC10245 4408169577_E B B 0.6582 ARSBFGLBAC10345 4408169577_E A B 0.9092 ARSBFGLBAC10365 4408169577_E B B 0.8021 ARSBFGLBAC10375 4408169577_E B B 0.8858 ARSBFGLBAC10591 4408169577_E A A 0.8670 ARSBFGLBAC10793 4408169577_E B B 0.8722 ARSBFGLBAC10867 4408169577_E A A 0.9316 ARSBFGLBAC10919 4408169577_E A B 0.7805 ARSBFGLBAC10952 4408169577_E A B 0.9314 ARSBFGLBAC10960 4408169577_E A B 0.5666 ARSBFGLBAC10975 4408169577_E A B 0.8665 ARSBFGLBAC10986 4408169577_E A B 0.8687 ARSBFGLBAC10993 4408169577_E B B 0.8146 ARSBFGLBAC11000 4408169577_E A A 0.9135 ARSBFGLBAC11003 4408169577_E A A 0.9454 ARSBFGLBAC11007 4408169577_E B B 0.9106 ARSBFGLBAC11025 4408169577_E B B 0.8742 ARSBFGLBAC11028 4408169577_E A A 0.8534 ARSBFGLBAC11034 4408169577_E B B 0.5769 ARSBFGLBAC11039 4408169577_E B B 0.8987

  7. Data Set 1  SNP 2 601 HF bulls  black-white & red-white  pedigree 10 355 individuals  Illumina 50 K chip SNP  SNP positions  pairwise LD  genomic position (Ensembl) Gene  Gene Ontology terms (GO)  metabolic pathways (KEGG)  deregressed national EBV Phenotype  complex inheritance mode

  8. Data set 1  SNP effect estimation • y deregressed EBV for protein yield • µ general mean • q additive SNP • Z  { -1, 0, 1 } • e residual

  9. Data set 1  gene networks identify physiological processes underlying complex traits + corresponding genes

  10. Data set 1  gene effect estimation • 46 267 SNP estimates • varying LD to causal variants - log 10 P • multiple testing correction • only the most significant SNP associations detected • 4 345 gene estimates • SNPs within / close to genes • better interpretation • • 6 „major” genes for PY LHX8 HEPHL1 DHX34 • BTA: 3, 8, 17, 18, 19, 29 FBP2 TANC2 AP1B1 • … find the other genes 

  11. Data set 1  network construction for PY • 44 genes • 660 GO • 75 KEGG

  12. Data set 1  network validation Functional SNP effect information estimation • GO EBV permutation • KEGG X 100 Gene effect Network construction estimation Gene selection

  13. Data set 1  testing functional features For each GO / KEGG: Odds for the original data Odds for permuted data

  14. Data set 1  results Significant KEGG pathways for PY (examples) • Lysosome (bta04142) CI: 8.8-51.7 → P<0.00001  protein degradation, tissue regression, inflammation • Cell cycle (bta04110) CI: 3.0-11.4 → P=0.00005  development of mammary epithelium • Pentose phosphate (bta00030) CI: 7.5-245 → P=0.00588  NADPH production in tissues engaged in biosynthesis

  15. Data set 1  trait similarity identify similarities between complex traits

  16. Data set 1  trait similarity GO / genes GO / genes Trait similarity

  17. Data set 1  similarity metrics Cosine metric: Jaccard metric: • N ij number of GO / genes in networks for trait i and j • N i number of GO / genes in a network for trait i • N j number of GO / genes in a network for trait j

  18. Data set 1  results Similarity between traits 0.7 genes cosine 0.6 genes Jaccard 0.5 GO Jaccard 0.4 0.3 0.2 0.1 0.0 PY, FY PY, MY PY, SCS PY, STA FY, MY FY, SCS FY, STA MY, SCSMY, STA SCS, STA

  19. Data Set 2  DNA sequence There is much more informative data to do it

  20. Data Set 2  DNA sequence @HWI-1KL157:67:D2AGFACXX:1:2316:10694:65033 2:N:0: CTATTACACGCCCCCGAAGCTCTAGCGGGTGTTCTCACGCACCCAAGGCATCCTCAACCACCACCATTTCTG + CCCFFADFHHGHHJJGGIIG@HIIFEHIJ;@F@DGGGGCCEB8BCDDDDBACDDCDDDBDDBDDDBDDDEE @HWI-1KL157:67:D2AGFACXX:1:2316:10671:65034 2:N:0: AGTGTATTACTGTCTTTGCACTCTTTAATCCTAGGTGACTTTTGGGGGTTCAGTATCAGATAGAGAACATATT + ?@@ADDDDHDBFHCEHIIBHEHEEHEH>BF?EFHCHFGFGFHH@HIG:6@=CGICAGG=7@@CHG===7 @HWI-1KL157:67:D2AGFACXX:1:2316:10609:65040 2:N:0: CTGGAGTGGGTATCCTTTCCCTTATCCAGGTTATCTTCCCAACCCAGGGATTGAACCCAGGTATCCTGGATT + @CCFDD2AFHDH<AFHII4CGIIJIJJGGIGIIJIIIJJJIHHIJJJIJEFGGICHHGGIIIHEHIHHGHHHFFFFFDDDDDD @HWI-1KL157:67:D2AGFACXX:1:2316:10717:65046 2:N:0: TACTCAAAAGAATCTGTGTTTAGACAGTTTAGAACATCTCCTACCTCTCACAGTTGGGAGGCTCTGAACAAT + @@@DD;DDHDBCFBEGGDHGHI<FBHIAEHE@GGEEFFHGDGIHGIGIIGBGGFGHIAFEGGHGIIIIIIEHH @HWI-1KL157:67:D2AGFACXX:1:2316:10507:65046 2:N:0: GAAGAAAAACTGTGTTTATGTCTCGAACATAATAAAGTCAACATGGATTATGTTAACTGTAATTGTACATCTA + @@@DDDDBHHHHBDBBHBHH3ACHHIIGBHIGCHGHGHIHHEGHII?4BFBDHHIGIDGDGFCCBF@FHI @HWI-1KL157:67:D2AGFACXX:1:2316:10653:65048 2:N:0: TATTGAAAACCTACCTACTAGGTAAATCTTAAGTAGGTTTAATCATGTCCACGTTTCCACTTGTTCACTCATTC

  21. Data Set 2  DNA sequence  paternal half-sib 32 HF cows whole genome DNA sequence  Illumina HiSeq  UMD3.1 reference genome alignment  BWA, Smalt variant calling  FreeBayes, GATK, Samtools, CNVnator

  22. Data set 2  genomic variability describe genetic variability on the DNA level  basis for complex trait modelling

  23. Data set 2  averaged coverage Genome averaged coverage for each cow 18 18 16 16 14 14 12 12 coverage coverage 10 10 8 8 • min: 5 6 6 • max: 17 4 4 2 2 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 Cow ID Cow ID

  24. Data set 2  coverage along the genome Chromosomewise coverage for a particular cow BTA01 : 𝒚 =8.56 =8.03 BTA10 : 𝒚 =8.14 BTA20 : 𝒚 BTX : 𝒚 =8.60

  25. Data set 2  SNPs Total number of identified SNPs 7 000 000 6 000 000 5 000 000 • min: 2 063 811  0.08% of genome # SNP 4 000 000 • max: 6 117 976  0.23% of genome • sd: 663 223 3 000 000 • sd -32 : 216 861 2 000 000 • c 2 P < 10 -4 1 000 000 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 Cow ID

  26. Data set 2  SNPs Total number of identified SNPs 1 000 000 1 3 alleles % of SNPs 900 000 0.5 800 000 total number of SNPs 700 000 0 600 000 1 4 7 10 13 16 19 22 25 28 500 000 BTA 400 000 0.008 4 alleles 300 000 • 15 272 427 % of SNPs 0.006 200 000 0.004 • 99.16% biallelic 100 000 0.002 0 0 1 4 7 10 13 16 19 22 25 28 1 4 7 10 13 16 19 22 25 28 BTA BTA

  27. Data set 2  SNPs Missense SNPs 300 0.006 250 0.005 number of missense SNPs missense SNP density 200 0.004 150 0.003 100 0.002 50 0.001 0 0 HK SS NS HK SS NS Housekeeping Strong Selection Neutral to Selection

  28. Data Set 2  SNPs Housekeeping  beta Actin, Beta-2-microglobulin, Glyceraldehyde-3- phosphate, Hydroxymethylbilane synthase, beta Heat shock 90kDa protein 1, Ubiquitin C Strong Selection  diacylglycerol O-acyltransferase 1, alpha 6 integrin, ADP- ribosylation factor-like 4A, bone morphogenetic protein 4, myeloid differentiation primary response Neutral to Selection  URI1 prefoldin-like chaperone, low density lipoprotein receptor-related protein, ATP/GTP binding protein 1, ankyrin repeat domain32, spectrin repeat containing, nuclear envelope 2

  29. Data set 2  SNPs Missense SNPs 300 0.006 250 0.005 number of missense SNPs missense SNP density 200 0.004 150 0.003 100 0.002 50 0.001 0 0 HK SS NS HK SS NS Housekeeping Strong Selection Neutral to Selection

  30. Data set 2  SNPs Missense SNPs • ANOVA: SNPdensity = category + gene(category) F  P = 0.230 category F  P = 0.008 gene(category) • ANOVA: #SNP = category + gene(category) F  P < 10 -4 category F  P < 10 -4 gene(category) House keeping Neutral to & selection Strong Selection

Recommend


More recommend