My P value is lower than your P value! Beyond GWAS in livestock - PowerPoint PPT Presentation

My P value is lower than your P value! Beyond GWAS in livestock genomics Joanna Szyda

Motivation P value based inference

Motivation „ Biology emerges from pathways, not from single genes ” Eric Lander

Motivation • Combine various sources of biological information • Use computational resources (data analysis) • Use brain  (biological conclusions)

Outline Data set 1  Illustration of methodology and biological conclusions ARSBFGLBAC10172 4408169577_E B B 0.8830 9.9999 ARSBFGLBAC1020 4408169577_E A B 0.8990 9.9999 ARSBFGLBAC10245 4408169577_E B B 0.6582 9.9999 Combine selected sources of information Data set 2  Illustration of the available genetic variability @HWI WI-1K 1KL15 157: 7:87: 7:C3N 3NCK CKACX CXX: X:8: 8:230 307:2 :203 034:7 :7845 453 3 2:N :N:0 :0:A :AGTT TTCC GG GGGA GAACT CTTGC GCTG TGTAT ATGTG TGCA CAGGG GGAG AGCA CAGGT GTGCT CTCT CTGTG TGCCA CAAC ACCTG TGGA GAGG GGGGA GAGGG GGAT ATGGG GGGTG TGGG GGA + <= <=?DBDA DAB:+ :+<? <?<CB CB@GE GEED ED>?@ ?@A@ A@AA AACF): ):CE CECG CG@GF GFIGG GGFF FFFFG FGFI FIBF BFA<' <'5@E @E4; 4;5=@ =@?3> 3>88 889

Data Set 1  SNP ARSBFGLBAC10172 4408169577_E B B 0.8830 ARSBFGLBAC1020 4408169577_E A B 0.8990 ARSBFGLBAC10245 4408169577_E B B 0.6582 ARSBFGLBAC10345 4408169577_E A B 0.9092 ARSBFGLBAC10365 4408169577_E B B 0.8021 ARSBFGLBAC10375 4408169577_E B B 0.8858 ARSBFGLBAC10591 4408169577_E A A 0.8670 ARSBFGLBAC10793 4408169577_E B B 0.8722 ARSBFGLBAC10867 4408169577_E A A 0.9316 ARSBFGLBAC10919 4408169577_E A B 0.7805 ARSBFGLBAC10952 4408169577_E A B 0.9314 ARSBFGLBAC10960 4408169577_E A B 0.5666 ARSBFGLBAC10975 4408169577_E A B 0.8665 ARSBFGLBAC10986 4408169577_E A B 0.8687 ARSBFGLBAC10993 4408169577_E B B 0.8146 ARSBFGLBAC11000 4408169577_E A A 0.9135 ARSBFGLBAC11003 4408169577_E A A 0.9454 ARSBFGLBAC11007 4408169577_E B B 0.9106 ARSBFGLBAC11025 4408169577_E B B 0.8742 ARSBFGLBAC11028 4408169577_E A A 0.8534 ARSBFGLBAC11034 4408169577_E B B 0.5769 ARSBFGLBAC11039 4408169577_E B B 0.8987

Data Set 1  SNP 2 601 HF bulls  black-white & red-white  pedigree 10 355 individuals  Illumina 50 K chip SNP  SNP positions  pairwise LD  genomic position (Ensembl) Gene  Gene Ontology terms (GO)  metabolic pathways (KEGG)  deregressed national EBV Phenotype  complex inheritance mode

Data set 1  SNP effect estimation • y deregressed EBV for protein yield • µ general mean • q additive SNP • Z  { -1, 0, 1 } • e residual

Data set 1  gene networks identify physiological processes underlying complex traits + corresponding genes

Data set 1  gene effect estimation • 46 267 SNP estimates • varying LD to causal variants - log 10 P • multiple testing correction • only the most significant SNP associations detected • 4 345 gene estimates • SNPs within / close to genes • better interpretation • • 6 „major” genes for PY LHX8 HEPHL1 DHX34 • BTA: 3, 8, 17, 18, 19, 29 FBP2 TANC2 AP1B1 • … find the other genes 

Data set 1  network construction for PY • 44 genes • 660 GO • 75 KEGG

Data set 1  network validation Functional SNP effect information estimation • GO EBV permutation • KEGG X 100 Gene effect Network construction estimation Gene selection

Data set 1  testing functional features For each GO / KEGG: Odds for the original data Odds for permuted data

Data set 1  results Significant KEGG pathways for PY (examples) • Lysosome (bta04142) CI: 8.8-51.7 → P<0.00001  protein degradation, tissue regression, inflammation • Cell cycle (bta04110) CI: 3.0-11.4 → P=0.00005  development of mammary epithelium • Pentose phosphate (bta00030) CI: 7.5-245 → P=0.00588  NADPH production in tissues engaged in biosynthesis

Data set 1  trait similarity identify similarities between complex traits

Data set 1  trait similarity GO / genes GO / genes Trait similarity

Data set 1  similarity metrics Cosine metric: Jaccard metric: • N ij number of GO / genes in networks for trait i and j • N i number of GO / genes in a network for trait i • N j number of GO / genes in a network for trait j

Data set 1  results Similarity between traits 0.7 genes cosine 0.6 genes Jaccard 0.5 GO Jaccard 0.4 0.3 0.2 0.1 0.0 PY, FY PY, MY PY, SCS PY, STA FY, MY FY, SCS FY, STA MY, SCSMY, STA SCS, STA

Data Set 2  DNA sequence There is much more informative data to do it

Data Set 2  DNA sequence @HWI-1KL157:67:D2AGFACXX:1:2316:10694:65033 2:N:0: CTATTACACGCCCCCGAAGCTCTAGCGGGTGTTCTCACGCACCCAAGGCATCCTCAACCACCACCATTTCTG + CCCFFADFHHGHHJJGGIIG@HIIFEHIJ;@F@DGGGGCCEB8BCDDDDBACDDCDDDBDDBDDDBDDDEE @HWI-1KL157:67:D2AGFACXX:1:2316:10671:65034 2:N:0: AGTGTATTACTGTCTTTGCACTCTTTAATCCTAGGTGACTTTTGGGGGTTCAGTATCAGATAGAGAACATATT + ?@@ADDDDHDBFHCEHIIBHEHEEHEH>BF?EFHCHFGFGFHH@HIG:6@=CGICAGG=7@@CHG===7 @HWI-1KL157:67:D2AGFACXX:1:2316:10609:65040 2:N:0: CTGGAGTGGGTATCCTTTCCCTTATCCAGGTTATCTTCCCAACCCAGGGATTGAACCCAGGTATCCTGGATT + @CCFDD2AFHDH<AFHII4CGIIJIJJGGIGIIJIIIJJJIHHIJJJIJEFGGICHHGGIIIHEHIHHGHHHFFFFFDDDDDD @HWI-1KL157:67:D2AGFACXX:1:2316:10717:65046 2:N:0: TACTCAAAAGAATCTGTGTTTAGACAGTTTAGAACATCTCCTACCTCTCACAGTTGGGAGGCTCTGAACAAT + @@@DD;DDHDBCFBEGGDHGHI<FBHIAEHE@GGEEFFHGDGIHGIGIIGBGGFGHIAFEGGHGIIIIIIEHH @HWI-1KL157:67:D2AGFACXX:1:2316:10507:65046 2:N:0: GAAGAAAAACTGTGTTTATGTCTCGAACATAATAAAGTCAACATGGATTATGTTAACTGTAATTGTACATCTA + @@@DDDDBHHHHBDBBHBHH3ACHHIIGBHIGCHGHGHIHHEGHII?4BFBDHHIGIDGDGFCCBF@FHI @HWI-1KL157:67:D2AGFACXX:1:2316:10653:65048 2:N:0: TATTGAAAACCTACCTACTAGGTAAATCTTAAGTAGGTTTAATCATGTCCACGTTTCCACTTGTTCACTCATTC

Data Set 2  DNA sequence  paternal half-sib 32 HF cows whole genome DNA sequence  Illumina HiSeq  UMD3.1 reference genome alignment  BWA, Smalt variant calling  FreeBayes, GATK, Samtools, CNVnator

Data set 2  genomic variability describe genetic variability on the DNA level  basis for complex trait modelling

Data set 2  averaged coverage Genome averaged coverage for each cow 18 18 16 16 14 14 12 12 coverage coverage 10 10 8 8 • min: 5 6 6 • max: 17 4 4 2 2 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 Cow ID Cow ID

Data set 2  coverage along the genome Chromosomewise coverage for a particular cow BTA01 : 𝒚 =8.56 =8.03 BTA10 : 𝒚 =8.14 BTA20 : 𝒚 BTX : 𝒚 =8.60

Data set 2  SNPs Total number of identified SNPs 7 000 000 6 000 000 5 000 000 • min: 2 063 811  0.08% of genome # SNP 4 000 000 • max: 6 117 976  0.23% of genome • sd: 663 223 3 000 000 • sd -32 : 216 861 2 000 000 • c 2 P < 10 -4 1 000 000 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 Cow ID

Data set 2  SNPs Total number of identified SNPs 1 000 000 1 3 alleles % of SNPs 900 000 0.5 800 000 total number of SNPs 700 000 0 600 000 1 4 7 10 13 16 19 22 25 28 500 000 BTA 400 000 0.008 4 alleles 300 000 • 15 272 427 % of SNPs 0.006 200 000 0.004 • 99.16% biallelic 100 000 0.002 0 0 1 4 7 10 13 16 19 22 25 28 1 4 7 10 13 16 19 22 25 28 BTA BTA

Data set 2  SNPs Missense SNPs 300 0.006 250 0.005 number of missense SNPs missense SNP density 200 0.004 150 0.003 100 0.002 50 0.001 0 0 HK SS NS HK SS NS Housekeeping Strong Selection Neutral to Selection

Data Set 2  SNPs Housekeeping  beta Actin, Beta-2-microglobulin, Glyceraldehyde-3- phosphate, Hydroxymethylbilane synthase, beta Heat shock 90kDa protein 1, Ubiquitin C Strong Selection  diacylglycerol O-acyltransferase 1, alpha 6 integrin, ADP- ribosylation factor-like 4A, bone morphogenetic protein 4, myeloid differentiation primary response Neutral to Selection  URI1 prefoldin-like chaperone, low density lipoprotein receptor-related protein, ATP/GTP binding protein 1, ankyrin repeat domain32, spectrin repeat containing, nuclear envelope 2

Data set 2  SNPs Missense SNPs 300 0.006 250 0.005 number of missense SNPs missense SNP density 200 0.004 150 0.003 100 0.002 50 0.001 0 0 HK SS NS HK SS NS Housekeeping Strong Selection Neutral to Selection

Data set 2  SNPs Missense SNPs • ANOVA: SNPdensity = category + gene(category) F  P = 0.230 category F  P = 0.008 gene(category) • ANOVA: #SNP = category + gene(category) F  P < 10 -4 category F  P < 10 -4 gene(category) House keeping Neutral to & selection Strong Selection

My P value is lower than your P value! Beyond GWAS in livestock - PowerPoint PPT Presentation

My P value is lower than your P value! Beyond GWAS in livestock genomics Joanna Szyda Motivation P value based inference Motivation Biology emerges from pathways, not from single genes Eric Lander Motivation Combine various sources

EpiGraphDB Query for confounders http://epigraphdb.org/confounder/ (cf:Gwas)-[r1:MR]->

Ontologising the GWAS Catalog A picture paints a thousand traits Helen Parkinson, EBI 17

Imputation and its importance in GWAS Dhriti 5 th September 2018 Lecture 6 H3ABioNet 2018

An example of following up results in a two-stage GWAS design David Duffy In a 100K

Hybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS Jorge

Latent Variable models for GWAs Oliver Stegle Machine Learning and Computational Biology Research

Efficient Outsourcing GWAS using FHE Wenjie Lu, Jun Sakuma * Dept. of CS, University of

GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof.

Circuit Lower-bounds Lecture 24 Weak circuits are indeed weak 1 Circuit Lower-bounds 2

Beyond your dreams Beyond your dreams Beyond your dreams Destnaton Destnaton Israel

Beyond your dreams Beyond your dreams Beyond your dreams Destnaton ROMANIA Romana

Beyond your Dreams Beyond Your Dreams Beyond your dreams Destnaton Destnaton KOSOVO

Beyond your dreams Beyond Your Dreams Beyond your dreams Destnaton Destnaton

MEDIA DISRUPTION SEEING BEYOND SEEING BEYOND SEEING BEYOND SEEING BEYOND LED BY THE BLIND

Looking Beyond the Knob Looking Beyond the Knob Looking Beyond the Knob Looking Beyond the Knob

GWAS on your notebook: Semi-parallel linear and logis9c

Sub: Investor Presentation under Regulation 30 of SEBI (Listing Obligations and Disclosure

Pharmaceutical chemistry terminology Pharmacy It is concerned with the collection,

The Blood-brain Barrier Dr. Eszter Farkas- Dr. Ferenc Domoki

Henkel Q3 2012 Kasper Rorsted Carsten Knobel London Nov 16, 2012 1 November 16, 2012 Q3

X-linked adrenoleukodystrophy in the Arab ethnic group: Presentation and management of a child

Rare Disease Summer 2014 Webinar August 13, 2014 1 Welcome Bryan Luce, PhD, MBA Chief Science

MEDICATION INTERACTIONS AND PRECAUTIONS August 2020 HBO Monthly Meeting Presented by CHI

CD103+ dendritic cells elicit CD8+ T cell responses to accelerate kidney injury in Adriamycin

Sambuz

Useful Links

Newsletter

Mail Us

My P value is lower than your P value! Beyond GWAS in livestock - PowerPoint PPT Presentation

My P value is lower than your P value! Beyond GWAS in livestock genomics Joanna Szyda Motivation P value based inference Motivation Biology emerges from pathways, not from single genes Eric Lander Motivation Combine various sources

EpiGraphDB Query for confounders http://epigraphdb.org/confounder/ (cf:Gwas)-[r1:MR]-&gt;

Ontologising the GWAS Catalog A picture paints a thousand traits Helen Parkinson, EBI 17

Imputation and its importance in GWAS Dhriti 5 th September 2018 Lecture 6 H3ABioNet 2018

An example of following up results in a two-stage GWAS design David Duffy In a 100K

Hybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS Jorge

Latent Variable models for GWAs Oliver Stegle Machine Learning and Computational Biology Research

Efficient Outsourcing GWAS using FHE Wenjie Lu*, Jun Sakuma * * Dept. of CS, University of

GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof.

Circuit Lower-bounds Lecture 24 Weak circuits are indeed weak 1 Circuit Lower-bounds 2

Beyond your dreams Beyond your dreams Beyond your dreams Destnaton Destnaton Israel

Beyond your dreams Beyond your dreams Beyond your dreams Destnaton ROMANIA Romana

Beyond your Dreams Beyond Your Dreams Beyond your dreams Destnaton Destnaton KOSOVO

Beyond your dreams Beyond Your Dreams Beyond your dreams Destnaton Destnaton

MEDIA DISRUPTION SEEING BEYOND SEEING BEYOND SEEING BEYOND SEEING BEYOND LED BY THE BLIND

Looking Beyond the Knob Looking Beyond the Knob Looking Beyond the Knob Looking Beyond the Knob

GWAS on your notebook: Semi-parallel linear and logis9c

Sub: Investor Presentation under Regulation 30 of SEBI (Listing Obligations and Disclosure

Pharmaceutical chemistry terminology Pharmacy It is concerned with the collection,

The Blood-brain Barrier Dr. Eszter Farkas- Dr. Ferenc Domoki

Henkel Q3 2012 Kasper Rorsted Carsten Knobel London Nov 16, 2012 1 November 16, 2012 Q3

X-linked adrenoleukodystrophy in the Arab ethnic group: Presentation and management of a child

Rare Disease Summer 2014 Webinar August 13, 2014 1 Welcome Bryan Luce, PhD, MBA Chief Science

MEDICATION INTERACTIONS AND PRECAUTIONS August 2020 HBO Monthly Meeting Presented by CHI

CD103+ dendritic cells elicit CD8+ T cell responses to accelerate kidney injury in Adriamycin

Sambuz

Useful Links

Newsletter

Mail Us

EpiGraphDB Query for confounders http://epigraphdb.org/confounder/ (cf:Gwas)-[r1:MR]->

Efficient Outsourcing GWAS using FHE Wenjie Lu, Jun Sakuma * Dept. of CS, University of