treelet covariance smoothers
play

Treelet Covariance Smoothers Estimation of Genetic Parameters - PowerPoint PPT Presentation

Treelet Covariance Smoothers Estimation of Genetic Parameters Benjamin Draves 1 1 Department of Mathematics Lafayette College Advisor: T. Gaugler Lafayette College, 2017 Benjamin Draves (Lafayette College) Treelet Covariance Smoothers


  1. Treelet Covariance Smoothers Estimation of Genetic Parameters Benjamin Draves 1 1 Department of Mathematics Lafayette College Advisor: T. Gaugler Lafayette College, 2017 Benjamin Draves (Lafayette College) Treelet Covariance Smoothers

  2. Overview Motivation in Statistical Genetics 1 Treelets 2 Treelet Covariance Smoothers 3 Simulation Studies 4 Health Aging and Body Composition Study 5 Conclusion 6 Benjamin Draves (Lafayette College) Treelet Covariance Smoothers

  3. Motivation in Statistical Genetics Molecular Biology Review Each person’s genetic composition coded on chromosomes Most humans have 46 in total, all occurring in pairs The 23rd pair determines sex We can compare the genetic data coded by the first 22 pairs for all humans Find patterns between this genetic data and realized traits & diseases Benjamin Draves (Lafayette College) Treelet Covariance Smoothers

  4. Motivation in Statistical Genetics Traditional Genetic Studies We wish to estimate the penetrance function , P ( Y | G ) Y is some phenotype of interest G codes the underlying genotype Kinda hard to do without G ... Linkage Analysis studies have had considerable success understanding G indirectly by analyzing Y through numerous generations Hard to do with human genetics Next Generation Sequencing (NGS) technology allows us to sample from G directly Benjamin Draves (Lafayette College) Treelet Covariance Smoothers

  5. Motivation in Statistical Genetics Single Nucleotide Polymorphisms (SNPs) So how do we encode this genetic information? Code the chromosome pairs Exploit the complimentary fashion of DNA Benjamin Draves (Lafayette College) Treelet Covariance Smoothers

  6. Motivation in Statistical Genetics SNPs (cont.) SNPs Recode Count Minor Alleles (A,T) (A,T) α α 2 (G,C) (A,T) β α 1 . . . . . . . . . . = ⇒ = ⇒ . . . . . (G,C) (A,T) β α 1 (G,C) (G,C) β β 0 Each row in this diagram represents a SNP The pair, either ( A , T ) or ( G , C ), is called a polymorphism or an allele An allele is called a minor allele if appears less frequently in the population Benjamin Draves (Lafayette College) Treelet Covariance Smoothers

  7. Motivation in Statistical Genetics Minor Allele Counts as Random Variables For each locus, k , we code can code individual i ’s minor allele count (MAC) by c ( i ) ∈ { 0 , 1 , 2 } k For m loci, we can describe the full genotype by Minor Allele Count (MAC) c ( i ) = { c ( i ) 1 , c ( i ) 2 , . . . , c ( i ) m } ∈ { 0 , 1 , 2 } m ∗ If we assume random recombination of alleles, c ( i ) ∼ Binom(2 , p k ) k Where p k is the minor allele frequency This is a pretty strong assumption, but using this framework allows for simple model construction Benjamin Draves (Lafayette College) Treelet Covariance Smoothers

  8. Motivation in Statistical Genetics Scaled Minor Allele Counts Under the assumption that alleles are independent, we can center our count vector − 2 p k ) / (2 p k (1 − p k )) 1 / 2 be the scaled minor allele Let z ( i ) := ( c ( i ) k k count at locus k Then for each SNP, k , we define the scaled minor allele count by Scaled Minor Allele Count (SMAC) k = ( z (1) k , z (2) k , . . . , z ( n ) z ∗ k ) t Where n is the number of individuals in the sample Then for a sample of m genetic markers, we organize this data as Z = ( z ∗ 1 , z ∗ 2 , . . . , z ∗ m ) ∈ R n × m Benjamin Draves (Lafayette College) Treelet Covariance Smoothers

  9. Motivation in Statistical Genetics Did everyone get that? z ∗ z ∗ z ∗ . . . m  1 2  z (1) z (1) z (1) z (1) . . . ∗ m 1 2   z (2) z (2) z (2) z (2)   . . . ∗  m  1 2 Z = .  . . .  ... . . . .   . . . . z ( n ) z ( n ) z ( n ) z ( n ) . . . Individual n ∗ m 1 2 SNP 2 Benjamin Draves (Lafayette College) Treelet Covariance Smoothers

  10. Motivation in Statistical Genetics Genetic Parameters of Interest Additive Genetic Relatedness ( A ) Denoted A ij for relatedness between individuals i and j Additive covariance between genetic markers I’ll refer to this as Relatedness Narrow Sense Heritability ( h 2 ) Incorporates a small contribution for the m genetic markers, independently Doesn’t try to understand the joint distribution of the alleles Traditional studies implicitly use this joint distribution to infer broad sense heritability I’ll refer to this as Heritability Benjamin Draves (Lafayette College) Treelet Covariance Smoothers

  11. Motivation in Statistical Genetics Estimating Relatedness We consider alleles Identical By Descent (IBD) Relatedness is the expected proportion of alleles IBD between individuals Under this interpretation of A , at SNP k , A ij = Cov ( z ( i ) k , z ( j ) k ) Using this information, we can estimate A by Method of Moments Estimate of A � m k ) t = ZZ t A = 1 � z ∗ k ( z ∗ m m k =1 As m increases, we expect ZZ t → A m Benjamin Draves (Lafayette College) Treelet Covariance Smoothers

  12. Motivation in Statistical Genetics Estimating Heritability Phenotype Model (1) Var ( y ) = ZZ t σ 2 u + I σ 2 y = X β β β + Zu + ǫ ǫ ǫ with ǫ β y vector of phenotypes, X β β fixed effects, u vector of random effects of the causal SNPs with Var ( u ) = I σ 2 ǫ ∼ N (0 , I σ 2 u , ǫ ǫ ǫ ) residual errors But remember, we want to understand the ratio of genetic variance to total variance Let u = ( u 1 , u 2 , . . . , u J ) t ∈ R J be the vector of effects corresponding to the J casual SNPs Let σ 2 g = J σ 2 u be the variance explained by all the SNPs � J z ( i ) We can then write the genetic effect of individual i as g i = j u j j =1 where Var ( g ) = A σ 2 g Benjamin Draves (Lafayette College) Treelet Covariance Smoothers

  13. Motivation in Statistical Genetics Estimating Heritability (cont.) Phenotype Model (2) � Var ( y ) = A σ 2 g + I σ 2 y = X β β β + g + ǫ ǫ ǫ with ǫ We can partition the variability of phenotypic expression into genetic ( σ 2 g ) and environmental ( σ 2 ǫ ) factors From here we define narrow sense heritability as Narrow Sense Heritability σ 2 h 2 = g σ 2 g + σ 2 ǫ We can estimate this value via restricted maximum likelihood (REML) algorithms Benjamin Draves (Lafayette College) Treelet Covariance Smoothers

  14. Motivation in Statistical Genetics Possible Problems Assume we have three random individuals, who happen to be named Ben, Josh, and Trent Trent and Ben, coming from small Midwest towns, are 7th degree relatives Josh, from the west coast, is unrelated to Trent and Ben Ben : 0 2 1 · · · · · · · · · 2 Trent : 1 2 0 · · · · · · · · · 0 · · · · · · · · · Josh : 1 2 1 1 � 130 , � 1 1 A (Ben, Trent) = A (Ben, Josh) = 130 How do we differentiate between distantly and unrelated individuals? Benjamin Draves (Lafayette College) Treelet Covariance Smoothers

Recommend


More recommend