analysis of multiple related phenotypes in genome wide
play

ANALYSIS OF MULTIPLE RELATED PHENOTYPES IN GENOME-WIDE ASSOCIATION - PowerPoint PPT Presentation

GIW 2016 ANALYSIS OF MULTIPLE RELATED PHENOTYPES IN GENOME-WIDE ASSOCIATION STUDIES Taesung Park 1 Sohee Oh 1 , Iksoo Huh 1 , and Seung-Yeoun Lee 2 1 Department of Statistics, Seoul National University, South Korea 2 Department of Applied


  1. GIW 2016 ANALYSIS OF MULTIPLE RELATED PHENOTYPES IN GENOME-WIDE ASSOCIATION STUDIES Taesung Park 1 Sohee Oh 1 , Iksoo Huh 1 , and Seung-Yeoun Lee 2 1 Department of Statistics, Seoul National University, South Korea 2 Department of Applied Statistics, Sejong Univeristy, South Korea 1

  2. Contents 1 Introduction 2 Multivariate analysis 3 Application: Korean Association REsource (KARE) Project 4 Simulation Study 5 Conclusion

  3. Genome Wide Association Studies (GWAS) — Studies of genetic variation across the entire genome — Single Nucleotide Polymorphism (SNP) — DNA sequence variations that occur when a single nucleotide is altered — Designed to identify associations between genetic markers & observable traits, or the presence/absence of a disease or condition — Rely on SNP chip technologies

  4. Genome Wide Association Studies (GWAS) — Successful in complex traits and diseases - height, body mass index, blood pressure - asthma, cancer, diabetes, heart disease and mental illnesses

  5. Association test — Univariate and single SNP analysis — Focus on one trait and single SNP 2 y Sex Age SNP , ~ N ( 0 , ) = β + β + β + β + ε ε σ i 0 1 i 2 i 3 i i one trait Trait 1 SNP SNP 1 1M SNP SNP 2 K … … SNP SNP I J

  6. Improving power — Common complex traits are related with many genes — Not easy to identify genetic variants with high significance at α =5 × 10 -8 — Further, these variants explain only small fraction of disease etiology — Need to develop a more powerful method for identifying genetic variants — Meta analysis by increasing sample size — Multiple SNP analysis: gene-gene interaction — Joint analysis with the correlated phenotypes

  7. Association test — Univariate + multiple SNP analysis — Focus on one trait and multiple SNPs one trait Trait 1 SNP 500K SNP 1 SNP SNP … … K 2 SNP SNP accumulated additive I J effects on multiple SNPs SNP-SNP Interaction

  8. Multivariate approach — Multivariate analysis — Focus on multiple related traits and single SNP Related traits Trait 1 Trait 2 Trait 3 Trait 4 Trait 5 SNP SNP 1 1M SNP SNP 2 K … … SNP SNP I J

  9. Multivariate approach — Examples: multiple related phenotypes — Obesity — BMI, Waist circumference, Weight, WHR, Body Fat — Hyperlipidemia — Total cholesterol, HDL/LDL cholesterol, Triglyceride — Metabolic Syndrome — Waist circumference, triglyceride, HDL cholesterol, blood pressure (SBP, DBP), Insulin resistance

  10. Multivariate approach — Existing Methods MultiPhen (O’Reilly et al., 2012) 1) — Proportional odds model Efficient algorithm for GWAS (Zhou and Stephens, 2014) 2) — Linear mixed model

  11. Contents 1 Introduction 2 Multivariate analysis 3 Application: Korean Association REsource (KARE) Project 4 Simulation Study 5 Conclusion

  12. Multivariate approach — Identify genetic variants associated with multiple related traits — Extension of the univariate linear model to the multivariate linear model with a response vector — Univariate variances are replaced by a covariance matrix — Joint analysis — Analyze several traits simultaneously — Account for correlation structure of multiple traits in the model — Allows different slopes(SNP effects) model for each trait — Different association direction => Hetrogeneous model — Common slope(SNP effect) model — Same association direction with similar effect sizes => Homogeneous model

  13. Multivariate general linear model (1) — Let y ij denote the value of trait j from subject i , for i =1,…, n , j =1,…, m — The linear model for the trait j p T y x x = ∑ β + ε = β + ε ij ik kj ij i ij j k = 1 is a vector of SNPs and covariates T x ( x ,..., x ) — = i i 1 ip is a vector of p unknown parameters — T ( ,..., ) β = β β j 1 j pj represents the effect of the k th SNP on the trait j — β kj — This models allows one SNP to have different effects on the traits

  14. The multivariate general linear model (2) is a vector of m responses from the i th subject ) T — y ( y ,..., y = i i 1 im is a vector of m residuals for the i th subject T ( ,..., ) — ε = ε ε i i 1 im ( ) — ~ N 0 , ε ∑ i m m — The vector nm × 1 ε ⎛ ⎞ 1 ⎜ ⎟ ! ( ) ~ N 0 , I ε = ⊗ Σ ⎜ ⎟ nm nm n ⎜ ⎟ ε ⎝ ⎠ n where I n denotes the n × n identity matrix and the operator is the ⊗ direct (Kronecker) product

  15. Multivariate general linear model (3) — Covariance (correlation) structure: matrix m m × — Specify how the traits within a subject are related Unstructured (UN) — 2 ! ⎛ ⎞ σ σ σ 1 12 1 m ⎜ ⎟ 2 ! ⎜ σ σ σ ⎟ 12 2 2 m ⎜ ⎟ " " # " ⎜ ⎟ ⎜ ⎟ 2 ! σ σ σ ⎝ ⎠ 1 m 2 m m Sturcutred covariane — Compound Symmetry (CS) First-order autoregressive (AR(1)) 2 2 2 2 2 2 m 1 2 ! ! − ⎛ ⎞ ⎛ ⎞ σ + σ σ σ σ ρσ ρ σ 1 1 1 ⎜ ⎟ ⎜ ⎟ 2 2 2 2 2 2 m 2 2 ! ! − σ σ + σ σ ⎜ ⎟ ρσ σ ρ σ ⎜ ⎟ 1 1 1 ⎜ ⎟ ⎜ ⎟ " " # " " " # " ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ 2 2 2 2 ! m 1 2 m 2 2 2 − − ! σ σ σ + σ ρ σ ρ σ σ ⎝ ⎠ ⎝ ⎠ 1 1 1

  16. The multivariate general linear model (4) — Matrix formulation T y " y y ⎛ ⎞ ⎛ ⎞ 11 1 m 1 ⎜ ⎟ ⎜ ⎟ Y # " " ! n m data matrix ⋅ = = ⎜ ⎟ × ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ T y " y y ⎝ ⎠ n 1 nm ⎝ n ⎠ Y XB E , = + T x " x x ⎛ ⎞ ⎛ ⎞ 11 1 p 1 ⎜ ⎟ ⎜ ⎟ X # " " ! n p known design matrix ⋅ = = × ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ T x " x x y ⎛ ⎞ ⎝ n 1 np ⎠ ⎝ n ⎠ 1 ⎜ ⎟ where E ( Y ) XB and Var ! I = = ⊗ Σ ⎜ ⎟ n " ⎛ β β ⎞ ⎜ ⎟ 11 1 m y ⎜ ⎟ ⎝ ⎠ n B # " " ( " ) p m parameter matrix ⋅ = = β β × ⎜ ⎟ 1 m ⎜ ⎟ " β β ⎝ ⎠ p 1 pm T " ⎛ ⎞ ε ε ε ⎛ ⎞ 11 1 m 1 ⎜ ⎟ ⎜ ⎟ E # " " ! n m matrix of random errors ⋅ = = ⎜ ⎟ × ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ T " ε ε ε ⎝ ⎠ ⎝ ⎠ n 1 nm n

  17. The multivariate general linear model (5) — Consider related-phenotypes simultaneously — Allow for correlation between phenotypes in the model — Detect genetic variants which have modest effects in univariate approach — Provide some chances to capture pleiotropic genes — Model — Hetro model with separate slopes (different genetic effects on each phenotype) — Homo model with common slope (same genetic effects on all phenotypes) — Unstructured variance-covariance structure — Test statistics — Wilk’s Λ statistic | E | k 1 ∏ Λ = = | H E | 1 1 + + λ i = i

  18. Contents 1 Introduction 2 Multivariate analysis 3 Application: Korean Association REsource (KARE) Project 4 Simulation Study 5 Conclusion

  19. Korea Association Resoure (KARE) Project • To identify genetic factors of quantitative clinical traits and life-style Objective related diseases (eg. T2DM) from Genome-Wide Association Study using population-based cohorts • Over 10,000 subjects from two community-based cohorts in Korea Genotyping (Ansung & Ansan cohorts) • Affymetrix 5.0 First high density large scale GWA Study performed in the East Asian population Courtesy of KNIH

  20. KARE KARE: Characteristics Baseline study Ansung Ansan Participants 5,018 5,020 2,778/ 2,497/ Sex (women/men) 2,240 2,523 Age (mean) 55.5 49.1 40th (%) 31.2 62.8 50th (%) 29.1 23.0 60> (%) 39.6 14.3 Courtesy of KNIH

  21. KARE data — Data Description — 8,842 subjects from two community-based cohorts in Korea (Ansung& Ansan cohorts) — Filtering Threshold — HWE < 10 -6 — MAF < 0.01 — Missing Proportion in each genotype > 0.05 — Missing imputation: HapMap JPT/CHB reference panel — SNPs: 327,872

  22. Obesity — Obesity related phenotypes — BMI, Waist circumference, Weight, and WHR — BMI = Weight/Height(m) 2 — WHR = Waist / Hip circumference — Which genes are associated with obesity related phenotypes? BMI Waist Weight WHR BMI 1 Waist 0.7607 1 Weight 0.7308 0.6862 1 WHR 0.3819 0.7971 0.2920 1

  23. Obesity: Univariate Analysis — Most GWAS are conducted under this framework — Focus on one phenotype and single SNP — Obesity related phenotypes — Separate univariate analyses Y Sex Age Area SNP BMI: = β + β + β + β + β + ε 1 01 11 21 31 41 1 Y Sex Age Area SNP Waist: = β + β + β + β + β + ε 2 02 12 22 32 42 2 Y Sex Age Area SNP Weight: = β + β + β + β + β + ε 3 03 13 23 33 43 3 Y Sex Age Area SNP WHR: = β + β + β + β + β + ε 4 04 14 24 34 44 4

  24. Obesity: Univariate Analysis Results — Number of significant genetic variants at a given level of α ≤ 10 -7 10 -7 < p ≤ 10 -6 10 -6 < p ≤ 10 -5 10 -5 < p ≤ 10 -4 P-value BMI 1 0 6 23 Waist 0 0 7 39 Weight 0 3 5 32 WHR 0 4 7 25

  25. BMI Waist Weight WHR

Recommend


More recommend