GIW 2016 ANALYSIS OF MULTIPLE RELATED PHENOTYPES IN GENOME-WIDE ASSOCIATION STUDIES Taesung Park 1 Sohee Oh 1 , Iksoo Huh 1 , and Seung-Yeoun Lee 2 1 Department of Statistics, Seoul National University, South Korea 2 Department of Applied Statistics, Sejong Univeristy, South Korea 1
Contents 1 Introduction 2 Multivariate analysis 3 Application: Korean Association REsource (KARE) Project 4 Simulation Study 5 Conclusion
Genome Wide Association Studies (GWAS) Studies of genetic variation across the entire genome Single Nucleotide Polymorphism (SNP) DNA sequence variations that occur when a single nucleotide is altered Designed to identify associations between genetic markers & observable traits, or the presence/absence of a disease or condition Rely on SNP chip technologies
Genome Wide Association Studies (GWAS) Successful in complex traits and diseases - height, body mass index, blood pressure - asthma, cancer, diabetes, heart disease and mental illnesses
Association test Univariate and single SNP analysis Focus on one trait and single SNP 2 y Sex Age SNP , ~ N ( 0 , ) = β + β + β + β + ε ε σ i 0 1 i 2 i 3 i i one trait Trait 1 SNP SNP 1 1M SNP SNP 2 K … … SNP SNP I J
Improving power Common complex traits are related with many genes Not easy to identify genetic variants with high significance at α =5 × 10 -8 Further, these variants explain only small fraction of disease etiology Need to develop a more powerful method for identifying genetic variants Meta analysis by increasing sample size Multiple SNP analysis: gene-gene interaction Joint analysis with the correlated phenotypes
Association test Univariate + multiple SNP analysis Focus on one trait and multiple SNPs one trait Trait 1 SNP 500K SNP 1 SNP SNP … … K 2 SNP SNP accumulated additive I J effects on multiple SNPs SNP-SNP Interaction
Multivariate approach Multivariate analysis Focus on multiple related traits and single SNP Related traits Trait 1 Trait 2 Trait 3 Trait 4 Trait 5 SNP SNP 1 1M SNP SNP 2 K … … SNP SNP I J
Multivariate approach Examples: multiple related phenotypes Obesity BMI, Waist circumference, Weight, WHR, Body Fat Hyperlipidemia Total cholesterol, HDL/LDL cholesterol, Triglyceride Metabolic Syndrome Waist circumference, triglyceride, HDL cholesterol, blood pressure (SBP, DBP), Insulin resistance
Multivariate approach Existing Methods MultiPhen (O’Reilly et al., 2012) 1) Proportional odds model Efficient algorithm for GWAS (Zhou and Stephens, 2014) 2) Linear mixed model
Contents 1 Introduction 2 Multivariate analysis 3 Application: Korean Association REsource (KARE) Project 4 Simulation Study 5 Conclusion
Multivariate approach Identify genetic variants associated with multiple related traits Extension of the univariate linear model to the multivariate linear model with a response vector Univariate variances are replaced by a covariance matrix Joint analysis Analyze several traits simultaneously Account for correlation structure of multiple traits in the model Allows different slopes(SNP effects) model for each trait Different association direction => Hetrogeneous model Common slope(SNP effect) model Same association direction with similar effect sizes => Homogeneous model
Multivariate general linear model (1) Let y ij denote the value of trait j from subject i , for i =1,…, n , j =1,…, m The linear model for the trait j p T y x x = ∑ β + ε = β + ε ij ik kj ij i ij j k = 1 is a vector of SNPs and covariates T x ( x ,..., x ) = i i 1 ip is a vector of p unknown parameters T ( ,..., ) β = β β j 1 j pj represents the effect of the k th SNP on the trait j β kj This models allows one SNP to have different effects on the traits
The multivariate general linear model (2) is a vector of m responses from the i th subject ) T y ( y ,..., y = i i 1 im is a vector of m residuals for the i th subject T ( ,..., ) ε = ε ε i i 1 im ( ) ~ N 0 , ε ∑ i m m The vector nm × 1 ε ⎛ ⎞ 1 ⎜ ⎟ ! ( ) ~ N 0 , I ε = ⊗ Σ ⎜ ⎟ nm nm n ⎜ ⎟ ε ⎝ ⎠ n where I n denotes the n × n identity matrix and the operator is the ⊗ direct (Kronecker) product
Multivariate general linear model (3) Covariance (correlation) structure: matrix m m × Specify how the traits within a subject are related Unstructured (UN) 2 ! ⎛ ⎞ σ σ σ 1 12 1 m ⎜ ⎟ 2 ! ⎜ σ σ σ ⎟ 12 2 2 m ⎜ ⎟ " " # " ⎜ ⎟ ⎜ ⎟ 2 ! σ σ σ ⎝ ⎠ 1 m 2 m m Sturcutred covariane Compound Symmetry (CS) First-order autoregressive (AR(1)) 2 2 2 2 2 2 m 1 2 ! ! − ⎛ ⎞ ⎛ ⎞ σ + σ σ σ σ ρσ ρ σ 1 1 1 ⎜ ⎟ ⎜ ⎟ 2 2 2 2 2 2 m 2 2 ! ! − σ σ + σ σ ⎜ ⎟ ρσ σ ρ σ ⎜ ⎟ 1 1 1 ⎜ ⎟ ⎜ ⎟ " " # " " " # " ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ 2 2 2 2 ! m 1 2 m 2 2 2 − − ! σ σ σ + σ ρ σ ρ σ σ ⎝ ⎠ ⎝ ⎠ 1 1 1
The multivariate general linear model (4) Matrix formulation T y " y y ⎛ ⎞ ⎛ ⎞ 11 1 m 1 ⎜ ⎟ ⎜ ⎟ Y # " " ! n m data matrix ⋅ = = ⎜ ⎟ × ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ T y " y y ⎝ ⎠ n 1 nm ⎝ n ⎠ Y XB E , = + T x " x x ⎛ ⎞ ⎛ ⎞ 11 1 p 1 ⎜ ⎟ ⎜ ⎟ X # " " ! n p known design matrix ⋅ = = × ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ T x " x x y ⎛ ⎞ ⎝ n 1 np ⎠ ⎝ n ⎠ 1 ⎜ ⎟ where E ( Y ) XB and Var ! I = = ⊗ Σ ⎜ ⎟ n " ⎛ β β ⎞ ⎜ ⎟ 11 1 m y ⎜ ⎟ ⎝ ⎠ n B # " " ( " ) p m parameter matrix ⋅ = = β β × ⎜ ⎟ 1 m ⎜ ⎟ " β β ⎝ ⎠ p 1 pm T " ⎛ ⎞ ε ε ε ⎛ ⎞ 11 1 m 1 ⎜ ⎟ ⎜ ⎟ E # " " ! n m matrix of random errors ⋅ = = ⎜ ⎟ × ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ T " ε ε ε ⎝ ⎠ ⎝ ⎠ n 1 nm n
The multivariate general linear model (5) Consider related-phenotypes simultaneously Allow for correlation between phenotypes in the model Detect genetic variants which have modest effects in univariate approach Provide some chances to capture pleiotropic genes Model Hetro model with separate slopes (different genetic effects on each phenotype) Homo model with common slope (same genetic effects on all phenotypes) Unstructured variance-covariance structure Test statistics Wilk’s Λ statistic | E | k 1 ∏ Λ = = | H E | 1 1 + + λ i = i
Contents 1 Introduction 2 Multivariate analysis 3 Application: Korean Association REsource (KARE) Project 4 Simulation Study 5 Conclusion
Korea Association Resoure (KARE) Project • To identify genetic factors of quantitative clinical traits and life-style Objective related diseases (eg. T2DM) from Genome-Wide Association Study using population-based cohorts • Over 10,000 subjects from two community-based cohorts in Korea Genotyping (Ansung & Ansan cohorts) • Affymetrix 5.0 First high density large scale GWA Study performed in the East Asian population Courtesy of KNIH
KARE KARE: Characteristics Baseline study Ansung Ansan Participants 5,018 5,020 2,778/ 2,497/ Sex (women/men) 2,240 2,523 Age (mean) 55.5 49.1 40th (%) 31.2 62.8 50th (%) 29.1 23.0 60> (%) 39.6 14.3 Courtesy of KNIH
KARE data Data Description 8,842 subjects from two community-based cohorts in Korea (Ansung& Ansan cohorts) Filtering Threshold HWE < 10 -6 MAF < 0.01 Missing Proportion in each genotype > 0.05 Missing imputation: HapMap JPT/CHB reference panel SNPs: 327,872
Obesity Obesity related phenotypes BMI, Waist circumference, Weight, and WHR BMI = Weight/Height(m) 2 WHR = Waist / Hip circumference Which genes are associated with obesity related phenotypes? BMI Waist Weight WHR BMI 1 Waist 0.7607 1 Weight 0.7308 0.6862 1 WHR 0.3819 0.7971 0.2920 1
Obesity: Univariate Analysis Most GWAS are conducted under this framework Focus on one phenotype and single SNP Obesity related phenotypes Separate univariate analyses Y Sex Age Area SNP BMI: = β + β + β + β + β + ε 1 01 11 21 31 41 1 Y Sex Age Area SNP Waist: = β + β + β + β + β + ε 2 02 12 22 32 42 2 Y Sex Age Area SNP Weight: = β + β + β + β + β + ε 3 03 13 23 33 43 3 Y Sex Age Area SNP WHR: = β + β + β + β + β + ε 4 04 14 24 34 44 4
Obesity: Univariate Analysis Results Number of significant genetic variants at a given level of α ≤ 10 -7 10 -7 < p ≤ 10 -6 10 -6 < p ≤ 10 -5 10 -5 < p ≤ 10 -4 P-value BMI 1 0 6 23 Waist 0 0 7 39 Weight 0 3 5 32 WHR 0 4 7 25
BMI Waist Weight WHR
Recommend
More recommend