ANALYSIS OF MULTIPLE RELATED PHENOTYPES IN GENOME-WIDE ASSOCIATION - PowerPoint PPT Presentation

GIW 2016 ANALYSIS OF MULTIPLE RELATED PHENOTYPES IN GENOME-WIDE ASSOCIATION STUDIES Taesung Park 1 Sohee Oh 1 , Iksoo Huh 1 , and Seung-Yeoun Lee 2 1 Department of Statistics, Seoul National University, South Korea 2 Department of Applied Statistics, Sejong Univeristy, South Korea 1

Contents 1 Introduction 2 Multivariate analysis 3 Application: Korean Association REsource (KARE) Project 4 Simulation Study 5 Conclusion

Genome Wide Association Studies (GWAS) Studies of genetic variation across the entire genome Single Nucleotide Polymorphism (SNP) DNA sequence variations that occur when a single nucleotide is altered Designed to identify associations between genetic markers & observable traits, or the presence/absence of a disease or condition Rely on SNP chip technologies

Genome Wide Association Studies (GWAS) Successful in complex traits and diseases - height, body mass index, blood pressure - asthma, cancer, diabetes, heart disease and mental illnesses

Association test Univariate and single SNP analysis Focus on one trait and single SNP 2 y Sex Age SNP , ~ N ( 0 , ) = β + β + β + β + ε ε σ i 0 1 i 2 i 3 i i one trait Trait 1 SNP SNP 1 1M SNP SNP 2 K … … SNP SNP I J

Improving power Common complex traits are related with many genes Not easy to identify genetic variants with high significance at α =5 × 10 -8 Further, these variants explain only small fraction of disease etiology Need to develop a more powerful method for identifying genetic variants Meta analysis by increasing sample size Multiple SNP analysis: gene-gene interaction Joint analysis with the correlated phenotypes

Association test Univariate + multiple SNP analysis Focus on one trait and multiple SNPs one trait Trait 1 SNP 500K SNP 1 SNP SNP … … K 2 SNP SNP accumulated additive I J effects on multiple SNPs SNP-SNP Interaction

Multivariate approach Multivariate analysis Focus on multiple related traits and single SNP Related traits Trait 1 Trait 2 Trait 3 Trait 4 Trait 5 SNP SNP 1 1M SNP SNP 2 K … … SNP SNP I J

Multivariate approach Examples: multiple related phenotypes Obesity BMI, Waist circumference, Weight, WHR, Body Fat Hyperlipidemia Total cholesterol, HDL/LDL cholesterol, Triglyceride Metabolic Syndrome Waist circumference, triglyceride, HDL cholesterol, blood pressure (SBP, DBP), Insulin resistance

Multivariate approach Existing Methods MultiPhen (O’Reilly et al., 2012) 1) Proportional odds model Efficient algorithm for GWAS (Zhou and Stephens, 2014) 2) Linear mixed model

Multivariate approach Identify genetic variants associated with multiple related traits Extension of the univariate linear model to the multivariate linear model with a response vector Univariate variances are replaced by a covariance matrix Joint analysis Analyze several traits simultaneously Account for correlation structure of multiple traits in the model Allows different slopes(SNP effects) model for each trait Different association direction => Hetrogeneous model Common slope(SNP effect) model Same association direction with similar effect sizes => Homogeneous model

Multivariate general linear model (1) Let y ij denote the value of trait j from subject i , for i =1,…, n , j =1,…, m The linear model for the trait j p T y x x = ∑ β + ε = β + ε ij ik kj ij i ij j k = 1 is a vector of SNPs and covariates T x ( x ,..., x ) = i i 1 ip is a vector of p unknown parameters T ( ,..., ) β = β β j 1 j pj represents the effect of the k th SNP on the trait j β kj This models allows one SNP to have different effects on the traits

The multivariate general linear model (2) is a vector of m responses from the i th subject ) T y ( y ,..., y = i i 1 im is a vector of m residuals for the i th subject T ( ,..., ) ε = ε ε i i 1 im ( ) ~ N 0 , ε ∑ i m m The vector nm × 1 ε ⎛ ⎞ 1 ⎜ ⎟ ! ( ) ~ N 0 , I ε = ⊗ Σ ⎜ ⎟ nm nm n ⎜ ⎟ ε ⎝ ⎠ n where I n denotes the n × n identity matrix and the operator is the ⊗ direct (Kronecker) product

Multivariate general linear model (3) Covariance (correlation) structure: matrix m m × Specify how the traits within a subject are related Unstructured (UN) 2 ! ⎛ ⎞ σ σ σ 1 12 1 m ⎜ ⎟ 2 ! ⎜ σ σ σ ⎟ 12 2 2 m ⎜ ⎟ " " # " ⎜ ⎟ ⎜ ⎟ 2 ! σ σ σ ⎝ ⎠ 1 m 2 m m Sturcutred covariane Compound Symmetry (CS) First-order autoregressive (AR(1)) 2 2 2 2 2 2 m 1 2 ! ! − ⎛ ⎞ ⎛ ⎞ σ + σ σ σ σ ρσ ρ σ 1 1 1 ⎜ ⎟ ⎜ ⎟ 2 2 2 2 2 2 m 2 2 ! ! − σ σ + σ σ ⎜ ⎟ ρσ σ ρ σ ⎜ ⎟ 1 1 1 ⎜ ⎟ ⎜ ⎟ " " # " " " # " ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ 2 2 2 2 ! m 1 2 m 2 2 2 − − ! σ σ σ + σ ρ σ ρ σ σ ⎝ ⎠ ⎝ ⎠ 1 1 1

The multivariate general linear model (4) Matrix formulation T y " y y ⎛ ⎞ ⎛ ⎞ 11 1 m 1 ⎜ ⎟ ⎜ ⎟ Y # " " ! n m data matrix ⋅ = = ⎜ ⎟ × ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ T y " y y ⎝ ⎠ n 1 nm ⎝ n ⎠ Y XB E , = + T x " x x ⎛ ⎞ ⎛ ⎞ 11 1 p 1 ⎜ ⎟ ⎜ ⎟ X # " " ! n p known design matrix ⋅ = = × ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ T x " x x y ⎛ ⎞ ⎝ n 1 np ⎠ ⎝ n ⎠ 1 ⎜ ⎟ where E ( Y ) XB and Var ! I = = ⊗ Σ ⎜ ⎟ n " ⎛ β β ⎞ ⎜ ⎟ 11 1 m y ⎜ ⎟ ⎝ ⎠ n B # " " ( " ) p m parameter matrix ⋅ = = β β × ⎜ ⎟ 1 m ⎜ ⎟ " β β ⎝ ⎠ p 1 pm T " ⎛ ⎞ ε ε ε ⎛ ⎞ 11 1 m 1 ⎜ ⎟ ⎜ ⎟ E # " " ! n m matrix of random errors ⋅ = = ⎜ ⎟ × ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ T " ε ε ε ⎝ ⎠ ⎝ ⎠ n 1 nm n

The multivariate general linear model (5) Consider related-phenotypes simultaneously Allow for correlation between phenotypes in the model Detect genetic variants which have modest effects in univariate approach Provide some chances to capture pleiotropic genes Model Hetro model with separate slopes (different genetic effects on each phenotype) Homo model with common slope (same genetic effects on all phenotypes) Unstructured variance-covariance structure Test statistics Wilk’s Λ statistic | E | k 1 ∏ Λ = = | H E | 1 1 + + λ i = i

Korea Association Resoure (KARE) Project • To identify genetic factors of quantitative clinical traits and life-style Objective related diseases (eg. T2DM) from Genome-Wide Association Study using population-based cohorts • Over 10,000 subjects from two community-based cohorts in Korea Genotyping (Ansung & Ansan cohorts) • Affymetrix 5.0 First high density large scale GWA Study performed in the East Asian population Courtesy of KNIH

KARE KARE: Characteristics Baseline study Ansung Ansan Participants 5,018 5,020 2,778/ 2,497/ Sex (women/men) 2,240 2,523 Age (mean) 55.5 49.1 40th (%) 31.2 62.8 50th (%) 29.1 23.0 60> (%) 39.6 14.3 Courtesy of KNIH

KARE data Data Description 8,842 subjects from two community-based cohorts in Korea (Ansung& Ansan cohorts) Filtering Threshold HWE < 10 -6 MAF < 0.01 Missing Proportion in each genotype > 0.05 Missing imputation: HapMap JPT/CHB reference panel SNPs: 327,872

Obesity Obesity related phenotypes BMI, Waist circumference, Weight, and WHR BMI = Weight/Height(m) 2 WHR = Waist / Hip circumference Which genes are associated with obesity related phenotypes? BMI Waist Weight WHR BMI 1 Waist 0.7607 1 Weight 0.7308 0.6862 1 WHR 0.3819 0.7971 0.2920 1

Obesity: Univariate Analysis Most GWAS are conducted under this framework Focus on one phenotype and single SNP Obesity related phenotypes Separate univariate analyses Y Sex Age Area SNP BMI: = β + β + β + β + β + ε 1 01 11 21 31 41 1 Y Sex Age Area SNP Waist: = β + β + β + β + β + ε 2 02 12 22 32 42 2 Y Sex Age Area SNP Weight: = β + β + β + β + β + ε 3 03 13 23 33 43 3 Y Sex Age Area SNP WHR: = β + β + β + β + β + ε 4 04 14 24 34 44 4

Obesity: Univariate Analysis Results Number of significant genetic variants at a given level of α ≤ 10 -7 10 -7 < p ≤ 10 -6 10 -6 < p ≤ 10 -5 10 -5 < p ≤ 10 -4 P-value BMI 1 0 6 23 Waist 0 0 7 39 Weight 0 3 5 32 WHR 0 4 7 25

BMI Waist Weight WHR

ANALYSIS OF MULTIPLE RELATED PHENOTYPES IN GENOME-WIDE ASSOCIATION - PowerPoint PPT Presentation

GIW 2016 ANALYSIS OF MULTIPLE RELATED PHENOTYPES IN GENOME-WIDE ASSOCIATION STUDIES Taesung Park 1 Sohee Oh 1 , Iksoo Huh 1 , and Seung-Yeoun Lee 2 1 Department of Statistics, Seoul National University, South Korea 2 Department of Applied

Whole Genome Analysis and Annotation Adam Siepel Biological Statistics & Computational

Genome Sequencing & Analysis Core Resource Olivier Fedrigo Friday, October 19, 12 Reference

PCOS PHENOTYPES LEARNING OBJECTIVES PCOS Phenotypes At the conclusion of this presentation,

Genome Reassembly From Fragments 7 January 2019 OSU CSE 1 Genome A genome is the encoding

Genome Wide Haplotype analyses Genome Wide Haplotype analyses of human complex diseases with the

Genome Annotation The steps in genome sequencing Generate genome sequence Assembly ORF

Visualizing ENCODE Data in the UCSC Genome Browser Pauline Fujita, Ph.D. UCSC Genome Bioinformatics

The Mouse Genome The Mouse Genome Database (MGD) Database (MGD) Eppig J.T., et al. (2005). The

Self Study: Yeast Genome Comparison SESSION 4 MARTIN KRZYWINSKI Genome Sciences Centre BC

Genome 562 February 2015 Week 6 Genome 562 p.1/13 Julian Huxley (1887-1975) Oxford

Genome 562 January 2015 Week 1 Genome 562 p.1/6 Early workers in theoretical population

Genome assembly Mark Stenglein, Todos Santos 2018 Genome assembly is the process of attempting to

Family-based analysis of genome-wide gene gene interactions Marit Ackermann Biotec TU Dresden

Current Topics in Genome Analysis Fall 2006 Week 4: Mining Genomic Sequence Data Tyra G.

2 Evolution of Phenotypes 1 GENOTYPES AND PHENOTYPES Evolutionary optimization in asexually

Collection and Use of New Phenotypes in Germany The future for phenotyping strategies how

Overview L8.1 Introduction to Small Angle Neutron Scattering L8.2 SANS Instrumentation

Carbon Macromolecules Ms. Poynter Biology Chemistry of Carbon There are 2 reasons that

Latvian Diabetes Register Eva Ramuse, Public health analyst of the Register Supervision Unit

Single-Molecule Spectroscopy, Imaging, and Photocontrol: Foundations for Super-Resolution

8/9/2016 Disclosures Biosense-Webster International: investigator- NOVEL VT THERAPIES:

Aberration and phase corrections for High Intensity Focused Ultrasound (HIFU) Odile Marcotte, CRM

Some Aspects in the Numerics of Nonlinear Acoustics: Time Integration and Open Domain Problems

Put Dutch GPU research on the (road)map! A Reconnaissance Project by: Whats in a name?

Sambuz

Useful Links

Newsletter

Mail Us

ANALYSIS OF MULTIPLE RELATED PHENOTYPES IN GENOME-WIDE ASSOCIATION - PowerPoint PPT Presentation

GIW 2016 ANALYSIS OF MULTIPLE RELATED PHENOTYPES IN GENOME-WIDE ASSOCIATION STUDIES Taesung Park 1 Sohee Oh 1 , Iksoo Huh 1 , and Seung-Yeoun Lee 2 1 Department of Statistics, Seoul National University, South Korea 2 Department of Applied

Whole Genome Analysis and Annotation Adam Siepel Biological Statistics &amp; Computational

Genome Sequencing &amp; Analysis Core Resource Olivier Fedrigo Friday, October 19, 12 Reference

PCOS PHENOTYPES LEARNING OBJECTIVES PCOS Phenotypes At the conclusion of this presentation,

Genome Reassembly From Fragments 7 January 2019 OSU CSE 1 Genome A genome is the encoding

Genome Wide Haplotype analyses Genome Wide Haplotype analyses of human complex diseases with the

Genome Annotation The steps in genome sequencing Generate genome sequence Assembly ORF

Visualizing ENCODE Data in the UCSC Genome Browser Pauline Fujita, Ph.D. UCSC Genome Bioinformatics

The Mouse Genome The Mouse Genome Database (MGD) Database (MGD) Eppig J.T., et al. (2005). The

Self Study: Yeast Genome Comparison SESSION 4 MARTIN KRZYWINSKI Genome Sciences Centre BC

Genome 562 February 2015 Week 6 Genome 562 p.1/13 Julian Huxley (1887-1975) Oxford

Genome 562 January 2015 Week 1 Genome 562 p.1/6 Early workers in theoretical population

Genome assembly Mark Stenglein, Todos Santos 2018 Genome assembly is the process of attempting to

Family-based analysis of genome-wide gene gene interactions Marit Ackermann Biotec TU Dresden

Current Topics in Genome Analysis Fall 2006 Week 4: Mining Genomic Sequence Data Tyra G.

2 Evolution of Phenotypes 1 GENOTYPES AND PHENOTYPES Evolutionary optimization in asexually

Collection and Use of New Phenotypes in Germany The future for phenotyping strategies how

Overview L8.1 Introduction to Small Angle Neutron Scattering L8.2 SANS Instrumentation

Carbon Macromolecules Ms. Poynter Biology Chemistry of Carbon There are 2 reasons that

Latvian Diabetes Register Eva Ramuse, Public health analyst of the Register Supervision Unit

Single-Molecule Spectroscopy, Imaging, and Photocontrol: Foundations for Super-Resolution

8/9/2016 Disclosures Biosense-Webster International: investigator- NOVEL VT THERAPIES:

Aberration and phase corrections for High Intensity Focused Ultrasound (HIFU) Odile Marcotte, CRM

Some Aspects in the Numerics of Nonlinear Acoustics: Time Integration and Open Domain Problems

Put Dutch GPU research on the (road)map! A Reconnaissance Project by: Whats in a name?

Sambuz

Useful Links

Newsletter

Mail Us

Whole Genome Analysis and Annotation Adam Siepel Biological Statistics & Computational

Genome Sequencing & Analysis Core Resource Olivier Fedrigo Friday, October 19, 12 Reference