Learning Hierarchical Bayesian Networks for Genome-Wide Association Studies Raphaël Mourad 1 , Christine Sinoquet 2 and Philippe Leray 1 KOD team (KnOwledge and Decision), 1 LINA, UMR CNRS 6241, Ecole Polytechnique de l'Université de Nantes. 2 LINA, UMR CNRS 6241, Université de Nantes. FRANCE Presented by Raphael Mourad PhD student in Bioinformatics raphael.mourad@univ-nantes.fr
Mourad R. et al : Learning Hierarchical Bayesian Networks for GWAS Outline 1/ Introduction 2/ Fondamental concept of association genetics 3/ Presentation of genetic data 4/ Our approach 5/ Results and discussion 6/ Conclusion and outlooks COMPSTAT 2010 2
Mourad R. et al : Learning Hierarchical Bayesian Networks for GWAS Introduction COMPSTAT 2010 3
Mourad R. et al : Learning Hierarchical Bayesian Networks for GWAS ● Context: Complex genetic diseases = multifactorial genetic diseases caused by a combination of genetic factors ( eg genes) and environmental factors ( eg sex, age...). Examples: diabetes, asthma, hypertension, some cancers... COMPSTAT 2010 4
Mourad R. et al : Learning Hierarchical Bayesian Networks for GWAS ● Dissect the genetic basis of these diseases: Genome-wide association studies (GWAS) → identification of genetic markers associated with common, complex diseases. Chromosome Markers The human genome variability is covered by hundreds of thousands of markers. COMPSTAT 2010 5
Mourad R. et al : Learning Hierarchical Bayesian Networks for GWAS Fondamental concept of association genetics COMPSTAT 2010 6
Mourad R. et al : Learning Hierarchical Bayesian Networks for GWAS ● Linkage disequilibrium (LD): → dependences generally observed between close SNPs on the chromosome, → at the basis of GWAS. Chromosome Causal Marker Marker mutation LD LD LD between markers and their surrounding area on the chromosome. COMPSTAT 2010 7
Mourad R. et al : Learning Hierarchical Bayesian Networks for GWAS Presentation of genetic data COMPSTAT 2010 8
Mourad R. et al : Learning Hierarchical Bayesian Networks for GWAS Phenotype DNA > 100k SNP 1 binary variable: Ternary variables - 1000 non-affected individuals - 1000 affected individuals ● Characteristics: → large number of genetic variables (SNP): combinatorial explosion → strong dependences among genetic variables COMPSTAT 2010 9
Mourad R. et al : Learning Hierarchical Bayesian Networks for GWAS Our approach COMPSTAT 2010 10
Mourad R. et al : Learning Hierarchical Bayesian Networks for GWAS ● Reduce the data dimension by synthetizing the information of highly dependent SNPs, due to LD. Latent variables (LV) Cliques of highly dependent SNPs synthetizing the information of SNP cliques SNP SNP LV SNP SNP Data dimension SNP SNP reduction LV SNP COMPSTAT 2010 11
Mourad R. et al : Learning Hierarchical Bayesian Networks for GWAS ● Provide a flexible and adapted probabilistic model to reduce dimension for genetic data. Genome sequence Ch Characteris istics of data: dependences by blocs of SNPs Latent variables Proposed mod odelli ling Observed variables Forest of Hierarchical Latent Class (SNPs) models (FHLCMs) COMPSTAT 2010 12
Mourad R. et al : Learning Hierarchical Bayesian Networks for GWAS ● Advantages of this modelling: → hierarchical, thus : - various degrees of dimension reduction, - various degrees of LD strength, → each latent variable can reveal multiple-SNP patterns, potentially relevant to explain the disease, → contrary to Hierarchical Latent Class model, SNPs are not constrained to be dependent upon one another, → high-order interactions between SNPs can be taken into account. COMPSTAT 2010 13
Mourad R. et al : Learning Hierarchical Bayesian Networks for GWAS ● Proposed algorithm to learn both parameters and structure of FHLCMs from data: CFHLC (Construction of Forests of Hierarchical Latent Class models). → based on an agglomerative hierarchical procedure to ensure scalability, → uses clique partitioning methods for an efficient discovery of non-overlapping cliques of dependent SNPs, → not restricted to binary variables and binary trees, as Hwang et al .'s algorithm. COMPSTAT 2010 14
Mourad R. et al : Learning Hierarchical Bayesian Networks for GWAS Schema of the algorithm: COMPSTAT 2010 15
Mourad R. et al : Learning Hierarchical Bayesian Networks for GWAS Results and discussion COMPSTAT 2010 16
Mourad R. et al : Learning Hierarchical Bayesian Networks for GWAS ● Protocol testing: → C++ implementation, → run on a standard pc (3.8 GHz, 3.3 Go RAM), → tested on simulated unphased genotypic data consisting of 2000 individuals and 1k, 10k or 100k SNPs, generated with the software Hapsimu. COMPSTAT 2010 17
Mourad R. et al : Learning Hierarchical Bayesian Networks for GWAS Scalability COMPSTAT 2010 18
Mourad R. et al : Learning Hierarchical Bayesian Networks for GWAS Visual display of a FHLCM: 100 snp sequence High dependence regions Latent variables Low dependence regions Observed variables High-order (SNPs) dependences COMPSTAT 2010 19
Mourad R. et al : Learning Hierarchical Bayesian Networks for GWAS Conclusion and outlooks COMPSTAT 2010 20
Mourad R. et al : Learning Hierarchical Bayesian Networks for GWAS Conclusion: ● CFHLC algorithm have been shown to be efficient on genome-scaled data, ● Can provide a data dimension reduction of 80%. Perspectives: ● Application on the detection of genetic associations thanks to FHLCM's latent variables, ● Visualization of LD structure through the FHLCM's graph. COMPSTAT 2010 21
Mourad R. et al : Learning Hierarchical Bayesian Networks for GWAS Thanks for your attention COMPSTAT 2010 22
Mourad R. et al : Learning Hierarchical Bayesian Networks for GWAS COMPSTAT 2010 23
Mourad R. et al : Learning Hierarchical Bayesian Networks for GWAS Questions COMPSTAT 2010 24
Mourad R. et al : Learning Hierarchical Bayesian Networks for GWAS Impact of window size on running time COMPSTAT 2010 25
Mourad R. et al : Learning Hierarchical Bayesian Networks for GWAS Impact of window size on dimension reduction COMPSTAT 2010 26
Mourad R. et al : Learning Hierarchical Bayesian Networks for GWAS Bibliography General on GWASs: - Balding D. (2006): a tutorial on statistical methods for population association studies. Specific to probabilistic graphical models: - Verzilli (2007): Bayesian graphical models for genome-wide association studies. - Hwang (2006): learning hierarchical Bayesian networks for large-scale data analysis. COMPSTAT 2010 27
Recommend
More recommend