genome wide haplotype analyses genome wide haplotype
play

Genome Wide Haplotype analyses Genome Wide Haplotype analyses of - PowerPoint PPT Presentation

Enabling Grids for E sciencE Enabling Grids for E-sciencE Genome Wide Haplotype analyses Genome Wide Haplotype analyses of human complex diseases with the EGEE grid ith th EGEE id Tregouet David david.tregouet@upmc.fr INSERM UMRS937


  1. Enabling Grids for E sciencE Enabling Grids for E-sciencE Genome Wide Haplotype analyses Genome Wide Haplotype analyses of human complex diseases with the EGEE grid ith th EGEE id Tregouet David – david.tregouet@upmc.fr INSERM UMRS937 – UPMC – Paris - France www eu egee org www.eu-egee.org EGEE and gLite are registered trademarks EGEE-III INFSO-RI-222667

  2. Genome Wide Association Studies (GWAS) Enabling Grids for E-sciencE • Principle Testing the association between a large number (~500K) of Testing the association between a large number ( 500K) of single nucleotide polymorphisms (SNPs) and a variable of interest (e.g: a disease) in a large cohort of individuals • How ? Estimate the SNP allele frequencies in cases and controls Estimate the SNP allele frequencies in cases and controls and calculate the corresponding statistical test yielding a pvalue • SNP definition Genetic variation in a DNA sequence that occurs when a Genetic variation in a DNA sequence that occurs when a single nucleotide (~ base: A,C,G,T ) in a genome is altered. Often considered as a binary 0/1 variable y EGEE-III INFSO-RI-222667 To change: View -> Header and Footer 2

  3. GWAS' main limits Enabling Grids for E-sciencE • Only single SNP associations are tested • May miss 'haplotypic' interaction between SNPs located in the same gene (or region) g ( g ) – Haplotype: Combination of alleles on a given chromosome – For example , with 2 SNPs (C/T & G/A) → 4 haplotypes C G One may want to test for difference in haplotype A C frequencies between cases and controls G T It may happen that only one haplotype is at risk A T EGEE-III INFSO-RI-222667 To change: View -> Header and Footer 3

  4. Genome Wide Haplotype Analysis (GWHAS) Enabling Grids for E-sciencE • Is it possible ? 2 SNPs : up to 4 haplotypes (i e 00|01|10|11) 2 SNPs : up to 4 haplotypes (i.e 00|01|10|11) 3 SNPs : up to 8 haplotypes (i.e 000|001|010|011|100|101|110|111) In a window (eg a gene or a region) of n SNPs up to 2 n haplotypes In a window (eg a gene or a region) of n SNPs, up to 2 haplotypes • Yes...but a large number of tests / comparisons have to be carried out a large number of tests / comparisons have to be carried out to identify which combination of SNPs is the best predictor for the disease ? EGEE-III INFSO-RI-222667 To change: View -> Header and Footer 4

  5. Genome Wide Haplotype Analysis (GWHAS) Enabling Grids for E-sciencE • Is it possible ? 2 SNPs : up to 4 haplotypes (i e 00|01|10|11) 2 SNPs : up to 4 haplotypes (i.e 00|01|10|11) 3 SNPs : up to 8 haplotypes (i.e 000|001|010|011|100|101|110|111) In a window (eg a gene or a region) of n SNPs up to 2 n haplotypes In a window (eg a gene or a region) of n SNPs, up to 2 haplotypes Example: In a window of 10 adjacent SNPs, restricting the p j , g haplotypes of length 4 lead to 375 combinations to be tested: [SNP1 + SNP2] [SNP1 + SNP2 + SNP3] [ ] [SNP1 + SNP3] [SNP1 + SNP2 + SNP4] [SNP1 + SNP2 + SNP3 +SNP4] .......................... ...................................... ...................................... [SNP1 + SNP10] [SNP1 + SNP9 + SNP10] [SNP1 + SNP6 + SNP7 +SNP10] [SNP2 + SNP3] [SNP2 SNP3] [SNP2 + SNP3 + SNP4] [SNP2 + SNP3 + SNP4] ....................................... ........................... ........................................ [SNP7 + SNP8 + SNP9 + SNP10] [SNP2 + SNP10] [SNP3 + SNP6 +SNP8] ........................... ....................................... [SNP9 + SNP10] [SNP9 + SNP10] [SNP8 + SNP9 + SNP10] [SNP8 + SNP9 + SNP10] EGEE-III INFSO-RI-222667 To change: View -> Header and Footer 5

  6. Genome Wide Haplotype Analysis (GWHAS) Enabling Grids for E-sciencE • GWHAS are possible but are extremely computationnally demanding !!!! g • Distribution of the haplotypic calculations on EGEE p yp –Development of an easygLite interface –Python & Perl script for results ' visualization Python & Perl script for results visualization EGEE-III INFSO-RI-222667 To change: View -> Header and Footer 6

  7. GWHAS on Coronary Artery Disease (CAD) Enabling Grids for E-sciencE • WTCCC data: 1926 CAD patients & 2938 healthy controls • 378,000 SNPs • Sliding windows approach on each chromosome Slidi i d h h h Windows of size 10 Haplotype composed of up to 4 SNPs ap otype co posed o up to S s 1 to 10 2 to 11 3 to 12 (n-10) to n ..... • Search for regions where haplotypes are stronger predictors of CAD risk than SNP alone di f CAD i k h SNP l EGEE-III INFSO-RI-222667 To change: View -> Header and Footer 7

  8. GWHAS on Coronary Artery Disease Enabling Grids for E-sciencE • 8.1 millions of combinations tested in less than 45 days (instead of more than 10 years on a single Pentium 4) ( y g ) • 29 regions where haplotypes could be better predictors than SNPs alone were identified • To control for false positives , replication was investigated in about 7000 CAD patients and 7000 controls controls • One region on chromosome 6 was confirmed • One region on chromosome 6 was confirmed EGEE-III INFSO-RI-222667 To change: View -> Header and Footer 8

  9. Nature Genetics doi:10.1038/ng.314 Enabling Grids for E-sciencE EGEE-III INFSO-RI-222667 To change: View -> Header and Footer 9

  10. Conclusions Enabling Grids for E-sciencE • Genome Wide Haplotype Association Studies are now a reality thanks to the use of Grid technology y gy • Using EGEE, we were able to identify a cluster of 3 g , y genes where haplotypes are strongly associated with CAD risk (Tregouet et al. Nature Genetics March 2009 ) • Possibility to apply such tool to other human diseases (Diabetes, Cancer....) • Possibility to use EGEE to investigate interactions between SNPs that are not necesseraly in the same gene/region gene/region EGEE-III INFSO-RI-222667 To change: View -> Header and Footer 10

  11. Credits Enabling Grids for E-sciencE François Cambien Alexandru Munteanu Alexandru Munteanu Laurence Tiret UMRS 937 Claire Perret Nilesh Samani Heribert Schunkert Inke König Jeannette Erdmann Andreas Ziegler .... UMR 8623 UMR 8623 Cécile Germain LRI EGEE-III INFSO-RI-222667 To change: View -> Header and Footer 11

Recommend


More recommend