Polymorphic variation in the human genome and susceptibility to disease Samuel Deutsch PhD PhD Samuel Deutsch Department of Genetic Medicine and Development Department of Genetic Medicine and Development University of Geneva University of Geneva
Human genome sequence Human genome sequence Only first Phase ! Only first Phase ! Consensus sequence for sequence for Consensus species species Annotation possible ! Annotation possible !
Human genome sequence: D Di iv ve er rs si it ty y Human genome sequence: Very large amount of sequence variation in human populations Key to Key to Human Genetics Human Genetics Large- -scale scale Large Microsatellites Microsatellites indels indels SNPs SNPs
Why is sequence Di iv ve er rs si it ty y important ? important ? Why is sequence D Phenotype (normal variation, Disease Disease ) ) Phenotype (normal variation, Evolution Evolution Risk prediction, Life style Risk prediction, Life style Pharmacogenomics, Personal medicine , Personal medicine Pharmacogenomics Forensics Forensics
Genes and disease Genes and disease Is a trait genetically Is a trait genetically determined ? determined ? Clear genetic Clear genetic effect effect • Autosomal dominant Autosomal dominant • • Fully • Fully penetrant penetrant
Sequence Variation : most traits are not monogenic ! most traits are not monogenic ! Sequence Variation :
Sequence Variation : most traits are not monogenic ! most traits are not monogenic ! Sequence Variation : Gene x environment Gene x environment interactions interactions
Association studies: is the trait genetically determined ? Association studies: is the trait genetically determined ? General Population Affected sibling f sibs of of affecteds affecteds f sibs λ s = f gen gen pop pop f
Association studies: is the trait genetically determined ? Association studies: is the trait genetically determined ? Disease frequency frequency Disease due to genom e genom e sharing sharing due to λ s Schizophrenia 12 Schizophrenia 12 Asthma 8 Asthma 8 Type I diabetes Type I diabetes 12 12 Crohn’ ’s s disease disease 25 Crohn 25 Multiple sclerosis sclerosis 24 Multiple 24 Aortic stenosis stenosis 59 Aortic 59 Ventricular septal septal defect defect 25 Ventricular 25 Cleft Cleft lip lip 40 40 λ s broken further into multiple loci ! broken further into multiple loci !
Association studies: is the trait genetically determined ? Association studies: is the trait genetically determined ? Disease frequency frequency Disease I n Monozygotic Monozygotic versus versus Dizygotic Dizygotic tw ins tw ins I n Monozygotic Share 100% 100% of of alleles alleles Monozygotic Share Dizygotic Share 50% 50% of of alleles alleles Dizygotic Share % concordance % concordance MZ DZ MZ DZ Epilepsy 70 6 Epilepsy 70 6 Multiple sclerosis sclerosis 18 2 Multiple 18 2 Type 1 diabetes diabetes 40 5 Type 1 40 5 Schizophrenia 53 15 Schizophrenia 53 15 Osteoarthritis 32 16 Osteoarthritis 32 16 Rheumatoid arthritis arthritis 12 3 Rheumatoid 12 3 Psoriasis 72 15 Psoriasis 72 15
Is a quantitative trait genetically controlled ? V P = V E + V G Total variance of a trait h 2 = V G / V P What fraction is genetic ? Can calculate heritability heritability using VC methods using VC methods Can calculate
Heritability Covariance for phenotype Slope = h2r Kinship (calculated as average IBD)
Sequence Variation : Types and uses Types and uses Sequence Variation : Microsatellites Microsatellites • Variation in Variation in number number of repeats of repeats • Multi Multi-allelic allelic in population in population 5 • Highly Highly informative informative 8 • Mostly Mostly non non-functional functional • Most useful for Most useful for Family studies Family studies Pedigree Pedigree Can be used for 2, 5 2, 5 3, 8 3, 8 Can be used for LINKAGE ANALYSIS LINKAGE ANALYSIS 2, 3 2, 3 3, 5 3, 5 2, 8 2, 8 2, 3 2, 3
Sequence Variation : Linkage Analysis Linkage Analysis Sequence Variation : 1 3 5 1 2 4 2 3 7 6 8 11 12 13 9 10 14 16 15 4 21 19 22 20 18 17 5 6 Panel of Panel of Microsatellites Microsatellites Look at Look at co co-segregation segregation evenly spaced throughout evenly spaced throughout patterns of disease with patterns of disease with genome genome alleles of specific markers alleles of specific markers
Sequence Variation : Linkage Analysis Linkage Analysis Sequence Variation : Co- -segregation of alleles with segregation of alleles with Co disease depends on: disease depends on: Alignement et CHIASMA crossing-over 1.Chromosomal localisation localisation. . 1.Chromosomal 2. Physical/Genetic distance 2. Physical/Genetic distance between between marker marker and and disease locus. disease locus.
Sequence Variation : Linkage Analysis Linkage Analysis Sequence Variation : Minimal Minimal recombination recombination region region
Sequence Variation : Linkage Analysis Linkage Analysis Sequence Variation : LOD score calculated by maximum LOD score calculated by maximum likelihood : likelihood : Likelihood of observation / likelihood observation by chance Likelihood of observation / likelihood observation by chance LOD > 3 LOD > 3 is usually considered to be significant on a genome is usually considered to be significant on a genome-wide basis wide basis
Mapping monogenic disorders: Great success story ! Great success story ! Mapping monogenic disorders: Genes with mutations Examples include : Examples include : causing human disorders • Cystic Fibrosis (7q31) Cystic Fibrosis (7q31) • 1794 • Muscular dystrophy (X) Muscular dystrophy (X) • • Parkinson's disease Parkinson's disease • (4q21) (4q21) • Deafness Deafness • DEC-05 (about 45 different loci !) (about 45 different loci !) Total ~ 25,000 genes
Linkage Analysis: Limits Limits Linkage Analysis:
Sequence Variation : SNPs SNPs Sequence Variation : • Variation in Variation in single position single position • • bi • bi- -allelic allelic in population in population Most common type of variation, Most common type of variation, • Less Less informative informative • any two chromosomes differ any two chromosomes differ every 600 bp 600 bp . . every • Can be Can be functional functional • (about 10 million 10 million genome genome- -wide) wide) (about •Most useful in Most useful in population studies population studies •
Functional consequences of variation Coding variation leading to protein changes Sequence variation (SNPs, deletions/duplications, Non coding variation repeats, transposable affecting elements) transcription of genes Non coding variation affecting chromatin structure
Sequence Variation : SNPs SNPs Sequence Variation :
Population- -based association studies based association studies Population • • If and If and allele i allele i in in gene x gene x is involved in disease pathogenesis, is involved in disease pathogenesis, one expects a significant one expects a significant increase in frequency increase in frequency in affected groups in affected groups vs. control. vs. control. N=60 N=60 N=60 N=60 Logistic regression Logistic regression f(i f(i) χ 2 test 2 test Controls Controls Patients Patients Genotypes Genotypes
Population- -based association studies based association studies Population Two main approaches : Two main approaches : • Candidate gene Candidate gene : limited set of SNPs in set of : limited set of SNPs in set of • candidate genes. In general gives a incomplete picture candidate genes. In general gives a incomplete picture of phenotype determination. of phenotype determination. • Indirect association Indirect association : : Genome Genome- -wide wide set of SNPs, no set of SNPs, no • prior hypothesis, potentially could give a complete view prior hypothesis, potentially could give a complete view of phenotype determination. Depends on LD LD . Only . Only of phenotype determination. Depends on possible with important technology advances important technology advances . . possible with
Association studies: Linkage disequilibrium Association studies: Linkage disequilibrium LD LD can be measured can be measured in several ways. For in several ways. For association studies association studies rsq (coefficient of rsq (coefficient of determination) is determination) is most common most common [ ] − 2 f ( AB ) f ( A ) f ( B ) = 2 r f ( A ) f ( a ) f ( B ) f ( b ) 8 8 'tag' SNPs 'tag' SNPs for 50 SNPs in for 50 SNPs in region region
Association studies: HapMap HapMap project project Association studies: Ultimate goal Ultimate goal : find the : find the minimal set of SNPs minimal set of SNPs that capture that capture most of the sequence variation most of the sequence variation information to perform information to perform association studies association studies . www.hapmap.org www.hapmap.org
Recommend
More recommend