deCODE genetics Rare and common variants in complex genetics: the deCODE experience Unnur Styrkarsdottir, PhD deCODE Genetics/Amgen, Reykjavik, Iceland Rotterdam, November 16 th , 2017
1. deCODE genetics (Íslensk erfðagreining) 1996 – Owned by Amgen 2. An example - rare variant and an indel associated with osteoarthritis
20 years of genotyping • Microsatellite panel 1998 • Single Nucleotide polymorphism array 2005 • Whole genome sequencing 2010 • RNA sequencing 2013 • Methylome, Metabolomics, Lipidomics ...
The Icelandic genetics project at deCODE Iceland = 340,000 inhabitants • A founder population • Genealogy of Icelanders „Book of Icelanders“ • – Church + Census (750,000 Individuals) Biological samples from 160,000 Icelanders • Large body of phenotypic information • Phasing of the genomes and assignment of • parent of origin
Human Phenotype Data • Over 400 diseases, including sub-phenotypes - close to full spectrum of common medical conditions • Hospital records (discharge diagnosis, medical records, pathology) • Registries (e.g. cancer from 1955, death registry, RAI, drug database) • Questionnaire based • Over 600 Quantitative traits • Anthropometry (Height, Weight, BMI, Body metrics) • Clinical biological traits routinely measured in blood and urine e.g. lipids • Computerized electrocardiogram (ECG) and Holter data, echocardiogram • DXA, MRI, Cognition
Human sequence variation data Father Mother • 160,000 chip typed • 50,000 WGS at ~30X (28,075 in current association freeze) Children • Sequence variations identified by WGS are imputed into the chip typed Icelanders (and their relatives without genotype information = familial imputation), assisted by long-range phasing • mRNA sequence data from blood (n=4,000), adipose (n=1,000) and atrial tissue (n=200)
Dataset on diversity Dataset on diversity in the sequence in the phenotype Phenotype- Whole-genome Sequence Genotype correlation >1500 (20-30x) phenotypes Information for a whole nation
deCODE’s analysis pipeline deCODE Lab Sequencing Samples Sequencing Alignment files Sample BAM BAM BAM files genotyping genotypes Chip genotyping Sequence Marker genotyping annotation Chip genotypes Sequence genotypes Annotated Sequence Long range LRP Imputation markers Genealogy phasing genotypes Imputed Sample genotypes annotation Phenotype Phenotype Association Phenotypes processing lists Automatic processing Association Functional results mutations Manual processing (data freeze)
deCODE GWAS : Complex Traits association with Common Variants Type 2 diabetes Melanoma Myocardial infarction/CAD Squamous cell carcinoma Abdominal aortic aneurysm Schizophrenia Intracranial aneurysm Urinary bladder cancer Atrial fibrillation Asthma Dementia Stroke Basal cell carcinoma Nicotine addiction BMI Lung cancer Menarche Peripheral arterial disease Thyroid cancer Prostate cancer Essential tremor Breast cancer Exfoliation Glaucoma Chronic renal failure Restless leg syndrome Heart block Osteoporosis/BMD Primary open angle glaucoma Open angle glaucoma Coffee consumption Height Pigmentation
deCODE GWAS : Complex Traits association with Rare Variants Ovarian cancer Stomach cancer Glioma Waldenström’s macroglobulinemia Basal cell carcinoma of the skin Height Prostate cancer Cholesterol and other biological traits Cancer of the biliary tract Kidney Stones Chronic lymphocytic lymphoma Myocardial Infarctus Alzheimer’s Disease Hip replacement Osteoporosis ADHD Type 2 Diabetes Sudden cardiac death Atrial fibrillation Osteoarthritis Gout Age Related Macular Degeneration Dyslexia Schizophrenia Autism
Other deCODE data analyses Recombination rate Gene conversions Mutation rate De novo mutations Parental origin Reproductive sucess Selection Variant landscape in individuals (LoF, missense, etc) Sequences not found in reference genome (non-repetitive)
de novo mutations rate and parents‘ sex and age WGS 1,548 individuals, their parents and for a subset of 225 at least one child 70 de novo mutations per individual, on average The number of de novo mutations increases with age of both fathers (1.5 per year) and mothers (0.37 per year) Jonsson, et.al. Nature 549 , 519 – 522 (2017)
1. deCODE genetics (Íslensk erfðagreining) 1996 – Owned by Amgen 2. An example - rare variant and an indel associated with osteoarthritis
Rare SNP and recessive Indel - Osteoarthritis WGS Chip array Imputation Association using different models Rare variant Indel RNAseq Public data Nature Genetics 49, 801 – 805 (2017)
GWAS approach • WGS from 8,453 individuals → 31.6 million variants under the multiplicative model and 19.2 million variants under the recessive model • Impute into 150,656 chip typed individuals and close relatives (294,212 untyped) • Select 4,657 Total Hip Replacement cases and 207,514 controls • Association analysis – multiplicative (additive) and recessive
Rare SNP and recessive Indel - Osteoarthritis Rare variant only found in Iceland An insertion (8bp) not present on chips or other sequencing datasets (at the time)
A 0.026% variant in COMP - p.asp369his • Initial imputation information was 0.952 with OR = 10.3 and P = 3.1 × 10 -9 • Validate imputation by directly genotyping 82 likely and possible carriers and 253 predicted non-carriers. Add the directly assessed genotypes to the training set for re- imputing. • Final info 0.996 with OR = 16.7 and P = 4.0 × 10 -12 • COMP gene encodes cartilage oligomeric matrix protein = high prior evidence I COMP pedigree II III IV V VI VII
Recessive model – very strong signal 8 bp insertion in CHADL gene, p.Val330GlyfsTer106 (frameshift) 3.9% allele freq. / 0.15% recessive Not present in ExAC
Sequencing coverage matters a Black line = total coverage Red line = GC content b Black line = Standard True Seq Blue line = True Seq Nano Red line = PCR free
Association driven by rs532464664[insGGCGCGCG] None of the other variants associated significantly after accounting for the effect of the homozygous state of the rs532464664[insGGCGCGCG] allele ( P adj)
Full length CHADL transcript in cartilage eQTL
CHADL transcript degraded by nonsense mediated decay
Present today in gnomAD browser We genotyped 10,000 foreign samples in order to estimate frequency in other populations 2% in other European populations compared to 4% in Iceland = founder effect No power to assess recessive association in other populations
Summary COMP - CHADL • A rare missense SNP in COMP gene associates with THR in Iceland • Population specific and in one extended pedigree • Gene with high prior evidence • Re-genotyping and re-imputation beneficial • An 8 bp indel in CHADL associates with THR in a recessive manner • Sequencing coverage crucial • RNA seq shows expression of full length transcript in cartilage • Transcript degraded through nonsense mediated decay • Variants identified through whole genome sequencing used for association analyses • Population specific imputation based on genealogy and long range haplotypes
Summary deCODE genetics • Long standing experience of human genetics • Rich phenotype and genotype data • Association of common and rare variants with human diseases and traits • Replications and meta-analyses • Other basic human genetics analyses
Thank you
Recommend
More recommend