searching for the genetic basis of complex traits in
play

Searching for the genetic basis of complex traits in humans and - PowerPoint PPT Presentation

Searching for the genetic basis of complex traits in humans and primates Vasily Ramensky UCLA Center for Neurobehavioral Genetics 22/03/16 University of California Los Angeles Center for Neurobehavioral Genetics -- 35-year project aimed to


  1. Searching for the genetic basis of complex traits in humans and primates Vasily Ramensky UCLA Center for Neurobehavioral Genetics 22/03/16

  2. University of California Los Angeles Center for Neurobehavioral Genetics

  3. -- 35-year project aimed to decrease the global economic and health impact of depression by 50% by 2050 -- 100,000 individuals to be enrolled -- The largest UCLA research initiative thus far, with an anticipated budget of $525 million for the first 10 years

  4. Projects Finnish Metabolic Sequencing : genetic basis of quantitative metabolic traits in the Finnish population -- Target gene sequencing in >6,000 NFBC1966 members -- Whole exome sequencing in 20,000 individual Vervet monkeys: non-human primates in biomedical research -- Whole genome sequencing of >700 members of Vervet Research Colony Tourette Syndrome: genetic basis of Tourette Syndrome -- Exome and targeted sequencing of >100 members of large TS pedigrees -- GWAS studies of large TS cohorts Bipolar disorder: genetic factors that contribute to risk for bipolar disorder -- Whole genome sequencing of 450 members of large pedigrees from Colombia and Costa Rica with severe form of bipolar disorder

  5. Finnish Metabolic Sequencing

  6. Finnish Metabolic Sequencing Northern Finnland Birth Cohort 1966 -- Founder population, inhabited Northern Finnland in the 1600s -- Genetic isolate, homogeneous in genetic and environmental background, enriched in potentially damaging variants -- Birth cohort: no age as a confounder; longitudinal data -- Quantitative heritable traits: * body mass index, * fasting serum concentrations of lipids, * glucose and insulin, * inflammation CRP, * blood pressure

  7. Finnish Metabolic Sequencing GWAS in NFBC66: Sabatti et al., 2009 31 associations to 6 traits, 9 associations previously unreported

  8. Finnish Metabolic Sequencing GWAS in NFBC66: Sabatti et al., 2009

  9. Finnish Metabolic Sequencing GWAS in NFBC66: Sabatti et al., 2009 Identified loci explained little of trait variability => contribution of rare variants?

  10. Genetic architecture of complex traits

  11. Finnish Metabolic Sequencing Targeted sequencing in NFBC66 and FUSION -- 78 genes in 6,121 samples, 17 loci on 10 chr -- 2,234 variants, 76% with MAF<=0.5% -- Single variant tests: variants with MAF>0.1% in additive genetic model -- Gene-level tests: missense variants with MAF<1% -- Goal: new single variant signals independent from GWAS or associations at the gene level

  12. Finnish Metabolic Sequencing Targeted sequencing in NFBC66 and FUSION Why? -- Insertions and deletions -- Epistatic interactions -- Compound heterozygotes -- Testing all rare missense variants -- Non-coding regulatory variants

  13. Tourette Syndrome

  14. Tourette Syndrome -- An inherited disorder, childhood onset (prevalence 0.4-3.8%) -- Multiple physical (motor) and vocal tics -- Linkage studies of large families: genetic signal on chr2p -- No significant associations for coding exome variants -- Exome + targeted non-coding regions on chr2p in 109 individuals from 15 large TS families (65 affected, 35 not affected, 9 unknown) -- Genotyping of candidate variants in >700 individuals from sib-pair families (UCLA) -- GWAS studies in multiple cohorts

  15. Tourette Syndrome Candidate variants in the chr2p region Pos, Region dbSNP Idx Segregation Chi2 Epigenomic info Mbp AAF Aff (Fam) 59.1 FLJ30838 0.91% 5 9 (4) 0.30 Enh H9 Neuronal FunSeq Progen Cells enhancer (REMC) 60.5 AC007381 0.78% 2 8 (3) 0.04 Fetal Brain Intron (REMC) 60.8 N/A 9.4% 0 30 (10) 0.001 LBL enh // Idx: conserv. mammals, primates, CADD, DANN, fatHMM-mkl

  16. Jeremiah Scharf, Dongmei Yu

  17. Tourette Syndrome BrainSpan : RNA-seq in 524 prenatal and postnatal samples LINC01122 Brain regions Time points

  18. Tourette Syndrome BrainSpan : RNA-seq in 524 prenatal and postnatal samples BCL11A Brain regions Time points

  19. Normalized expression (X-Xmean)/Xstdev Rcorr=0.723 4 3 2 BCL11A 1 0 -3 -2 -1 0 1 2 3 4 -1 -2 LINC01122

  20. Tourette Syndrome Annotation of “anonymous” lincRNA 1) Search for genes coexpressed with query Q: -- Threshold: genes with Rcorr > R 0 -- Forward: genes in Q ’s top x% // contaminated by “promiscuous” genes -- Reverse: genes for which Q is in top x% -- Reverse-back-reverse ( Gene’s best friends by Sasha Favorov) 2) Check enriched GO terms for top ranked genes

  21. Tourette Syndrome GO annotations for reverse and forward ranks

  22. Sequencing in the VRC

  23. Vervet Research Colony N ~ 2X10 4

  24. Sequencing in the VRC Non-human primates vs. humans and rodents -- Low sequence divergence, syntenic blocks -- Phenotypic similarity (brain/behavior, infectious diseases, metabolism) -- Invasive studies are possible -- Controlled environment -- Longitudinal approaches are possible

  25. Sequencing in the VRC Examples of available phenotypes: -- Brain and behavior: MRI, CSF monoamines, novelty seeking, intruder challenge, anxiety, mother-infant interaction, sleep/circadian rhythms, cortisol, oxytocin -- Metabolism and growth: lipids, glycemic measures, adipokines/leptin, vitamin D, morphometry (BMI) -- Microbiome at multiple body sites -- Life history traits and disease history -- RNA-seq: eQTLs from multiple tissues

  26. Sequencing in the VRC Non-human primates vs. humans and rodents -- Low sequence divergence, syntenic blocks -- Phenotypic similarity (brain/behavior, infectious diseases, metabolism) -- Invasive studies are possible -- Controlled environment -- Longitudinal approaches are possible -- No reference datasets (dbSNP, Encode, etc.) -- Not all tools work for highly inbred populations

  27. Sequencing in the VRC Blue : Founders. Orange : sequenced monkeys, size ~ coverage

  28. Sequencing in the VRC -- WGS of >700 samples with varying coverage (1..30x) -- Reference genome C.sabaeus 1.1: 29 + 2 chr Workflow: -- Raw variant calling with GATK, genotype refinement in trios -- Postprocessing: genotype conflicts, Mendelian errors, low qual -- Phasing in 99 = 82 HC + 17 LC samples with Beagle -- Phasing and imputation in 620 LC, 99 as reference haplotypes -- Postprocessing: Mendelian errors, QC, quality flags -- Two independent call sets: 16.7 mln SNVs genomewide, 1.3 mln extended exome SNVs and indels

  29. Sequencing in the VRC Variant annotation COMPLEX 50502 3.7 DEL 133861 9.8 INS 69993 5.1 SNV 1114235 81.4 NR annotation Variants % ------------------------------- Upstream-1000 325,968 23.8 Downstream-1000 284,953 20.8 Intron 174,171 12.7 3-UTR 167,523 12.2 Non-coding 144,099 10.5 5-UTR 102,395 7.5 Synon 79,477 5.8 Missense 75,436 5.5 Coding-exon-indel 10,325 0.8 Stop-gain 1,514 0.1 Donor 1,352 0.1 Acceptor 1,191 0.1 Stop-loss 187 0.0 -------------------------------- Total 1,368,591

  30. Sequencing in the VRC Alternative allele count distributions by type

  31. Sequencing in the VRC Constrained human genes in vervets -- ExAC: exomes in 60,706 humans -- 3,230 genes depleted with PTVs (protein-truncating variants: indels, splice site, stop gain) -- 3,118 constrained genes (96.5%) have vervet orthologs -- Of them, 1,256 vervet genes harbor 2,212 PTVs (total 13,665) -- Genes with multiple PTVs: not constrained in vervets? Genes with few PTVs: check respective phenotypes

  32. Sequencing in the VRC Unconstrained genes with many PTVs

  33. Sequencing in the VRC Constrained genes with many PTVs

  34. Sequencing in the VRC Alt allele counts for PTVs

  35. New methods to interpret genome variation

  36. New methods to interpret variation Protein-truncating variants: why are they tolerated? Data: -- ExAC: ~60,000 human exomes -- Vervets: ~15,000 PTVs in 719 exomes -- Available microexon data Approach -- Protein structure: models and features

  37. New methods to interpret variation Good old missense variants Motivation? -- Prediction targeted at specific protein families -- Need to explain the mechanism -- Account for intragenic compensation -- Traditional training sets need revision

  38. New methods to interpret variation Good old missense variants Motivation? -- Prediction targeted at specific protein families -- Need to explain the mechanism -- Account for intragenic compensation -- Traditional training sets need revision Data: -- New and emerging: NGS-based (ExAC) -- Old and forgotten: functional experiments // How an impact on biochemical function translates to the clinical and population levels?

  39. New methods to interpret variation Compensated pathogenic deviation -- A source of prediction errors for existing methods -- Fundamental mechanism of protein evolution and resistance development for pathogens Data: -- Protein mutation databases: functional effect of M 1 , M 1 +M 2 … -- Literature-based

  40. New methods to interpret variation Non-coding variation Data -- Genome sequence markup: genes and their elements -- Population-based variant frequencies (dbSNP, WGS) -- Genotype-phenotype associations (ClinVar, eQTLs, GWAS) -- Comparative genomics: conservation -- TF binding sites: experimental (ChIP-seq) and predicted -- Epigenomics data (REMC, ENCODE) Problems -- Training sets -- Tissue specificity

Recommend


More recommend