cindy g boer
play

Cindy G. Boer Genetic Laboratory Internal Medicine Erasmus MC - PowerPoint PPT Presentation

Gene Regulation, Epigenetics & Databases Cindy G. Boer Genetic Laboratory Internal Medicine Erasmus MC Congratulations! A genome-wide significant GWAS hit! (and what to do now?) GWAS identifies SNPs not Genes! We want to know Causal


  1. Gene Regulation, Epigenetics & Databases Cindy G. Boer Genetic Laboratory Internal Medicine Erasmus MC

  2. Congratulations! A genome-wide significant GWAS hit! (and what to do now?)

  3. GWAS identifies SNPs not Genes! We want to know  Causal gene & disease mechanism…… This question presents us with 2 problems: 1. What is the causal variant ? 2. What is the causal cell type(s)? Causal variant  Causal cell type  Causal gene

  4. Identification of Causal variant? Locus zoom plot • LD structure plotted SNPs high LD • (r 2 >0.8 or r 2 > • 0.6) Linkage Disequilibrium (LD) • GWAS Association between disease trait and (tag) SNP – Array designed on LD structure not functional SNP (imputation) • None, few, tens even hundreds of SNPs in LD with top SNP GWAS! Castaño Betancourt, et al .,(2016), PLOS genetics

  5. Genome-wide association signal Step 1: Annotation! Top SNP (+SNPs LD >0.8 )  (one SNP/multiple) located in the coding sequence of a gene • Synonymous? Or Non-Synonymous? • Gene? What is known, what does it do? – Damaging effect of the hit? (first part of the practical)

  6. Genome-wide association signal Step 1: annotation [realistic scenario] Most GWAS findings are located in non-coding regions of the genome [M.T. Maurano et al., Science, 337, 1190 (2012)] – Introns or intergenic – ~ 98.5% human genome is non-coding Difficult to link SNP  Gene  Phenotype

  7. Regulatory elements GWAS SNPs are enriched for regulatory elements. Regulatory regions Promoters, enhancers, inhibitors, insulators, transcription factor binding sites etc. 1. What is a regulatory region/how is a regulatory region defined? 2. How will you know if your hit is located in a regulatory region? [M.T. Maurano et al., Science, 337, 1190 (2012)]

  8. Bioinformatics: “Mathematical, statistical and computing methods that aim to solve biological problems using DNA and amino acid sequences and related information” Databases & Tools: Online collection of (molecular) biological data ant tools that are: • Structured & Searchable • Publically available • Updated periodically & Cross-referenced • Literature • Data from research

  9. Gene Regulation databases

  10. The Central Dogma (of molecular biology) Epigenomics: All epigenetic modifications on the genetic material of a cell

  11. Epigenetics “Epigenetic mechanisms can control the functions of noncoding sequences of DNA”. The regulation and control of gene expression is essential for cell function, survival, differentiation

  12. Histones & Chromatin

  13. DNA structure & Regulation DNase hypersensitive regions  open chromatin configuration

  14. DNA structure & Regulation

  15. The Histone Code Histone code : multiple histone post translational modifications (PTMs)  specific unique downstream functions Specific proteins involved in gene control recognize and interrogate the patterns of histone modifications : Ex. RNA polymerase II, Transcription factors & DNA binding proteins • Transcription factor recruitment • Chromatin shape and function

  16. Epigenetics: Histone Code Inactive Promoter Active Promoter H3K27me3 H3K4me3 [promoter specific] DNA methylation H2A.Z [histone variant] Inactive Enhancer Active Enhancer H3K9me2 H3K4me1 [enhancer specific] DNA methylation H2A.Z [histone variant] Hundreds histone PTM’s Known!

  17. Regulatory regions: Chromatin States ENCODE/ROADMAP • “15-state model” • Histone modifications • DNAse sites • TF-binding Sites Roadmap Epigenomics Consortium, et al ., Nature 2015

  18. Epigenetics: symphony No. 9

  19. DNA binding proteins DNA-binding proteins: Transcription factors, nucleases, other DN binding proteins Non-specific binding: polymerases, histones Specific binding: Transcription factors, nucleases Specific binding  recognition consensus sequence  Change in consensus sequence  change in DNA binding affinity?  change in gene regulation/expression?

  20. Consensus sequences • DNA binding motif: “recognition sequence” • Found in databases: – JASPAR database – Integrated in HaploReg (practical)  Can also be affected by methylation! (EWAS)

  21. CTCF methylation CTCF binding is affected by methylation in it’s core sequence  Proper CTCF functioning is essential! “severe dysregulation of CTCF in cancer cells” Mouse mutants CTCF – embryonic lethal

  22. So Far we have: Annotation: • Location (Chr/Bp) • Coding/non-coding • DNA regulatory elements – (and open chromatin sites) • Transcription factor binding sites GWAS & EWAS goal Identify novel targets/genes involved in phenotype X  So far only annotation, No (potential) causal gene

  23. Gene Regulation Typical eukaryotic gene regulation • Complex 3D looping (CTCF) • Multiple regulatory regions • Involvement of multiple transcription factors • Can be cell type specific Gene regulation is highly complex! Adapted from: Alberts, Molecular Biology of the Cell 5 th Edition, figure 7-44

  24. Gene Regulation • ~1 MB (1000.0000 base pairs) long range regulation – Sonic Hedgehog, essential developmental gene

  25. Circadian rhythm : Epigenetics • Mammalian circadian clock • Oscillation of ~ 24h – Light-dark cycle (melatonin secretion), Feed cycle • A conserved transcriptional–translational auto-regulatory loop generates molecular oscillations of ‘clock genes’ at the cellular level PARP1- and CTCF-Mediated Interactions between Active and Repressed Chromatin at the Lamina Promote Oscillating Transcription, Zhao et al., 2015 Molecular Cell

  26. Complex 3D structure [Movie Time]

  27. SNP to gene: even more complicated than you thought Even if authors did everything they could to determine the causal gene, they might be wrong! Cannon, ME et al., 2018, American Journal of Hum Genet

  28. Finding [causal] Genes Cell type specify is useful & Important: • Gene expression levels (RNA-seq) – Predicted promoter activity in cell type – Predicted gene activity (ex active gene transcription mark: H3K36me3) • Gene expression – Genotype – eQTL’s! (Thursday lecture/practical) – Also Cell type specific!

  29. Phenotype - Alzheimer Enhancer Enhancer Marks in Marks in Brain? Heart?

  30. Causal Genes: Example  Enhancer site (likely) to regulate gene 1 or gene 2 (or both)?

  31. Cell type selection: • Not in all cases the selection of target tissue will be easy: – Cell fate – Cell state and Cell type – Complex diseases & phenotypes  Availability of material & data  Proxy tissues: • Same lineage, similar functioning tissue • (gene of interest) expression vs no expression • Tools & databases to select target tissue • GWAS SNPs are enriched for gene regulatory regions….in target cell type!

  32. Genome-wide association signal Cannon, ME et al., 2018, American Journal of Hum Genet

  33. ..How to Find? • Where is your hit (SNP) located? – Chromosome & position – Near or in which genes • Coding variant – Synonymous/non-synonymous • Regulatory regions • 3D structure of the genome • Candidate gene – gene function • Cell type?

  34. Bioinformatics: “Mathematical, statistical and computing methods that aim to solve biological problems using DNA and amino acid sequences and related information” Databases & Tools: Online collection of (molecular) biological data ant tools that are: • Structured & Searchable • Publically available • Updated periodically & Cross-referenced • Literature • Data from research

  35. Bioinformatic databases & Tools • Cross-referenced! • Also do own cross reference! • Regular Updated!

  36. Biological databases • Pubmed – Literature database • Categorized databases: too much to name – Genomic variation : dbSNP, HapMa .... – Sequence: NCBI RefSeq database, Entrez Nucleotide, miRbase... – Proteins : RCSB protein databand, UniProt, SMART... – Pathways : KEGG, Reactome, STRING... – DNA annotation : ENCODE, ROADMAP epigenetics • Genome Browsers: genomic database, integrating all data associated to genome annotation & function. • Mining Tools: FUMA & HaploReg

  37. Genome Browser • Displaying, viewing and accessing genome annotation data • Genome annotation: – DNA-variation information, epigenetic regulation, transcription, translation, disease information... • Links to other specialized Databases

  38. Difference? • NCBI, UCSC and EnsEMBL use the same human genome assembly generated by NCBI – Release timing and data availability can differ between sites • NOTE: the version of the genome assembly – Annotation location and availability will be different between different assemblies • Own preference which to use • Practical : mainly UCSC and some forays into other databases, including NCBI, EnsEMBL & ENCODE

  39. ..How to Find? • Where is your hit (SNP) located? – Chromosome & position – Near or in which genes • Coding variant – Synonymous/non-synonymous • Regulatory regions • 3D structure of the genome • Candidate gene – gene function • Cell type?

  40. Mining Tools FUMA Functional Mapping and Annotation of Genome-Wide Association Studies – Monday Practical & Todays practical – Novel Tool!

Recommend


More recommend