computational systems biology deep learning in the life
play

Computational Systems Biology Deep Learning in the Life Sciences - PowerPoint PPT Presentation

Computational Systems Biology Deep Learning in the Life Sciences 6.802 6.874 20.390 20.490 HST.506 David Gifford Lecture 10 March 12, 2019 Histone Marks Chromatin 3D Structure http://mit6874.github.io 1 Goals for today Chromatin


  1. Computational Systems Biology Deep Learning in the Life Sciences 6.802 6.874 20.390 20.490 HST.506 David Gifford Lecture 10 March 12, 2019 Histone Marks Chromatin 3D Structure http://mit6874.github.io 1

  2. Goals for today • Chromatin marks and their models • Hidden Markov Model (HMM) Deep learning model (DeepSEA) • • Three-dimensional chromatin structure • Inferring it • Predicting it

  3. 1. Chromatin marks and biological state

  4. Chromatin and Nucleosome Organization Khorasanizadeh, (2004) Green -H3, yellow - H4, red - H2A, pink - H2B. Dark and light blue - DNA Nucleosome DNA - 146 base pairs, wrapped 1.7 times in a left-handed superhelix Proteins - two copies of each Histones H2A, H2B, H3 and H4. Higher organisms have linker H1 histone Histone variants H3 variants: H3.3 - transcribed CENP-A - centromeres H2A variants: H2A.X - DNA damage macroH2A - X chromosome H2A.Z - transcribed regions

  5. Chro ma tin o rg a niza tio n ha s multiple struc tura l la ye rs a nd o rg a nize s c hro ma tin into “do ma ins” Bo th DNA me thyla tio n a nd c hro ma tin ma rks c o nta in impo rta nt func tio na l info rma tio n

  6. Histo ne T a il Mo dific a tio ns Sims III et al., 2003

  7. We c an obse r ve c hr omatin mar ks and othe r ge nome assoc iate d pr ote ins using ChIP- se q H3K 4me 3 RNA Po l I I

  8. s. a ) b ) hE SC ChI P-se q De te c tion of Class I (ac tive ) and Class II (poise d) e nhanc e r re a d de nsity pro file s we re g e ne ra te d fo r the indic a te d histo ne mo dific a tio ns c e nte re d o n p300-b o und re g io ns in the to p 1000 Cla ss I a nd Cla ss I I e nha nc e rs, re spe c tive ly. c ) hE SC Na no g ChI P-se q sho ws tha t Na no g b inds a t the thre e pre dic te d Cla ss I I e nha nc e r po sitio ns ne a r the CDX2 g e ne

  9. 2. Learning chromatin states

  10. Can we find late nt state to e xplain obse r ve d mar ks? Roadmap Epigenomics Consortium et al. Nature 518 , 317-330 (2015) doi:10.1038/nature14248

  11. Hidde n Mar kov Mode ls Hidde n sta te x in [1 .. m] F o r e xa mple , m c a n 15 E mitte d symb o l y c a n b e multi dime nsio na l F o r e xa mple , histo ne a nd a c c e ssib ility da ta a t g e no mic lo c us t One no de e ve ry 200b p do wn g e no me Pa ra me te rs a re P(x t+1 | x t ), P(y t | x t )

  12. Hidde n Mar kov Mode ls c an be use d to c r e ate late nt state s that ge ne r ate c hr omatin mar ks Hidde n Ma rko v Mo de l (Chro mHMM) Divide g e no me into 200b p windo ws Hidde n sta te fo r a 200b p windo w mo de ls wha t histo ne ma rks a re pre se nt in the windo w Unsupe rvise d – re sulting sta te s must b e inte rpre te d with inde pe nde nt da ta T he numb e r o f sta te s is fixe d a nd is a mo de ling de c isio n

  13. ChromHMM Model Parameter Visualization. Hoffman M M et al. Nucl. Acids Res. 2013;41:827-841 P(y t | x t ) P(x t+1 | x t )

  14. Chr omHMM se gme nt base d c hr omatin state s

  15. Tissues and cell types profiled in the Roadmap Epigenomics Consortium. Roadmap Epigenomics Consortium et al. Nature 518 , 317-330 (2015) doi:10.1038/nature14248

  16. Roadmap Epigenomics Consortium et al. Nature 518 , 317-330 (2015) doi:10.1038/nature14248

  17. 3. Predicting chromatin state from sequence

  18. DeepSea learns TF binding, accessibility, and chromatin marks 125 DNa se fe a ture s, 690 T F fe a ture s, 104 17% o f g e no me histo ne fe a ture s 690 T F b inding pro file s fo r 160 thre e diffe re nt T F s, 125 c o nvo lutio n DHS pro file s a nd la ye rs with 320, 104 histo ne -ma rk 480 a nd 960 pro file s ke rne ls Chr 8 a nd 9 1000 b p windo w e xc lude d

  19. DeepSea can predict differentially accessible regions based upon SNP value

  20. An ensemble logistic regression classifier based on DeepSea output can identify regulatory variants

  21. 4. Three-dimensional interactions

  22. HiC, HiChip, a nd ChI A-PE T da ta re ve a l dista l g e no me inte ra c tio ns

  23. E nhanc e r s r e gulate distal tar ge t ge ne s by ge nome looping E nha nc e r Ma ste r Re g ula to rs Me dia to r Co he sin Po l I I Ge ne

  24. in situ HiC identifies proximal genomic contacts Ce ll. 2014 De c 18; 159(7): 1665–1680.

  25. in situ HiC reveals interactions at 1 – 5 KB resolution

  26. Observed interchromosomal interaction distances fall off exponentially

  27. ChIA-PET identifies protein mediated interactions and improves resolution for those events

  28. ChIA-PET data are consistent with HiC data

  29. ChIA-PET discovered enhancer linkages

  30. Issue s with ChIA- PE T 1. Hig h fa lse ne g a tive ra te . L ib ra rie s pro duc e d a re no t c o mple x e no ug h to pe rmit furthe r disc o ve ry b y a dditio na l se q ue nc ing . 2. Spe c ific to a pro te in (RNA Po lyme ra se I I in o ur e xa mple ) 3. Hi-C a nd de riva tive s ma y so lve the se pro b le ms e ve ntua lly

  31. HiChIP identifies protein mediated interactions

  32. HiChIP is more sensitive than ChIA-PET

  33. HiChIP and ChIA-PET interactions compared Smc1a antibody (part of cohesion complex)

  34. XIST promoter interactions show more support from HiChIP than Hi-C

  35. HiChIP (Smc1a) is more sensitive than HiC

  36. 5a. Discovering interactions: Anchor-based

  37. Method 1: Discover anchors using ChIP-seq methods Given anchors, what is the chance of observing an interaction by chance? N to ta l e nds I a ,b inte ra c tio ns o b se rve d c a e nds c b e nds

  38. What is the chance of observing an interaction by chance? N to ta l e nds I a ,b inte ra c tio ns o b se rve d c a e nds c b e nds

  39. E stimating total e ve nts fr om ove r lap I ma g ine we pe rfo rm two b io lo g ic a l re plic a te s o f a n e xpe rime nt a nd o b ta in 1000 e ve nts in e a c h, o f whic h 900 a re ide ntic a l. We c a n use a hype rg e o me tric mo de l to infe r ho w ma ny po ssib le e ve nts e xist ( N ) g ive n two sa mple size s ( m a nd n ) a nd a n o ve rla p ( k ): Using this mo de l, we pre dic t ~1100 to ta l e ve nts

  40. Appr oximate c lose d for m solution for total numbe r of e ve nts T he ML e stima te o f N is a ppro xima te ly: One wa y to se e this is b y using the no rma l a ppro xima tio n o f the b ino mia l a ppro xima tio n to the hype rg e o me tric distrib utio n:

  41. 5b. Discovering interactions: Density-based

  42. Me tho d 2: CI D use s de nsity-b a se d c luste ring to disc o ve r c hro ma tin inte ra c tio ns Nucleic Acids Research, 14 February 2019, gkz051, https://doi.org/10.1093/nar/gkz051 • Figure 1. CID uses density-based clustering to discover chromatin interactions. (A) ChIA-PET interactions can be discovered as groups of dense arcs connecting two genomic regions. Each arc is a PET. (B) The PETs plotted on a two-dimensional map using the genomic coordinates of the two reads. Each point is a PET. The colors represent the density values, defined as the number of PETs in the neighborhood. The red dashed square represents the size of the neighborhood. (C) The clustering decision graph. Each point is a PET. The points with high density and high delta values are selected as cluster centers. For simplicity, only large clusters are labelled. (D) The read pairs are assigned to the nearest cluster centers. The clusters are labeled as in (C). (E) The clusters are visualized as arcs. The clusters are labeled as in (C) and (D).

  43. Method 2: Density cluster interaction origins https:/ / a c a de mic .o up.c o m/ na r/ a dva nc e -a rtic le / do i/ 10.1093/ na r/ g We use a thre e -c o mpo ne nt mixture mo de l to de sc rib e c o nditio na l distrib utio n o f PE T -c o unt fro m a ll the PE T c luste rs. One c o mpo ne nt re pre se nts true inte ra c tio n PE T c luste r (T iPC), a nd the o the r two fo r ra ndo m c o llisio n PE T c luste r (Rc PC) a nd ra ndo m lig a tio n PE T c luste r (RlPC), re spe c tive ly. T iPC a nd Rc PC mo de ls inc lude d a ,b dista nc e b e twe e n c luste rs https:/ / a c a de mic .o up.c o m/ b io info rma tic s/ a rtic le / 31/ 23/ 3

  44. Cluster interaction origins

  45. Jaccard coefficient – measure of set similarity

  46. CID is more reproducible and sensitive

  47. 6. Predicting enhancer-promoter interactions

  48. TargetFinder uses multiple data types to predict HiC interactions https:/ / www.na ture .c o m/ a rtic le s/ ng .3539

  49. TargetFinder Training Data

  50. TargetFinder – Ratio of the CTCF and RAD21 ChIP-seq signals occurring within interacting enhancers and non- interacting enhancers

  51. TargetFinder – Enrichment of signals at transcription start sites (TSS) Da rk – inte ra c ting ; L ig ht – no n-inte ra c ting

  52. TargetFinder – Performance F e a ture s fo r e nha nc e rs a nd pro mo te rs o nly (E / P), e xte nde d e nha nc e rs a nd pro mo te rs (E E / P), a nd e nha nc e rs a nd pro mo te rs plus the windo ws b e t

  53. Deep learning network for predicting enhancer-promoter interactions

Recommend


More recommend