Computational Systems Biology Deep Learning in the Life Sciences - PowerPoint PPT Presentation

Computational Systems Biology Deep Learning in the Life Sciences 6.802 6.874 20.390 20.490 HST.506 David Gifford Lecture 10 March 12, 2019 Histone Marks Chromatin 3D Structure http://mit6874.github.io 1

Goals for today • Chromatin marks and their models • Hidden Markov Model (HMM) Deep learning model (DeepSEA) • • Three-dimensional chromatin structure • Inferring it • Predicting it

1. Chromatin marks and biological state

Chromatin and Nucleosome Organization Khorasanizadeh, (2004) Green -H3, yellow - H4, red - H2A, pink - H2B. Dark and light blue - DNA Nucleosome DNA - 146 base pairs, wrapped 1.7 times in a left-handed superhelix Proteins - two copies of each Histones H2A, H2B, H3 and H4. Higher organisms have linker H1 histone Histone variants H3 variants: H3.3 - transcribed CENP-A - centromeres H2A variants: H2A.X - DNA damage macroH2A - X chromosome H2A.Z - transcribed regions

Chro ma tin o rg a niza tio n ha s multiple struc tura l la ye rs a nd o rg a nize s c hro ma tin into “do ma ins” Bo th DNA me thyla tio n a nd c hro ma tin ma rks c o nta in impo rta nt func tio na l info rma tio n

Histo ne T a il Mo dific a tio ns Sims III et al., 2003

We c an obse r ve c hr omatin mar ks and othe r ge nome assoc iate d pr ote ins using ChIP- se q H3K 4me 3 RNA Po l I I

s. a ) b ) hE SC ChI P-se q De te c tion of Class I (ac tive ) and Class II (poise d) e nhanc e r re a d de nsity pro file s we re g e ne ra te d fo r the indic a te d histo ne mo dific a tio ns c e nte re d o n p300-b o und re g io ns in the to p 1000 Cla ss I a nd Cla ss I I e nha nc e rs, re spe c tive ly. c ) hE SC Na no g ChI P-se q sho ws tha t Na no g b inds a t the thre e pre dic te d Cla ss I I e nha nc e r po sitio ns ne a r the CDX2 g e ne

2. Learning chromatin states

Can we find late nt state to e xplain obse r ve d mar ks? Roadmap Epigenomics Consortium et al. Nature 518 , 317-330 (2015) doi:10.1038/nature14248

Hidde n Mar kov Mode ls Hidde n sta te x in [1 .. m] F o r e xa mple , m c a n 15 E mitte d symb o l y c a n b e multi dime nsio na l F o r e xa mple , histo ne a nd a c c e ssib ility da ta a t g e no mic lo c us t One no de e ve ry 200b p do wn g e no me Pa ra me te rs a re P(x t+1 | x t ), P(y t | x t )

Hidde n Mar kov Mode ls c an be use d to c r e ate late nt state s that ge ne r ate c hr omatin mar ks Hidde n Ma rko v Mo de l (Chro mHMM) Divide g e no me into 200b p windo ws Hidde n sta te fo r a 200b p windo w mo de ls wha t histo ne ma rks a re pre se nt in the windo w Unsupe rvise d – re sulting sta te s must b e inte rpre te d with inde pe nde nt da ta T he numb e r o f sta te s is fixe d a nd is a mo de ling de c isio n

ChromHMM Model Parameter Visualization. Hoffman M M et al. Nucl. Acids Res. 2013;41:827-841 P(y t | x t ) P(x t+1 | x t )

Chr omHMM se gme nt base d c hr omatin state s

Tissues and cell types profiled in the Roadmap Epigenomics Consortium. Roadmap Epigenomics Consortium et al. Nature 518 , 317-330 (2015) doi:10.1038/nature14248

Roadmap Epigenomics Consortium et al. Nature 518 , 317-330 (2015) doi:10.1038/nature14248

3. Predicting chromatin state from sequence

DeepSea learns TF binding, accessibility, and chromatin marks 125 DNa se fe a ture s, 690 T F fe a ture s, 104 17% o f g e no me histo ne fe a ture s 690 T F b inding pro file s fo r 160 thre e diffe re nt T F s, 125 c o nvo lutio n DHS pro file s a nd la ye rs with 320, 104 histo ne -ma rk 480 a nd 960 pro file s ke rne ls Chr 8 a nd 9 1000 b p windo w e xc lude d

DeepSea can predict differentially accessible regions based upon SNP value

An ensemble logistic regression classifier based on DeepSea output can identify regulatory variants

4. Three-dimensional interactions

HiC, HiChip, a nd ChI A-PE T da ta re ve a l dista l g e no me inte ra c tio ns

E nhanc e r s r e gulate distal tar ge t ge ne s by ge nome looping E nha nc e r Ma ste r Re g ula to rs Me dia to r Co he sin Po l I I Ge ne

in situ HiC identifies proximal genomic contacts Ce ll. 2014 De c 18; 159(7): 1665–1680.

in situ HiC reveals interactions at 1 – 5 KB resolution

Observed interchromosomal interaction distances fall off exponentially

ChIA-PET identifies protein mediated interactions and improves resolution for those events

ChIA-PET data are consistent with HiC data

ChIA-PET discovered enhancer linkages

Issue s with ChIA- PE T 1. Hig h fa lse ne g a tive ra te . L ib ra rie s pro duc e d a re no t c o mple x e no ug h to pe rmit furthe r disc o ve ry b y a dditio na l se q ue nc ing . 2. Spe c ific to a pro te in (RNA Po lyme ra se I I in o ur e xa mple ) 3. Hi-C a nd de riva tive s ma y so lve the se pro b le ms e ve ntua lly

HiChIP identifies protein mediated interactions

HiChIP is more sensitive than ChIA-PET

HiChIP and ChIA-PET interactions compared Smc1a antibody (part of cohesion complex)

XIST promoter interactions show more support from HiChIP than Hi-C

HiChIP (Smc1a) is more sensitive than HiC

5a. Discovering interactions: Anchor-based

Method 1: Discover anchors using ChIP-seq methods Given anchors, what is the chance of observing an interaction by chance? N to ta l e nds I a ,b inte ra c tio ns o b se rve d c a e nds c b e nds

What is the chance of observing an interaction by chance? N to ta l e nds I a ,b inte ra c tio ns o b se rve d c a e nds c b e nds

E stimating total e ve nts fr om ove r lap I ma g ine we pe rfo rm two b io lo g ic a l re plic a te s o f a n e xpe rime nt a nd o b ta in 1000 e ve nts in e a c h, o f whic h 900 a re ide ntic a l. We c a n use a hype rg e o me tric mo de l to infe r ho w ma ny po ssib le e ve nts e xist ( N ) g ive n two sa mple size s ( m a nd n ) a nd a n o ve rla p ( k ): Using this mo de l, we pre dic t ~1100 to ta l e ve nts

Appr oximate c lose d for m solution for total numbe r of e ve nts T he ML e stima te o f N is a ppro xima te ly: One wa y to se e this is b y using the no rma l a ppro xima tio n o f the b ino mia l a ppro xima tio n to the hype rg e o me tric distrib utio n:

5b. Discovering interactions: Density-based

Me tho d 2: CI D use s de nsity-b a se d c luste ring to disc o ve r c hro ma tin inte ra c tio ns Nucleic Acids Research, 14 February 2019, gkz051, https://doi.org/10.1093/nar/gkz051 • Figure 1. CID uses density-based clustering to discover chromatin interactions. (A) ChIA-PET interactions can be discovered as groups of dense arcs connecting two genomic regions. Each arc is a PET. (B) The PETs plotted on a two-dimensional map using the genomic coordinates of the two reads. Each point is a PET. The colors represent the density values, defined as the number of PETs in the neighborhood. The red dashed square represents the size of the neighborhood. (C) The clustering decision graph. Each point is a PET. The points with high density and high delta values are selected as cluster centers. For simplicity, only large clusters are labelled. (D) The read pairs are assigned to the nearest cluster centers. The clusters are labeled as in (C). (E) The clusters are visualized as arcs. The clusters are labeled as in (C) and (D).

Method 2: Density cluster interaction origins https:/ / a c a de mic .o up.c o m/ na r/ a dva nc e -a rtic le / do i/ 10.1093/ na r/ g We use a thre e -c o mpo ne nt mixture mo de l to de sc rib e c o nditio na l distrib utio n o f PE T -c o unt fro m a ll the PE T c luste rs. One c o mpo ne nt re pre se nts true inte ra c tio n PE T c luste r (T iPC), a nd the o the r two fo r ra ndo m c o llisio n PE T c luste r (Rc PC) a nd ra ndo m lig a tio n PE T c luste r (RlPC), re spe c tive ly. T iPC a nd Rc PC mo de ls inc lude d a ,b dista nc e b e twe e n c luste rs https:/ / a c a de mic .o up.c o m/ b io info rma tic s/ a rtic le / 31/ 23/ 3

Cluster interaction origins

Jaccard coefficient – measure of set similarity

CID is more reproducible and sensitive

6. Predicting enhancer-promoter interactions

TargetFinder uses multiple data types to predict HiC interactions https:/ / www.na ture .c o m/ a rtic le s/ ng .3539

TargetFinder Training Data

TargetFinder – Ratio of the CTCF and RAD21 ChIP-seq signals occurring within interacting enhancers and non- interacting enhancers

TargetFinder – Enrichment of signals at transcription start sites (TSS) Da rk – inte ra c ting ; L ig ht – no n-inte ra c ting

TargetFinder – Performance F e a ture s fo r e nha nc e rs a nd pro mo te rs o nly (E / P), e xte nde d e nha nc e rs a nd pro mo te rs (E E / P), a nd e nha nc e rs a nd pro mo te rs plus the windo ws b e t

Deep learning network for predicting enhancer-promoter interactions

Computational Systems Biology Deep Learning in the Life Sciences - PowerPoint PPT Presentation

Computational Systems Biology Deep Learning in the Life Sciences 6.802 6.874 20.390 20.490 HST.506 David Gifford Lecture 10 March 12, 2019 Histone Marks Chromatin 3D Structure http://mit6874.github.io 1 Goals for today Chromatin

Deep Computing in Biology Challenges and Progress Ajay K. Royyuru Computational Biology Center

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Computational Systems Biology Deep Learning in the Life Sciences 6.802 6.874 20.390 20.490

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Computational Systems Biology Deep Learning in the Life Sciences 6.802 6.874 20.390 20.490

Computational Systems Biology Deep Learning in the Life Sciences 6.802 6.874 20.390 20.490

Basics of Molecular biology Molecular biology is the study of biology at molecular level.

2019-20 DNA Biology New Products RNA Biology PROTEIN Biology MOLECULAR Biology Plant DNA

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

1. Introduction to Molecular & Systems Biology EECS 600: Systems Biology &

Methods Updating Variables Console Programs int life = 42; life life = 42 life; 21 life =

Computational Systems Biology Deep Learning in the Life Sciences 6.802 6.874 20.390 20.490

Computational Systems Biology Deep Learning in the Life Sciences 6.802 20.390 20.490 HST.506

Computational Systems Biology Deep Learning in the Life Sciences 6.802 6.874 20.390 20.490

Introduction to HPC, Leon Kos, UL PRACE Autumn School 2013 - Industry oriented HPC simulations,

CONVEXIFICATION AND GLOBAL OPTIMIZATION Nick Sahinidis University of Illinois at

Aveek Das Objective of Linear Regression is to minimize the mean square error For the

Decoding Chromatin States with Epigenome Data 02-715 Advanced Topics in

Synchronism vs asynchronism in Boolean automata networks Sylvain Sen MOVE seminar 18th

M ODELLING OF B IOCHEMICAL N ETWORKS WITH T IME P ETRI N ETS Monika Heiner Brandenburg University

Stefano Ceri Politecnico di Milano 1 The Big Approach in the pharma sector Bayer, From

Ed Ed Education and Education and ti ti d d Developm ent in Saudi Developm ent in Saudi p

Sambuz

Useful Links

Newsletter

Mail Us

Computational Systems Biology Deep Learning in the Life Sciences - PowerPoint PPT Presentation

Computational Systems Biology Deep Learning in the Life Sciences 6.802 6.874 20.390 20.490 HST.506 David Gifford Lecture 10 March 12, 2019 Histone Marks Chromatin 3D Structure http://mit6874.github.io 1 Goals for today Chromatin

Deep Computing in Biology Challenges and Progress Ajay K. Royyuru Computational Biology Center

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Computational Systems Biology Deep Learning in the Life Sciences 6.802 6.874 20.390 20.490

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Computational Systems Biology Deep Learning in the Life Sciences 6.802 6.874 20.390 20.490

Computational Systems Biology Deep Learning in the Life Sciences 6.802 6.874 20.390 20.490

Basics of Molecular biology Molecular biology is the study of biology at molecular level.

2019-20 DNA Biology New Products RNA Biology PROTEIN Biology MOLECULAR Biology Plant DNA

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

1. Introduction to Molecular &amp; Systems Biology EECS 600: Systems Biology &amp;

Methods Updating Variables Console Programs int life = 42; life life = 42 life; 21 life =

Computational Systems Biology Deep Learning in the Life Sciences 6.802 6.874 20.390 20.490

Computational Systems Biology Deep Learning in the Life Sciences 6.802 20.390 20.490 HST.506

Computational Systems Biology Deep Learning in the Life Sciences 6.802 6.874 20.390 20.490

Introduction to HPC, Leon Kos, UL PRACE Autumn School 2013 - Industry oriented HPC simulations,

CONVEXIFICATION AND GLOBAL OPTIMIZATION Nick Sahinidis University of Illinois at

Aveek Das Objective of Linear Regression is to minimize the mean square error For the

Decoding Chromatin States with Epigenome Data 02-715 Advanced Topics in

Synchronism vs asynchronism in Boolean automata networks Sylvain Sen MOVE seminar 18th

M ODELLING OF B IOCHEMICAL N ETWORKS WITH T IME P ETRI N ETS Monika Heiner Brandenburg University

Stefano Ceri Politecnico di Milano 1 The Big Approach in the pharma sector Bayer, From

Ed Ed Education and Education and ti ti d d Developm ent in Saudi Developm ent in Saudi p

Sambuz

Useful Links

Newsletter

Mail Us

1. Introduction to Molecular & Systems Biology EECS 600: Systems Biology &