Not Just a Black Box: Interpretable Deep Learning for Genomics Avan> Shrikumar, Peyton Greenside, Anshul Kundaje Peyton Anshul 1
With great power comes really poor interpretability… Deep Interpretable Deep Power Learning Learning Traditional machine learning Classical statistics 2 Interpretability
Example biological problem: understanding stem cell differen=a=on liver cells cardiac cells fer=lized egg blood cells Cell-types are different because different genes are turned on How is cell-type-specific gene expression controlled? Ans: “control elements” that show cell-type-specific openness 3
“control elements” show >ssue-specific openness Most of the genome exists in a closed state… Most “controller” proteins can’t bind “histone” proteins act like spools closed DNA that the DNA winds around …except for cell-type specific open control elements “Controller” proteins bind to DNA paUerns present in these “control elements” …which then ac=vate nearby genes Figures from Shlyueva et al., 4 Nature Reviews Gene/cs, 2014
89%* of disease-associated muta=ons occur outside of genes Figures from Shlyueva et al., Nature Reviews Gene/cs, 2014 Many muta=ons have no effect! Which posi=ons in controller sites are important? Interpret the model Predict openness Experimentally to learn important from sequence using measure cell-type posi=ons! deep learning specific openness 5 *Stranger et al ., Genet. , 2011
Overview of deep learning model Accessible Open in cell- Output: Open (+1) vs Open in cell- type X in HSCs type Y not open (0) Later layers build on paUerns of previous layer Computer Learned paUern vision detectors G A T A A C C G A T A T C 1 1 0 1 1 0 0 0 0 1 0 0 0 A 0 0 0 0 0 1 1 0 0 0 0 1 0 C 0 0 0 0 0 0 0 1 0 0 0 0 1 G 0 0 1 0 0 0 0 0 1 0 1 0 0 T 6 Input: DNA sequence represented as ones and zeros
Ques>ons for the model • Which posi=ons in the DNA sequence are the important ones? • What are the recurring paUerns in the DNA? 7
Ques>ons for the model • Which posi=ons in the DNA sequence are the important ones? • What are the recurring paUerns in the DNA? 8
How can we iden=fy important nucleo=des? ? Open in cell- Open in tcell- In-silico type X type Y mutagenesis G A T A A C C G A T A T C …................................ A C A C G G T T T Alipanahi et al, 2015 9 Zhou & Troyanskaya, 2015
Satura=on problem illustrated y = (i 1 + i 2 ) when (i 1 + i 2 ) < 1 = 1 when (i 1 + i 2 ) >= 1 =1 y = 1 - h h 1 y h = max(0, 1 – i 1 – i 2 ) 2 1 0 i 1 + i 2 i 1 =1 i 2 =1 0 10
“Backpropaga=on” based approaches Examples - Gradients Open in cell- Open in cell- Open in cell- type X type X type Y (Simonyan et al.) - DeepLIFT github.com/kundajelab/deeplip G A T C G A A A G A T A A C C G A T A T C C C 1 1 0 1 1 0 0 0 0 1 0 0 0 A 0 0 0 0 0 1 1 0 0 0 0 1 0 C 0 0 0 0 0 0 0 1 0 0 0 0 1 G 0 0 1 0 0 0 0 0 1 0 1 0 0 T 11 Input: DNA sequence represented as ones and zeros
Satura=on revisited y = (i 1 + i 2 ) when (i 1 + i 2 ) < 1 = 1 when (i 1 + i 2 ) >= 1 y = 1 - h When (i 1 + i 2 ) >= 1, gradient is 0 h h = 1 max(0, 1 – i 1 – i 2 ) y 2 1 0 i 1 + i 2 i 1 i 2 12
The DeepLIFT solu=on: difference from reference y = (i 1 + i 2 ) when (i 1 + i 2 ) < 1 = 1 when (i 1 + i 2 ) >= 1 Reference: i 1 =0 & i 2 =0 y = 1 - h h=1 when (i 1 + i 2 ) = 0 (reference) h 1 h = At (i 1 + i 2 ) = 2, max(0, 1 – i 1 – i 2 ) the “difference from reference” is -1, NOT 0 1 2 0 i 1 + i 2 i 1 i 2 13
DeepLIFT generalizes to other func=on types… “difference from reference” is +0.5 when Sigmoid is 0.5 inputs is >> 0 when input is 0 (assuming reference input of 0) 14
Reference maUers! Sugges>ons on how to pick a CIFAR10 model, class = “ship” DeepLIFT reference : Reference Original scores - MNIST: background (all zeros) - Genomics: - Average frequency of ACGT in background set - mul=ple references generated by shuffling the original sequence 15
Example failure-mode 2: “min” (AND) rela=on y = i 1 – h 2 -1 h 2 = max(0, h 1 ) 1 h 1 = i 1 -i 2 -1 1 y = i 1 – max(0, i 1 – i 2 ) i 1 i 2 = min(i 1 , i 2 ) à gradient 0 for either i 1 or i 2 16
DeepLIFT idea 2: consider different orders for posi=ve and nega=ve terms y = max(0, i 1 – i 2 ) = max(0, 10-6) = 4 i 1 = 10, i 2 = 6 Standard breakdown (gradient*input): Equally-valid alterna=ve breakdown: 4 = (10 from i 1 ) + (-6 from i 2 ) 4 = (4 from i 1 ) + (0 from i 2 ) max(0, i 1 - i 2 ) max(0, i 1 - i 2 ) Average: -6 4 = (7 from i 1 ) + (-3 from i 2 ) +10 i 1 - i 2 4 i 1 - i 2 i 1 - i 2 i 1 =10 i 2 =6 0 i 2 =6 i 1 =10 Would get this breakdown even with y = i 1 – i 2 It doesn’t leverage the nonlinearity; gradients are *local* 17
Example failure-mode 2: “min” (AND) rela=on y = i 1 – h 2 -1 h 2 = max(0, h 1 ) 1 h 1 = i 1 -i 2 -1 1 y = i 1 – max(0, i 1 – i 2 ) i 1 i 2 = min(i 1 , i 2 ) à gradient 0 for either i 1 or i 2 à DeepLIFT gives 50% importance to each of i 1 and i 2 18
Eg: morphing 8 to a 3 or a 6 original 8->3 8->6 Backprop Guided Integrated gradients DeepLIFT 19
Case study: understanding “control elements” of blood cell types Publicly available “openness” data (Corces & Buenrostro et al., 2016) Hematopoe=c stem cell Peyton Greenside 20 White blood cell Red blood cell
Cell-type-specific use of “controller” sequence in HSC, B-cells and Erythroid No peak Openness signal Importance in Erythroid Importance in B-cells No peak No data available Importance in HSC’s SPI1 protein binding signal Protein not present in cell No data available GATA1 protein binding signal Gata Gata Gata SPI1 HSC’s B-cells Peyton Greenside 21 Erythroid
Ques>ons for the model • Which posi=ons in the DNA sequence are the important ones? • What are the recurring paUerns in the DNA? 22
Naïve idea: look at individual paUern detectors Individual GATA paUern detectors mo=fs found by DeepBind (Alipanahi et al.) Computer vision Problem: High levels of redundancy, because mul>ple neurons cooperate with each other 23
How do we combine the contribu=ons of mul=ple paUern detectors to find consolidated paUerns? Insight: input-level importance scores reveal combined contribu=ons score Sequence 1 score Sequence 2 score Sequence 3 MoDISco: Mo =f D iscovery from I mportance Sco res 24
Case-study: Predic=ng Nanog binding in embryonic stem cells Nanog protein 94% auROC on held-out test-set Foreground: 1000s of sequences bound by Nanog in embryonic stem cells vs. Background: Open regions in embryonic stem cells Nanog DNA-binding signal 25
Learning reoccurring paUerns Publicly available paUerns Single MoDISco feature (Kheradpour et al.) beUer predicts Nanog binding than all 4 other features combined Corresponding Result of MoDISCo 26
In development: Discover dependencies with “Delta DeepLIFT” Peyton Greenside (“Gata” paUern) (“Tal” paUern) 0-2 0-2 Simula=on: random background sequence with Posi=ve set: at least one and at least one Iden=fy sequences with one and one Mutate the 27
Summary • DeepLIFT: can reveal cell-type-specific importance of posi=ons at “control elements” – With advantages over gradients/in-silico mutagenesis – hUps://github.com/kundajelab/deeplip • MoDISco: Mo=f Discovery from Importance Scores • Broader and more consolidated mo=fs compared to other approaches • Delta DeepLIFT to iden=fy dependencies
Peyton Greenside Nasa SinnoU-Armstrong Anna Shcherbina Anshul Kundaje Irene Kaplow Johnny Israeli Chuan Sheng Foo Funding Nathan Boley Maryna Taranova Oana Ursu HHMI Interna=onal Student Research Fellowship Daniel Kim Bio-X fellowship Chris Probert Microsop Women’s Fellowship Jin-Wook Lee Michael Wainberg Rahul Mohan NIH R01ES02500902
and foreground: both Missing “GATA” paUern gradient*inp DeepLIFT Peyton Greenside Mo=fs from Kheradpour et al.
Consolidated MoDISco paUerns don’t lose info. rela=ve to fragmented paUerns #paUerns=32 #paUerns=4 Logis=c regression on top hits to each paUern, auROC 5 known paUerns Top 4 de-novo All 32 4 known paUerns 4 MoDISco de-novo from (ENCODE db) from tradi=onal (HOMER db) Mo=fs tradi=onal method method (HOMER) (HOMER)
Mo=f discovery works on con=nuous signals Model trained to predict binding of CTCF protein from sequence + accessibility signal (DNase) DNA sequence paUern “Footprint” paUern in accessibility signal from experiment Nasa SinnoU-Armstrong Chuan Sheng Foo (DNase)
Example failure-mode 2: thresholding output output = max(0, input – 10) “difference from reference” input 0 10 DeepLIFT contribu>on grad*Δinput (taylor) output minus gradient “reference output” if “reference”=0 10 1 input input input 0 0 10 0 10 10
Recommend
More recommend