Not Just a Black Box: Interpretable Deep Learning for Genomics - PowerPoint PPT Presentation

Not Just a Black Box: Interpretable Deep Learning for Genomics Avan> Shrikumar, Peyton Greenside, Anshul Kundaje Peyton Anshul 1

With great power comes really poor interpretability… Deep Interpretable Deep Power Learning Learning Traditional machine learning Classical statistics 2 Interpretability

Example biological problem: understanding stem cell differen=a=on liver cells cardiac cells fer=lized egg blood cells Cell-types are different because different genes are turned on How is cell-type-specific gene expression controlled? Ans: “control elements” that show cell-type-specific openness 3

“control elements” show >ssue-specific openness Most of the genome exists in a closed state… Most “controller” proteins can’t bind “histone” proteins act like spools closed DNA that the DNA winds around …except for cell-type specific open control elements “Controller” proteins bind to DNA paUerns present in these “control elements” …which then ac=vate nearby genes Figures from Shlyueva et al., 4 Nature Reviews Gene/cs, 2014

89%* of disease-associated muta=ons occur outside of genes Figures from Shlyueva et al., Nature Reviews Gene/cs, 2014 Many muta=ons have no effect! Which posi=ons in controller sites are important? Interpret the model Predict openness Experimentally to learn important from sequence using measure cell-type posi=ons! deep learning specific openness 5 *Stranger et al ., Genet. , 2011

Overview of deep learning model Accessible Open in cell- Output: Open (+1) vs Open in cell- type X in HSCs type Y not open (0) Later layers build on paUerns of previous layer Computer Learned paUern vision detectors G A T A A C C G A T A T C 1 1 0 1 1 0 0 0 0 1 0 0 0 A 0 0 0 0 0 1 1 0 0 0 0 1 0 C 0 0 0 0 0 0 0 1 0 0 0 0 1 G 0 0 1 0 0 0 0 0 1 0 1 0 0 T 6 Input: DNA sequence represented as ones and zeros

Ques>ons for the model • Which posi=ons in the DNA sequence are the important ones? • What are the recurring paUerns in the DNA? 7

How can we iden=fy important nucleo=des? ? Open in cell- Open in tcell- In-silico type X type Y mutagenesis G A T A A C C G A T A T C …................................ A C A C G G T T T Alipanahi et al, 2015 9 Zhou & Troyanskaya, 2015

Satura=on problem illustrated y = (i 1 + i 2 ) when (i 1 + i 2 ) < 1 = 1 when (i 1 + i 2 ) >= 1 =1 y = 1 - h h 1 y h = max(0, 1 – i 1 – i 2 ) 2 1 0 i 1 + i 2 i 1 =1 i 2 =1 0 10

“Backpropaga=on” based approaches Examples - Gradients Open in cell- Open in cell- Open in cell- type X type X type Y (Simonyan et al.) - DeepLIFT github.com/kundajelab/deeplip G A T C G A A A G A T A A C C G A T A T C C C 1 1 0 1 1 0 0 0 0 1 0 0 0 A 0 0 0 0 0 1 1 0 0 0 0 1 0 C 0 0 0 0 0 0 0 1 0 0 0 0 1 G 0 0 1 0 0 0 0 0 1 0 1 0 0 T 11 Input: DNA sequence represented as ones and zeros

Satura=on revisited y = (i 1 + i 2 ) when (i 1 + i 2 ) < 1 = 1 when (i 1 + i 2 ) >= 1 y = 1 - h When (i 1 + i 2 ) >= 1, gradient is 0 h h = 1 max(0, 1 – i 1 – i 2 ) y 2 1 0 i 1 + i 2 i 1 i 2 12

The DeepLIFT solu=on: difference from reference y = (i 1 + i 2 ) when (i 1 + i 2 ) < 1 = 1 when (i 1 + i 2 ) >= 1 Reference: i 1 =0 & i 2 =0 y = 1 - h h=1 when (i 1 + i 2 ) = 0 (reference) h 1 h = At (i 1 + i 2 ) = 2, max(0, 1 – i 1 – i 2 ) the “difference from reference” is -1, NOT 0 1 2 0 i 1 + i 2 i 1 i 2 13

DeepLIFT generalizes to other func=on types… “difference from reference” is +0.5 when Sigmoid is 0.5 inputs is >> 0 when input is 0 (assuming reference input of 0) 14

Reference maUers! Sugges>ons on how to pick a CIFAR10 model, class = “ship” DeepLIFT reference : Reference Original scores - MNIST: background (all zeros) - Genomics: - Average frequency of ACGT in background set - mul=ple references generated by shuffling the original sequence 15

Example failure-mode 2: “min” (AND) rela=on y = i 1 – h 2 -1 h 2 = max(0, h 1 ) 1 h 1 = i 1 -i 2 -1 1 y = i 1 – max(0, i 1 – i 2 ) i 1 i 2 = min(i 1 , i 2 ) à gradient 0 for either i 1 or i 2 16

DeepLIFT idea 2: consider different orders for posi=ve and nega=ve terms y = max(0, i 1 – i 2 ) = max(0, 10-6) = 4 i 1 = 10, i 2 = 6 Standard breakdown (gradient*input): Equally-valid alterna=ve breakdown: 4 = (10 from i 1 ) + (-6 from i 2 ) 4 = (4 from i 1 ) + (0 from i 2 ) max(0, i 1 - i 2 ) max(0, i 1 - i 2 ) Average: -6 4 = (7 from i 1 ) + (-3 from i 2 ) +10 i 1 - i 2 4 i 1 - i 2 i 1 - i 2 i 1 =10 i 2 =6 0 i 2 =6 i 1 =10 Would get this breakdown even with y = i 1 – i 2 It doesn’t leverage the nonlinearity; gradients are *local* 17

Example failure-mode 2: “min” (AND) rela=on y = i 1 – h 2 -1 h 2 = max(0, h 1 ) 1 h 1 = i 1 -i 2 -1 1 y = i 1 – max(0, i 1 – i 2 ) i 1 i 2 = min(i 1 , i 2 ) à gradient 0 for either i 1 or i 2 à DeepLIFT gives 50% importance to each of i 1 and i 2 18

Eg: morphing 8 to a 3 or a 6 original 8->3 8->6 Backprop Guided Integrated gradients DeepLIFT 19

Case study: understanding “control elements” of blood cell types Publicly available “openness” data (Corces & Buenrostro et al., 2016) Hematopoe=c stem cell Peyton Greenside 20 White blood cell Red blood cell

Cell-type-specific use of “controller” sequence in HSC, B-cells and Erythroid No peak Openness signal Importance in Erythroid Importance in B-cells No peak No data available Importance in HSC’s SPI1 protein binding signal Protein not present in cell No data available GATA1 protein binding signal Gata Gata Gata SPI1 HSC’s B-cells Peyton Greenside 21 Erythroid

Naïve idea: look at individual paUern detectors Individual GATA paUern detectors mo=fs found by DeepBind (Alipanahi et al.) Computer vision Problem: High levels of redundancy, because mul>ple neurons cooperate with each other 23

How do we combine the contribu=ons of mul=ple paUern detectors to find consolidated paUerns? Insight: input-level importance scores reveal combined contribu=ons score Sequence 1 score Sequence 2 score Sequence 3 MoDISco: Mo =f D iscovery from I mportance Sco res 24

Case-study: Predic=ng Nanog binding in embryonic stem cells Nanog protein 94% auROC on held-out test-set Foreground: 1000s of sequences bound by Nanog in embryonic stem cells vs. Background: Open regions in embryonic stem cells Nanog DNA-binding signal 25

Learning reoccurring paUerns Publicly available paUerns Single MoDISco feature (Kheradpour et al.) beUer predicts Nanog binding than all 4 other features combined Corresponding Result of MoDISCo 26

In development: Discover dependencies with “Delta DeepLIFT” Peyton Greenside (“Gata” paUern) (“Tal” paUern) 0-2 0-2 Simula=on: random background sequence with Posi=ve set: at least one and at least one Iden=fy sequences with one and one Mutate the 27

Summary • DeepLIFT: can reveal cell-type-specific importance of posi=ons at “control elements” – With advantages over gradients/in-silico mutagenesis – hUps://github.com/kundajelab/deeplip • MoDISco: Mo=f Discovery from Importance Scores • Broader and more consolidated mo=fs compared to other approaches • Delta DeepLIFT to iden=fy dependencies

Peyton Greenside Nasa SinnoU-Armstrong Anna Shcherbina Anshul Kundaje Irene Kaplow Johnny Israeli Chuan Sheng Foo Funding Nathan Boley Maryna Taranova Oana Ursu HHMI Interna=onal Student Research Fellowship Daniel Kim Bio-X fellowship Chris Probert Microsop Women’s Fellowship Jin-Wook Lee Michael Wainberg Rahul Mohan NIH R01ES02500902

and foreground: both Missing “GATA” paUern gradient*inp DeepLIFT Peyton Greenside Mo=fs from Kheradpour et al.

Consolidated MoDISco paUerns don’t lose info. rela=ve to fragmented paUerns #paUerns=32 #paUerns=4 Logis=c regression on top hits to each paUern, auROC 5 known paUerns Top 4 de-novo All 32 4 known paUerns 4 MoDISco de-novo from (ENCODE db) from tradi=onal (HOMER db) Mo=fs tradi=onal method method (HOMER) (HOMER)

Mo=f discovery works on con=nuous signals Model trained to predict binding of CTCF protein from sequence + accessibility signal (DNase) DNA sequence paUern “Footprint” paUern in accessibility signal from experiment Nasa SinnoU-Armstrong Chuan Sheng Foo (DNase)

Example failure-mode 2: thresholding output output = max(0, input – 10) “difference from reference” input 0 10 DeepLIFT contribu>on grad*Δinput (taylor) output minus gradient “reference output” if “reference”=0 10 1 input input input 0 0 10 0 10 10

Not Just a Black Box: Interpretable Deep Learning for Genomics - PowerPoint PPT Presentation

Not Just a Black Box: Interpretable Deep Learning for Genomics Avan> Shrikumar, Peyton Greenside, Anshul Kundaje Peyton Anshul 1 With great power comes really poor interpretability Deep Interpretable Deep Power Learning Learning

From ML Successes to Applications ICIP18 Tutorial on Interpretable Deep Learning 2 Black Box

Not Just a Black Box: Interpretable Deep Learning for Genomics Presented by: AvanA Shrikumar 1

Paradoxes in Probability How probability continues to amuse me! Let's play a game! Box A Box B

Interpretable sets in o-minimal structures Will Johnson March 27, 2015 Will Johnson

Black Box Scanning Tool + White Box Testing Tool Toshis Black Box Scanning Tool Same

A recipe for black box functors Maru Sarazola and Brendan Fong What is a black box functor? In

Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable

Towards Interpretable Deep Learning for Natural Language Processing Roy Schwartz University of

Interpretable & Transparent Deep Learning Fraunhofer HHI, Machine Learning Group Wojciech

Deep Visual Models with Interpretable Features and Modularized Structures Quanshi Zhang John

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Kid s Box American English Level 1 Presentation Plus: Kid s Box American English Kid s Box

Flux Box Flux Box A concept by Flux Laboratory Flux box : concept Flux box : concept What is Flux

Partial Substitute tong-wang@uiowa.edu Poster #67 A black-box model + High predictive

[7] Gaussian Elimination Starting to peek inside the black box So far sol ve( A, b) is a black

engage@columbus.k12.oh.us REOPEN. REENGAGE. REIMAGINE. Topics Mitigation procedures Education

Ewing Schools Restart and Recovery Plan For the 2020-21 School Year The Ewing Public Schools

e-Learning Forum Asia 2011 Nanyang Technological University Singapore Title: Understanding the

wellbeing in academics GAIL KINMAN UNIVERSITY OF BEDFORDSHIRE Work-related wellbeing in

Davis-Besse Nuclear Power Station IMC 0350 Meeting 1 Davis-Besse Davis-Besse February 11, 2003

Safety of Gas Gathering Pipelines RIN: 2137-AF38 Docket: PHMSA 2011 0023 Gas Pipeline

Community Resiliency Workshop Be Informed Be Prepared Workshops Courtesy The Librarians

Risk Assessment of CO 2 Geologic Sequestration Project Number DE-FE0001112 Jim Lepinski

Not Just a Black Box: Interpretable Deep Learning for Genomics - PowerPoint PPT Presentation

Not Just a Black Box: Interpretable Deep Learning for Genomics Avan> Shrikumar, Peyton Greenside, Anshul Kundaje Peyton Anshul 1 With great power comes really poor interpretability Deep Interpretable Deep Power Learning Learning

From ML Successes to Applications ICIP18 Tutorial on Interpretable Deep Learning 2 Black Box

Not Just a Black Box: Interpretable Deep Learning for Genomics Presented by: AvanA Shrikumar 1

Paradoxes in Probability How probability continues to amuse me! Let's play a game! Box A Box B

Interpretable sets in o-minimal structures Will Johnson March 27, 2015 Will Johnson

Black Box Scanning Tool + White Box Testing Tool Toshis Black Box Scanning Tool Same

A recipe for black box functors Maru Sarazola and Brendan Fong What is a black box functor? In

Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable

Towards Interpretable Deep Learning for Natural Language Processing Roy Schwartz University of

Interpretable &amp; Transparent Deep Learning Fraunhofer HHI, Machine Learning Group Wojciech

Deep Visual Models with Interpretable Features and Modularized Structures Quanshi Zhang John

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Kid s Box American English Level 1 Presentation Plus: Kid s Box American English Kid s Box

Flux Box Flux Box A concept by Flux Laboratory Flux box : concept Flux box : concept What is Flux

Partial Substitute tong-wang@uiowa.edu Poster #67 A black-box model + High predictive

[7] Gaussian Elimination Starting to peek inside the black box So far sol ve( A, b) is a black

engage@columbus.k12.oh.us REOPEN. REENGAGE. REIMAGINE. Topics Mitigation procedures Education

Ewing Schools Restart and Recovery Plan For the 2020-21 School Year The Ewing Public Schools

e-Learning Forum Asia 2011 Nanyang Technological University Singapore Title: Understanding the

wellbeing in academics GAIL KINMAN UNIVERSITY OF BEDFORDSHIRE Work-related wellbeing in

Davis-Besse Nuclear Power Station IMC 0350 Meeting 1 Davis-Besse Davis-Besse February 11, 2003

Safety of Gas Gathering Pipelines RIN: 2137-AF38 Docket: PHMSA 2011 0023 Gas Pipeline

Community Resiliency Workshop Be Informed Be Prepared Workshops Courtesy The Librarians

Risk Assessment of CO 2 Geologic Sequestration Project Number DE-FE0001112 Jim Lepinski

Interpretable & Transparent Deep Learning Fraunhofer HHI, Machine Learning Group Wojciech