with interpretable deep learning
play

with Interpretable Deep Learning Presented by: Avanti Shrikumar - PowerPoint PPT Presentation

Understanding Genome Regulation with Interpretable Deep Learning Presented by: Avanti Shrikumar Kundaje Lab Stanford University Example biological problem: understanding stem cell differentiation liver cells Lung cells fertilized egg


  1. Understanding Genome Regulation with Interpretable Deep Learning Presented by: Avanti Shrikumar Kundaje Lab Stanford University

  2. Example biological problem: understanding stem cell differentiation liver cells Lung cells fertilized egg Kidney cells Cell-types are different because different genes are turned on How is cell-type-specific gene expression controlled? Ans: “regulatory elements” act like switches to turn genes on 1

  3. “Regulatory elements” are switches that turn genes on Sequence contain “DNA patterns” that …and activate nearby genes proteins called transcription factors bind to ACGTGTAACTGATAATGCCGATATT DNA sequence of a gene Regulatory element Transcription factors bind to DNA words Regulatory element + transcription factors loop over … 2

  4. 90%+ * of disease-associated mutations are outside genes! Regulatory element has “DNA patterns” that transcription factors bind to ACGTGTAACTGATAATGCCGATATT DNA sequence of a gene Transcription factors Many positions in a regulatory element are not essential for its function! → Which positions in regulatory elements matter? 2 *Stranger et al ., Genet. , 2011

  5. Q: Which positions in regulatory elements matter? Predict tissue- Interpret the Experimentally specific activity model to learn measure of regulatory regulatory important elements from elements in patterns in the sequence using different tissues input! deep learning 3

  6. Questions for the model - Which parts of the input are the most important for making a given prediction? - What are the recurring patterns in the input? 4

  7. Questions for the model - Which parts of the input are the most important for making a given prediction? - What are the recurring patterns in the input? 4

  8. Overview of deep learning model Active in Active in Accessible in Accessible in Output: Active (+1) vs not Erythroid Liver Lung HSCs active (0) Later layers build on patterns of previous layer Learned pattern detectors G A T A A C C G A T A T C 1 1 0 0 0 0 0 1 1 0 1 A 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 C 0 0 0 0 1 0 0 0 0 0 0 G 0 1 0 1 0 0 0 1 1 0 0 0 0 0 0 T 5 Input: DNA sequence represented as ones and zeros

  9. How can we identify important nucleotides? ? Active in Active in In-silico Liver Lung mutagenesis G A T A A C C G A T A T C A C A …................................ C G G Alipanahi et al, 2015 T T 6 T Zhou & Troyanskaya, 2015

  10. Saturation problem illustrated y o =1 1 y in =2 y o 1 2 0 =1 i 1 i 2 =1 y in = i 1 + i 2 0 Avoiding saturation means perturbing combinations of inputs → increased computational cost 7

  11. “Backpropagation” based approaches Active in Active in Active in Examples Liver Liver Lung - Gradients (Simonyan et al.) - Integrated Gradients (ICML 2017) - DeepLIFT (ICML 2017); https://github.com/kundajelab /deeplift G A T C G A A A G A T A A C C G A T A T C C 1 1 0 0 0 0 0 0 0 1 1 0 1 A 0 0 0 1 0 0 0 1 0 0 0 1 0 C 0 0 0 0 1 0 0 0 0 0 0 G 0 1 0 1 0 0 0 1 1 0 0 0 0 0 0 T 8 Input: DNA sequence represented as ones and zeros

  12. Saturation revisited When (i 1 + i 2 ) >= 1, y o =1 gradient is 0 y in =2 1 y o i 1 =1 =1 i 2 1 2 0 y in = i 1 + i 2 Affects: - Gradients - Deconvolutional Networks - Guided Backpropagation - Layerwise Relevance Propagation 9

  13. The DeepLIFT solution: difference from reference 0 =0 & i 2 0 =0 Reference: i 1 y o =1 0 + i 2 0 =0 as (i 1 0 ) = 0 (reference) y o With (i 1 + i 2 ) = 2, the y in =2 1 y o “difference from reference” (Δy ) is +1, NOT 0 =1 i 1 =1 i 2 1 2 0 y in = i 1 + i 2 Δi 1 =1 Δi 2 =1 C Δi1Δy =0.5=C Δi2Δy Detailed backpropagation rules in the paper 10

  14. DeepLIFT scores at active regulatory element near HNF4A gene Anna Shcherbina Liver Lung Kidney 11

  15. Choice of reference matters! CIFAR10 model, class = “ship” Suggestions on how to pick a DeepLIFT Reference Original scores reference : - MNIST: all zeros (background) - Consider using a distribution of references - E.g. multiple references generated by dinucleotide-shuffling a genomic sequence 12

  16. Integrated Gradients: Another reference-based approach y =0 1 y =0.0 i 1 i 2 =0.0 1 2 0 i 1 + i 2 i 1 i 2 dy/di x i 1 i 2 dy/di x 0.0 0.0 1 dy/di x = 1 13

  17. Integrated Gradients: Another reference-based approach y =0 1 y =0.2 i 1 i 2 =0.2 1 2 0 i 1 + i 2 i 1 i 2 dy/di x i 1 i 2 dy/di x 0.0 0.0 1 dy/di x = 1 0.2 0.2 1 13

  18. Integrated Gradients: Another reference-based approach y =0 1 y =0.4 i 1 i 2 =0.4 1 2 0 i 1 + i 2 i 1 i 2 dy/di x i 1 i 2 dy/di x 0.0 0.0 1 dy/di x = 1 0.2 0.2 1 13 0.4 0.4 1

  19. Integrated Gradients: Another reference-based approach y =0 1 y =0.6 i 1 i 2 =0.6 1 2 0 i 1 + i 2 i 1 i 2 dy/di x i 1 i 2 dy/di x 0.6 0.6 0 0.0 0.0 1 dy/di x = 0 0.2 0.2 1 13 0.4 0.4 1

  20. Integrated Gradients: Another reference-based approach y =0 1 y =0.8 i 1 i 2 =0.8 1 2 0 i 1 + i 2 i 1 i 2 dy/di x i 1 i 2 dy/di x 0.6 0.6 0 0.0 0.0 1 dy/di x = 0 0.8 0.8 0 0.2 0.2 1 13 0.4 0.4 1

  21. Integrated Gradients: Another reference-based approach y =0 1 y =1.0 i 1 i 2 =1.0 1 2 0 i 1 + i 2 i 1 i 2 dy/di x i 1 i 2 dy/di x 0.6 0.6 0 0.0 0.0 1 dy/di x = 0 Average dy/di x = 0.5 0.8 0.8 0 0.2 0.2 1 (Average dy/di 1 )*Δi 1 = 0.5 13 (Average dy/di 1 )*Δi 2 = 0.5 1.0 1.0 0 0.4 0.4 1

  22. Integrated Gradients: Another reference-based approach • Sundararajan et al. • Pros: – completely black-box except for gradient computation – functionally equivalent networks guaranteed to give the same result • Cons: – Repeated gradient calc. adds computational overhead – Linear interpolation path between the baseline and actual input can result in chaotic behavior from the network, esp. for things like one- hot encoded DNA sequence 14

  23. - Original: Original one-hot encoded DNA sequences - “Shuffled”: shuffled sequences as “baseline” - Interpolation parameterized by “alpha” from 0 to 1 15

  24. 15

  25. 15

  26. 15

  27. 15

  28. 15

  29. 15

  30. Neural nets can behave unexpectedly when supplied inputs outside the training set distribution 15

  31. Might be why Integrated Gradients sometimes performs worse than grad*input on DNA… Per-position perturbation Region active in cell type “A549” (“In - Silico Mutagenesis”) DeepLIFT Grad*Input Integrated Gradients 16

  32. Integrated Gradients: Another reference-based approach • Sundararajan et al. • Pros: – completely black-box except for gradient computation – functionally equivalent networks guaranteed to give the same result • Cons: – Repeated gradient calc. adds computational overhead – Linear interpolation path between the baseline and actual input can result in chaotic behavior from the network, esp. for things like one- hot encoded DNA sequence – Still relies on gradients, which are local by nature and can give misleading interpretations 17

  33. Failure-case: “min” (AND) relation y = i 1 – h = i 1 – max(0, i 1 – i 2 ) i 1 y = min(i 1 , i 2 ) h = ReLU(i 1 – i 2 ) = max(0, i 1 -i 2 ) i 2 i 1 , i 2 y i 2 < i 1 i 1 – (i 1 -i 2 ) = i 2 i 2 > i 1 i 1 – 0 = i 1 Gradient=0 for either i 1 or i 2 , whichever is larger This is true even when interpolating from (0,0) to (i 1 ,i 2 )! 18

  34. The DeepLIFT solution: consider different orders for adding positive and negative terms i 1 = 10, i 2 = 6 = 10 – ReLU(4) = 6  min(i 1 =10, i 2 =6) y = i 1 – ReLU(i 1 – i 2 ) 19

  35. The DeepLIFT solution: consider different orders for adding positive and negative terms i 1 = 10, i 2 = 6 = 10 – ReLU(4) = 6  min(i 1 =10, i 2 =6) y = i 1 – ReLU(i 1 – i 2 ) Standard breakdown: 4 = (10 from i 1 ) + (-6 from i 2 ) ReLU(i 1 - i 2 ) +10 -6 4 i 1 - i 2 4 i 1 =10 19 i 2 =6

  36. The DeepLIFT solution: consider different orders for adding positive and negative terms i 1 = 10, i 2 = 6 = 10 – ReLU(4) = 6  min(i 1 =10, i 2 =6) y = i 1 – ReLU(i 1 – i 2 ) Standard breakdown: y = 6 = (10 from i 1 ) – [(10 from i 1 ) – (6 from i 2 )] = 6 from i 2 Standard breakdown: 4 = (10 from i 1 ) + (-6 from i 2 ) ReLU(i 1 - i 2 ) +10 -6 4 i 1 - i 2 4 i 1 =10 19 i 2 =6

  37. The DeepLIFT solution: consider different orders for adding positive and negative terms i 1 = 10, i 2 = 6 = 10 – ReLU(4) = 6  min(i 1 =10, i 2 =6) y = i 1 – ReLU(i 1 – i 2 ) Standard breakdown: y = 6 = (10 from i 1 ) – [(10 from i 1 ) – (6 from i 2 )] = 6 from i 2 Standard breakdown: Other possible breakdown: 4 = (10 from i 1 ) + (-6 from i 2 ) 4 = (4 from i 1 ) + (0 from i 2 ) ReLU(i 1 - i 2 ) ReLU(i 1 - i 2 ) +10 -6 4 4 +4 0 i 1 - i 2 i 1 - i 2 4 4 i 2 =6 i 1 =10 19 i 1 =10 i 2 =6

Recommend


More recommend