Incremental and Approximate Inference for Faster Occlusion-based - PowerPoint PPT Presentation

Incremental and Approximate Inference for Faster Occlusion-based Deep CNN Explanations Supun Nakandala , Arun Kumar, and Yannis Papakonstantinou University of California, San Diego

Introduction Deep Convolutional Neural Networks (CNNs) are revolutionizing many image analytics tasks Surveillance Autonomous Vehicles 2

Background: What is a CNN? Input Image Predict Class Series of Convolution Layer Transformations 3-D array Probability - … P(pneumonia) Convolution Layer Input 3-D Output 3-D Array Array *Simplified representation of a CNN 3

Explainability of CNN predictions is important in many critical applications such as in healthcare! 4

How to Explain CNN Predictions? An active research area Occlusion-based explanation (OBE) is widely used by practitioners 5

Occlusion-based Explanations (OBE) Original Image Occluded Image P(pneumonia) Occlusion heatmap Oc localizes the region of interest 6 Source: http://blog.qure.ai/notes/visualizing_deep_learning

Problem: OBE is Highly Time Consuming CNN Inference is time consuming. E.g. Inception3: 35 MFLOPS, ResNet152 : 65 MFLOPS Possibly … thousands Can take between several seconds to several minutes! Cast OBE as a query optimization task Our Idea Database-inspired optimization techniques 7 *MFLOPS: Mega Floating Point Operations

Outline 1. Background on CNN Internals 2. Incremental CNN Inference Inspired by MQO + IVM 3. Approximate CNN Inference Inspired by AQP + Vision Science 4. Experimental Results *IVM: Incremental View Maintenance *MQO: Multi-Query Optimization *AQP: Approximate Query Processing 8

Background: What is a CNN? Input Image Predict Class Series of Convolution Layer Transformations 3-D array Probability - … P(pneumonia) Convolution Layer Input 3-D Output 3-D Array Array *Simplified representation of a CNN 9

Background: Convolution Layer 3-D Filter Kernel 2-D slice of Output (learned) x x x x x x K1 x x x SUM( K1 o X’) x x x X x x x x x x K2 Input 3-D Array x x x x x x o : Hadamard product x x x Kn 10 Convolution Layer

Reimagining Convolution as a Query Input: A SELECT A.X AS X, A.Y AS Y, K.N AS Z, SUM(A.V * K.V) AS V X Y Z V FROM K, (SELECT A.X , A.Y, A.Z, A.V, A.X - T.X + FW /2 AS A_K.X, A.Y - T.Y + FH /2 AS A_K.Y FROM A, A AS T Filter Kernels: K WHERE ABS(A.X - T.X) <= FW /2 X Y Z N V AND ABS(A.Y - T.Y) <= FH/2) WHERE A_K.X = K.X AND A_K.Y = K.Y GROUP BY A.X, A.Y, K.N CNN performs series of Joins and Aggregates Takeaway: Linear Algebra data model improves hardware utilization *FW: Filter Width, FH: Filter Height 11

Outline 1. Background on CNN Internals 2. Incremental CNN Inference Inspired by MQO + IVM 3. Approximate CNN Inference Inspired by AQP + Vision Science 4. Experimental Results *IVM: Incremental View Maintenance *MQO: Multi-Query Optimization *AQP: Approximate Query Processing 12

Observation: Redundant Computations - Same Different Multiple such occluded images This is a new instance of the Incremental View Maintenance task in databases Layer 2 Layer N Layer 1 Geometric properties of CNN determines how to propagate changes 13 * Only a cross section is shown. Changed region spans the depth dimension

Our Solution: Incremental Inference Cast OBE as a set of sequence of “queries” Original Image - P(pneumonia) Materialized Views Algebraic Framework for Incremental Propagation: No redundant computations Multiple occluded images. Sequential execution throttles the - P(pneumonia) performance, especially on GPUs! IVM 14

Our Solution: Batched Incremental Inference Share and reuse materialized views across all occluded images - Multiple IVM queries run in one go (form of MQO). We create a custom GPU kernel for parallel memory copies. Improves hardware utilization. Read additional context … 15

What speedups can we expect? Theoretical speedups for popular deep CNNs with our IVM Issue: “Avalanche E ff ect” causes low speedups in some CNNs 16

Outline 1. Background on CNN Internals 2. Incremental CNN Inference 3. Approximate CNN Inference Inspired by AQP + Vision Science 4. Experimental Results 17

Approximate CNN Inference Trade o ff visual quality of heatmap to reduce Basic Idea: runtime How do we quantify new heatmap quality? Structural similarity index (SSIM) 1.0 : identical Exact heatmap SSIM … -1.0 SSIM values close to 0.9 are widely used Approximate heatmap 18

Overview of our Approximations a. Projective Field Thresholding Combats Avalanche E ff ect by pruning computations Makes every query in OBE faster b. Adaptive Drill-down Lower granularity queries for less sensitive regions Reduces total number of queries in OBE 19

Avalanche Effect Example: 1-D Convolution Gains from Incremental Inference diminishes at latter layers! Projective Field Filter Kernel 20

Our Solution: Projective Field Thresholding Projective Field Number of di ff erent Threshold paths: 7 τ = 5/9 6 6 3 3 1 1 21

How do we pick τ ? Visual Runtime Heatmap Depends on Image and CNN Properties Quality τ =1.0 τ =0.8 τ =0.6 τ =0.4 Auto tune τ for SSIM target using a sample image set Done once upfront during system configuration 22

Outline 1. Background on CNN Internals 2. Incremental CNN Inference 3. Approximate CNN Inference a. Projective Field Thresholding b. Adaptive Drill-down 4. Experimental Results 23

Workload Images Chest X-Ray Images Task Predicting Pneumonia CNNs VGG16, ResNet18, Inception3 Occluding Patch Black Color Occluding Patch 16 x 16 Size Occluding Patch 4 Stride SSIM Target 0.90 More datasets in the paper 24

Experimental Setup CPU Intel i7 @ 3.4 GHz GPU One Nvidia Titan Xp Memory 32 GB Deep Learning Toolkit PyTorch version 0.4.0 25

Naive Incremental (Exact) Incremental + Approximate GPU 0.7x 12 9 Runtime (s) 6 2.3x 3 3.9x 1.6x 8.6x 3.1x 0 VGG16 ResNet18 Inception3 CPU 18 Runtime (min) 13.5 1.5x 9 3.7x 5.4x 4.5 13.8x 2.1x 4.9x 0 26 VGG16 ResNet18 Inception3

Summary Explaining CNN predictions is important. OBE is widely used. DB-inspired incremental and approximate inference optimizations to accelerate OBE. Our optimizations make OBE more amenable to interactive diagnosis of CNN predictions. Project Web Page: https://adalabucsd.github.io/krypton.html Video: https://tinyurl.com/y2oy9hqq snakanda@eng.ucsd.edu 27

System Architecture Flow of Data Invokes 0: Invoke incremental inference. Krypton 1: Initialize the input tensors, kernel weights and output buffer Python 0 in the GPU memory. PyTorch 2: Invoke the Custom Kernel Interface (written in C) using FFI Python Python foreign function interface (FFI) support. Pass memory 2 Custom Kernel Interface C references of input tensors, kernel weights and output buffer. 3 3: Forward the call to the Custom Kernel Implementation Custom Kernel Impl. 1 (written in CUDA). 7 Cuda 4: Parallely copy the memory regions from the input tensor to 5 an intermediate memory buffer. 4 cuDNN Library 5: Invoke the CNN transformation using cuDNN. 6 6: cuDNN reads the input from intermediate buffer and writes GPU Memory the transformed output to the output buffer. 7: Read the output to the main memory or pass reference as the input to the next transformation. 28

Read Context Padding Output Input 0 0 0 0 Updated patch in the output Input patch that needs to be read in to the transformation operator Updated patch in the input Filter kernel 29

Incremental and Approximate Inference for Faster Occlusion-Based Deep CNN Explanations Supun Nakandala, Arun Kumar, and Yannis Papakonstantinou University of California, San Diego Observation Explainability of CNN predictions is important. Occlusion-based explainability (OBE) is widely used. Problem OBE is highly compute intensive. This Work Cast OBE as an instance of view-materialization problem. Perform incremental and approximate inference. ~5x and ~35x speedups for exact and approximate heatmaps. 30

Incremental and Approximate Inference for Faster Occlusion-based - PowerPoint PPT Presentation

Incremental and Approximate Inference for Faster Occlusion-based Deep CNN Explanations Supun Nakandala , Arun Kumar, and Yannis Papakonstantinou University of California, San Diego Introduction Deep Convolutional Neural Networks (CNNs) are

FASTER TRANSFORMER Bo Yang Hsueh, 2019/12/18 AGENDA What is Faster Transformer Introduce the

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Approximate inference: Sampling methods Probabilistic Graphical Models Sharif University of

Bayesian networks: approximate inference Machine Intelligence Thomas D. Nielsen September 2008

Two Approximate- Programmability Birds, One Statistical- Inference Stone Adrian Sampson

Faster Parallel Algorithm for Approximate Shortest Path Jason Li (CMU) STOC 2020 March 2, 2020

Incremental Garbage Collection Part II Roland Schatz Incremental Garbage Collection p.1/22

Approximate Inference: Randomized Methods October 15, 2015 Topics Hard Inference

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Travel Time Estimation using Approximate Belief States on a Hidden Markov Model Walid Krichene

Approximate Nearest Neighbors Search Approximate Nearest Neighbors Search in High Dimensions in

Approximate Computing Is Dead; Long Live Approximate Computing Adrian Sampson Cornell Hardware

Approximate inference on graphical models: variational methods Alexandre Bouchard-C ot e

Approximate Bayesian inference for latent Gaussian models avard Rue 1 H Department of

Batch-Incremental vs. Instance-Incremental Learning in Dynamic and Evolving Data Jesse Read 1 ,

Incremental and Non-incremental Learning of Control Knowledge for Planning Daniel Borrajo Mill

Speeding Up Data Science: From a Data Management Perspective Jiannan Wang Database System Lab

Database Learning Yongjoo Park Our Goal: reuse the work. Users Database query Answer to query

Database Learning: Toward a Database that Becomes Smarter Over Time Yongjoo Park Our Goal: reuse

Insights of Approximate Query Processing Systems Presented by: Huanyi Chen Ruoxi Zhang Agenda

DATA ANALYTICS USING DEEP LEARNING GT 8803 // SIDDHARTH BISWAL L E C T U R E # 0 3 : B L A Z E

Anticoagulation Services at Sandwell and West Birmingham Hospitals NHS Trust Joanne Malpass and

Concentrated Dark Matter and PBHs Scott Watson ( Syracuse University ) Based on: Concentrated

IAPT PBR Workshop 20 July 2017 Andy Wright, IAPT Clinical Advisor, Rebecca Campbell, Quality