On Out-of-Distribution Detection Algorithms with Deep Neural Skin - PowerPoint PPT Presentation

I NTRODUCTION M ETHODOLOGY E XPERIMENTS C ONCLUSION On Out-of-Distribution Detection Algorithms with Deep Neural Skin Cancer Classifiers Andre G. C. Pacheco 1 Chandramouli S. Sastry 2, 3 Thomas Trappenberg 2 Sageev Oore 2,3 Renato A. Krohling 1 1 Federal University of Espirito Santo - Vit´ oria, Brazil 2 Dalhousie University - Halifax, Canada 3 Vector Institute - Toronto, Canada { agcpacheco, rkrohling } @inf.ufes.br , cssastry@dal.ca , { tt,sageev } @cs.dal.ca 1

I NTRODUCTION M ETHODOLOGY E XPERIMENTS C ONCLUSION I NTRODUCTION What are out-of-distribution (OOD) samples? ◮ Samples that do not contain any of the labels modeled during training phase 2

I NTRODUCTION M ETHODOLOGY E XPERIMENTS C ONCLUSION I NTRODUCTION Problem: ◮ Deep Neural Softmax classifiers make over-confident predictions for OOD samples ◮ Detecting OOD samples is challenging Objective: ◮ Detecting such OOD samples, in particular for skin cancer classification 3

I NTRODUCTION M ETHODOLOGY E XPERIMENTS C ONCLUSION I NTRODUCTION We examine the performance of the OOD detection algorithms with skin cancer classifiers ◮ State-of-the-art OOD algorithms: ◮ ODIN (Liang et al., 2017) ◮ Mahalanobis (Lee et al., 2018) ◮ Gram-OOD (Sastry and Oore, 2019) ◮ Gram-OOD*: ◮ An extension of the Gram-OOD algorithm that generally performs better for this particular task 4

I NTRODUCTION M ETHODOLOGY E XPERIMENTS C ONCLUSION S UMMARY OF OOD ALGORITHMS ODIN: ◮ Use softmax with temperature as confidence on perturbed inputs. ◮ Needs to fine-tune temperature and perturbation magnitude. Mahalanobis: ◮ Computes layerwise Mahalanobis distances from class-conditional feature distributions. ◮ Mahalanobis distances are used to train a Logistic Regression Detector. ◮ Needs OOD samples to train the Logistic Regression Detector. 5

I NTRODUCTION M ETHODOLOGY E XPERIMENTS C ONCLUSION G RAM M ATRIX OOD DETECTION ◮ Take into account intermediate feature activations ◮ Compute Gram Matrices at every layer and check for anomalously high or low values. ◮ Does not require any knowledge of OOD samples. ◮ Can work with any pre-trained model. 6

I NTRODUCTION M ETHODOLOGY E XPERIMENTS C ONCLUSION G RAM M ATRIX ◮ Let F l refer to the activations at layer l of shape [ C l , H l ∗ W l ] . ◮ Gram Matrix is computed using F l as: G l = F l F ⊤ (1) l ◮ Gram Matrix of Order p is computed as: ⊤ G p l = F p l F p (2) l 7

I NTRODUCTION M ETHODOLOGY E XPERIMENTS C ONCLUSION G RAM M ATRIX AS PAIRWISE CORRELATIONS ◮ Pairwise correlations between feature maps are computed using G p l of various orders 8

I NTRODUCTION M ETHODOLOGY E XPERIMENTS C ONCLUSION L AYERWISE DEVIATION ◮ Lawerwise deviations δ ( D ) are computed from the min and max of G p l w.r.t. the class:  0 if λ l ≤ g l ≤ Λ l   λ l − g l δ l ( λ l , Λ l , g l ) = if g l < λ l | λ l | g l − Λ l  if g l > Λ l  | Λ l | G p G p where λ l = min � � and Λ l = max � � l l 9

I NTRODUCTION M ETHODOLOGY E XPERIMENTS C ONCLUSION T OTAL DEVIATION ◮ The total deviation ( ∆ ) is computing by summing across the deviations of all layers ◮ Normalized by E Va [ δ l ] ◮ The OOD is determined as follows: � if ∆( D ) > τ True isOOD ( D ) = False if ∆( D ) ≤ τ 10

I NTRODUCTION M ETHODOLOGY E XPERIMENTS C ONCLUSION G RAM -OOD* ◮ Normalization of Gram Matrix values G p ˆ l − min(ˆ G p l ) G p ˜ l = (3) max(ˆ G p l ) − min(ˆ G p . l ) ◮ Ensures that the class-conditional bounds values are computed from the same interval regardless the layer ◮ It is possible to consider only activation layers ◮ It does not require higher-order Gram Matrix for skin cancer detection 11

I NTRODUCTION M ETHODOLOGY E XPERIMENTS C ONCLUSION G RAM -OOD* Overview: 12

I NTRODUCTION M ETHODOLOGY E XPERIMENTS C ONCLUSION E XPERIMENTS ◮ In-distributions: ISIC 2019 dataset ◮ Out-of-distributions: a collection of different datasets ◮ Deep models: DenseNet-121, MobileNet-v2, ResNet-50, and VGGNet-16 13

I NTRODUCTION M ETHODOLOGY E XPERIMENTS C ONCLUSION E XPERIMENTS ISIC × all : DenseNet-121 and MobileNet-V2 TNR @ TPR 95% Model OOD Mahalanobis OOD-Gram OOD-Gram* (Unbiased) Derm-Skin 45.7 78.0 76.1 Clin-Skin 68.6 82.8 83.1 ImageNet 92.0 80.7 88.4 DenseNet-121 B-box 92.0 88.0 88.1 B-box-70 100. 99.9 100. NCT 91.6 98.9 99.9 Derm-Skin 32.4 66.7 72.8 Clin-Skin 79.8 77.9 83.8 ImageNet 85.8 84.3 92.4 MobileNet-v2 B-box 88.4 86.9 98.7 B-box-70 98.4 100. 100. NCT 84.7 99.3 100. 14

I NTRODUCTION M ETHODOLOGY E XPERIMENTS C ONCLUSION E XPERIMENTS ISIC × all : ResNet-50 and VGGNet-16 TNR @ TPR 95% Model OOD Mahalanobis OOD-Gram OOD-Gram* (Unbiased) Derm-Skin 36.9 74.8 73.2 Clin-Skin 65.9 84.7 86.3 ImageNet 95.7 86.6 85.8 ResNet-50 B-box 97.6 88.4 99.3 B-box-70 100. 100. 100. NCT 96.9 99.9 100. Derm-Skin 31.7 79.8 77.5 Clin-Skin 66.3 80.7 80.6 ImageNet 72.8 77.6 81.7 VGGNet-16 B-box 85.9 86.5 94.6 B-box-70 93.1 100 100 NCT 85.2 99.7 100. 15

I NTRODUCTION M ETHODOLOGY E XPERIMENTS C ONCLUSION E XPERIMENTS ISIC 2019 Unknown label detection: AUC Average Precision Model Mahalanobis / Gram-OOD / Gram-OOD* DenseNet-121 52.3 / 67.3 / 69.3 20.1 / 28.9 / 31.1 MobileNet-v2 52.9 / 68.7 / 69.5 20.2 / 31.4 / 32.6 ResNet-50 56.1 / 70.4 / 70.2 21.6 / 33.2 / 33.7 VGGNet-16 54.1 / 66.9 / 69.5 20.9 / 30.2 / 32.6 16

I NTRODUCTION M ETHODOLOGY E XPERIMENTS C ONCLUSION C ONCLUSION ◮ Gram-OOD based methods work better than Mahalanobis for the realistic experiment ◮ Gram-OOD* performs better than the original approach for most of OOD datasets ◮ The normalization plays a key role in combining deviations across layers ◮ A good normalizing scheme can yield significant improvements in detection rates and should be explored ◮ Future research: train models that can implicitly detect out-of-distribution samples by taking into account the information contained in the various orders of gram matrices 17

I NTRODUCTION M ETHODOLOGY E XPERIMENTS C ONCLUSION A CKNOWLEDGMENTS We thanks the financial support of: ◮ Coordination for the Improvement of Higher Education Personnel (CAPES) ◮ National Council for Scientific and Technological Development (CNPq) ◮ Foundation for Supporting Research and Innovation in Esp´ ırito Santo (FAPES) ◮ Canadian Institute for Advanced Research (CIFAR) 18

I NTRODUCTION M ETHODOLOGY E XPERIMENTS C ONCLUSION Thank you for your time! https://github.com/paaatcha/gram-ood 19

On Out-of-Distribution Detection Algorithms with Deep Neural Skin - PowerPoint PPT Presentation

I NTRODUCTION M ETHODOLOGY E XPERIMENTS C ONCLUSION On Out-of-Distribution Detection Algorithms with Deep Neural Skin Cancer Classifiers Andre G. C. Pacheco 1 Chandramouli S. Sastry 2, 3 Thomas Trappenberg 2 Sageev Oore 2,3 Renato A. Krohling 1 1

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

Efficient Out-of-Distribution Detection in Digital Pathology Jasper Linmans, Jeroen van der Laak,

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

1. Normal distribution 2. Geometric distribution 3. Binomial distribution 4.

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Background Data Resampling for Outlier-Aware Classification Out-of-distribution Detection

Commission: Out of touch, out of date, out of pocket April 2017 Commission: Out of touch, out of

Low Level Low Level Low Level Low Level Detection of Detection of Detection of Detection of

Detection, Segmentation Overview Object Detection deer cat Object Detection as Classification

Object Detection using NVIDIA DIGITS Customization and Modification Deep Learning Institute

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Graph Algorithms Chapter 22 1 CPTR 430 Algorithms Graph Algorithms Why Study Graph Algorithms?

Greedy Algorithms Chapter 16 1 CPTR 430 Algorithms Greedy Algorithms Greedy Algorithms For

Algorithms Chapter 3 Chapter Summary Algorithms n Example Algorithms n Algorithmic Paradigms

Perimeter Intrusion Detection Mikro Tek Detection Technologies Ltd | +44 (0) 1773 744750 |

N-gram Graph: Representation for Graphs Shengchao Liu, Mehmet Furkan Demirel, Yingyu Liang

A CLT for Information-Theoretic Statistics of Gram Random Matrices Malika Kharouf Joint work

Gram-Schmidt Finding Orthonormal Basis The famous Gram-Schmidt process is used to produce an

The Infinite Markov Model Daichi Mochihashi NTT Communication Science Laboratories, Japan

CS7015 (Deep Learning) : Lecture 10 Learning Vectorial Representations Of Words Mitesh M. Khapra

Information Retrieval WS 2016 / 2017 Lecture 5, Tuesday November 22 nd , 2016 (Fuzzy Search, Edit

Semantic Indexing Using Deep CNNs and GMM Supervectors Nakamasa Inoue and Koichi Shinoda Zhang

Rich History of WIC MN Sen. HuBERT Humphrey sponsored legislation creating WIC in 1972