Uncertainty Estimation Using a Single Deep Deterministic Neural - PowerPoint PPT Presentation

Uncertainty Estimation Using a Single Deep Deterministic Neural Network Paper ID: 4538 Joost van Amersfoort, Lewis Smith, Yee Whye Teh, Yarin Gal Email: hi@joo.st

DUQ - 4 min Overview

Why do we want uncertainty? Many applications need uncertainty • Self-driving cars • Active Learning • Exploration in RL

Deterministic Uncertainty Quantification (DUQ) • A robust and powerful method to obtain uncertainty in deep learning • Match or outperform Deep Ensembles uncertainty with the runtime cost of a single network • Does not extrapolate arbitrarily and is able to detect OoD data

Two Moons Certain Uncertain Deep Ensembles 1 DUQ (1) Lakshminarayanan, Balaji, Alexander Pritzel, and Charles Blundell. "Simple and scalable predictive uncertainty estimation using deep ensembles." Advances in neural information processing systems . 2017.

The Model Cat Prediction e 1 • Uncertainty = distance between feature 1 n || W c f θ ( x ) − e c || 2 f θ } 2 Uncertainty representation and closest centroid = exp − 2 σ 2 f θ ( x ) • Deterministic, cheap to calculate e 2 Dog • Old idea based on RBF networks e 3 Bird Centroids

Overview • Use “One vs Rest” loss function to update model f θ ( x ) • Update centroids with exponential moving average Standard RBF • Regularise centroids to stay close to origin • Need to be well behaved → f θ ( x ) penalty on the Jacobian DUQ

Results • Training is easy and stable • Accuracy same as common softmax networks • Match or outperform Deep Ensembles uncertainty with the runtime cost of a single network Train on FashionMNIST Evaluate on FashionMNIST + MNIST

DUQ - Deep(er) Dive

Uncertainty Estimation Cat Prediction e 1 1 n || W c f θ ( x ) − e c || 2 f θ } 2 Uncertainty = exp − 2 σ 2 f θ ( x ) e 2 Dog • Uncertainty estimation for classification e 3 Bird • Use a deep neural network for feature extraction Centroids • Single centroid per class • Define uncertainty as distance to closest centroid in feature space

Uncertainty Estimation Cat Prediction e 1 1 n || W c f θ ( x ) − e c || 2 f θ } 2 Uncertainty = exp − 2 σ 2 f θ ( x ) e 2 Dog • Uncertainty estimation for classification e 3 Bird • Use a deep neural network for feature extraction Centroids • Single centroid per class • Define uncertainty as distance to closest centroid in feature space • Deterministic and single forward pass!

DUQ - Overview • Use “One vs Rest” loss function to update model f θ ( x ) • Update centroids with exponential moving average Standard RBF • Regularise centroids to stay close to origin • Need to be well behaved → f θ ( x ) penalty on the Jacobian DUQ

Learning the Model • “One vs Rest” loss function • Decrease distance to correct centroid, while increasing it relative to all others • Avoids centroids collapsing on top of each other • Regularisation avoids centroids exploding

Learning the Centroids Gradient Centroid • Exponential distance from centroid is bad for gradient based learning • When far away from the correct centroid, the gradient goes to zero • No learning signal for model Data

Learning the Centroids • Move each centroid to the mean of the feature vector of that class Centroid moves towards the data • Use exponential moving average with heavy momentum to make this work with mini-batches. Set to 0.99(9)

DUQ - Overview • Use “One vs Rest” loss function to update model f θ ( x ) • Update centroids with exponential moving average Standard RBF • Regularise centroids to stay close to origin • Need to be well behaved → f θ ( x ) penalty on the Jacobian DUQ

Why do we need to regularise ƒ? • Classification is at odds with being able to detect OoD input • Is the black star OoD? • Classification means we ignore features that don’t affect the class

Stability & Sensitivity • Two-sided gradient penalty λ ⋅ [ || ∇ x ∑ 2 • From above : low Lipschitz constant - 2 − L ] commonly used K c || 2 • From below: sensitive to changes in c the input ||ƒ(x) - ƒ(x + δ )|| > L

Stability & Sensitivity • Two-sided gradient penalty • From above : low Lipschitz constant - commonly used DUQ - penalty from above • From below: sensitive to changes in the input ||ƒ(x) - ƒ(x + δ )|| > L DUQ - two sided penalty

Results • FashionMNIST vs MNIST Out of Distribution detection • Rejection Classification on CIFAR-10 (training set) and SVHN (out of distribution set)

Summary • A robust and powerful method to obtain uncertainty in deep learning • Match or outperform Deep Ensembles 1 uncertainty with the runtime cost of a single network • No arbitrary extrapolation and able to detect OoD data

Limitations and Future Work • Aleatoric Uncertainty. DUQ is not able to estimate this. The one class per centroid system makes training stable, but does not allow assigning a data point to multiple classes. • Probabilistic Framework. DUQ is not placed in a probabilistic framework, however there are interesting similarities to inducing point GPs with parametrised (“deep”) kernels

Come chat with us Time slots available on ICML website - Missed us? Email at hi@joo.st Joost van Amersfoort Lewis Smith Yee Whye Teh Yarin Gal

Uncertainty Estimation Using a Single Deep Deterministic Neural - PowerPoint PPT Presentation

Uncertainty Estimation Using a Single Deep Deterministic Neural Network Paper ID: 4538 Joost van Amersfoort, Lewis Smith, Yee Whye Teh, Yarin Gal Email: hi@joo.st DUQ - 4 min Overview Why do we want uncertainty? Many applications need

Uncertainty AIMA Chapter 13 Outline Uncertainty Uncertainty Probability Syntax and

Using Single Photons Using Single Photons Using Single Photons Using Single Photons for WIMP

Training Deterministic Parsers with Non-Deterministic Oracles by Yoav Goldberg and Joakim

UNCERTAINTY IN KNOWLEDGE Ch. 9 Uncertainty in Knowledge 1 Sources of Uncertainty

Deterministic Networking Lab Part Bernhard Frmel Institut fr Technische Informatik

From normal to anomalous deterministic diffusion Part 1: Normal deterministic diffusion Rainer

TAIL ESTIMATION USING DETERMINISTIC METHODS Maximum Foreseeable Loss (MFL) For Severe Thunder

ESTIMATION AS UNCERTAINTY REDUCTION What is this estimation thing, anyway? Michael Godeck

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Motion Estimation by Affine Transforms Motion Estimation by Affine Transforms Motion Estimation

Lifting from the Deep: Convolutional 3D Pose Estimation from a Single Image Denis Tom

7 Modelling Uncertainty Bayes theorem 7 Modelling Uncertainty Bayes theorem

Uncertainty and its Representa/on @kordinglab Uncertainty ma7ers

Decision Making Privacy-Motivated . . . under Uncertainty: Uncertainty Leads to . . .

CPSC 875 CPSC 875 John D McGregor John D. McGregor C10 Error Design Uncertainty Uncertainty

Decision Making Under Uncertainty Making Decisions Under Uncertainty AI C LASS 10 (C H .

CSE 473: Artificial Intelligence Autumn 2011 Reinforcement Learning Luke Zettlemoyer Many

Lecture 8 AR, MA, and ARMA Models 9/27/2018 1 AR models AR(p) models We can generalize from

Exponentially Weighted Moving Average Chart Given a sequence of observations x 1 , x 2 , . . . , x

Off-diagonal estimates and weighted elliptic operators Cristian Rios University of Calgary

Weak dependence of mixed moving average processes and applications Robert Stelzer Institute of

Gaussian Semimartingales and Moving Averages Andreas Basse Thiele Centre, University of Aarhus,

The UKs productivity growth challenge Dave Ramsden Deputy Governor for Markets and Banking

Central Limit Theorem for Analitic Functions of two-sided moving averages. Limit theorem for