Uncertainty in Bayesian Neural Nets August 4 2017 Overview BNN - PowerPoint PPT Presentation

Uncertainty in Bayesian Neural Nets August 4 2017

Overview • BNN review • Visualization experiments • BNN results

BNN Prior: p(W) Likelihood: p(Y|X,W) Approximate Posterior: q(W) Posterior Predictive: 𝐹 "($) [𝑞(𝑧|𝑦, 𝑋)]

X W BNN • Variational Inference Y • Maximize lower bound on the marginal log-likelihood log 𝑞 𝑍 𝑌 ≥ 𝐹 " $ [log 𝑞 𝑍 𝑌, 𝑋 + log 𝑞 𝑋 − log 𝑟 𝑋 ] Likelihood Prior Posterior Approx Dependent on the number of data points ; 1 + 1 𝑂 log 𝑞(𝑋) 𝑁 9 log 𝑞 𝑍 : 𝑌 : , 𝑋 𝑟(𝑋) :<=

�� Different priors and posterior approximations • Priors p(W): • 𝑂(0, 𝜏 A ) • Scale-mixtures of Normals • Sparsity Inducing • Posterior Approximations q(W): • Delta peak q W = 𝜀𝑋 A ) • Fully Factorized Gaussians q W = ∏ 𝑂(𝑥 I |𝜈 I , 𝜏 I • Bernoulli Dropout • Gaussian Dropout • MNF

� � Multiplicative Normalizing Flows (MNF) Christos Louizos, Max Welling Generative Model ICML 2017 • Augment model with auxiliary variable X W 𝑨~𝑟 𝑨 𝑋~𝑟 𝑋 𝑨 𝑟 𝑋 = N 𝑟 𝑋 𝑨 𝑟 𝑨 𝑒𝑨 Y Z R VW R STU Inference Model A ) 𝑟 𝑋 𝑨 = P P 𝑂(𝑨 I 𝜈 IQ , 𝜏 IQ I<= Q<= W Normalizing Flows New lower bound Z log 𝑞 𝑍 𝑌 ≥ 𝐹 " $ [log 𝑞 𝑍 𝑌, 𝑋 + log 𝑞 𝑋 − log 𝑟 𝑋|𝑨 + log 𝑠 𝑨 𝑥 − log 𝑟(𝑨)]

Predictive Distributions

Uncertainties • Model uncertainty (Epistemic uncertainty) • Captures ignorance about the model that is most suitable to explain the data • Reduces as the amount of observed data increases • Summarized by generating function realizations from our distribution • Measurement Noise (Aleatoric uncertainty) • Noise inherent in the environment, captured in likelihood function • Predictive uncertainty • Entropy of prediction = H[p(y|x)]

Visualization Experiments • 1D regression • Classification of MNIST (visualize in 2D) • Questions: • Activations • Number of samples • Held out classes • Type of uncertainties

Sigmoid: (1+e -x ) -1 Tanh BNNs with Different Activation Functions ReLU: max(0,x) Softplus: ln(1+e x )

Uncertainty of Decision Boundaries • Setup: • Classification of MNIST • Train: 50000 Test: 10000 784-100-2-100-10 NN BNN BNN: FFG, N(0,1) Activations: Softplus

Decision Boundaries – 3 Samples Plot of Argmax p(y|x) at each point

Uncertainty of Decision Boundaries: Held Out Classes • Setup: • Classification of digits 0 to 4 (5 to 9 held out) 784-100-100-2-100-100-10 NN BNN BNN: FFG, N(0,1) Activations: Softplus

Where do you think the held out classes will go? Inside or Outside the Circle?

Where do you think the held out classes will go?

Held Out Classes Unseen classes don’t get encoded as something far away, instead encoded near mean

Confidence of Predictions? Maybe large areas have high entropy Argmax vs Max

Class Boundaries - Confidences Sharp transitions There isn’t much uncertain space: mostly uniform, high confidence

Entropy Argmax Max Entropy

Affect of Choice of Activation Function • Softplus • ReLU • Tanh

Softplus 𝐹 "($) [𝑞(𝑧|𝑦, 𝑥)] Mean of q(W) Sample 1 Sample 2 Sample 3

ReLU 𝐹 "($) [𝑞(𝑧|𝑦, 𝑥)] Sample 1 Sample 2 Sample 3 Mean of q(W)

Tanh 𝐹 "($) [𝑞(𝑧|𝑦, 𝑥)] Sample 1 Sample 2 Sample 3 Mean of q(W)

Mix (Softplus, ReLu, Tanh) Mean of q(W) 𝐹 "($) [𝑞(𝑧|𝑦)] Sample 1 Sample 2 Sample 3

Number of Datapoints 𝐹 "($) [𝑞(𝑧|𝑦)] 25000 10000 1000 100 Argmax Max Entropy

Model vs Output Uncertainty • Predictive Uncertainty = 𝐼[𝑞(𝑧|𝑦)] Output Model Uncertainty Uncertainty 𝐼[𝑞(𝑧|𝑦, 𝑥 Z )] 𝐼[𝐹 "($) [𝑞(𝑧|𝑦, 𝑥)]] where 𝑥 Z = mean of q(w) High variance predictions Output high entropy (on decision boundary)

Model vs Output Uncertainty 25000 training datapoints Train Test Held Out Model Uncertainty .06 .06 .43 Output Uncertainty .05 .05 .36 Large data: output uncertainty 100 training datapoints Train Test Held Out Model Uncertainty .07 .26 .43 Output Uncertainty .03 .15 .25 Small data: model uncertainty

BNN GP+NN NN Adversarial Examples, Uncertainty, and Transfer Testing Robustness in Gaussian Process Hybrid Deep Networks (July 2017)

Visualize landscape of likelihood p(y train |x train ,W) w 1 w 2 Dimension of W is large, so use an 2D auxiliary variable

� � Visualize landscape of likelihood Generative Model • Auxiliary Variable Model X W (2D) 𝑨~𝑟 𝑨 r 𝑨 𝑋 Y Z 𝑋~𝑟 𝑋 𝑨 hypo-network hyper-network Inference Model 784-100-100-2-10-10-10 W BNN NN 𝑟 𝑋 𝑨 = 𝜀(𝑋|𝑨) Z 𝑟 𝑋 = N 𝜀 𝑋 𝑨 𝑟 𝑨 𝑒𝑨 log 𝑞 𝑍 𝑌 ≥ 𝐹 " $ [log 𝑞 𝑍 𝑌, 𝑋 + log 𝑞 𝑋 − log 𝑟 𝑋|𝑨 + log 𝑠 𝑨 𝑥 − log 𝑟(𝑨)]

Decision Boundaries z 1 z 2 z 3 𝐹 "([) [𝑞(𝑧|𝑦, 𝑨)]

Likelihood Landscape Log p(y test |x test ,W,z) Log p(y train |x train ,W,z) z 1 z 2 z 2

Likelihood Landscape log p(y train |x train ,W,z) + log r(z|W) log p(y train |x train ,W,z) log p(y test |x test ,W,z) - log q(z) z 1 z 2

Likelihood Landscape log p(y train |x train ,W,z) + log r(z|W) Log p(y train |x train ,W,z) Log p(y test |x test ,W,z) - log q(z) z 1 z 2

Likelihood Landscape log p(y train |x train ,W,z) + log r(z|W) - log q(z) Log p(y train |x train ,W,z) Log p(y test |x test ,W,z) z 1 z 2

Recent BNN Papers • Multiplicative Normalizing Flows for Variational Bayesian Neural Networks (2017) • Variational Dropout Sparsifies Deep Neural Networks (2017) • Bayesian Compression for Deep Learning (2017) • Adversarial Perturbations • Compression

Adversarial perturbations MNIST CIFAR 10

Compression vs Uncertainty H[P]

Conclusion • Used visualizations to help understand uncertainty in BNNs • Goal: improve uncertainty estimates and generalization Applications • Active learning • Bayes Opt • RL • Safety • Efficiency

References • Weight Uncertainty in Neural Networks (2015) • Variational Dropout and the Local Reparameterization Trick (2015) • Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning (2016) • Variational Dropout Sparsifies Deep Neural Networks (2017) • On Calibration of Modern Neural Networks (2017) • Multiplicative Normalizing Flows for Variational Bayesian Neural Networks (2017)

Thank You

Uncertainty in Bayesian Neural Nets August 4 2017 Overview BNN - PowerPoint PPT Presentation

Uncertainty in Bayesian Neural Nets August 4 2017 Overview BNN review Visualization experiments BNN results BNN Prior: p(W) Likelihood: p(Y|X,W) Approximate Posterior: q(W) Posterior Predictive: "($) [(|,

Conflict nets: Efficient locally canonical MALL proof nets Dominic J. D. Hughes and Willem

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Petri Nets Petri Nets Inputs and Outputs Petri Nets vs FSM Lionel Morel Modeling Templates

Mix-Nets Lecture 19 Some tools for electronic-voting (and other things) Mix-Nets Mix-Nets

Petri Nets and Model Checking Natasa Gkolfi University of Oslo March 31, 2017 Petri Nets and

NLP Programming Tutorial 8 - Recurrent Neural Nets Graham Neubig Nara Institute of Science and

CSC421/2516 Lecture 19: Bayesian Neural Nets Roger Grosse and Jimmy Ba Roger Grosse and Jimmy Ba

Bayesian Methods for Neural Networks Readings: Bishop, Neural Networks for Pattern Recognition .

Uncertainty AIMA Chapter 13 Outline Uncertainty Uncertainty Probability Syntax and

Convolutional Neural Nets CS447 Natural Language Processing (J. Hockenmaier)

Today CS 188: Artificial Intelligence Neural Nets (wrap-up) and Decision Trees Neural Nets --

From DB-nets to Coloured Petri Nets with Priorities Marco Montali and Andrey Rivkin KRDB Research

Why Are Convlotuional Nets More Sample-Efficient than Fully-Connected Nets? Zhiyuan Li Joint

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Architecture, Arguments, and Confidence (Joint work with Bev Littlewood, City University, London

Monte Carlo Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 6 in

Change in psychotropic drug use in Norwegian nursing homes between 2004 and 2011 Sarah Janus

Computational social processes Lirong Xia Fall, 2016 Example: Crowdsourcing . . . . . . .

UTOPIAE is a 3.9M starting 01 January 2017 research and training network supported by the 15

Analysis of the stability and accuracy of multivariate polynomial approximation by discrete least

Structural modeling of Wiener-Hammerstein system in the presence of the process noise Erliang

CPSC410/611: Week 14 - Deadlocks The problem Examples Resource and system