uncertainty in bayesian neural nets
play

Uncertainty in Bayesian Neural Nets August 4 2017 Overview BNN - PowerPoint PPT Presentation

Uncertainty in Bayesian Neural Nets August 4 2017 Overview BNN review Visualization experiments BNN results BNN Prior: p(W) Likelihood: p(Y|X,W) Approximate Posterior: q(W) Posterior Predictive: "($) [(|,


  1. Uncertainty in Bayesian Neural Nets August 4 2017

  2. Overview • BNN review • Visualization experiments • BNN results

  3. BNN Prior: p(W) Likelihood: p(Y|X,W) Approximate Posterior: q(W) Posterior Predictive: 𝐹 "($) [𝑞(𝑧|𝑦, 𝑋)]

  4. X W BNN • Variational Inference Y • Maximize lower bound on the marginal log-likelihood log 𝑞 𝑍 𝑌 ≥ 𝐹 " $ [log 𝑞 𝑍 𝑌, 𝑋 + log 𝑞 𝑋 − log 𝑟 𝑋 ] Likelihood Prior Posterior Approx Dependent on the number of data points ; 1 + 1 𝑂 log 𝑞(𝑋) 𝑁 9 log 𝑞 𝑍 : 𝑌 : , 𝑋 𝑟(𝑋) :<=

  5. �� Different priors and posterior approximations • Priors p(W): • 𝑂(0, 𝜏 A ) • Scale-mixtures of Normals • Sparsity Inducing • Posterior Approximations q(W): • Delta peak q W = 𝜀𝑋 A ) • Fully Factorized Gaussians q W = ∏ 𝑂(𝑥 I |𝜈 I , 𝜏 I • Bernoulli Dropout • Gaussian Dropout • MNF

  6. � � Multiplicative Normalizing Flows (MNF) Christos Louizos, Max Welling Generative Model ICML 2017 • Augment model with auxiliary variable X W 𝑨~𝑟 𝑨 𝑋~𝑟 𝑋 𝑨 𝑟 𝑋 = N 𝑟 𝑋 𝑨 𝑟 𝑨 𝑒𝑨 Y Z R VW R STU Inference Model A ) 𝑟 𝑋 𝑨 = P P 𝑂(𝑨 I 𝜈 IQ , 𝜏 IQ I<= Q<= W Normalizing Flows New lower bound Z log 𝑞 𝑍 𝑌 ≥ 𝐹 " $ [log 𝑞 𝑍 𝑌, 𝑋 + log 𝑞 𝑋 − log 𝑟 𝑋|𝑨 + log 𝑠 𝑨 𝑥 − log 𝑟(𝑨)]

  7. Predictive Distributions

  8. Uncertainties • Model uncertainty (Epistemic uncertainty) • Captures ignorance about the model that is most suitable to explain the data • Reduces as the amount of observed data increases • Summarized by generating function realizations from our distribution • Measurement Noise (Aleatoric uncertainty) • Noise inherent in the environment, captured in likelihood function • Predictive uncertainty • Entropy of prediction = H[p(y|x)]

  9. Visualization Experiments • 1D regression • Classification of MNIST (visualize in 2D) • Questions: • Activations • Number of samples • Held out classes • Type of uncertainties

  10. Sigmoid: (1+e -x ) -1 Tanh BNNs with Different Activation Functions ReLU: max(0,x) Softplus: ln(1+e x )

  11. Uncertainty of Decision Boundaries • Setup: • Classification of MNIST • Train: 50000 Test: 10000 784-100-2-100-10 NN BNN BNN: FFG, N(0,1) Activations: Softplus

  12. Decision Boundaries – 3 Samples Plot of Argmax p(y|x) at each point

  13. Uncertainty of Decision Boundaries: Held Out Classes • Setup: • Classification of digits 0 to 4 (5 to 9 held out) 784-100-100-2-100-100-10 NN BNN BNN: FFG, N(0,1) Activations: Softplus

  14. Where do you think the held out classes will go? Inside or Outside the Circle?

  15. Where do you think the held out classes will go?

  16. Held Out Classes Unseen classes don’t get encoded as something far away, instead encoded near mean

  17. Confidence of Predictions? Maybe large areas have high entropy Argmax vs Max

  18. Class Boundaries - Confidences Sharp transitions There isn’t much uncertain space: mostly uniform, high confidence

  19. Entropy Argmax Max Entropy

  20. Affect of Choice of Activation Function • Softplus • ReLU • Tanh

  21. Softplus 𝐹 "($) [𝑞(𝑧|𝑦, 𝑥)] Mean of q(W) Sample 1 Sample 2 Sample 3

  22. ReLU 𝐹 "($) [𝑞(𝑧|𝑦, 𝑥)] Sample 1 Sample 2 Sample 3 Mean of q(W)

  23. Tanh 𝐹 "($) [𝑞(𝑧|𝑦, 𝑥)] Sample 1 Sample 2 Sample 3 Mean of q(W)

  24. Mix (Softplus, ReLu, Tanh) Mean of q(W) 𝐹 "($) [𝑞(𝑧|𝑦)] Sample 1 Sample 2 Sample 3

  25. Number of Datapoints 𝐹 "($) [𝑞(𝑧|𝑦)] 25000 10000 1000 100 Argmax Max Entropy

  26. Model vs Output Uncertainty • Predictive Uncertainty = 𝐼[𝑞(𝑧|𝑦)] Output Model Uncertainty Uncertainty 𝐼[𝑞(𝑧|𝑦, 𝑥 Z )] 𝐼[𝐹 "($) [𝑞(𝑧|𝑦, 𝑥)]] where 𝑥 Z = mean of q(w) High variance predictions Output high entropy (on decision boundary)

  27. Model vs Output Uncertainty 25000 training datapoints Train Test Held Out Model Uncertainty .06 .06 .43 Output Uncertainty .05 .05 .36 Large data: output uncertainty 100 training datapoints Train Test Held Out Model Uncertainty .07 .26 .43 Output Uncertainty .03 .15 .25 Small data: model uncertainty

  28. BNN GP+NN NN Adversarial Examples, Uncertainty, and Transfer Testing Robustness in Gaussian Process Hybrid Deep Networks (July 2017)

  29. Visualize landscape of likelihood p(y train |x train ,W) w 1 w 2 Dimension of W is large, so use an 2D auxiliary variable

  30. � � Visualize landscape of likelihood Generative Model • Auxiliary Variable Model X W (2D) 𝑨~𝑟 𝑨 r 𝑨 𝑋 Y Z 𝑋~𝑟 𝑋 𝑨 hypo-network hyper-network Inference Model 784-100-100-2-10-10-10 W BNN NN 𝑟 𝑋 𝑨 = 𝜀(𝑋|𝑨) Z 𝑟 𝑋 = N 𝜀 𝑋 𝑨 𝑟 𝑨 𝑒𝑨 log 𝑞 𝑍 𝑌 ≥ 𝐹 " $ [log 𝑞 𝑍 𝑌, 𝑋 + log 𝑞 𝑋 − log 𝑟 𝑋|𝑨 + log 𝑠 𝑨 𝑥 − log 𝑟(𝑨)]

  31. Decision Boundaries z 1 z 2 z 3 𝐹 "([) [𝑞(𝑧|𝑦, 𝑨)]

  32. Likelihood Landscape Log p(y test |x test ,W,z) Log p(y train |x train ,W,z) z 1 z 2 z 2

  33. Likelihood Landscape log p(y train |x train ,W,z) + log r(z|W) log p(y train |x train ,W,z) log p(y test |x test ,W,z) - log q(z) z 1 z 2

  34. Likelihood Landscape log p(y train |x train ,W,z) + log r(z|W) Log p(y train |x train ,W,z) Log p(y test |x test ,W,z) - log q(z) z 1 z 2

  35. Likelihood Landscape log p(y train |x train ,W,z) + log r(z|W) - log q(z) Log p(y train |x train ,W,z) Log p(y test |x test ,W,z) z 1 z 2

  36. Recent BNN Papers • Multiplicative Normalizing Flows for Variational Bayesian Neural Networks (2017) • Variational Dropout Sparsifies Deep Neural Networks (2017) • Bayesian Compression for Deep Learning (2017) • Adversarial Perturbations • Compression

  37. Adversarial perturbations MNIST CIFAR 10

  38. Compression vs Uncertainty H[P]

  39. Conclusion • Used visualizations to help understand uncertainty in BNNs • Goal: improve uncertainty estimates and generalization Applications • Active learning • Bayes Opt • RL • Safety • Efficiency

  40. References • Weight Uncertainty in Neural Networks (2015) • Variational Dropout and the Local Reparameterization Trick (2015) • Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning (2016) • Variational Dropout Sparsifies Deep Neural Networks (2017) • On Calibration of Modern Neural Networks (2017) • Multiplicative Normalizing Flows for Variational Bayesian Neural Networks (2017)

  41. Thank You

Recommend


More recommend