ba bayesi esian deep deep le lear arning ning
play

Ba Bayesi esian Deep Deep Le Lear arning ning Prof. Leal-Taix - PowerPoint PPT Presentation

Ba Bayesi esian Deep Deep Le Lear arning ning Prof. Leal-Taix and Prof. Niessner 1 Go Going ful g full Bay ayes esian an Bayes = Probabilities Hypothesis = Model Bayes Theorem Evidence = data Prof. Leal-Taix and Prof.


  1. Ba Bayesi esian Deep Deep Le Lear arning ning Prof. Leal-Taixé and Prof. Niessner 1

  2. Go Going ful g full Bay ayes esian an • Bayes = Probabilities Hypothesis = Model • Bayes Theorem Evidence = data Prof. Leal-Taixé and Prof. Niessner 2

  3. Go Going ful g full Bay ayes esian an • Start with a prior on the model parameters • Choose a statistical model data • Use data to refine my prior, i.e., compute the posterior No dependence on parameters Prof. Leal-Taixé and Prof. Niessner 3

  4. Go Going ful g full Bay ayes esian an • Start with a prior on the model parameters • Choose a statistical model data • Use data to refine my prior, i.e., compute the posterior posterior prior likelihood Prof. Leal-Taixé and Prof. Niessner 4

  5. Go Going ful g full Bay ayes esian an • 1. Learning: Computing the posterior – Finding a point estimate (MAP) à what we have been doing so far! – Finding a probability distribution of This lecture Prof. Leal-Taixé and Prof. Niessner 5

  6. Wh What at hav ave e we e lear earned ed so o far ar? ages of Deep Learning models • Ad Advant antag – Very expressive models – Good for tasks such as classification, regression, sequence prediction – Modular structure, efficient training, many tools – Scales well with large amounts of data • But we have also disad advant antag ages … – ”Black-box” feeling – We cannot judge how “confident” the model is about a decision Prof. Leal-Taixé and Prof. Niessner 7

  7. Model Modeling uncer ertai ainty • Wish list: – We want to know what our models know and what they do not know Prof. Leal-Taixé and Prof. Niessner 8

  8. Model Modeling uncer ertai ainty • Example: I have built a dog breed classifier Bulldog German What answer sheperd will my NN give? Chihuaha Prof. Leal-Taixé and Prof. Niessner 9

  9. Model Modeling uncer ertai ainty • Example: I have built a dog breed classifier Bulldog German sheperd I would rather get as an answer that my model is not certain Chihuaha about the type of dog breed Prof. Leal-Taixé and Prof. Niessner 10

  10. Model Modeling uncer ertai ainty • Wish list: – We want to know what our models know and what they do not know • Why do we care? – Decision making – Learning from limited, noisy, and missing data – Insights on why a model failed Prof. Leal-Taixé and Prof. Niessner 11

  11. Model Modeling uncer ertai ainty • Finding the posterior – Finding a point estimate (MAP) à what we have been doing so far! – Finding a probability distribution of Image: https://medium.com/@joeDiHare/deep-bayesian-neural-networks-952763a9537 Prof. Leal-Taixé and Prof. Niessner 12

  12. Model Modeling uncer ertai ainty • We can sample many times from the distribution and see how this affects our model’s predictions • If predictions are consistent = model is confident Image: https://medium.com/@joeDiHare/deep-bayesian-neural-networks-952763a9537 Prof. Leal-Taixé and Prof. Niessner 13

  13. Model Modeling uncer ertai ainty I am not really sure Kendal & Gal. “What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?“ NIPS 2016 Prof. Leal-Taixé and Prof. Niessner 14

  14. Ho How d do w we g get t the p post sterio ior? • Compute the posterior over the weights • Probability of observing our data under all possible model parameters How do we compute this? Prof. Leal-Taixé and Prof. Niessner 15

  15. Ho How d do w we g get t the p post sterio ior? • How do we compute this? • Denominator = we cannot compute all possible combinations • Two ways to compute the Markov Chain Monte Carlo approximation of the posterior: Variational Inference Prof. Leal-Taixé and Prof. Niessner 16

  16. Ho How d do w we g get t the p post sterio ior? • Markov Chain Monte Carlo (MCMC) – A chain of samples SLOW that converge to • Variational Inference – Find an approximation that. Prof. Leal-Taixé and Prof. Niessner 17

  17. Dropout Dropout for or Ba Bayesi esian I Inferen erence ce Prof. Leal-Taixé and Prof. Niessner 18

  18. Rec Recal all: Drop opou out • Disable a random set of neurons (typically 50%) Forward Prof. Leal-Taixé and Prof. Niessner 19 Srivastava 2014

  19. Rec Recal all: Drop opou out Redundant representations • Using half the network = half capacity Furry Has two eyes Has a tail Has paws Has two ears Prof. Leal-Taixé and Prof. Niessner 20

  20. Rec Recal all: Drop opou out • Using half the network = half capacity – Redundant representations – Base your scores on more features • Consider it as model ensemble Prof. Leal-Taixé and Prof. Niessner 21

  21. Rec Recal all: Drop opou out • Two models in one Model 1 Model 2 Prof. Leal-Taixé and Prof. Niessner 22

  22. MC MC dr drop opou out • Variational Inference – Find an approximation that • Dropout training – The variational distribution is from a Bernoulli distribution (where the states are “on” and “off”) Y Gal, Z Ghahramani, “Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning”, ICML 2016 Prof. Leal-Taixé and Prof. Niessner 23

  23. MC MC dr drop opou out • 1. Train a model with dropout before every weight layer test time • 2. Apply dropout at te – Sampling is done in a Monte Carlo fashion, hence the name Monte Carlo dropout Y Gal, Z Ghahramani, “Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning”, ICML 2016 Prof. Leal-Taixé and Prof. Niessner 24

  24. MC MC dr drop opou out – Sampling is done in a Monte Carlo fashion, e.g., classification Parameter sampling NN where and is the dropout distribution Y Gal, Z Ghahramani, “Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning”, ICML 2016 Prof. Leal-Taixé and Prof. Niessner 25

  25. Meas Measure e you our model odel’s uncer ertai ainty Kendal & Gal. “What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?“ NIPS 2016 Prof. Leal-Taixé and Prof. Niessner 26

  26. Another lo look Prof. Leal-Taixé and Prof. Niessner 27

  27. Le Let t us ta take ke ano nothe ther lo look • We know it is intractable, we approximate it • The denominator expresses how my data is generated Prof. Leal-Taixé and Prof. Niessner 28

  28. Le Let t us ta take ke ano nothe ther lo look • We assume that the data is generated by some random process, involving an unobserved continuous random (latent) variable • Generation process: • Posterior: Prof. Leal-Taixé and Prof. Niessner 29

  29. Le Let t us ta take ke ano nothe ther lo look • Variational Inference – Find an approximation. • My approximation is parameterized by a model Prof. Leal-Taixé and Prof. Niessner 30

  30. Variational l Au Autoenc ncoders Prof. Leal-Taixé and Prof. Niessner 31

  31. Rec Recal all: Autoen oencoder oders • Encode the input into a representation (bottleneck) and reconstruct it with the decoder Encoder Decoder Conv Transpose Conv Prof. Leal-Taixé and Prof. Niessner 32

  32. Var Variat ation onal al Autoen oencoder oder Encoder Decoder Conv Transpose Conv Prof. Leal-Taixé and Prof. Niessner 33

  33. Var Variat ation onal al Autoen oencoder oder • Latent space is now a distribution • Specifically it is a Gaussian Encoder Prof. Leal-Taixé and Prof. Niessner 34

  34. Var Variat ation onal al Autoen oencoder oder • Latent space is now a distribution • Specifically it is a Gaussian Encoder Mean Diagonal covariance Prof. Leal-Taixé and Prof. Niessner 35

  35. Var Variat ation onal al Autoen oencoder oder • Latent space is now a distribution • Specifically it is a Gaussian Encoder Mean Diagonal covariance Prof. Leal-Taixé and Prof. Niessner 36

  36. Var Variat ation onal al Autoen oencoder oder • Back to our Bayesian view, our generation process was: • Which is the denominator of the posterior: I want to optimize Prof. Leal-Taixé and Prof. Niessner 37

  37. Var Variat ation onal al Autoen oencoder oder • Loss function for a data point I draw samples of the latent variable z from my encoder Prof. Leal-Taixé and Prof. Niessner 38

  38. Var Variat ation onal al Autoen oencoder oder • Loss function for a data point Bayes Rule Posterior Prof. Leal-Taixé and Prof. Niessner 39

  39. Var Variat ation onal al Autoen oencoder oder • Loss function for a data point Just a constant Prof. Leal-Taixé and Prof. Niessner 40

  40. Var Variat ation onal al Autoen oencoder oder • Loss function for a data point Prof. Leal-Taixé and Prof. Niessner 41

  41. Var Variat ation onal al Autoen oencoder oder • Loss function for a data point Kullback-Leibler Divergences Prof. Leal-Taixé and Prof. Niessner 42

  42. Var Variat ation onal al Autoen oencoder oder • Loss function for a data point Measures how good Reconstruction loss I still cannot express my latent distribution the shape of the is with respect to my distribution. But I know prior Prof. Leal-Taixé and Prof. Niessner 43

  43. Var Variat ation onal al Autoen oencoder oder • Loss function for a data point Loss function (lower bound) Prof. Leal-Taixé and Prof. Niessner 44

Recommend


More recommend