edward deep probabilistic programming
play

Edward: Deep Probabilistic Programming Extended Seminar Systems and - PowerPoint PPT Presentation

Edward: Deep Probabilistic Programming Extended Seminar Systems and Machine Learning Steven Lang 13.02.2020 1 Outline Introduction Refresher on Probabilistic Modeling Deep Probabilistic Programming Compositional Representations in


  1. Edward: Deep Probabilistic Programming Extended Seminar – Systems and Machine Learning Steven Lang 13.02.2020 1

  2. Outline Introduction Refresher on Probabilistic Modeling Deep Probabilistic Programming Compositional Representations in Edward Experiments Alternatives Conclusion 2

  3. Outline Introduction Refresher on Probabilistic Modeling Deep Probabilistic Programming Compositional Representations in Edward Experiments Alternatives Conclusion Introduction 3

  4. Motivation ◮ Nature of deep neural networks is compositional ◮ Connect layers in creative ways ◮ No worries about – testing (forward propagation) – inference (gradient based opt., with backprop. and auto-diff.) ◮ Leads to easy development of new successful architectures Introduction 4

  5. Motivation LeNet-5 (Lecun et al. 1998) ResNet-50 (He et al. 2015) VGG16 (Simonyan and Zisserman 2014) Inception-v4 (Szegedy et al. 2014) Introduction 5

  6. Motivation Goal : Achieve the composability of deep learning for 1. Probabilistic models 2. Probabilistic inference Introduction 6

  7. Outline Introduction Refresher on Probabilistic Modeling Deep Probabilistic Programming Compositional Representations in Edward Experiments Alternatives Conclusion Refresher on Probabilistic Modeling 7

  8. What is a Random Variable (RV)? ◮ Random number determined by chance, e.g. outcome of a single dice roll ◮ Drawn according to a probability distribution ◮ Typical random variables in statistical machine learning: – input data – output data – noise Refresher on Probabilistic Modeling 8

  9. 2 = 1 0.4 = 0, 2 = 2 = 2, 0.3 2 = 3 = 4, p ( X ) 0.2 0.1 0.0 2 0 2 4 6 8 10 X What is a Probability Distribution? ◮ Discrete : Describes probability, that RV will be equal to a certain value ◮ Continuous : Describes probability density , that RV will be equal to a certain value Refresher on Probabilistic Modeling 9

  10. What is a Probability Distribution? ◮ Discrete : Describes probability, that RV will be equal to a certain value ◮ Continuous : Describes probability density , that RV will be equal to a certain value 2 = 1 0.4 = 0, Example : Normal distribution 2 = 2 = 2, 0.3 2 = 3 = 4, p ( X ) 0.2 � � 2 � 1 − 1 � x − µ N ( µ, σ ) = √ 2 πσ 2 exp 0.1 2 σ 0.0 2 0 2 4 6 8 10 X Refresher on Probabilistic Modeling 9

  11. Common Probability Distributions Discrete ◮ Bernoulli ◮ Binomial ◮ Hypergeometric ◮ Poisson ◮ Boltzmann Refresher on Probabilistic Modeling 10

  12. Common Probability Distributions Discrete Continuous ◮ Bernoulli ◮ Uniform ◮ Binomial ◮ Beta ◮ Hypergeometric ◮ Normal ◮ Poisson ◮ Laplace ◮ Boltzmann ◮ Student-t Refresher on Probabilistic Modeling 10

  13. What is Inference? ◮ Answer the query P ( Q | E ) – Q : Query, set of RVs we are interested in – E : Evidence, set of RVs that we know the state of Refresher on Probabilistic Modeling 11

  14. What is Inference? ◮ Answer the query P ( Q | E ) – Q : Query, set of RVs we are interested in – E : Evidence, set of RVs that we know the state of ◮ Example: What is the prob. that – it has rained ( Q ) – when we know that the gras is wet ( E ) P ( Has Rained = true | Gras = wet ) Refresher on Probabilistic Modeling 11

  15. Probabilistic Models Bayesian Networks Variational Autoencoder Deep Belief Networks Markov Networks Refresher on Probabilistic Modeling 12

  16. Outline Introduction Refresher on Probabilistic Modeling Deep Probabilistic Programming Compositional Representations in Edward Experiments Alternatives Conclusion Deep Probabilistic Programming 13

  17. Key Ideas Probabilistic programming lets users ◮ specify probabilistic models as programs ◮ compile those models down into inference procedures Deep Probabilistic Programming 14

  18. Key Ideas Probabilistic programming lets users ◮ specify probabilistic models as programs ◮ compile those models down into inference procedures Two compositional representations as first class citizens ◮ Random variables ◮ Inference Deep Probabilistic Programming 14

  19. Key Ideas Probabilistic programming lets users ◮ specify probabilistic models as programs ◮ compile those models down into inference procedures Two compositional representations as first class citizens ◮ Random variables ◮ Inference Goal Make probabilistic programming as flexible and efficient as deep learning! Deep Probabilistic Programming 14

  20. Typicall PPL Tradeoffs Probabilistic programming languages typically have the following trade-off: Deep Probabilistic Programming 15

  21. Typicall PPL Tradeoffs Probabilistic programming languages typically have the following trade-off: ◮ Expressiveness – allow rich class beyond graphical models – scales poorly w.r.t. data and model size Deep Probabilistic Programming 15

  22. Typicall PPL Tradeoffs Probabilistic programming languages typically have the following trade-off: ◮ Expressiveness – allow rich class beyond graphical models – scales poorly w.r.t. data and model size ◮ Efficiency – PPL is restricted to a specific class of models – inference algorithms are optimized for this specific class Deep Probabilistic Programming 15

  23. Edward Edward (Tran et al. 2017) builds on two compositional representations ◮ Random variables ◮ Inference Deep Probabilistic Programming 16

  24. Edward Edward (Tran et al. 2017) builds on two compositional representations ◮ Random variables ◮ Inference Edward allows to fit the same model using a variety of composable inference methods ◮ Point estimation ◮ Variational inference ◮ Markov Chain Monte Carlo Deep Probabilistic Programming 16

  25. Edward Key concept : no distinct model or inference block ◮ Model : Composition/collection of random variables ◮ Inference : Way of modifying parameters in that collection subject to another Deep Probabilistic Programming 17

  26. Edward Uses computational benefits from TensorFlow like ◮ distributed training ◮ parallelism ◮ vectorization ◮ GPU support “for free” Deep Probabilistic Programming 18

  27. Outline Introduction Refresher on Probabilistic Modeling Deep Probabilistic Programming Compositional Representations in Edward Experiments Alternatives Conclusion Compositional Representations in Edward 19

  28. Criteria for Probabilistic Models Edward poses the following criteria on compositional representations for probabilistic models : 1. Integration with computational graphs – nodes represent operations on data – edges represent data communicated between nodes Compositional Representations in Edward 20

  29. Criteria for Probabilistic Models Edward poses the following criteria on compositional representations for probabilistic models : 1. Integration with computational graphs – nodes represent operations on data – edges represent data communicated between nodes 2. Invariance of the representation under the graph – graph can be reused during inference Compositional Representations in Edward 20

  30. Graph Example Computational Graph Evaluation y x Variables + z Constants x 2 pow Operations Compositional Representations in Edward 21

  31. Graph Example Computational Graph Evaluation 1. x + y y x Variables + z Constants x 2 pow Operations Compositional Representations in Edward 21

  32. Graph Example Computational Graph Evaluation 1. x + y y x 2. ( x + y ) · y · z Variables + z Constants x 2 pow Operations Compositional Representations in Edward 21

  33. Graph Example Computational Graph Evaluation 1. x + y y x 2. ( x + y ) · y · z Variables + z 3. 2 ( x + y ) · y · z Constants x 2 pow Operations Compositional Representations in Edward 21

  34. Example: Beta-Bernoulli Programm Beta-Bernoulli Model 50 � p ( x , θ ) = Beta ( θ | 1 , 1) Bernoulli ( x n | θ ) n =1 Compositional Representations in Edward 22

  35. Example: Beta-Bernoulli Programm Beta-Bernoulli Model 50 � p ( x , θ ) = Beta ( θ | 1 , 1) Bernoulli ( x n | θ ) n =1 Computation Graph ones(50) θ ∗ x ∗ x θ Compositional Representations in Edward 22

  36. Example: Beta-Bernoulli Programm Beta-Bernoulli Model 50 � p ( x , θ ) = Beta ( θ | 1 , 1) Bernoulli ( x n | θ ) n =1 Computation Graph ones(50) θ ∗ x ∗ x θ Edward code theta = Beta(a=1.0 , b=1.0) # Sample from Beta dist. x = Bernoulli(p=tf.ones (50) * theta) # Sample from Bernoulli dist. Compositional Representations in Edward 22

  37. Criteria for Probabilistic Inference Edward poses the following criteria on compositional representations for probabilistic inference : 1. Support for many classes of inference Compositional Representations in Edward 23

  38. Criteria for Probabilistic Inference Edward poses the following criteria on compositional representations for probabilistic inference : 1. Support for many classes of inference 2. Invariance of inference under the computational graph – posterior can be further composed as part of another model Compositional Representations in Edward 23

  39. Inference in Edward Goal : calculate posterior p ( z , β | x train ; θ ) , given ◮ data x train ◮ model parameters θ ◮ local variables z ◮ global variables β Compositional Representations in Edward 24

Recommend


More recommend