Edward: Deep Probabilistic Programming Extended Seminar – Systems and Machine Learning Steven Lang 13.02.2020 1
Outline Introduction Refresher on Probabilistic Modeling Deep Probabilistic Programming Compositional Representations in Edward Experiments Alternatives Conclusion 2
Outline Introduction Refresher on Probabilistic Modeling Deep Probabilistic Programming Compositional Representations in Edward Experiments Alternatives Conclusion Introduction 3
Motivation ◮ Nature of deep neural networks is compositional ◮ Connect layers in creative ways ◮ No worries about – testing (forward propagation) – inference (gradient based opt., with backprop. and auto-diff.) ◮ Leads to easy development of new successful architectures Introduction 4
Motivation LeNet-5 (Lecun et al. 1998) ResNet-50 (He et al. 2015) VGG16 (Simonyan and Zisserman 2014) Inception-v4 (Szegedy et al. 2014) Introduction 5
Motivation Goal : Achieve the composability of deep learning for 1. Probabilistic models 2. Probabilistic inference Introduction 6
Outline Introduction Refresher on Probabilistic Modeling Deep Probabilistic Programming Compositional Representations in Edward Experiments Alternatives Conclusion Refresher on Probabilistic Modeling 7
What is a Random Variable (RV)? ◮ Random number determined by chance, e.g. outcome of a single dice roll ◮ Drawn according to a probability distribution ◮ Typical random variables in statistical machine learning: – input data – output data – noise Refresher on Probabilistic Modeling 8
2 = 1 0.4 = 0, 2 = 2 = 2, 0.3 2 = 3 = 4, p ( X ) 0.2 0.1 0.0 2 0 2 4 6 8 10 X What is a Probability Distribution? ◮ Discrete : Describes probability, that RV will be equal to a certain value ◮ Continuous : Describes probability density , that RV will be equal to a certain value Refresher on Probabilistic Modeling 9
What is a Probability Distribution? ◮ Discrete : Describes probability, that RV will be equal to a certain value ◮ Continuous : Describes probability density , that RV will be equal to a certain value 2 = 1 0.4 = 0, Example : Normal distribution 2 = 2 = 2, 0.3 2 = 3 = 4, p ( X ) 0.2 � � 2 � 1 − 1 � x − µ N ( µ, σ ) = √ 2 πσ 2 exp 0.1 2 σ 0.0 2 0 2 4 6 8 10 X Refresher on Probabilistic Modeling 9
Common Probability Distributions Discrete ◮ Bernoulli ◮ Binomial ◮ Hypergeometric ◮ Poisson ◮ Boltzmann Refresher on Probabilistic Modeling 10
Common Probability Distributions Discrete Continuous ◮ Bernoulli ◮ Uniform ◮ Binomial ◮ Beta ◮ Hypergeometric ◮ Normal ◮ Poisson ◮ Laplace ◮ Boltzmann ◮ Student-t Refresher on Probabilistic Modeling 10
What is Inference? ◮ Answer the query P ( Q | E ) – Q : Query, set of RVs we are interested in – E : Evidence, set of RVs that we know the state of Refresher on Probabilistic Modeling 11
What is Inference? ◮ Answer the query P ( Q | E ) – Q : Query, set of RVs we are interested in – E : Evidence, set of RVs that we know the state of ◮ Example: What is the prob. that – it has rained ( Q ) – when we know that the gras is wet ( E ) P ( Has Rained = true | Gras = wet ) Refresher on Probabilistic Modeling 11
Probabilistic Models Bayesian Networks Variational Autoencoder Deep Belief Networks Markov Networks Refresher on Probabilistic Modeling 12
Outline Introduction Refresher on Probabilistic Modeling Deep Probabilistic Programming Compositional Representations in Edward Experiments Alternatives Conclusion Deep Probabilistic Programming 13
Key Ideas Probabilistic programming lets users ◮ specify probabilistic models as programs ◮ compile those models down into inference procedures Deep Probabilistic Programming 14
Key Ideas Probabilistic programming lets users ◮ specify probabilistic models as programs ◮ compile those models down into inference procedures Two compositional representations as first class citizens ◮ Random variables ◮ Inference Deep Probabilistic Programming 14
Key Ideas Probabilistic programming lets users ◮ specify probabilistic models as programs ◮ compile those models down into inference procedures Two compositional representations as first class citizens ◮ Random variables ◮ Inference Goal Make probabilistic programming as flexible and efficient as deep learning! Deep Probabilistic Programming 14
Typicall PPL Tradeoffs Probabilistic programming languages typically have the following trade-off: Deep Probabilistic Programming 15
Typicall PPL Tradeoffs Probabilistic programming languages typically have the following trade-off: ◮ Expressiveness – allow rich class beyond graphical models – scales poorly w.r.t. data and model size Deep Probabilistic Programming 15
Typicall PPL Tradeoffs Probabilistic programming languages typically have the following trade-off: ◮ Expressiveness – allow rich class beyond graphical models – scales poorly w.r.t. data and model size ◮ Efficiency – PPL is restricted to a specific class of models – inference algorithms are optimized for this specific class Deep Probabilistic Programming 15
Edward Edward (Tran et al. 2017) builds on two compositional representations ◮ Random variables ◮ Inference Deep Probabilistic Programming 16
Edward Edward (Tran et al. 2017) builds on two compositional representations ◮ Random variables ◮ Inference Edward allows to fit the same model using a variety of composable inference methods ◮ Point estimation ◮ Variational inference ◮ Markov Chain Monte Carlo Deep Probabilistic Programming 16
Edward Key concept : no distinct model or inference block ◮ Model : Composition/collection of random variables ◮ Inference : Way of modifying parameters in that collection subject to another Deep Probabilistic Programming 17
Edward Uses computational benefits from TensorFlow like ◮ distributed training ◮ parallelism ◮ vectorization ◮ GPU support “for free” Deep Probabilistic Programming 18
Outline Introduction Refresher on Probabilistic Modeling Deep Probabilistic Programming Compositional Representations in Edward Experiments Alternatives Conclusion Compositional Representations in Edward 19
Criteria for Probabilistic Models Edward poses the following criteria on compositional representations for probabilistic models : 1. Integration with computational graphs – nodes represent operations on data – edges represent data communicated between nodes Compositional Representations in Edward 20
Criteria for Probabilistic Models Edward poses the following criteria on compositional representations for probabilistic models : 1. Integration with computational graphs – nodes represent operations on data – edges represent data communicated between nodes 2. Invariance of the representation under the graph – graph can be reused during inference Compositional Representations in Edward 20
Graph Example Computational Graph Evaluation y x Variables + z Constants x 2 pow Operations Compositional Representations in Edward 21
Graph Example Computational Graph Evaluation 1. x + y y x Variables + z Constants x 2 pow Operations Compositional Representations in Edward 21
Graph Example Computational Graph Evaluation 1. x + y y x 2. ( x + y ) · y · z Variables + z Constants x 2 pow Operations Compositional Representations in Edward 21
Graph Example Computational Graph Evaluation 1. x + y y x 2. ( x + y ) · y · z Variables + z 3. 2 ( x + y ) · y · z Constants x 2 pow Operations Compositional Representations in Edward 21
Example: Beta-Bernoulli Programm Beta-Bernoulli Model 50 � p ( x , θ ) = Beta ( θ | 1 , 1) Bernoulli ( x n | θ ) n =1 Compositional Representations in Edward 22
Example: Beta-Bernoulli Programm Beta-Bernoulli Model 50 � p ( x , θ ) = Beta ( θ | 1 , 1) Bernoulli ( x n | θ ) n =1 Computation Graph ones(50) θ ∗ x ∗ x θ Compositional Representations in Edward 22
Example: Beta-Bernoulli Programm Beta-Bernoulli Model 50 � p ( x , θ ) = Beta ( θ | 1 , 1) Bernoulli ( x n | θ ) n =1 Computation Graph ones(50) θ ∗ x ∗ x θ Edward code theta = Beta(a=1.0 , b=1.0) # Sample from Beta dist. x = Bernoulli(p=tf.ones (50) * theta) # Sample from Bernoulli dist. Compositional Representations in Edward 22
Criteria for Probabilistic Inference Edward poses the following criteria on compositional representations for probabilistic inference : 1. Support for many classes of inference Compositional Representations in Edward 23
Criteria for Probabilistic Inference Edward poses the following criteria on compositional representations for probabilistic inference : 1. Support for many classes of inference 2. Invariance of inference under the computational graph – posterior can be further composed as part of another model Compositional Representations in Edward 23
Inference in Edward Goal : calculate posterior p ( z , β | x train ; θ ) , given ◮ data x train ◮ model parameters θ ◮ local variables z ◮ global variables β Compositional Representations in Edward 24
Recommend
More recommend