Probabilistic Graphical Models Inference & Learning in DL - PowerPoint PPT Presentation

Probabilistic Graphical Models Inference & Learning in DL Zhiting Hu Lecture 19, March 29, 2017 Reading: 1

Deep Generative Models l Explicit probabilistic models Provide an explicit parametric specification of the distribution of 𝒚 l Tractable likelihood function p # (𝒚) l E.g., l 𝑞 𝑦, 𝑨|𝛽 = 𝑞 𝑦|𝑨 𝑞(𝑨|𝛽) 2

Deep Generative Models l Explicit probabilistic models Provide an explicit parametric specification of the distribution of 𝒚 l Tractable likelihood function p # (𝒚) l E.g., Sigmoid Belief Nets l (3) + 𝑑 . (3) , 𝑑 . 6 𝒊 / 𝑞 𝑤 ./ = 1 𝒙 . ,𝒊 / = 𝜏 𝒙 . (3) = 1 𝒙 9 , 𝒊 / : ,𝑑 9 ) = 𝜏(𝒙 9 : + 𝑑 9 ) 6 𝒊 / (:) = 0,1 ? 𝑞 ℎ 9/ 𝒊 / (3) = 0,1 > 𝒊 / 𝒘 / = 0,1 = 3

Deep Generative Models l Explicit probabilistic models Provide an explicit parametric specification of the distribution of 𝒚 l Tractable likelihood function p # (𝒚) l E.g., Deep generative model parameterized with NNs (e.g., VAEs) l 𝑞 # 𝒚 𝒜 = 𝑂 𝒚; 𝜈 # 𝒜 ,𝜏 𝑞 𝒜 = 𝑂(𝒜; 𝟏,𝑱) 4

Deep Generative Models l Implicit probabilistic models Defines a stochastic process to simulate data 𝒚 l Do not require tractable likelihood function l Data simulator l Natural approach for problems in population genetics, weather, ecology, etc. l E.g., generate data from a deterministic equation given parameters and random l noise (e.g., GANs) 𝒚 / = 𝑕 𝒜 / ; 𝜾 𝒜 / ∼ 𝑂(𝟏,𝑱) 5

Recap: Variational Inference 𝑞 # 𝒚,𝒜 l Consider a probabilistic model 𝑟 R 𝒜 | 𝒚 l Assume variational distribution l Lower bound for log likelihood log 𝑞 𝒚 + Q𝑟 R 𝒜 𝒚 log 𝑞 # 𝒚, 𝒜 = 𝐿𝑀 𝑟 𝝔 𝒜 𝒚 || 𝑞 𝜾 𝒜 𝒚 𝑟 R 𝒜 𝒚 𝒜 ≥ Q𝑟 R 𝒜 𝒚 log𝑞 # 𝒚, 𝒜 𝑟 R 𝒜 𝒚 𝒜 ≔ ℒ(𝜾, 𝝔;𝒚) Free energy l 𝐺 𝜾, 𝝔; 𝒚 = −log 𝑞 𝒚 + 𝐿𝑀(𝑟 𝝔 𝒜 𝒚 || 𝑞 𝜾 (𝒜|𝒚)) 6

Wake Sleep Algorithm 𝑞 # 𝒚 𝒜 l Consider a generative model E.g., sigmoid brief nets l l Variational bound: log 𝑞 𝒚 ≥ Q 𝑟 R 𝒜 𝒚 log 𝑞 # 𝒚,𝒜 𝑟 R 𝒜 𝒚 ≔ ℒ(𝜾, 𝝔;𝒚) 𝒜 𝑟 R 𝒜 𝒚 l Use a inference network 𝑞 # l Maximize the bound w.r.t. à Wake phase max 𝜾 E \(𝒜|𝒚) log 𝑞 𝜾 𝒚 𝒜 l 𝑟 𝒜 𝒚 Get samples from through bottom-up pass l Use the samples as targets for updating the generator l 7

Wake Sleep Algorithm l [Hinton et al., Science 1995] l Generally applicable to a wide range of generative models by training a separate inference network 𝑞(𝒜) 𝑞 # 𝒚 𝒜 l Consider a generative model , with prior E.g., multi-layer brief nets l l Free energy 𝐺 𝜾, 𝝔; 𝒚 = −log 𝑞 𝒚 + 𝐿𝑀(𝑟 𝝔 𝒜 𝒚 || 𝑞 𝜾 (𝒜|𝒚)) 𝑟 R 𝒜 𝒚 l Inference network a.k.a. recognition network l 8

R 2 Wake Sleep Algorithm R 1 𝒚 l Free energy: 𝐺 𝜾, 𝝔; 𝒚 = −log 𝑞 𝒚 + 𝐿𝑀(𝑟 𝝔 𝒜 𝒚 || 𝑞 𝜾 (𝒜|𝒚)) 𝑞 # l Minimize the free energy w.r.t. à Wake phase max 𝜾 E \(𝒜|𝒚) log 𝑞 𝜾 𝒚 𝒜 l 𝑟 Get samples from through bottom-up pass on training data l Use the samples as targets for updating the generator l [Figure courtesy: Maei’s slides] 9

G 2 R 2 Wake Sleep Algorithm G 1 R 1 𝒚 l Free energy: 𝐺 𝜾, 𝝔; 𝒚 = −log 𝑞 𝒚 + 𝐿𝑀(𝑟 𝝔 𝒜 𝒚 || 𝑞 𝜾 (𝒜|𝒚)) 𝑟 R 𝒜 𝒚 l Maximize the free energy w.r.t. ? computationally expensive / high variance l l Instead, maximize w.r.t. . à 𝑟 R 𝒜 𝒚 Sleep phase 𝐺′ 𝜾,𝝔;𝒚 = −log 𝑞 𝒚 + 𝐿𝑀(𝑞 𝒜 𝒚 || 𝑟 R (𝒜|𝒚)) max 𝝔 E ^(𝒜,𝒚) log 𝑟 R 𝒜 𝒚 l 𝑞 “Dreaming” up samples from through top-down pass l Use the samples as targets for updating the recognition network l 10

G 2 R 2 Wake Sleep Algorithm G 1 R 1 𝒚 l Wake phase: Use recognition network to perform a bottom-up pass in order to create samples l for layers above (from data) Train generative network using samples obtained from recognition model l l Sleep phase: Use generative weights to reconstruct data by performing a top-down pass l Train recognition weights using samples obtained from generative model l l KL is not symmetric l Doesn’t optimize a well-defined objective function l Not guaranteed to converge 11

Variational Auto-encoders (VAEs) l [Kingma & Welling, 2014] l Enjoy similar applicability with wake-sleep algorithm Not applicable to discrete latent variables l Optimize a variational lower bound on the log-likelihood l Reduce variance through reparameterization of the recognition l distribution Alternatives: use control variates as in reinforcement learning [Mnih & Gregor, l 2014] 12

Variational Auto-encoders (VAEs) 𝑞 # 𝒚 𝒜 𝑞(𝒜) l Generative model , with prior a.k.a. decoder l 𝑟 R 𝒜 𝒚 l Inference network a.k.a. encoder, recognition network l l Variational lower bound log 𝑞 𝒚 ≥ E \ _ 𝒜 𝒚 log𝑞 # 𝒚,𝒜 − KL 𝑟 R 𝒜 𝒚 || 𝑞 𝒜 ≔ ℒ(𝜾,𝝔;𝒚) 13

Variational Auto-encoders (VAEs) l Variational lower bound ℒ 𝜾, 𝝔; 𝒚 = E \ _ 𝒜 𝒚 log 𝑞 # 𝒚, 𝒜 − KL(𝑟 R 𝒜 𝒚 || 𝑞(𝒜)) 𝑞 # 𝒚 𝒜 ℒ(𝜾, 𝝔; 𝒚) l Optimize w.r.t. the same with the wake phase l ℒ(𝜾, 𝝔; 𝒚) 𝑟 R 𝒜 𝒚 l Optimize w.r.t. Directly computing the gradient with MC estimation l a REINFORCE-like update rule which suffers from high variance [Mnih & Gregor 2014] (Next lecture for more on REINFORCE) VAEs use a reparameterization trick to reduce variance l 14

VAEs: Reparameterization Trick l 15

VAEs: Reparameterization Trick l 𝑟 R 𝒜 (9) 𝒚 (9) = 𝒪(𝒜 9 ; 𝝂 9 ,𝝉 :(9) 𝑱) 𝒜 = 𝒜 R (𝝑) is a deterministic mapping of 𝝑 [Figure courtesy: Chang’s slides] 16

VAEs: Reparameterization Trick Variational lower bound l ℒ 𝜾, 𝝔;𝒚 = E \ _ 𝒜 𝒚 log 𝑞 # 𝒚,𝒜 − KL 𝑟 R 𝒜 𝒚 || 𝑞 𝒜 E \ _ 𝒜 𝒚 log 𝑞 # 𝒚, 𝒜 = E 𝝑∼𝒪(𝟏,𝑱) log 𝑞 # 𝒚,𝒜 R 𝝑 𝑟 R 𝒜 𝒚 ℒ 𝜾, 𝝔; 𝒚 Optimize w.r.t. l 𝛼 R E \ _ 𝒜 𝒚 log 𝑞 # 𝒚, 𝒜 = E 𝝑∼𝒪(𝟏,𝑱) 𝛼 R log 𝑞 # 𝒚,𝒜 R 𝝑 l Uses the gradients w.r.t. the latent variables l For Gaussian distributions, can be computed KL 𝑟 R 𝒜 𝒚 || 𝑞 𝒜 l and differentiated analytically 17

VAEs: Training l 18

VAEs: Results l 19

VAEs: Results l Generated MNIST images [Gregor et al., 2015] 20

VAEs: Limitations and variants l Element-wise reconstruction error For image generation, to reconstruct every pixels l Sensitive to irrelevant variance, e.g., translations l Variant: feature-wise (perceptual-level) reconstruction [Dosovitskiy et al., 2016] l Use a pre-trained neural network to extract features of data l Generated images are required to have similar feature vectors with the data l Variant: Combining VAEs with GANs [Larsen et al., 2016] (more later) l Reconstruction results with different loss 21

VAEs: Limitations and variants l Not applicable to discrete latent variables Differentiable reparameterization does not apply to discrete variables l Wake-sleep algorithm/GANs allow discrete latents l Variant: marginalize out discrete latents [Kingma et al., 2014] l Expensive when the discrete space is large l Variant: use continuous approximations l Gumbel-softmax [Jang et al, 2017] for approximating multinomial variables l Variant: combine VAEs with wake-sleep algorithm [Hu et al., 2017] l 22

VAEs: Limitations and variants Usually use a fixed standard normal distribution as prior l 𝑞 𝒜 = 𝒪(𝒜; 𝟏,𝑱) l For ease of inference and learning l Limited flexibility: converting the data distribution to fixed, single-mode prior l distribution Variant: use hierarchical nonparametric priors [Goyal et al., 2017] l E.g., Dirichlet process, nested Chinese restaurant process (more later) l Learn the structures of priors jointly with the model l 23

VAEs: Limitations and variants Usually use a fixed standard normal distribution as prior l 𝑞 𝒜 = 𝒪(𝒜; 𝟏,𝑱) l For ease of inference and learning l Limited flexibility: converting the data distribution to fixed, single-mode prior l distribution Variant: use hierarchical nonparametric priors [Goyal et al., 2017] l E.g., Dirichlet process, nested Chinese restaurant process (more later) l Learn the structures of priors jointly with the model l 24

Deep Generative Models l Implicit probabilistic models Defines a stochastic process to simulate data 𝒚 l Do not require tractable likelihood function l Data simulator l Natural approach for problems in population genetics, weather, ecology, etc. l E.g., generate data from a deterministic equation given parameters and random l noise (e.g., GANs) 𝒚 / = 𝑕 𝒜 / ; 𝜾 𝒜 / ∼ 𝑂(𝟏,𝑱) 25

Generative Adversarial Nets (GANs) [Goodfellow et al., 2014] l Assume implicit generative model l Learn cost function jointly l Interpreted as a mini-max game between a generator and a l discriminator Generate sharp, high-fidelity samples l 26

Probabilistic Graphical Models Inference & Learning in DL - PowerPoint PPT Presentation

Probabilistic Graphical Models Inference & Learning in DL Zhiting Hu Lecture 19, March 29, 2017 Reading: 1 Deep Generative Models l Explicit probabilistic models Provide an explicit parametric specification of the distribution of l

Probabilistic Graphical Models CMSC 678 UMBC Probabilistic Graphical Models A graph G that

Probabilistic Graphical Models Probabilistic Graphical Models Variable elimination Siamak

Graphical Models Graphical Models Bayesian Networks Siamak Ravanbakhsh Fall 2019 Previously on

Probabilistic Graphical Models Probabilistic Graphical Models introduction to learning Siamak

Computer Science Let me be provocative Probabilistic graphical models is how we do probabilistic

Probabilistic Graphical Models Probabilistic Graphical Models Undirected Models Fall 2019

Probabilistic Graphical Models Probabilistic Graphical Models parameter learning in undirected

Probabilistic Graphical Models Probabilistic Graphical Models Gaussian Network Models Fall 2019

CS 6782: Fall 2010 Probabilistic Graphical Models Guozhang Wang December 10, 2010 1

Probabilistic Graphical Models Probabilistic Graphical Models Relationship between the directed

Probabilistic Graphical Models Probabilistic Graphical Models Review of probability theory

Probabilistic Graphical Models Probabilistic Graphical Models Loopy BP and Bethe Free Energy

Probabilistic Graphical Models Probabilistic Graphical Models Structure learning in Bayesian

Probabilistic Graphical Models Probabilistic Graphical Models MAP inference Siamak Ravanbakhsh

Probabilistic Graphical Models Probabilistic Graphical Models Markov Chain Monte Carlo Inference

The Elimination Algorithm Probabilistic Graphical Models (10- Probabilistic Graphical Models

Detecting Sybil Attacks using Proofs of Work and Location for Vehicular AdHoc Networks (VANETS)

Software Engineering Six easy pieces Bertrand Meyer LASER, Biodola, September 2006 Software

Secure Sofuware Design For Data Privacy Narudom Roongsiriwong, CISSP MiSSConf(SP5), July 6, 2019

CIAM an Evolving and Converging World Simon Wood TIIME Unconference 19-Feb-2020 Q4 2019

The History and Culture Media Literacy Media includes any type of communication that reaches or

Edoxaban for the Treatment of Acute Symptomatic Venous Thromboembolism the HOKUSAI-VTE study

Energy Storage Incentive May 9, 2018 Presenter: Andrea Woodall, Center for Sustainable Energy

RevEngE is a dish served cold: Debug-Oriented Malware Decompilation and Reassembly Marcus Botacin