Approximate Inference: Sampling Methods CMSC 678 UMBC
Outline Recap Monte Carlo methods Sampling Techniques Uniform sampling Importance Sampling Rejection Sampling Metropolis-Hastings Gibbs sampling Example: Collapsed Gibbs Sampler for Topic Models
Recap from last time…
Exponential Family Forms: Capture Common Distributions Discrete (Finite distributions) Dirichlet (Distributions over (finite) distributions) Gaussian Gamma, Exponential, Poisson, Negative-Binomial, Laplace, log- Normal,…
Exponential Family Forms: “Easy” Posterior Inference Posterior p has same form as prior p p is the conjugate prior for q Posterior Likelihood Prior Dirichlet (Beta) Discrete (Bernoulli) Dirichlet (Beta) Normal Normal (fixed var.) Normal Gamma Exponential Gamma
Variational Inference: A Gradient- Based Optimization Technique 𝑞(𝜄|𝑦) Set t = 0 Difficult to compute Pick a starting value λ t Until converged : Minimize the 1. Get value y t = F(q(•; λ t )) “difference” 𝑟 𝜇 (𝜄) 2. Get gradient g t = F’(q(•; λ t )) by changing λ Easy(ier) to 3. Get scaling factor ρ t compute 4. Set λ t+1 = λ t + ρ t *g t 5. Set t += 1
Variational Inference: The Function to Optimize KL-Divergence (expectation) Find the best distribution Parameters for desired model D KL 𝑟 𝜄 || 𝑞(𝜄|𝑦) = log 𝑟 𝜄 𝔽 𝑟 𝜄 𝑞(𝜄|𝑦) Variational parameters for θ
Goal: Posterior Inference Hyperparameters α Unknown parameters Θ Data: p α ( Θ | ) Likelihood model: p( | Θ )
(Some) Learning Techniques MAP/MLE: Point estimation, basic EM Variational Inference: Functional Optimization Sampling/Monte Carlo today
Outline Recap Monte Carlo methods Sampling Techniques Uniform sampling Importance Sampling Rejection Sampling Metropolis-Hastings Gibbs sampling Example: Collapsed Gibbs Sampler for Topic Models
Two Problems for Sampling Methods to Solve Generate samples from p 𝑞 𝑦 = 𝑣 𝑦 , 𝑦 ∈ ℝ 𝐸 𝑎 𝑦 1 , 𝑦 2 , … , 𝑦 𝑆 samples Q : Why is sampling from p(x) hard?
Two Problems for Sampling Methods to Solve Generate samples from p 𝑞 𝑦 = 𝑣 𝑦 , 𝑦 ∈ ℝ 𝐸 𝑎 𝑦 1 , 𝑦 2 , … , 𝑦 𝑆 samples Q : Why is sampling from p(x) hard? A1 : Can we evaluate Z? A2 : Can we sample without enumerating? (Correct samples should be where p is big)
Two Problems for Sampling Methods to Solve Generate samples from p 𝑞 𝑦 = 𝑣 𝑦 , 𝑦 ∈ ℝ 𝐸 𝑎 𝑦 1 , 𝑦 2 , … , 𝑦 𝑆 samples Q : Why is sampling from p(x) hard? A1 : Can we evaluate Z? A2 : Can we sample without 𝑣 𝑦 = exp(.4 𝑦 − .4 2 − 0.08𝑦 4 ) ITILA, Fig enumerating? (Correct samples 29.1 should be where p is big)
Two Problems for Sampling Methods to Solve Estimate expectation of a Generate samples from p function 𝜚 𝑞 𝑦 = 𝑣 𝑦 , 𝑦 ∈ ℝ 𝐸 Φ = 𝜚 𝑦 𝑞 = 𝔽 𝑦∼𝑞 𝜚 𝑦 = 𝑎 ∫ 𝑞 𝑦 𝜚 𝑦 𝑒𝑦 𝑦 1 , 𝑦 2 , … , 𝑦 𝑆 samples Q : Why is sampling from p(x) hard? A1 : Can we evaluate Z? A2 : Can we sample without enumerating? (Correct samples should be where p is big)
Two Problems for Sampling Methods to Solve Estimate expectation of a Generate samples from p function 𝜚 𝑞 𝑦 = 𝑣 𝑦 , 𝑦 ∈ ℝ 𝐸 Φ = 𝜚 𝑦 𝑞 = 𝔽 𝑦∼𝑞 𝜚 𝑦 = 𝑎 ∫ 𝑞 𝑦 𝜚 𝑦 𝑒𝑦 𝑦 1 , 𝑦 2 , … , 𝑦 𝑆 samples 1 𝑆 σ 𝑠 𝜚 𝑦 𝑠 Φ = Q : Why is sampling from p(x) hard? A1 : Can we evaluate Z? A2 : Can we sample without enumerating? (Correct samples should be where p is big)
Two Problems for Sampling Methods to Solve Estimate expectation of a Generate samples from p function 𝜚 𝑞 𝑦 = 𝑣 𝑦 , 𝑦 ∈ ℝ 𝐸 Φ = 𝜚 𝑦 𝑞 = 𝔽 𝑦∼𝑞 𝜚 𝑦 = 𝑎 ∫ 𝑞 𝑦 𝜚 𝑦 𝑒𝑦 𝑦 1 , 𝑦 2 , … , 𝑦 𝑆 samples 1 𝑆 σ 𝑠 𝜚 𝑦 𝑠 Φ = Q : Why is sampling from p(x) hard? If we could sample from p… A1 : Can we evaluate Z? consistent 𝔽 A2 : Can we sample without Φ = Φ estimator enumerating? (Correct samples should be where p is big)
Outline Recap Monte Carlo methods Sampling Techniques Uniform sampling Importance Sampling Rejection Sampling Metropolis-Hastings Gibbs sampling Example: Collapsed Gibbs Sampler for Topic Models
Goal: Uniform Sampling Φ = 𝜚 𝑦 𝑞 = 𝔽 𝑦∼𝑞 𝜚 𝑦 sample 𝜚 𝑦 𝑠 𝑞 ∗ (𝑦 𝑠 ) Φ = uniformly : 𝑦 1 , 𝑦 2 , … , 𝑦 𝑆 𝑠
Goal: Uniform Sampling Φ = 𝜚 𝑦 𝑞 = 𝔽 𝑦∼𝑞 𝜚 𝑦 sample 𝜚 𝑦 𝑠 𝑞 ∗ (𝑦 𝑠 ) Φ = uniformly : 𝑦 1 , 𝑦 2 , … , 𝑦 𝑆 𝑠 𝑞 ∗ 𝑦 = 𝑣 𝑦 𝑎 ∗ 𝑎 ∗ = 𝑣(𝑦 𝑠 ) 𝑠
Goal: Uniform Sampling Φ = 𝜚 𝑦 𝑞 = 𝔽 𝑦∼𝑞 𝜚 𝑦 sample 𝜚 𝑦 𝑠 𝑞 ∗ (𝑦 𝑠 ) Φ = uniformly : 𝑦 1 , 𝑦 2 , … , 𝑦 𝑆 𝑠 𝑞 ∗ 𝑦 = 𝑣 𝑦 𝑎 ∗ 𝑎 ∗ = 𝑣(𝑦 𝑠 ) 𝑠 this might work if R (the number of samples) sufficiently hits high probability regions
Goal: Uniform Sampling Φ = 𝜚 𝑦 𝑞 = 𝔽 𝑦∼𝑞 𝜚 𝑦 sample 𝜚 𝑦 𝑠 𝑞 ∗ (𝑦 𝑠 ) Φ = uniformly : 𝑦 1 , 𝑦 2 , … , 𝑦 𝑆 𝑠 𝑞 ∗ 𝑦 = 𝑣 𝑦 𝑎 ∗ 𝑎 ∗ = 𝑣(𝑦 𝑠 ) 𝑠 this might work if R Ising model example: (the number of 2 H states of high • samples) sufficiently probability hits high probability 2 N states total • regions
Goal: Uniform Sampling Φ = 𝜚 𝑦 𝑞 = 𝔽 𝑦∼𝑞 𝜚 𝑦 sample 𝜚 𝑦 𝑠 𝑞 ∗ (𝑦 𝑠 ) Φ = uniformly : 𝑦 1 , 𝑦 2 , … , 𝑦 𝑆 𝑠 𝑞 ∗ 𝑦 = 𝑣 𝑦 𝑎 ∗ 𝑎 ∗ = 𝑣(𝑦 𝑠 ) 𝑠 this might work if R chance of sample being in Ising model example: 2 𝐼 (the number of 2 H states of high high prob. region: • 2 𝑂 samples) sufficiently probability hits high probability 2 N states total • min. samples needed: ∼ 2 𝑂−𝐼 regions
Outline Recap Monte Carlo methods Sampling Techniques Uniform sampling Importance Sampling Rejection Sampling Metropolis-Hastings Gibbs sampling Example: Collapsed Gibbs Sampler for Topic Models
Goal: Importance Sampling Φ = 𝜚 𝑦 𝑞 = 𝔽 𝑦∼𝑞 𝜚 𝑦 approximating distribution: 𝑅 𝑦 ∝ 𝑣 𝑟 𝑦 sample from Q : 𝑦 1 , 𝑦 2 , … , 𝑦 𝑆 ITILA, Fig 29.5
Goal: Importance Sampling Φ = 𝜚 𝑦 𝑞 = 𝔽 𝑦∼𝑞 𝜚 𝑦 approximating distribution: 𝑅 𝑦 ∝ 𝑣 𝑟 𝑦 sample from Q : 𝑦 1 , 𝑦 2 , … , 𝑦 𝑆 p(x) x where Q(x) > p(x): over-represented x where Q(x) < p(x): under-represented ITILA, Fig 29.5
Goal: Importance Sampling Φ = 𝜚 𝑦 𝑞 = 𝔽 𝑦∼𝑞 𝜚 𝑦 approximating distribution: Φ = σ 𝑠 𝜚 𝑦 𝑠 𝑥(𝑦 𝑠 ) 𝑅 𝑦 ∝ 𝑣 𝑟 𝑦 σ 𝑠 𝑥 𝑦 𝑠 sample from Q : 𝑦 1 , 𝑦 2 , … , 𝑦 𝑆 p(x) x where Q(x) > p(x): 𝑥 𝑦 𝑠 = 𝑣 𝑞 𝑦 over-represented 𝑣 𝑟 𝑦 x where Q(x) < p(x): under-represented ITILA, Fig 29.5
Goal: Importance Sampling Φ = 𝜚 𝑦 𝑞 = 𝔽 𝑦∼𝑞 𝜚 𝑦 approximating distribution: Φ = σ 𝑠 𝜚 𝑦 𝑠 𝑥(𝑦 𝑠 ) 𝑅 𝑦 ∝ 𝑣 𝑟 𝑦 σ 𝑠 𝑥 𝑦 𝑠 sample from Q : 𝑦 1 , 𝑦 2 , … , 𝑦 𝑆 p(x) x where Q(x) > p(x): 𝑥 𝑦 𝑠 = 𝑣 𝑞 𝑦 over-represented 𝑣 𝑟 𝑦 x where Q(x) < p(x): under-represented Q : How reliable will ITILA, Fig 29.5 this estimator be?
Goal: Importance Sampling Φ = 𝜚 𝑦 𝑞 = 𝔽 𝑦∼𝑞 𝜚 𝑦 approximating distribution: Φ = σ 𝑠 𝜚 𝑦 𝑠 𝑥(𝑦 𝑠 ) 𝑅 𝑦 ∝ 𝑣 𝑟 𝑦 σ 𝑠 𝑥 𝑦 𝑠 sample from Q : 𝑦 1 , 𝑦 2 , … , 𝑦 𝑆 p(x) x where Q(x) > p(x): 𝑥 𝑦 𝑠 = 𝑣 𝑞 𝑦 over-represented 𝑣 𝑟 𝑦 x where Q(x) < p(x): under-represented A : In practice, difficult Q : How reliable will ITILA, Fig 29.5 to say. 𝑥 𝑦 𝑠 may not this estimator be? be a good indicator
Goal: Importance Sampling Φ = 𝜚 𝑦 𝑞 = 𝔽 𝑦∼𝑞 𝜚 𝑦 approximating distribution: Φ = σ 𝑠 𝜚 𝑦 𝑠 𝑥(𝑦 𝑠 ) 𝑅 𝑦 ∝ 𝑣 𝑟 𝑦 σ 𝑠 𝑥 𝑦 𝑠 sample from Q : x where Q(x) > p(x): 𝑥 𝑦 𝑠 = 𝑣 𝑞 𝑦 over-represented 𝑦 1 , 𝑦 2 , … , 𝑦 𝑆 𝑣 𝑟 𝑦 x where Q(x) < p(x): under-represented p(x) A : In practice, difficult Q : How reliable will to say. 𝑥 𝑦 𝑠 may not this estimator be? be a good indicator Q : How do you choose a good approximating ITILA, Fig 29.5 distribution?
Goal: Importance Sampling Φ = 𝜚 𝑦 𝑞 = 𝔽 𝑦∼𝑞 𝜚 𝑦 approximating distribution: Φ = σ 𝑠 𝜚 𝑦 𝑠 𝑥(𝑦 𝑠 ) 𝑅 𝑦 ∝ 𝑣 𝑟 𝑦 σ 𝑠 𝑥 𝑦 𝑠 sample from Q : x where Q(x) > p(x): 𝑥 𝑦 𝑠 = 𝑣 𝑞 𝑦 over-represented 𝑦 1 , 𝑦 2 , … , 𝑦 𝑆 𝑣 𝑟 𝑦 x where Q(x) < p(x): under-represented p(x) A : In practice, difficult Q : How reliable will to say. 𝑥 𝑦 𝑠 may not this estimator be? be a good indicator Q : How do you choose A : Task/domain a good approximating ITILA, Fig 29.5 specific distribution?
Recommend
More recommend