Graphical Models Graphical Models Monte-Carlo Inference Siamak Ravanbakhsh Winter 2018
Learning objectives Learning objectives the relationship between sampling and inference sampling from univariate distributions Monte Carlo sampling in graphical models
Mote Carlo inference Mote Carlo inference calculating marginals p ( x = ¯ 1 ) = p ( ¯ 1 , x , … , x ) ∑ x ,…, x x x 1 2 n 2 n
Mote Carlo Mote Carlo inference inference calculating marginals p ( x = ¯ 1 ) = p ( ¯ 1 , x , … , x ) ∑ x ,…, x x x 1 2 n 2 n approximate it by sampling X ( l ) ∼ p ( x ) 1 ∑ l ( l ) I ( X p ( x = ¯ 1 ) ≈ = ¯ 1 ) x x 1 1 L
Mote Carlo inference Mote Carlo inference calculating marginals p ( x = ¯ 1 ) = p ( ¯ 1 , x , … , x ) ∑ x ,…, x x x 1 2 n 2 n approximate it by sampling X ( l ) ∼ p ( x ) 1 ∑ l ( l ) I ( X p ( x = ¯ 1 ) ≈ = ¯ 1 ) x x 1 1 L inference in exponential family p ( x ) = exp(⟨ θ , ψ ⟩ − A ( θ )) θ is about finding the mean parameters μ = E [ ψ ( x )] p θ 1 ∑ l using L samples (particles) ( l ) μ ≈ ψ ( X ) L
Sampling from Sampling from categorical categorical dist. dist. access to pseudo random number generator for X ∼ U (0, 1) given p ( X = d ) = p ∀1 ≤ d ≤ D d p 1 p 2 p 6 0 1 generate and see where it falls X ∼ U (0, 1) use binary search O (log( D ))
Transforming Transforming probability densities probability densities given a random variable X ∼ p X what is the prob. density of ? Y = ϕ ( X ) −1 d ϕ ( y ) −1 Y ∼ p ( y ) = p ( ϕ ( y ))∣ ∣ Y X d y corresponding x how changes the volume around each point y ϕ (bonus) ϕ in multivariate case: determinant of the Jacobian matrix image: wikipedia
Inverse transform Inverse transform sampling sampling let be uniform = U (0, 1) X p X given a density p Y images: work.thaslwanter.at, Murphy's book
Inverse transform Inverse transform sampling sampling let be uniform = U (0, 1) X p X given a density p Y let be its CDF F ( y ) = P ( Y < y ) F Y Y images: work.thaslwanter.at, Murphy's book
Inverse transform Inverse transform sampling sampling let be uniform = U (0, 1) X p X given a density p Y let be its CDF F ( y ) = P ( Y < y ) F Y Y transform X using −1 ϕ ( X ) = F ( X ) Y what is the density of ? Y = ϕ ( X ) images: work.thaslwanter.at, Murphy's book
Inverse transform Inverse transform sampling sampling let be uniform = U (0, 1) X p X given a density p Y let be its CDF F ( y ) = P ( Y < y ) F Y Y transform X using −1 ϕ ( X ) = F ( X ) Y what is the density of ? Y = ϕ ( X ) F Y −1 d ϕ ( y ) d F ( y ) X −1 Y ∼ p ( ϕ ( y ))∣ ∣ = p ( F ( y ))∣ ∣ X X d y d y constant: p ( y ) Y = U (0, 1) p X Y images: work.thaslwanter.at, Murphy's book
Inverse transform sampling: Inverse transform sampling: example example Expoenential distribution p ( y ) = λe − λy p ( y ) x − λy F ( y ) = 1 − e Y F Y y calculate the inverse CDF: y −1 1 ( x ) = − ln(1 − x ) F Y λ image:wikipedia
Sampling in graphical models Sampling in graphical models for Bayes-nets ancestral sampling
Sampling in graphical models Sampling in graphical models for Bayes-nets ancestral sampling find a topological ordering (how?) e.g., D,I,G,S,L or I,S,D,G,L sample by conditioning on parents G ∼ P ( g ∣ I , D )
Introducing evidence Introducing evidence what if we have an evidence E.g., how to sample from the posterior? 0 p ( D , I , S , L ∣ G = g )
Introducing evidence Introducing evidence what if we have an evidence E.g., how to sample from the posterior? 0 p ( D , I , S , L ∣ G = g ) rejection sampling find a topological ordering sample by conditioning on parents only keep samples compatible with evidence 0 ( G = g ) wasteful if evidence has a low probability
Rejection sampling Rejection sampling general form ~ to sample from 1 p p ( x ) = ( x ) Z use a proposal distribution q ( x ) such that everywhere ~ Mq ( x ) > ( x ) p sample X ∼ q ( x ) ~ ( x ) accept the sample with probability p Mq ( x ) image: Murphy's book
Rejection sampling Rejection sampling general form ~ to sample from 1 p p ( x ) = ( x ) Z use a proposal distribution q ( x ) such that everywhere ~ Mq ( x ) > ( x ) p sample X ∼ q ( x ) ~ ( x ) accept the sample with probability p Mq ( x ) ~ ( x ) what is the probability of acceptance? p q ( x ) d x = Z ∫ x Mq ( x ) M Z for high-dimensional dists. becomes small! M rejection sampling becomes wasteful image: Murphy's book
Likelihood weighting Likelihood weighting what if we have an evidence? E.g., how to sample from the posterior? 0 p ( D , I , S , L ∣ G = g ) find a topological ordering assign a weight to each particle ( l ) ← 1 w sample by conditioning on parents when sampling an observed variable set it to its observed value G = g 1 update the sample's weight ( l ) ( l ) 1 ( l ) ( l ) ← w × p ( G = g ∣ D = d , I = i ) w current assignments to parents
Likelihood weighting Likelihood weighting what if we have an evidence? E.g., how to sample from the posterior? 0 p ( D , I , S , L ∣ G = g ) using weighted particles for inference: ( l ) 0 w I ( S = s ) ∑ l 0 1 p ( S = s ∣ G = g ) = l ∑ l w l
Likelihood weighting Likelihood weighting what if we have an evidence? E.g., how to sample from the posterior? 0 p ( D , I , S , L ∣ G = g ) using weighted particles for inference: ( l ) 0 w I ( S = s ) ∑ l 0 1 p ( S = s ∣ G = g ) = l ∑ l w l special case of importance sampling
Unnormalized Unnormalized importance sampling importance sampling Objective: Monte Carlo estimate E [ f ( x )] q ( x ) p ( x ) p f ( x ) difficult to sample from p (yet easy to evaluate) use a proposal distribution q : p ( x ) > 0 ⇒ q ( x ) > 0 x image: Bishop's book
Unnormalized Unnormalized importance sampling importance sampling Objective: Monte Carlo estimate E [ f ( x )] q ( x ) p ( x ) p f ( x ) difficult to sample from p (yet easy to evaluate) use a proposal distribution q : p ( x ) > 0 ⇒ q ( x ) > 0 x since p ( x ) p ( x ) E [ f ( x )] = f ( x )d x = E [ ∫ x p ( x ) f ( x )d x = ∫ x q ( x ) f ( x )] p q q ( x ) q ( x ) image: Bishop's book
Unnormalized Unnormalized importance sampling importance sampling Objective: Monte Carlo estimate E [ f ( x )] q ( x ) p ( x ) p f ( x ) difficult to sample from p (yet easy to evaluate) use a proposal distribution q : p ( x ) > 0 ⇒ q ( x ) > 0 x since p ( x ) p ( x ) E [ f ( x )] = f ( x )d x = E [ ∫ x p ( x ) f ( x )d x = ∫ x q ( x ) f ( x )] p q q ( x ) q ( x ) sample X ∼ q ( x ) l ( l ) p ( X ) assign an importance sampling weight ( l ) w ( X ) = ( l ) q ( X ) image: Bishop's book
Unnormalized importance sampling Unnormalized importance sampling Objective: Monte Carlo estimate E [ f ( x )] q ( x ) p ( x ) p f ( x ) difficult to sample from p (yet easy to evaluate) use a proposal distribution q : p ( x ) > 0 ⇒ q ( x ) > 0 x since p ( x ) p ( x ) E [ f ( x )] = f ( x )d x = E [ ∫ x p ( x ) f ( x )d x = ∫ x q ( x ) f ( x )] p q q ( x ) q ( x ) sample X ∼ q ( x ) l ( l ) p ( X ) assign an importance sampling weight ( l ) w ( X ) = ( l ) q ( X ) 1 ∑ l E [ f ( x )] ≈ ( l ) ( l ) w ( X ) f ( X ) is an unbiased estimator p L can be more efficient than sampling from p itself! (why?) image: Bishop's book
normalized normalized importance sampling importance sampling ~ What if we can evaluate p, up to a constant? 1 p p ( x ) = ( x ) Z 1 posterior in directed models p ( x ∣ E = e ) = p ( x , e ) Examples p ( e ) 1 ∏ I prior in undirected models p ( x ) = ϕ ( x ) I I Z
normalized normalized importance sampling importance sampling ~ What if we can evaluate p, up to a constant? 1 p p ( x ) = ( x ) Z 1 posterior in directed models p ( x ∣ E = e ) = p ( x , e ) Examples p ( e ) 1 ∏ I prior in undirected models p ( x ) = ϕ ( x ) I I Z ~ ~ ( x ) define p then E [ w ( x )] = w ( x ) = q ( x ) ( x )d x = Z ∫ x p q ~ 1 ∫ x E [ w ( x ) f ( x )] since ( x ) E [ f ( x )] = 1 E [ w ( x ) f ( x )] = p ∫ x p ( x ) f ( x )d x = q ( x ) f ( x )d x = q p q E [ w ( x )] q ( x ) Z Z q
normalized importance sampling normalized importance sampling ~ What if we can evaluate p, up to a constant? 1 p p ( x ) = ( x ) Z 1 posterior in directed models p ( x ∣ E = e ) = p ( x , e ) Examples p ( e ) 1 ∏ I prior in undirected models p ( x ) = ϕ ( x ) I I Z ~ ~ ( x ) define p then E [ w ( x )] = w ( x ) = q ( x ) ( x )d x = Z ∫ x p q ~ 1 ∫ x E [ w ( x ) f ( x )] since ( x ) E [ f ( x )] = 1 E [ w ( x ) f ( x )] = p ∫ x p ( x ) f ( x )d x = q ( x ) f ( x )d x = q p q E [ w ( x )] q ( x ) Z Z q sample ( l ) ∼ q ( x ) X ~ ( l ) ( X ) assign an importance sampling weight ( l ) p w ( X ) = ( l ) q ( X ) ( l ) ( l ) w ( X ) f ( X ) ∑ l E [ f ( x )] ≈ is a biased estimator (e.g., consider L=1) p w ( X ( l ) ) ∑ l
Revisiting likelihood weighting Revisiting likelihood weighting likelihood weighting: ( l ) 0 w I ( S = s ) ∑ l 0 2 1 p ( S = s ∣ G = g , I = i ) = l ∑ l w l equivalent to:
Recommend
More recommend