graphical models graphical models
play

Graphical Models Graphical Models Monte-Carlo Inference Siamak - PowerPoint PPT Presentation

Graphical Models Graphical Models Monte-Carlo Inference Siamak Ravanbakhsh Winter 2018 Learning objectives Learning objectives the relationship between sampling and inference sampling from univariate distributions Monte Carlo sampling in


  1. Graphical Models Graphical Models Monte-Carlo Inference Siamak Ravanbakhsh Winter 2018

  2. Learning objectives Learning objectives the relationship between sampling and inference sampling from univariate distributions Monte Carlo sampling in graphical models

  3. Mote Carlo inference Mote Carlo inference calculating marginals p ( x = ¯ 1 ) = p ( ¯ 1 , x , … , x ) ∑ x ,…, x x x 1 2 n 2 n

  4. Mote Carlo Mote Carlo inference inference calculating marginals p ( x = ¯ 1 ) = p ( ¯ 1 , x , … , x ) ∑ x ,…, x x x 1 2 n 2 n approximate it by sampling X ( l ) ∼ p ( x ) 1 ∑ l ( l ) I ( X p ( x = ¯ 1 ) ≈ = ¯ 1 ) x x 1 1 L

  5. Mote Carlo inference Mote Carlo inference calculating marginals p ( x = ¯ 1 ) = p ( ¯ 1 , x , … , x ) ∑ x ,…, x x x 1 2 n 2 n approximate it by sampling X ( l ) ∼ p ( x ) 1 ∑ l ( l ) I ( X p ( x = ¯ 1 ) ≈ = ¯ 1 ) x x 1 1 L inference in exponential family p ( x ) = exp(⟨ θ , ψ ⟩ − A ( θ )) θ is about finding the mean parameters μ = E [ ψ ( x )] p θ 1 ∑ l using L samples (particles) ( l ) μ ≈ ψ ( X ) L

  6. Sampling from Sampling from categorical categorical dist. dist. access to pseudo random number generator for X ∼ U (0, 1) given p ( X = d ) = p ∀1 ≤ d ≤ D d p 1 p 2 p 6 0 1 generate and see where it falls X ∼ U (0, 1) use binary search O (log( D ))

  7. Transforming Transforming probability densities probability densities given a random variable X ∼ p X what is the prob. density of ? Y = ϕ ( X ) −1 d ϕ ( y ) −1 Y ∼ p ( y ) = p ( ϕ ( y ))∣ ∣ Y X d y corresponding x how changes the volume around each point y ϕ (bonus) ϕ in multivariate case: determinant of the Jacobian matrix image: wikipedia

  8. Inverse transform Inverse transform sampling sampling let be uniform = U (0, 1) X p X given a density p Y images: work.thaslwanter.at, Murphy's book

  9. Inverse transform Inverse transform sampling sampling let be uniform = U (0, 1) X p X given a density p Y let be its CDF F ( y ) = P ( Y < y ) F Y Y images: work.thaslwanter.at, Murphy's book

  10. Inverse transform Inverse transform sampling sampling let be uniform = U (0, 1) X p X given a density p Y let be its CDF F ( y ) = P ( Y < y ) F Y Y transform X using −1 ϕ ( X ) = F ( X ) Y what is the density of ? Y = ϕ ( X ) images: work.thaslwanter.at, Murphy's book

  11. Inverse transform Inverse transform sampling sampling let be uniform = U (0, 1) X p X given a density p Y let be its CDF F ( y ) = P ( Y < y ) F Y Y transform X using −1 ϕ ( X ) = F ( X ) Y what is the density of ? Y = ϕ ( X ) F Y −1 d ϕ ( y ) d F ( y ) X −1 Y ∼ p ( ϕ ( y ))∣ ∣ = p ( F ( y ))∣ ∣ X X d y d y constant: p ( y ) Y = U (0, 1) p X Y images: work.thaslwanter.at, Murphy's book

  12. Inverse transform sampling: Inverse transform sampling: example example Expoenential distribution p ( y ) = λe − λy p ( y ) x − λy F ( y ) = 1 − e Y F Y y calculate the inverse CDF: y −1 1 ( x ) = − ln(1 − x ) F Y λ image:wikipedia

  13. Sampling in graphical models Sampling in graphical models for Bayes-nets ancestral sampling

  14. Sampling in graphical models Sampling in graphical models for Bayes-nets ancestral sampling find a topological ordering (how?) e.g., D,I,G,S,L or I,S,D,G,L sample by conditioning on parents G ∼ P ( g ∣ I , D )

  15. Introducing evidence Introducing evidence what if we have an evidence E.g., how to sample from the posterior? 0 p ( D , I , S , L ∣ G = g )

  16. Introducing evidence Introducing evidence what if we have an evidence E.g., how to sample from the posterior? 0 p ( D , I , S , L ∣ G = g ) rejection sampling find a topological ordering sample by conditioning on parents only keep samples compatible with evidence 0 ( G = g ) wasteful if evidence has a low probability

  17. Rejection sampling Rejection sampling general form ~ to sample from 1 p p ( x ) = ( x ) Z use a proposal distribution q ( x ) such that everywhere ~ Mq ( x ) > ( x ) p sample X ∼ q ( x ) ~ ( x ) accept the sample with probability p Mq ( x ) image: Murphy's book

  18. Rejection sampling Rejection sampling general form ~ to sample from 1 p p ( x ) = ( x ) Z use a proposal distribution q ( x ) such that everywhere ~ Mq ( x ) > ( x ) p sample X ∼ q ( x ) ~ ( x ) accept the sample with probability p Mq ( x ) ~ ( x ) what is the probability of acceptance? p q ( x ) d x = Z ∫ x Mq ( x ) M Z for high-dimensional dists. becomes small! M rejection sampling becomes wasteful image: Murphy's book

  19. Likelihood weighting Likelihood weighting what if we have an evidence? E.g., how to sample from the posterior? 0 p ( D , I , S , L ∣ G = g ) find a topological ordering assign a weight to each particle ( l ) ← 1 w sample by conditioning on parents when sampling an observed variable set it to its observed value G = g 1 update the sample's weight ( l ) ( l ) 1 ( l ) ( l ) ← w × p ( G = g ∣ D = d , I = i ) w current assignments to parents

  20. Likelihood weighting Likelihood weighting what if we have an evidence? E.g., how to sample from the posterior? 0 p ( D , I , S , L ∣ G = g ) using weighted particles for inference: ( l ) 0 w I ( S = s ) ∑ l 0 1 p ( S = s ∣ G = g ) = l ∑ l w l

  21. Likelihood weighting Likelihood weighting what if we have an evidence? E.g., how to sample from the posterior? 0 p ( D , I , S , L ∣ G = g ) using weighted particles for inference: ( l ) 0 w I ( S = s ) ∑ l 0 1 p ( S = s ∣ G = g ) = l ∑ l w l special case of importance sampling

  22. Unnormalized Unnormalized importance sampling importance sampling Objective: Monte Carlo estimate E [ f ( x )] q ( x ) p ( x ) p f ( x ) difficult to sample from p (yet easy to evaluate) use a proposal distribution q : p ( x ) > 0 ⇒ q ( x ) > 0 x image: Bishop's book

  23. Unnormalized Unnormalized importance sampling importance sampling Objective: Monte Carlo estimate E [ f ( x )] q ( x ) p ( x ) p f ( x ) difficult to sample from p (yet easy to evaluate) use a proposal distribution q : p ( x ) > 0 ⇒ q ( x ) > 0 x since p ( x ) p ( x ) E [ f ( x )] = f ( x )d x = E [ ∫ x p ( x ) f ( x )d x = ∫ x q ( x ) f ( x )] p q q ( x ) q ( x ) image: Bishop's book

  24. Unnormalized Unnormalized importance sampling importance sampling Objective: Monte Carlo estimate E [ f ( x )] q ( x ) p ( x ) p f ( x ) difficult to sample from p (yet easy to evaluate) use a proposal distribution q : p ( x ) > 0 ⇒ q ( x ) > 0 x since p ( x ) p ( x ) E [ f ( x )] = f ( x )d x = E [ ∫ x p ( x ) f ( x )d x = ∫ x q ( x ) f ( x )] p q q ( x ) q ( x ) sample X ∼ q ( x ) l ( l ) p ( X ) assign an importance sampling weight ( l ) w ( X ) = ( l ) q ( X ) image: Bishop's book

  25. Unnormalized importance sampling Unnormalized importance sampling Objective: Monte Carlo estimate E [ f ( x )] q ( x ) p ( x ) p f ( x ) difficult to sample from p (yet easy to evaluate) use a proposal distribution q : p ( x ) > 0 ⇒ q ( x ) > 0 x since p ( x ) p ( x ) E [ f ( x )] = f ( x )d x = E [ ∫ x p ( x ) f ( x )d x = ∫ x q ( x ) f ( x )] p q q ( x ) q ( x ) sample X ∼ q ( x ) l ( l ) p ( X ) assign an importance sampling weight ( l ) w ( X ) = ( l ) q ( X ) 1 ∑ l E [ f ( x )] ≈ ( l ) ( l ) w ( X ) f ( X ) is an unbiased estimator p L can be more efficient than sampling from p itself! (why?) image: Bishop's book

  26. normalized normalized importance sampling importance sampling ~ What if we can evaluate p, up to a constant? 1 p p ( x ) = ( x ) Z 1 posterior in directed models p ( x ∣ E = e ) = p ( x , e ) Examples p ( e ) 1 ∏ I prior in undirected models p ( x ) = ϕ ( x ) I I Z

  27. normalized normalized importance sampling importance sampling ~ What if we can evaluate p, up to a constant? 1 p p ( x ) = ( x ) Z 1 posterior in directed models p ( x ∣ E = e ) = p ( x , e ) Examples p ( e ) 1 ∏ I prior in undirected models p ( x ) = ϕ ( x ) I I Z ~ ~ ( x ) define p then E [ w ( x )] = w ( x ) = q ( x ) ( x )d x = Z ∫ x p q ~ 1 ∫ x E [ w ( x ) f ( x )] since ( x ) E [ f ( x )] = 1 E [ w ( x ) f ( x )] = p ∫ x p ( x ) f ( x )d x = q ( x ) f ( x )d x = q p q E [ w ( x )] q ( x ) Z Z q

  28. normalized importance sampling normalized importance sampling ~ What if we can evaluate p, up to a constant? 1 p p ( x ) = ( x ) Z 1 posterior in directed models p ( x ∣ E = e ) = p ( x , e ) Examples p ( e ) 1 ∏ I prior in undirected models p ( x ) = ϕ ( x ) I I Z ~ ~ ( x ) define p then E [ w ( x )] = w ( x ) = q ( x ) ( x )d x = Z ∫ x p q ~ 1 ∫ x E [ w ( x ) f ( x )] since ( x ) E [ f ( x )] = 1 E [ w ( x ) f ( x )] = p ∫ x p ( x ) f ( x )d x = q ( x ) f ( x )d x = q p q E [ w ( x )] q ( x ) Z Z q sample ( l ) ∼ q ( x ) X ~ ( l ) ( X ) assign an importance sampling weight ( l ) p w ( X ) = ( l ) q ( X ) ( l ) ( l ) w ( X ) f ( X ) ∑ l E [ f ( x )] ≈ is a biased estimator (e.g., consider L=1) p w ( X ( l ) ) ∑ l

  29. Revisiting likelihood weighting Revisiting likelihood weighting likelihood weighting: ( l ) 0 w I ( S = s ) ∑ l 0 2 1 p ( S = s ∣ G = g , I = i ) = l ∑ l w l equivalent to:

Recommend


More recommend