Introduction Introduction Direct sampling methods Direct sampling methods Inference by Markov chain simulation Inference by Markov chain simulation Summary Summary Where are we? Informatics 2D – Reasoning and Agents Semester 2, 2019–2020 Last time . . . Alex Lascarides ◮ Inference in Bayesian Networks alex@inf.ed.ac.uk ◮ Exact methods: enumeration, variable elimination algorithm ◮ Computationally intractable in the worst case Today . . . ◮ Approximate Inference in Bayesian Networks Lecture 25 – Approximate Inference in Bayesian Networks 17th March 2020 Informatics UoE Informatics 2D 1 Informatics UoE Informatics 2D 140 Introduction Introduction Direct sampling methods Direct sampling methods Rejection sampling Inference by Markov chain simulation Inference by Markov chain simulation Likelihood weighting Summary Summary Approximate inference in BNs Direct sampling methods ◮ Basic idea: generate samples from a known probability distribution ◮ Exact inference computationally very hard ◮ Consider an unbiased coin as a random variable – sampling from ◮ Approximate methods important, here randomised sampling the distribution is like flipping the coin algorithms ◮ It is possible to sample any distribution on a single variable given a ◮ Monte Carlo algorithms set of random numbers from [0,1] ◮ We will talk about two types of MC algorithms: ◮ Simplest method: generate events from network without evidence 1. Direct sampling methods ◮ Sample each variable in ‘topological order’ 2. Markov chain sampling ◮ Probability distribution for sampled value is conditioned on values assigned to parents Informatics UoE Informatics 2D 141 Informatics UoE Informatics 2D 142
Introduction Introduction Direct sampling methods Rejection sampling Direct sampling methods Rejection sampling Inference by Markov chain simulation Likelihood weighting Inference by Markov chain simulation Likelihood weighting Summary Summary Example Example ◮ Consider the following BN and ordering [ Cloudy , Sprinkler , Rain , WetGrass ]: ◮ Direct sampling process: P ( C ) = .5 ◮ Sample from P ( Cloudy ) = ⟨ 0 . 5 , 0 . 5 ⟩ , suppose this returns true ◮ Sample from P ( Sprinkler | Cloudy = true ) = ⟨ 0 . 1 , 0 . 9 ⟩ , suppose this Cloudy returns false ◮ Sample from P ( Rain | Cloudy = true ) = ⟨ 0 . 8 , 0 . 2 ⟩ , C P ( S ) C P ( R ) suppose this returns true Rain Sprinkler t .10 t .80 ◮ Sample from f .50 f .20 P ( WetGrass | Sprinkler = false , Rain = true ) = ⟨ 0 . 9 , 0 . 1 ⟩ , suppose Wet Grass this returns true S R P ( W ) ◮ Event returned=[ true , false , true , true ] t t .99 t f .90 f t .90 f f .00 Informatics UoE Informatics 2D 143 Informatics UoE Informatics 2D 144 Introduction Introduction Direct sampling methods Rejection sampling Direct sampling methods Rejection sampling Inference by Markov chain simulation Likelihood weighting Inference by Markov chain simulation Likelihood weighting Summary Summary Direct sampling methods Rejection sampling ◮ Generates samples with probability S ( x 1 , . . . , x n ) ◮ Purpose: to produce samples for hard-to-sample distribution from n � S ( x 1 , . . . , x n ) = P ( x 1 , . . . , x n ) = P ( x i | parents ( X i )) easy-to-sample distribution i =1 ◮ To determine P ( X | e ) generate samples from the prior distribution i.e. in accordance with the distribution specified by the BN first ◮ Answers are computed by counting the number N ( x 1 , . . . , x n ) of ◮ Then reject those that do not match the evidence the times event x 1 , . . . , x n was generated and dividing by total ◮ The estimate ˆ P ( X = x | e ) is obtained by counting how often number N of all samples ◮ In the limit, we should get X = x occurs in the remaining samples ◮ Rejection sampling is consistent because, by definition: N ( x 1 , . . . , x n ) lim = S ( x 1 , . . . , x n ) = P ( x 1 , . . . , x n ) N n →∞ P ( X | e ) = N ( X , e ) ≈ P ( X , e ) ˆ = P ( X | e ) ◮ If the estimated probability ˆ P becomes exact in the limit we call N ( e ) P ( e ) the estimate consistent and we write “ ≈ ” in this sense, e.g. P ( x 1 , . . . , x n ) ≈ N ( x 1 , . . . , x n ) / N Informatics UoE Informatics 2D 145 Informatics UoE Informatics 2D 146
Introduction Introduction Direct sampling methods Rejection sampling Direct sampling methods Rejection sampling Inference by Markov chain simulation Likelihood weighting Inference by Markov chain simulation Likelihood weighting Summary Summary Back to our example Likelihood weighting ◮ Assume we want to estimate P ( Rain | Sprinkler = true ), using 100 ◮ Avoids inefficiency of rejection sampling by generating only samples ◮ 73 have Sprinkler = false (rejected), 27 have Sprinkler = true samples consistent with evidence ◮ Of these 27, 8 have Rain = true and 19 have Rain = false ◮ Fixes the values for evidence variables E and samples only the ◮ P ( Rain | Sprinkler = true ) ≈ α ⟨ 8 , 19 ⟩ = ⟨ 0 . 296 , 0 . 704 ⟩ remaining variables X and Y ◮ True answer would be ⟨ 0 . 3 , 0 . 7 ⟩ ◮ Since not all events are equally probable, each event has to be ◮ But the procedure rejects too many samples that are not weighted by its likelihood that it accords to the evidence consistent with e (exponential in number of variables) ◮ Likelihood is measured by product of conditional probabilities for ◮ Not really usable (similar to naively estimating conditional each evidence variable, given its parents probabilities from observation) Informatics UoE Informatics 2D 147 Informatics UoE Informatics 2D 148 Introduction Introduction Direct sampling methods Rejection sampling Direct sampling methods Rejection sampling Inference by Markov chain simulation Likelihood weighting Inference by Markov chain simulation Likelihood weighting Summary Summary Likelihood weighting Likelihood weighting – why it works ◮ S ( z , e ) = � l i =1 P ( z i | parents ( Z i )) ◮ Consider query P ( Rain | Sprinkler = true , WetGrass = true ) in our example; initially set weight w = 1, then event is generated: ◮ S ’s sample values for each Z i is influenced by the evidence among ◮ Sample from P ( Cloudy ) = ⟨ 0 . 5 , 0 . 5 ⟩ , suppose this returns true Z i ’s ancestors ◮ Sprinkler is evidence variable with value true , we set ◮ But S pays no attention when sampling Z i ’s value to evidence from Z i ’s non-ancestors; so it’s not sampling from the true w ← w × P ( Sprinkler = true | Cloudy = true ) = 0 . 1 ◮ Sample from P ( Rain | Cloudy = true ) = ⟨ 0 . 8 , 0 . 2 ⟩ , suppose this posterior probability distribution ! returns true ◮ But the likelihood weight w makes up for the di ff erence between ◮ WetGrass is evidence variable with value true , we set the actual and desired sampling distributions: w ← w × P ( WetGrass = true | Sprinkler = true , Rain = true ) = 0 . 099 m � w ( z , e ) = P ( e i | parents ( E i )) ◮ Sample returned=[ true , true , true , true ] with weight 0.099 tallied i =1 under Rain = true Informatics UoE Informatics 2D 149 Informatics UoE Informatics 2D 150
Recommend
More recommend