  Where are we? Informatics 2D – Reasoning and Agents Semester 2, 2019–2020 Last time . . . Alex Lascarides ◮ Inference in Bayesian Networks alex@inf.ed.ac.uk ◮ Exact methods: enumeration, variable elimination algorithm ◮ Computationally intractable in the worst case Today . . . ◮ Approximate Inference in Bayesian Networks Lecture 25 – Approximate Inference in Bayesian Networks 17th March 2020 Approximate inference in BNs Direct sampling methods ◮ Basic idea: generate samples from a known probability distribution ◮ Exact inference computationally very hard ◮ Consider an unbiased coin as a random variable – sampling from ◮ Approximate methods important, here randomised sampling the distribution is like flipping the coin algorithms ◮ It is possible to sample any distribution on a single variable given a ◮ Monte Carlo algorithms set of random numbers from [0,1] ◮ We will talk about two types of MC algorithms: ◮ Simplest method: generate events from network without evidence 1. Direct sampling methods ◮ Sample each variable in 'topological order' 2. Markov chain sampling ◮ Probability distribution for sampled value is conditioned on values assigned to parents

  Example ◮ Consider the following BN and ordering [ Cloudy , Sprinkler , Rain , WetGrass ]: ◮ Direct sampling process: P ( C ) = .5 ◮ Sample from P ( Cloudy ) = ⟨ 0 . 5 , 0 . 5 ⟩ , suppose this returns true ◮ Sample from P ( Sprinkler | Cloudy = true ) = ⟨ 0 . 1 , 0 . 9 ⟩ , suppose this Cloudy returns false ◮ Sample from P ( Rain | Cloudy = true ) = ⟨ 0 . 8 , 0 . 2 ⟩ , C P ( S ) C P ( R ) suppose this returns true Rain Sprinkler t .10 t .80 ◮ Sample from f .50 f .20 P ( WetGrass | Sprinkler = false , Rain = true ) = ⟨ 0 . 9 , 0 . 1 ⟩ , suppose Wet Grass this returns true S R P ( W ) ◮ Event returned=[ true , false , true , true ] t t .99 t f .90 f t .90 f f .00 Direct sampling methods Rejection sampling ◮ Generates samples with probability S ( x 1 , . . . , x n ) ◮ Purpose: to produce samples for hard-to-sample distribution from n � S ( x 1 , . . . , x n ) = P ( x 1 , . . . , x n ) = P ( x i | parents ( X i )) easy-to-sample distribution i =1 ◮ To determine P ( X | e ) generate samples from the prior distribution i.e. in accordance with the distribution specified by the BN first ◮ Answers are computed by counting the number N ( x 1 , . . . , x n ) of ◮ Then reject those that do not match the evidence the times event x 1 , . . . , x n was generated and dividing by total ◮ The estimate ˆ P ( X = x | e ) is obtained by counting how often number N of all samples ◮ In the limit, we should get X = x occurs in the remaining samples ◮ Rejection sampling is consistent because, by definition: N ( x 1 , . . . , x n ) lim = S ( x 1 , . . . , x n ) = P ( x 1 , . . . , x n ) N n →∞ P ( X | e ) = N ( X , e ) ≈ P ( X , e ) ˆ = P ( X | e ) ◮ If the estimated probability ˆ P becomes exact in the limit we call N ( e ) P ( e ) the estimate consistent and we write " ≈ " in this sense, e.g. P ( x 1 , . . . , x n ) ≈ N ( x 1 , . . . , x n ) / N

  Back to our example Likelihood weighting ◮ Assume we want to estimate P ( Rain | Sprinkler = true ), using 100 ◮ Avoids inefficiency of rejection sampling by generating only samples ◮ 73 have Sprinkler = false (rejected), 27 have Sprinkler = true samples consistent with evidence ◮ Of these 27, 8 have Rain = true and 19 have Rain = false ◮ Fixes the values for evidence variables E and samples only the ◮ P ( Rain | Sprinkler = true ) ≈ α ⟨ 8 , 19 ⟩ = ⟨ 0 . 296 , 0 . 704 ⟩ remaining variables X and Y ◮ True answer would be ⟨ 0 . 3 , 0 . 7 ⟩ ◮ Since not all events are equally probable, each event has to be ◮ But the procedure rejects too many samples that are not weighted by its likelihood that it accords to the evidence consistent with e (exponential in number of variables) ◮ Likelihood is measured by product of conditional probabilities for ◮ Not really usable (similar to naively estimating conditional each evidence variable, given its parents probabilities from observation) Likelihood weighting Likelihood weighting – why it works ◮ S ( z , e ) = � l i =1 P ( z i | parents ( Z i )) ◮ Consider query P ( Rain | Sprinkler = true , WetGrass = true ) in our example; initially set weight w = 1, then event is generated: ◮ S 's sample values for each Z i is influenced by the evidence among ◮ Sample from P ( Cloudy ) = ⟨ 0 . 5 , 0 . 5 ⟩ , suppose this returns true Z i 's ancestors ◮ Sprinkler is evidence variable with value true , we set ◮ But S pays no attention when sampling Z i 's value to evidence from Z i 's non-ancestors; so it's not sampling from the true w ← w × P ( Sprinkler = true | Cloudy = true ) = 0 . 1 ◮ Sample from P ( Rain | Cloudy = true ) = ⟨ 0 . 8 , 0 . 2 ⟩ , suppose this posterior probability distribution ! returns true ◮ But the likelihood weight w makes up for the di ff erence between ◮ WetGrass is evidence variable with value true , we set the actual and desired sampling distributions: w ← w × P ( WetGrass = true | Sprinkler = true , Rain = true ) = 0 . 099 m � w ( z , e ) = P ( e i | parents ( E i )) ◮ Sample returned=[ true , true , true , true ] with weight 0.099 tallied i =1 under Rain = true

