ECE 4524 Artificial Intelligence and Engineering Applications Lecture 20: Approximate Inference Reading: AIAMA 14.5, see also MacKay Chapters 29 and Chapters 27 and 33 Today’s Schedule: ◮ Inference by simulation ◮ sampling random variables ◮ direct sampling of BN, rejection and weighting ◮ Gibbs sampling ◮ Inference by optimization (if time) ◮ KL Divergence
Generating Random Variates An essential part of approximate inference by simulation is the sampling of random variates from arbitrary probability distributions. Some essential questions ◮ How is it possible for a deterministic computer to generate random numbers? ◮ What distributions are generally available in programming languages? The most common approach is to use pseudo-random number generators (PRNGs).
PRNGs for uniform variates ◮ PRNGs are iterated functions, recurrence relations x k +1 = f ( x k ) that display chaotic behavior (sensitive dependence on initial conditions). x 0 , the initial condition, is called the seed . ◮ A simple example is the logistic map x k +1 = rx k (1 − x k ) for x ∈ [0 , 1] and r > 0 which for r > 3 . 54 displays chaotic behavior. It is not very uniform though, which is a goal of a good PRNG. ◮ The most common PRNG is the Mersenne twister.
Generating arbitrary variates So using a PRNG we can generate a random variate from a uniform distribution, U (0 , N ). How do we generate other arbitrary variates? ◮ Transformation Approach ◮ Rejection Sampling
Sampling a Bayesian Network Recall the BN defines a factorization of the joint density n � P ( X 1 , X 2 , · · · , X n ) = P ( X i | parents( X i )) i =1 ◮ To sample from the joint density we sample from the conditional probabilities in order from causes to effects. ◮ We then build a histogram and normalize it to a density.
BN with evidence nodes When sampling from a BN with evidence nodes we can just disregard (reject) samples that conflict with the known value of the evidence. This is known as rejection sampling .
Likelihood Weighting Rejecting samples is wasteful, so instead, we can use them to generate weights.
Gibbs Sampling A different approach is to randomly perturbing nodes. ◮ Given an initial variate of the BN X 0 1 , X 0 2 , · · · , X 0 n ◮ randomly choose (or cycle in order) a node X i and generate a new variate conditioned on the existing variates in its Markov blanket. X i ∼ P ( X k +1 | M ( X k i )) i ◮ We accumulate a histogram as before and normalize.
Gibbs Sampling
Computing P ( Z i | mb( Z i )) To compute the conditional of the non-evidence variables given thier Markov Blanket, we use the values in the blanket (the parents, children, and children’s parents) at the previous iteration. � P ( Z i | mb ( Z i )) ∝ P ( Z i | parents( Z i )) P ( Y j | parents( Y j )) Y j = children ( Z i )
Variational Bayes Sampling can be computationally costly, especially for continuous R.V.s. A different approach to sampling is to approximate the posterior f ( x | e ) by a parameterized function q ( x ; λ ). ◮ The quality of the approximation is measured using the KullbackLeibler (KL) divergence � q ( x ; λ ) D KL [ q || f ] = q ( x ; λ ) ln f ( x | e ) dx which is zero when q = f . ◮ The goal is then to find the parameters λ that minimize the divergence, converting the inference problem to an optimization problem. A common choice for q is a Gaussian.
Next Actions ◮ Reading on Decision Theory and Utility (AIAMA 16.1-16.3) ◮ Complete (really simple) warmup before noon on 4/3. Note: You now have all you need to complete PS 3. It is due on 4/5.
Recommend
More recommend