bayesian networks approximate inference
play

Bayesian networks: approximate inference Machine Intelligence - PowerPoint PPT Presentation

Bayesian networks: approximate inference Machine Intelligence Thomas D. Nielsen September 2008 Approximative inference September 2008 1 / 25 Approximate Inference Motivation Because of the (worst-case) intractability of exact inference in


  1. Bayesian networks: approximate inference Machine Intelligence Thomas D. Nielsen September 2008 Approximative inference September 2008 1 / 25

  2. Approximate Inference Motivation Because of the (worst-case) intractability of exact inference in Bayesian networks, try to find more efficient approximate inference techniques: Instead of computing exact posterior P ( A | E = e ) compute approximation P ( A | E = e ) ˆ with P ( A | E = e ) ∼ P ( A | E = e ) ˆ Approximative inference September 2008 2 / 25

  3. Approximate Inference Absolute/Relative Error For p , ˆ p ∈ [ 0 , 1 ] : ˆ p is approximation for p with absolute error ≤ ǫ , if | p − ˆ p |≤ ǫ, i.e. ˆ p ∈ [ p − ǫ, p + ǫ ] . Approximative inference September 2008 3 / 25

  4. Approximate Inference Absolute/Relative Error For p , ˆ p ∈ [ 0 , 1 ] : ˆ p is approximation for p with absolute error ≤ ǫ , if | p − ˆ p |≤ ǫ, i.e. ˆ p ∈ [ p − ǫ, p + ǫ ] . p is approximation for p with relative error ≤ ǫ , if ˆ p / p |≤ ǫ, i.e. ˆ p ∈ [ p ( 1 − ǫ ) , p ( 1 + ǫ )] . | 1 − ˆ Approximative inference September 2008 3 / 25

  5. Approximate Inference Absolute/Relative Error For p , ˆ p ∈ [ 0 , 1 ] : ˆ p is approximation for p with absolute error ≤ ǫ , if | p − ˆ p |≤ ǫ, i.e. ˆ p ∈ [ p − ǫ, p + ǫ ] . p is approximation for p with relative error ≤ ǫ , if ˆ p / p |≤ ǫ, i.e. ˆ p ∈ [ p ( 1 − ǫ ) , p ( 1 + ǫ )] . | 1 − ˆ This definition is not always fully satisfactory, because it is not symmetric in p and ˆ p and not invariant under the transition p → ( 1 − p ) , ˆ p → ( 1 − ˆ p ) . Use with care! p 1 , ˆ p 2 are approximations for p 1 , p 2 with absolute error ≤ ǫ , then no error bounds follow for When ˆ p 1 / ˆ p 2 as an approximation for p 1 / p 2 . ˆ p 1 , ˆ p 2 are approximations for p 1 , p 2 with relative error ≤ ǫ , then ˆ p 1 / ˆ p 2 approximates p 1 / p 2 When ˆ with relative error ≤ ( 2 ǫ ) / ( 1 + ǫ ) . Approximative inference September 2008 3 / 25

  6. Approximate Inference Randomized Methods Most methods for approximate inference are randomized algorithms that compute approximations P from random samples of instantiations. ˆ We shall consider: Forward sampling Likelihood weighting Gibbs sampling Metropolis Hastings algorithm Approximative inference September 2008 4 / 25

  7. Approximate Inference Forward Sampling Observation: can use Bayesian network as random generator that produces full instantiations V = v according to distribution P ( V ) . Example: A A t f .2 .8 - Generate random numbers r 1 , r 2 uniformly from [0,1]. - Set A = t if r 1 ≤ . 2 and A = f else. B - Depending on the value of A and r 2 set B B A t f to t or f . t .7 .3 f .4 .6 Generation of one random instantiation: linear in size of network. Approximative inference September 2008 5 / 25

  8. Approximate Inference Sampling Algorithm Thus, we have a randomized algorithm S that produces possible outputs from sp ( V ) according to the distribution P ( V ) . Define P ( A = a | E = e ) := |{ i ∈ 1 , . . . , N | E = e , A = a in S i }| ˆ |{ i ∈ 1 , . . . , N | E = e in S i }| Approximative inference September 2008 6 / 25

  9. Approximate Inference Forward Sampling: Illustration Sample with not E = e E = e , A � = a E = e , A = a # Approximation for P ( A = a | E = e ) : # ∪ Approximative inference September 2008 7 / 25

  10. Approximate Inference Sampling from the conditional distribution Problem of forward sampling: samples with E � = e are useless! Idea: find sampling algorithm S c that produces outputs from sp ( V ) according to the distribution P ( V | E = e ) . Approximative inference September 2008 8 / 25

  11. Approximate Inference Sampling from the conditional distribution Problem of forward sampling: samples with E � = e are useless! Idea: find sampling algorithm S c that produces outputs from sp ( V ) according to the distribution P ( V | E = e ) . A tempting approach: Fix the variables in E to e and sample from the nonevidence variables only! Approximative inference September 2008 8 / 25

  12. Approximate Inference Sampling from the conditional distribution Problem of forward sampling: samples with E � = e are useless! Idea: find sampling algorithm S c that produces outputs from sp ( V ) according to the distribution P ( V | E = e ) . A tempting approach: Fix the variables in E to e and sample from the nonevidence variables only! Problem: Only evidence from the ancestors are taken into account! Approximative inference September 2008 8 / 25

  13. Approximate Inference Likelihood weighting We would like to sample from ( pa ( X ) ′′ are the parents in E ) P ( X | pa ( X ) ′ , pa ( X ) ′′ = e ) × P ( X = e | pa ( X ) ′ , pa ( X ) ′′ = e ) , P ( U , e ) = Y Y X ∈U\ E X ∈ E but by applying forward sampling with fixed E we actually sample from: P ( X | pa ( X ) ′ , pa ( X ) ′′ = e ) . Y Sampling distribution = X ∈U\ E Approximative inference September 2008 9 / 25

  14. Approximate Inference Likelihood weighting We would like to sample from ( pa ( X ) ′′ are the parents in E ) P ( X | pa ( X ) ′ , pa ( X ) ′′ = e ) × P ( X = e | pa ( X ) ′ , pa ( X ) ′′ = e ) , P ( U , e ) = Y Y X ∈U\ E X ∈ E but by applying forward sampling with fixed E we actually sample from: P ( X | pa ( X ) ′ , pa ( X ) ′′ = e ) . Y Sampling distribution = X ∈U\ E Solution: Instead of letting each sample count as 1, use w ( x , e ) = P ( X = e | pa ( X ) ′ , pa ( X ) ′′ = e ) . Y X ∈ E Approximative inference September 2008 9 / 25

  15. Approximate Inference Likelihood weighting: example A A - Assume evidence B = t . t f .2 .8 - Generate a random number r uniformly from [0,1]. - Set A = t if r ≤ . 2 and A = f else. B - If A = t then let the sample count as B A t f w ( t , t ) = 0 . 7; otherwise w ( f , t ) = 0 . 4. t .7 .3 f .4 .6 Approximative inference September 2008 10 / 25

  16. Approximate Inference Likelihood weighting: example A A - Assume evidence B = t . t f .2 .8 - Generate a random number r uniformly from [0,1]. - Set A = t if r ≤ . 2 and A = f else. B - If A = t then let the sample count as B A t f w ( t , t ) = 0 . 7; otherwise w ( f , t ) = 0 . 4. t .7 .3 f .4 .6 With N samples ( a 1 , . . . , a N ) we get P N i = 1 w ( a i = t , e ) P ( A = t | B = t ) = ˆ . P N i = 1 ( w ( a i = t , e ) + w ( a i = f , e )) Approximative inference September 2008 10 / 25

  17. Approximate Inference Gibbs Sampling For notational convenience assume from now on that for some l : E = V l + 1 , V l + 2 , . . . , V n . Write W for V 1 , . . . , V l . Principle: obtain new sample from previous sample by randomly changing the value of only one selected variable. Procedure Gibbs sampling v 0 = ( v 0 , 1 , . . . , v 0 , l ) := arbitrary instantiation of W i := 1 repeat forever choose V k ∈ W # deterministic or randomized generate randomly v i , k according to distribution P ( V k | V 1 = v i − 1 , 1 , . . . , V k − 1 = v i − 1 , k − 1 , V k + 1 = v i − 1 , k + 1 , . . . , V l = v i − 1 , l , E = e ) set v i = ( v i − 1 , 1 , . . . v i − 1 , k − 1 , v i , k , v i − 1 , k + 1 , . . . , v i − 1 , l ) i := i + 1 Approximative inference September 2008 11 / 25

  18. Approximate Inference Illustration The process of Gibbs sampling can be understood as a random walk in the space of all instantiations with E = e : Reachable in one step: instantiations that differ from current one by value assignment to at most one variable (assume randomized choice of variable V k ). Approximative inference September 2008 12 / 25

  19. Approximate Inference Implementation of Sampling Step The sampling step generate randomly v i , k according to distribution P ( V k | V 1 = v i − 1 , 1 , . . . , V k − 1 = v i − 1 , k − 1 , V k + 1 = v i − 1 , k + 1 , . . . , V l = v i − 1 , l , E = e ) requires sampling from a conditional distribution. In this special case (all but one variables are instantiated) this is easy: just need to compute for each v ∈ sp ( V k ) the probability P ( V 1 = v i − 1 , 1 , . . . , V k − 1 = v i − 1 , k − 1 , V k = v , V k + 1 = v i − 1 , k + 1 , . . . , V l = v i − 1 , l , E = e ) (linear in network size), and choose v i , k according to these probabilities (normalized). This can be further simplified by computing the distribution on sp ( V k ) only in the Markov blanket of V k , i.e. the subnetwork consisting of V k , its parents, its children, and the parents of its children. Approximative inference September 2008 13 / 25

  20. Approximate Inference Convergence of Gibbs Sampling Under certain conditions: the distribution of samples converges to the posterior distribution P ( W | E = e ) : i →∞ P ( v i = v ) = P ( W = v | E = e ) ( v ∈ sp ( W )) . lim Sufficient conditions are: in the repeat loop of the Gibbs sampler, variable V k is randomly selected (with non-zero selection probability for all V k ∈ W ), and the Bayesian network has no zero-entries in its cpt’s Approximative inference September 2008 14 / 25

Recommend


More recommend