overview
play

Overview 1. Probabilistic Reasoning/Graphical models 2. Importance - PowerPoint PPT Presentation

Overview 1. Probabilistic Reasoning/Graphical models 2. Importance Sampling 3. Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling Overview 1. Probabilistic


  1. Likelihood Weighting: Sampling Sample in topological order over X ! e e e e e e e e e Clamp evidence, Sample x i  P(X i |pa i ), P(X i |pa i ) is a look-up in CPT! 29

  2. Likelihood Weighting: Proposal Distribution   Q ( X \ E ) P ( X | pa , e ) i i Notice: Q is another Bayesian network  X X \ E i Example :    Given a Bayesian network : P(X , X , X ) P(X ) P(X | X ) P(X | X , X ) and 1 2 3 1 2 1 3 1 2  Evidence X x . 2 2    Q(X , X ) P ( X ) P ( X | X , X x ) 1 3 1 3 1 2 2 Weights :  Given a sample : x ( x ,.., x ) 1 n    P ( x | pa , e ) P ( e | pa ) i i j j P x e ( , )   X X \ E E E   w i j  Q ( x ) P ( x | pa , e ) i i  X X \ E i   P ( e | pa ) j j  E E j 30

  3. Likelihood Weighting: Estimates T 1  ˆ  ( t ) Estimate P(e): P ( e ) w T  t 1 Estimate Posterior Marginals: T  ( t ) ( t ) w g ( x ) ˆ x P ( x , e ) i ˆ    i t 1 P ( x | e ) ˆ i T  P ( e ) ( t ) w  t 1   ( t ) t g ( x ) 1 if x x and equals zero otherwise x i i i 31

  4. Likelihood Weighting • Converges to exact posterior marginals • Generates Samples Fast • Sampling distribution is close to prior (especially if E  Leaf Nodes) • Increasing sampling variance  Convergence may be slow  Many samples with P(x (t) )=0 rejected 32

  5. Outline • Definitions and Background on Statistics • Theory of importance sampling • Likelihood weighting • Error estimation • State-of-the-art importance sampling techniques 33

  6. absolute

  7. Outline • Definitions and Background on Statistics • Theory of importance sampling • Likelihood weighting • State-of-the-art importance sampling techniques 38

  8. Proposal selection • One should try to select a proposal that is as close as possible to the posterior distribution.   2   Var [ w ( z )] 1 P ( z , e )  ˆ      Q Var P ( e ) P ( e ) Q ( z )   Q   T N Q ( z )  z Z P ( z , e )   P ( e ) 0 , to have a zero - variance estimator Q ( z ) P ( z , e )   Q ( z ) P ( e )   Q ( z ) P ( z | e )

  9. Perfect sampling using Bucket Elimination • Algorithm: – Run Bucket elimination on the problem along an ordering o=(X N ,..,X 1 ). – Sample along the reverse ordering: (X 1 ,..,X N ) – At each variable X i , recover the probability P(X i |x 1 ,...,x i-1 ) by referring to the bucket.

  10. Bucket Elimination Query:    Elimination Order: d,e,b,c A P ( a | e 0 ) P ( a , e 0 ) A    B P ( a , e 0 ) P ( a ) P ( b | a ) P ( c | a ) P ( d | a , b ) P ( e | b , c ) B C C  c , b , e 0 , d      P ( a ) P ( c | a ) P ( b | a ) P ( e | b , c ) P ( d | a , b ) D E D E  c b e 0 d Original Functions Messages Bucket Tree D E D,A,B E,B,C   f ( a , b ) P ( d | a , b ) D: P ( d | a , b ) D f D ( a , b ) f E ( b , c ) d   P ( e | b , c ) f E ( b , c ) P ( e 0 | b , c ) E: B  B,A,C  f ( a , c ) P ( b | a ) f ( a , b ) f ( b , c ) P ( b | a ) B: B D E f B ( a , c ) b   P ( c | a ) f ( a ) P ( c | a ) f ( a , c ) C: C B C c C,A   P ( a , e 0 ) p ( A ) f ( a ) A: P ( a ) C f C ( a ) A A Time and space exp(w*) 41

  11. Bucket elimination (BE) Algorithm elim-bel (Dechter 1996)  Elimination operator b bucket B: P(B|A) P(D|B,A) P(e|B,C) B bucket C: P(C|A) h B (A, D, C, e) C h C bucket D: (A, D, e) D h D bucket E: (A, e) E bucket A: P(a) h E (a) A P(e) SP2 42

  12. Sampling from the output of BE (Dechter 2002)    Set A a, D d, C c in the bucket    Sample : B b Q(B | a, e, d) P ( B | a ) P ( d | B , a ) P ( e | b , c ) bucket B: P(B|A) P(D|B,A) P(e|B,C)   Set A a, D d in the bucket bucket C: P(C|A) h B (A, D, C, e)     B Sample : C c Q(C | a, e, d) P ( C | A ) h (a, d, C, e)  Set A a in the bucket bucket D: h C (A, D, e)    C Sample : D d Q(D | a, e) h (a, D, e) bucket E: h D (A, e) Evidence bucket : ignore bucket A: P(A)   E Q(A) P(A) h (A) h E (A)   Sample : A a Q(A) SP2 43

  13. Mini- buckets: “local inference” • Computation in a bucket is time and space exponential in the number of variables involved • Therefore, partition functions in a bucket into “mini - buckets” on smaller number of variables • Can control the size of each “mini - bucket”, yielding polynomial complexity. SP2 44

  14. Mini-Bucket Elimination Space and Time constraints: Mini-buckets Maximum scope size of the new Σ B  Σ B  function generated should be bounded by 2 bucket B: P(e|B,C) P(B|A) P(D|B,A) P(C|A) h B (C,e) bucket C: BE generates a function having scope size 3. So it cannot be used. h B (A,D) bucket D: h C (A,e) bucket E: P(A) h D (A) h E (A) bucket A: Approximation of P(e) 45 45 SP2

  15. Sampling from the output of MBE bucket B: P(e|B,C) P(B|A) P(D|B,A) P(C|A) h B (C,e) bucket C: h B (A,D) bucket D: Sampling is same as in BE-sampling h C (A,e) bucket E: except that now we construct Q from a randomly selected “mini - bucket” h D (A) h E (A) bucket A: 46 46 SP2

  16. IJGP-Sampling (Gogate and Dechter, 2005) • Iterative Join Graph Propagation (IJGP) – A Generalized Belief Propagation scheme (Yedidia et al., 2002) • IJGP yields better approximations of P(X|E) than MBE – (Dechter, Kask and Mateescu, 2002) • Output of IJGP is same as mini-bucket “clusters” • Currently the best performing IS scheme!

  17. Current Research question • Given a Bayesian network with evidence or a Markov network representing function P, generate another Bayesian network representing a function Q (from a family of distributions, restricted by structure) such that Q is closest to P. • Current approaches – Mini-buckets – Ijgp – Both • Experimented, but need to be justified theoretically.

  18. Algorithm: Approximate Sampling 1) Run IJGP or MBE 2) At each branch point compute the edge probabilities by consulting output of IJGP or MBE • Rejection Problem: – Some assignments generated are non solutions

  19. Adaptive Importance Sampling      1 Initial Proposal Q ( Z ) Q ( Z ) Q ( Z | pa ( Z )) ... Q ( Z | pa ( Z )) 1 2 2 n n ˆ   P ( E e ) 0  For i 1 to k do 1 N k Generate samples z ,..., z from Q N 1  ˆ ˆ     i P (E e) P ( E e ) w ( z ) k N  j 1        k 1 k k Update Q Q ( k ) Q Q ' End ˆ  P ( E e ) Re turn k

  20. Adaptive Importance Sampling • General case • Given k proposal distributions • Take N samples out of each distribution • Approximate P(e) k 1    ˆ     P ( e ) Avg weight jth proposal k  j 1

  21. Estimating Q'(z)     ' Q ( Z ) Q ' ( Z ) Q ' ( Z | pa ( Z )) ... Q ' ( Z | pa ( Z )) 1 2 2 n n where each Q' (Z | Z ,.., Z ) i 1 i - 1 is estimated by importance sampling

  22. Overview 1. Probabilistic Reasoning/Graphical models 2. Importance Sampling 3. Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling

  23. Markov Chain x 1 x 2 x 3 x 4 • A Markov chain is a discrete random process with the property that the next state depends only on the current state ( Markov Property ) :    t 1 2 t 1 t t 1 P ( x | x , x ,..., x ) P ( x | x ) • If P(X t |x t-1 ) does not depend on t ( time homogeneous ) and state space is finite, then it is often expressed as a transition function (aka  transition matrix )   P ( X x ) 1 x 54

  24. Example: Drunkard’s Walk • a random walk on the number line where, at each step, the position may change by +1 or −1 with equal probability 1 2 1 2 3   P ( n 1 ) P ( n 1 )  D ( X ) { 0 , 1 , 2 ,...} n 0 . 5 0 . 5 transition matrix P(X) 55

  25. Example: Weather Model rain rain rain sun rain  D ( X ) { rainy , sunny } P ( rainy ) P ( sunny ) rainy 0 . 9 0 . 1 sunny 0 . 5 0 . 5 transition matrix P(X) 56

  26. Multi-Variable System   X { X , X , X }, D ( X ) discrete , finite 1 2 3 i • state is an assignment of values to all the variables t t+1 x 1 x 1 t t+1 x 2 x 2 t t+1 x 3 x 3 x  t t t t { x , x ,..., x } 1 2 n 57

  27. Bayesian Network System • Bayesian Network is a representation of the joint probability distribution over 2 or more variables t t+1 X 1 x 1 X 1 t t+1 X 2 x 2 X 2 X 3 t t+1 X 3 x 3 X  x  { X , X , X } t t t t { x , x , x } 1 2 3 1 2 3 58

  28. Stationary Distribution Existence • If the Markov chain is time-homogeneous, then the vector  (X) is a stationary distribution (aka invariant or equilibrium distribution, aka “fixed point” ), if its entries sum up to 1 and satisfy:     ( x ) ( x ) P ( x | x ) i j i j  x D ( X ) i • Finite state space Markov chain has a unique stationary distribution if and only if: – The chain is irreducible – All of its states are positive recurrent 59

  29. Irreducible • A state x is irreducible if under the transition rule one has nonzero probability of moving from x to any other state and then coming back in a finite number of steps • If one state is irreducible, then all the states must be irreducible (Liu, Ch. 12, pp. 249, Def. 12.1.1) 60

  30. Recurrent • A state x is recurrent if the chain returns to x with probability 1 • Let M( x ) be the expected number of steps to return to state x • State x is positive recurrent if M( x ) is finite The recurrent states in a finite state chain are positive recurrent . 61

  31. Stationary Distribution Convergence • Consider infinite Markov chain:   ( n ) n 0 0 n P P ( x | x ) P P • If the chain is both irreducible and aperiodic , then:   ( n ) lim P   n • Initial state is not important in the limit “The most useful feature of a “good” Markov chain is its fast forgetfulness of its past…” (Liu, Ch. 12.1) 62

  32. Aperiodic • Define d(i) = g.c.d.{n > 0 | it is possible to go from i to i in n steps}. Here, g.c.d. means the greatest common divisor of the integers in the set. If d(i)=1 for  i , then chain is aperiodic • Positive recurrent, aperiodic states are ergodic 63

  33. Markov Chain Monte Carlo • How do we estimate P(X) , e.g., P(X|e) ? • Generate samples that form Markov Chain with stationary distribution  =P(X|e) • Estimate  from samples (observed states): visited states x 0 ,…, x n can be viewed as “samples” from distribution  T 1     t ( x ) ( x , x ) T  t 1    lim ( x )   T 64

  34. MCMC Summary • Convergence is guaranteed in the limit • Initial state is not important, but… typically, we throw away first K samples - “ burn-in ” • Samples are dependent, not i.i.d. • Convergence ( mixing rate ) may be slow • The stronger correlation between states, the slower convergence! 65

  35. Gibbs Sampling (Geman&Geman,1984) • Gibbs sampler is an algorithm to generate a sequence of samples from the joint probability distribution of two or more random variables • Sample new variable value one variable at a time from the variable’s conditional distribution:   t t t t t P ( X ) P ( X | x ,.., x , x ,..., x } P ( X | x \ x )   i i 1 i 1 i 1 n i i • Samples form a Markov chain with stationary distribution P(X|e) 66

  36. Gibbs Sampling: Illustration The process of Gibbs sampling can be understood as a random walk in the space of all instantiations of X=x (remember drunkard’s walk): In one step we can reach instantiations that differ from current one by value assignment to at most one variable (assume randomized choice of variables X i ).

  37. Ordered Gibbs Sampler Generate sample x t+1 from x t :    t 1 t t t X x P ( X | x , x ,..., x , e ) Process 1 1 1 2 3 N     All t 1 t 1 t t X x P ( X | x , x ,..., x , e ) 2 2 2 1 3 N Variables ... In Some Order       t 1 t 1 t 1 t 1 X x P ( X | x , x ,..., x , e )  N N N 1 2 N 1 In short, for i=1 to N:    t 1 t X x sampled from P ( X | x \ x , e ) i i i i 68

  38. Transition Probabilities in BN Given Markov blanket (parents, children, and their parents), X i is independent of all other nodes X i Markov blanket :     markov ( X ) pa ch ( pa ) i i i j  X j ch j  t t P ( X | x \ x ) P ( X | markov ) : i i i i   t P ( x | x \ x ) P ( x | pa ) P ( x | pa ) i i i i j j  X j ch i Computation is linear in the size of Markov blanket! 69

  39. Ordered Gibbs Sampling Algorithm (Pearl,1988) Input: X, E=e Output: T samples {x t } Fix evidence E=e, initialize x 0 at random 1. For t = 1 to T (compute samples) 2. For i = 1 to N (loop through variables) t+1  P(X i | markov i t ) 3. x i 4. End For 5. End For

  40. Gibbs Sampling Example - BN   X { X , X ,..., X }, E { X } 1 2 9 9 X 1 = x 1 0 X1 X3 X6 X 6 = x 6 0 X 2 = x 2 0 X2 X5 X8 X 7 = x 7 0 X 3 = x 3 0 X9 X 8 = x 8 0 X4 X7 X 4 = x 4 0 X 5 = x 5 0 71

  41. Gibbs Sampling Example - BN   X { X , X ,..., X }, E { X } 1 2 9 9 X1 X3 X6 x  1 0 0 P ( X | x ,..., x , x ) 1 1 2 8 9 X2 X5 X8 x  1 1 0 P ( X | x ,..., x , x ) 2 2 1 8 9  X9 X4 X7 72

  42. Answering Queries P(x i |e) = ? • Method 1 : count # of samples where X i = x i ( histogram estimator ): Dirac delta f-n T 1     t P ( X x ) ( x , x ) i i i T  t 1 • Method 2 : average probability ( mixture estimator ): T 1     t P ( X x ) P ( X x | markov ) i i i i i T  t 1 • Mixture estimator converges faster (consider estimates for the unobserved values of X i ; prove via Rao-Blackwell theorem)

  43. Rao-Blackwell Theorem Rao-Blackwell Theorem: Let random variable set X be composed of two groups of variables, R and L. Then, for the joint distribution  (R,L) and function g, the following result applies  Var [ E { g ( R ) | L } Var [ g ( R )] for a function of interest g, e.g., the mean or covariance ( Casella&Robert,1996, Liu et. al. 1995 ). • theorem makes a weak promise, but works well in practice! • improvement depends the choice of R and L 74

  44. Importance vs. Gibbs ˆ  t x P ( X | e ) Gibbs: ˆ       T P ( X | e ) P ( X | e ) T 1   ˆ t g ( X ) g ( x ) T  t 1  Importance: t w t X Q ( X | e ) t t T 1 g ( x ) P ( x )   g t T Q ( x )  t 1

  45. Gibbs Sampling: Convergence • Sample from ` P(X|e)  P(X|e) • Converges iff chain is irreducible and ergodic • Intuition - must be able to explore all states: – if X i and X j are strongly correlated, X i =0  X j =0, then, we cannot explore states with X i =1 and X j =1 • All conditions are satisfied when all probabilities are positive • Convergence rate can be characterized by the second eigen-value of transition matrix 76

  46. Gibbs: Speeding Convergence Reduce dependence between samples (autocorrelation) • Skip samples • Randomize Variable Sampling Order • Employ blocking (grouping) • Multiple chains Reduce variance (cover in the next section) 77

  47. Blocking Gibbs Sampler • Sample several variables together, as a block • Example: Given three variables X,Y,Z , with domains of size 2, group Y and Z together to form a variable W ={ Y,Z } with domain size 4. Then, given sample ( x t , y t , z t ), compute next sample:    t 1 t t t x P ( X | y , z ) P ( w )       t 1 t 1 t 1 t 1 ( y , z ) w P ( Y , Z | x ) + Can improve convergence greatly when two variables are strongly correlated! - Domain of the block variable grows exponentially with the #variables in a block! 78

  48. Gibbs: Multiple Chains • Generate M chains of size K • Each chain produces independent estimate P m : K 1   t P ( x | e ) P ( x | x \ x ) m i i i K  t 1 • Estimate P(x i |e) as average of P m (x i |e) : M 1      ˆ    P P m M  i 1 Treat P m as independent random variables. 79

  49. Gibbs Sampling Summary • Markov Chain Monte Carlo method (Gelfand and Smith, 1990, Smith and Roberts, 1993, Tierney, 1994) • Samples are dependent , form Markov Chain • Sample from which converges to P ( X | e ) P ( X | e ) • Guaranteed to converge when all P > 0 • Methods to improve convergence: – Blocking – Rao-Blackwellised 80

  50. Overview 1. Probabilistic Reasoning/Graphical models 2. Importance Sampling 3. Markov Chain Monte Carlo: Gibbs Sampling 4. Sampling in presence of Determinism 5. Rao-Blackwellisation 6. AND/OR importance sampling

  51. Sampling: Performance • Gibbs sampling – Reduce dependence between samples • Importance sampling – Reduce variance • Achieve both by sampling a subset of variables and integrating out the rest (reduce dimensionality), aka Rao-Blackwellisation • Exploit graph structure to manage the extra cost 82

  52. Smaller Subset State-Space • Smaller state-space is easier to cover X  X  { X , X , X , X } { X 1 X , } 1 2 3 4 2   D ( X ) 64 D ( X ) 16 83

  53. Smoother Distribution P(X 1 ,X 2 ,X 3 ,X 4 ) P(X 1 ,X 2 ) 0-0.1 0.1-0.2 0.2-0.26 0-0.1 0.1-0.2 0.2-0.26 0.2 0.2 1 0.1 11 0.1 0 10 0 0 01 0 00 01 00 1 10 11 84

  54. Speeding Up Convergence • Mean Squared Error of the estimator:       2 MSE P BIAS Var P Q Q • In case of unbiased estimator, BIAS=0     ˆ ˆ ˆ 2      2 MSE [ P ] Var [ P ] E P E [ P ]   Q Q Q Q • Reduce variance  speed up convergence ! 85

  55. Rao-Blackwellisation   X R L 1    ˆ 1 T  g ( x ) { h ( x ) h ( x )} T 1 ~    1 T  g ( x ) { E [ h ( x ) | l ] E [ h ( x ) | l ]} T   Var { g ( x )} Var { E [ g ( x ) | l ]} E {var[ g ( x ) | l ]}  Var { g ( x )} Var { E [ g ( x ) | l ]} Var { h ( x )} Var { E [ h ( x ) | l ]} ~    ˆ Var { g ( x )} Var { g ( x )} T T Liu, Ch.2.3 86

  56. Rao-Blackwellisation “Carry out analytical computation as much as possible” - Liu • X=R  L • Importance Sampling: P ( R , L ) P ( R )  Var { } Var { } Q Q Q ( R , L ) Q ( R ) Liu, Ch.2.5.5 • Gibbs Sampling: – autocovariances are lower (less correlation between samples) – if X i and X j are strongly correlated, X i =0  X j =0, only include one fo them into a sampling set 87

  57. Blocking Gibbs Sampler vs. Collapsed X Y Z • Standard Gibbs: P ( x | y , z ), P ( y | x , z ), P ( z | x , y ) (1) Faster • Blocking: Convergence P ( x | y , z ), P ( y , z | x ) (2) • Collapsed: (3) P ( x | y ), P ( y | x ) 88

  58. Collapsed Gibbs Sampling Generating Samples Generate sample c t+1 from c t :    t 1 t t t C c P ( c | c , c ,..., c , e ) 1 1 1 2 3 K     t 1 t 1 t t C c P ( c | c , c ,..., c , e ) 2 2 2 1 3 K ...       t 1 t 1 t 1 t 1 C c P ( c | c , c ,..., c , e )  K K K 1 2 K 1 In short, for i=1 to K:    t 1 t C c sampled from P ( c | c \ c , e ) i i i i 89

  59. Collapsed Gibbs Sampler Input: C  X, E=e Output: T samples {c t } Fix evidence E=e, initialize c 0 at random 1. For t = 1 to T (compute samples) 2. For i = 1 to N (loop through variables) t+1  P(C i | c t \ c i ) 3. c i 4. End For 5. End For

  60. Calculation Time • Computing P(c i | c t \ c i ,e) is more expensive (requires inference) • Trading #samples for smaller variance: – generate more samples with higher covariance – generate fewer samples with lower covariance • Must control the time spent computing sampling probabilities in order to be time- effective! 91

  61. Exploiting Graph Properties Recall… computation time is exponential in the adjusted induced width of a graph • w -cutset is a subset of variable s.t. when they are observed, induced width of the graph is w • when sampled variables form a w -cutset , inference is exp( w ) (e.g., using Bucket Tree Elimination ) • cycle-cutset is a special case of w -cutset Sampling w -cutset  w-cutset sampling! 92

  62. What If C=Cycle-Cutset ?   0 0 0 c { x ,x }, E { X } 2 5 9 P(x 2 ,x 5 ,x 9 ) – can compute using Bucket Elimination X1 X2 X3 X1 X3 X4 X5 X6 X4 X6 X9 X9 X7 X8 X7 X8 P(x 2 ,x 5 ,x 9 ) – computation complexity is O(N) 93

  63. Computing Transition Probabilities Compute joint probabilities: X1 X2 X3  BE : P ( x 0 , x , x ) 2 3 9  BE : P ( x 1 , x , x ) X4 X5 X6 2 3 9 Normalize: X9 X7 X8      P ( x 0 , x , x ) P ( x 1 , x , x ) 2 3 9 2 3 9     P ( x 0 | x ) P ( x 0 , x , x ) 2 3 2 3 9     P ( x 1 | x ) P ( x 1 , x , x ) 2 3 2 3 9 94

  64. Cutset Sampling-Answering Queries • Query:  c i  C, P(c i |e)=? same as Gibbs: T 1  ˆ  t P ( c |e ) P ( c | c \ c , e ) i i i T  t 1 computed while generating sample t using bucket tree elimination • Query:  x i  X\C, P(x i |e)=? T 1   t P (x |e) P ( x | c ,e ) i i T  t 1 compute after generating sample t using bucket tree elimination 95

  65. Cutset Sampling vs. Cutset Conditioning • Cutset Conditioning    P(x |e) P ( x | c,e ) P ( c | e ) i i  c D ( C ) • Cutset Sampling T 1   t P (x |e) P ( x | c ,e ) i i T  t 1 count ( c )    P ( x | c,e ) i T  c D ( C )    P ( x | c,e ) P ( c | e ) i  c D ( C ) 96

  66. Cutset Sampling Example Estimating P(x 2 |e) for sampling node X 2 : Sample 1  1 0 x P ( x | x ,x ) 2 2 5 9  X1 X2 X3 Sample 2  2 1 x P ( x | x ,x ) 2 2 5 9  X4 X5 X6 Sample 3  3 2 x P ( x | x ,x ) 2 2 5 9   0 P ( x | x ,x ) 2 5 9   1 X9 X7 X8   1   P ( x | x ) P ( x | x ,x ) 2 9 2 5 9 3    2 P ( x | x ,x )   2 5 9 97

  67. Cutset Sampling Example Estimating P(x 3 |e) for non-sampled node X 3 :   1 1 1 1 1 c { x , x } P ( x | x , x , x ) 2 5 3 2 5 9 X1 X2 X3   2 2 2 2 2 c { x , x } P ( x | x , x , x ) 2 5 3 2 5 9   3 3 3 3 3 c { x , x } P ( x | x , x , x ) 2 5 3 2 5 9 X4 X5 X6   1 1 P ( x | x , x , x ) 3 2 5 9   1   2 2   P ( x | x ) P ( x | x , x , x ) 3 9 3 2 5 9 3   X9  3 3 X7 X8 P ( x | x , x , x )   3 2 5 9 98

  68. CPCS54 Test Results CPCS54, n=54, |C|=15, |E|=3 CPCS54, n=54, |C|=15, |E|=3 Cutset Gibbs Cutset Gibbs 0.004 0.0008 0.003 0.0006 0.002 0.0004 0.001 0.0002 0 0 0 1000 2000 3000 4000 5000 0 5 10 15 20 25 # samples Time(sec) MSE vs. #samples (left) and time (right) Ergodic, |X|=54, D(X i )=2, |C|=15, |E|=3 Exact Time = 30 sec using Cutset Conditioning 99

  69. CPCS179 Test Results CPCS179, n=179, |C|=8, |E|=35 CPCS179, n=179, |C|=8, |E|=35 Cutset Gibbs Cutset Gibbs 0.012 0.012 0.01 0.01 0.008 0.008 0.006 0.006 0.004 0.004 0.002 0.002 0 0 100 500 1000 2000 3000 4000 0 20 40 60 80 # samples Time(sec) MSE vs. #samples (left) and time (right) Non-Ergodic (1 deterministic CPT entry) |X| = 179, |C| = 8, 2<= D(X i )<=4, |E| = 35 Exact Time = 122 sec using Cutset Conditioning 100

Recommend


More recommend