RN, Chapter Bayesian Belief Network 14.4 Inference
Decision Theoretic Agents � Introduction to Probability [Ch13] � Belief networks [Ch14] � Introduction [Ch14.1-14.2] � Bayesian Net Inference [Ch14.4] (Bucket Elimination) � Dynamic Belief Networks [Ch15] � Single Decision [Ch16] � Sequential Decisions [Ch17] Game Theory [Ch17.6 – 17.7] � 2
Types of Reasoning � Typical case: P( QueryVar | EvidenceVars = vals ) � Eg: P( + Burglary | + JohnCalls, ¬ MaryCalls ) � Diagnostic : from effect to (possible) causes P( + Burglary | + JohnCalls ) = 0.016 � � Causal : from cause to effects P( + JohnCalls | + Burglary ) = 0.86 � � I nterCausal : between causes of common effect P( + Burglary | + Alarm ) = 0.376 � P(+ Burglary | + Alarm, + Earthquake ) = 0.003 � Earthquake EXPLAINS alarms, and so Earthquake EXPLAI NS AWAY burglary � Mixed : combinations of . . . P( Alarm | JohnCall, ¬ Earthquake ) = 0.03 3 �
Approaches to Belief Assessment � Exact, Guaranteed � PolyTree Algorithm � Inherent complexity. . . � Clustering Approach � Bucket Elimination � CutSet Approach Approximate, Guaranteed � � Algorithm Modification � Value Merging � Node Merging � Arc Removal Approximate, Probabilistic � � Logic Sampling � Likelihood Sampling 5
Inherent Complexity 1. A v B v C � Worst case: 2. C v D v ~ A 3. B v C v ~ D � NP-hard to get exact answer (# P-complete) � NP-hard to get answer within 0.5 � Cannot get relative error within 2 n1- ε unless P = NP � Cannot stochastically approximate 1-bit, unless P= RP � Efficient algorithm . . . � for “PolyTree”: Poly time � ≤ 1 path between any two nodes � if CPtable “bounded” (sub-exp time) wrt λ = M/m M = largest CPtable entry; m = smallest 11
Exact Inference: Re-arrange Sums ) b = B , a = A ∑ ( P = b ) a = A ( P P(+ b, + j, + m ) = ∑ e ∑ a P(+ b, E= e, A= a, + j, + m) = ∑ e ∑ a P(+ b) P(e) P(a|+ b,e) P(+ j|a) P(+ m|a) = P(+ b) ∑ e P(e) ∑ a P(a|+ b,e) P(+ j|a) P(+ m|a) 15
Still Duplicated Computation! P( + b, + j, + m ) = P(+ b ) ∑ e P( e ) ∑ a P( a | + b, e ) P(+ j | a ) P(+ m | a ) � Enumeration is inefficient: ... as repeated computation Computes P(+ j | a )P(+ m | a ) for each value of E: { + e, – e } � Better to have DAG… re-use COMMON SUBEXPRESSION ! 16
Bucket-Elimination : Set-up θ A= 1 θ A= 0 A 0.4 0.6 θ B= 1|A= a θ B= 0|A= a a C 1 0.325 0.675 B 0 0.440 0.550 θ C= 1|A= a θ C= 0|A= a a � Given 1 0.200 0.800 0 0.367 0.633 D � specific structure θ D= 1|B= b,C= θ D= 0|B= b,C= b c c c � specific CPtable entries 1 1 0.300 0.700 1 0 0.333 0.667 0 1 0.250 0.750 � Fixed ordering over variables: 0 0 0.450 0.550 π 0 = 〈 A,B,C,D 〉 � Create |Vars|+ 1 buckets � b { } , b A , b B , b C , b D 24
b f(b) e f(e) (b) = λ 〈 b 〉 . f B 0 0.999 (e) = λ 〈 e 〉 . f E 0 0.998 1 0.001 1 0.002 a e b f(a, e, b) 1 1 1 0.95 (a,e,b) = λ 〈 A,E,B 〉 . f A 1 1 0 0.29 : : : : 0 0 1 0.06 0 0 0 0.999 j a f(j,a) m a f(m,a) 1 1 0.90 1 1 0.70 (j,a) = λ 〈 J, A 〉 . f J 1 0 0.05 (m,a) = λ 〈 M, A 〉 . f M 1 0 0.01 0 1 0.10 0 1 0.30 0 0 0.95 0 0 0.99 –b, + j, + m 25
b f(b) e f(e) () = λ 〈〉 . f -b 0 0.999 (e) = λ 〈 e 〉 . f E 0 0.998 1 0.001 1 0.002 a e b f(a, e, b) 1 1 1 0.95 (a,e) = λ 〈 A,E 〉 . f A,-b 1 1 0 0.29 : : : : 0 0 1 0.06 0 0 0 0.999 j a f(j,a) m a f(m,a) 1 1 0.90 1 1 0.70 (a) = λ 〈 A 〉 . f + j 1 0 0. 05 (a) = λ 〈 A 〉 . f + m 1 0 0. 01 0 1 0.10 0 1 0.30 0 0 0.95 0 0 0.99 –b, + j, + m 26
b f(-b) e f(e) () = λ 〈〉 . f -b 0 0.999 (e) = λ 〈 e 〉 . f E 0 0.998 1 0.002 a e f(a, e, -b) (a,e) = λ 〈 A,E 〉 . f A,-b 1 1 0.29 : : : 0 0 0.999 a f(+ j,a) a f(+ m,a) 1 0.90 1 0.70 (a) = λ 〈 A 〉 . f + j 0 0.05 (a) = λ 〈 A 〉 . f + m 0 0.01 b { } b nil b B b B b E b E b A b A b J b J b M b M () = θ -b (e) = θ e (a,e) = θ a| -b,e f { } ,1 f E,1 f A,1 - - - (a) = θ + j|a f A,2 (a) = θ + m|a f A,3 27
“Variable Elimination”: Factors P( -b, + j, + m ) = P(-b ) ∑ e P( e ) ∑ a P( a | -b, e ) P(+ j | a ) P(+ m | a ) B E A J M � Store intermediate results (factors) to avoid recomputation � Factor for M: 2-element vector � Factor for J: � Factor for A: ≡ 4-element vector 28
BE Alg, con’t Process buckets, from highest to lowest � g X := elim X [ f X,1 ⋈ f X,2 ⋈ … ⋈ f X,k ] � g x is function of ∪ i Vars( f X,I ) – { X} � Process b A � Let highest index by “Y” g A (e) = elim A [ f A,1 ⋈ f A,2 ⋈ f A,3 ] Store g X into b Y � add to b E … � b { } b B b E b A b J b M () = θ -b (e) = θ e (a,e) = θ a|-b,e f { } ,1 f E,1 f A,1 - - - (a) = θ + j|a f A,2 (a) = θ + m|a f A,3 [ f A,1 ⋈ f A,2 ⋈ f E,2 (e) = elim A f A,3 ] 30
BE Alg, con’t Process buckets, from highest to lowest � g X := elim X [ f X,1 ⋈ f X,2 ⋈ … ⋈ f X,k ] � g x is function of ∪ i Vars( f X,I ) – { X} � Process b E � Let highest index by “Y” g E () = elim E [ f E,1 ⋈ f E,2 ] Store g X into b Y � add to b nill … � b nil b B b E b A b J b M () = θ -b (e) = θ e (a,e) = θ a|-b,e f { } ,1 f E,1 f A,1 - - - (a) = θ + j|a f E,2 (e) = … f A,2 (a) = θ + m|a f A,3 [ f E,1 ⋈ f { } ,2 () = elim E f E,2 ] 33
BE Alg, con’t Process buckets, from highest to lowest � g X := elim X [ f X,1 ⋈ f X,2 ⋈ … ⋈ f X,k ] � g x is function of ∪ i Vars( f X,I ) – { X} � Process b { } � Let highest index by “Y” g { } () = [ f { } ,1 ⋈ f { } ,2 ] Store g X into b Y � Return g { } l … � b { } b B b E b A b J b M () = θ -b (e) = θ e (a,e) = θ a|-b,e f { } ,1 f E,1 f A,1 - - - (a) = θ + j|a f { } ,2 () = … f E,2 (e) = … f A,2 (a) = θ + m|a f A,3 Return f { } ,1 ⋈ f { } ,2 34
Bucket Elimination Algorithm Given : � Belief Net BN = 〈 N, A, C 〉 � Order of nodes π = 〈 X 1 , … , X |N| 〉 � Evidence (nodes { E i } ⊂ N, values { e i } ) � (Single) Query node X ∈ N Compute: P(X | E 1 = e 1 , … ) by computing , … ) ∀ P(X = x, E 1 = e 1 x � Step# 1: Initialize |N| + 1 “buckets” � . . . bucket b i for variable X i � Each “instantiated form of CPtables" is function of variables � Store in bucket with highest index � Step# 2: Process each bucket � . . . from highest index down � to eliminate associated variable � Step# 3: Read off answer 35 � . . . in “top” bucket, b { }
Remove “Dead Variables” P(+ b, + j ) = = ∑ e ∑ a ∑ m P(+ b, E= e, A= a, + j, M= m) = ∑ e ∑ a ∑ m P(+ b) P(E= e) P(a|+ b,e) P(+ j|a) P(m|a) = P(+ b) ∑ e P(e) ∑ a P(a|+ b,e) P(+ j|a) ∑ m P(m|a) � Note for any A= a, ∑ m P( M= m | a ) = 1 ⇒ can remove this node! � In general: need to keep only nodes ABOVE query, evidence notes (Remove any nodes below) 36
Approaches to Belief Assessment � Exact, Guaranteed � PolyTree Algorithm � Inherent complexity. . . � Clustering Approach � Bucket Elimination � CutSet Approach Approximate, Guaranteed � � Algorithm Modification � Value Merging � Node Merging � Arc Removal Approximate, Probabilistic � � Logic Sampling � Likelihood Sampling 46
Logic Sampling + What is P( WG = + ) ? + � Get DataSample � Of 5 tuples, 2 have WG = + Set P( WG= + ) = 2/5 � But … how to generate examples? A � Uniform?? No! � What is P(+ a, -b) ? a P(+ b|a) + 1.0 � Based on distribution!! B - 0.0 47
Example of Logic Sampling To get value of “Cloudy”: Flip 0.5-coin � Assume “Cloudy = True” To get value of “Sprinkler”: Flip 0.1-coin � (as Cloudy = True, P( + s | + c ) = 0.10) Assume “Sprinkler = False” To get value of “Rain”: Flip 0.8-coin � (as Cloudy = True, P( + r | + c ) = 0.8) C C S S R R W W Assume “Rain = True” + T 0 F + T + T + + 0 + To get value of “WetGrass”: Flip 0.9-coin � 0 0 + 0 (as Sprinkler = F, Rain = T, P( + w | ¬ s, + r ) = 0.9) + + 0 + Assume “WetGrass = True” On other trials, get other results, as different results of coin-flips � 48
Stochastic Approximation 1: Logic Sampling � To estimate P(X | E = e ) : � To produce random instance from BN: PriorSample � Note: if E ≠ e, just ignore instance 49
Aside: Flipping A Coin � Consider flipping a (fair) coin m times. … expect to observe ≈ 0.5 m heads � Could have “bad run” ... suggesting coin is not fair. � How (un)likely to observe ≥ 55% heads? (10% more than expected) � ... as function of m : What's probability of � (1) m = 100, ≥ 55 heads � (2) m = 500, ≥ 275 heads � (3) m = 1000, ≥ 550 heads � (4) m = 10,000, ≥ 5,500 heads ? 50
Using Chernoff Bounds X i 's are iid… for now, with μ = 0.5 � � Prob of S m > 0.55 is < e -2 m 0.05^ 2 m = 100 ⇒ < 0.6 m = 500 ⇒ < 0.08 m = 1,000 ⇒ < 0.007 m = 10,000 ⇒ < 10 -22 51
Recommend
More recommend