Example Topology of network encodes conditional independence assertions: Cavity Weather Bayesian networks Toothache Catch Chapter 14.1–3 Weather is independent of the other variables Toothache and Catch are conditionally independent given Cavity Chapter 14.1–3 1 Chapter 14.1–3 4 Outline Example ♦ Syntax I’m at work, neighbor John calls to say my alarm is ringing, but neighbor Mary doesn’t call. Sometimes it’s set off by minor earthquakes. Is there a ♦ Semantics burglar? ♦ Parameterized distributions Variables: Burglar , Earthquake , Alarm , JohnCalls , MaryCalls Network topology reflects “causal” knowledge: – A burglar can set the alarm off – An earthquake can set the alarm off – The alarm can cause Mary to call – The alarm can cause John to call Chapter 14.1–3 2 Chapter 14.1–3 5 Bayesian networks Example contd. A simple, graphical notation for conditional independence assertions P(B) P(E) and hence for compact specification of full joint distributions Burglary Earthquake .002 .001 Syntax: a set of nodes, one per variable B E P(A|B,E) a directed, acyclic graph (link ≈ “directly influences”) T T .95 Alarm a conditional distribution for each node given its parents: T F .94 F T .29 P ( X i | Parents ( X i )) F F .001 In the simplest case, conditional distribution represented as a conditional probability table (CPT) giving the A P(J|A) A P(M|A) distribution over X i for each combination of parent values JohnCalls T .90 .70 MaryCalls T F .05 .01 F Chapter 14.1–3 3 Chapter 14.1–3 6
Compactness Global semantics A CPT for Boolean X i with k Boolean parents has “Global” semantics defines the full joint distribution B E B E 2 k rows for the combinations of parent values as the product of the local conditional distributions: A A P ( x 1 , . . . , x n ) = Π n Each row requires one number p for X i = true i = 1 P ( x i | parents ( X i )) (the number for X i = false is just 1 − p ) J M J M e.g., P ( j ∧ m ∧ a ∧ ¬ b ∧ ¬ e ) If each variable has no more than k parents, = P ( j | a ) P ( m | a ) P ( a |¬ b, ¬ e ) P ( ¬ b ) P ( ¬ e ) the complete network requires O ( n · 2 k ) numbers = 0 . 9 × 0 . 7 × 0 . 001 × 0 . 999 × 0 . 998 I.e., grows linearly with n , vs. O (2 n ) for the full joint distribution ≈ 0 . 00063 For burglary net, ?? numbers Chapter 14.1–3 7 Chapter 14.1–3 10 Compactness Constructing Bayesian networks A CPT for Boolean X i with k Boolean parents has Need a method such that a series of locally testable assertions of B E 2 k rows for the combinations of parent values conditional independence guarantees the required global semantics A Each row requires one number p for X i = true 1. Choose an ordering of variables X 1 , . . . , X n (the number for X i = false is just 1 − p ) 2. For i = 1 to n J M add X i to the network If each variable has no more than k parents, select parents from X 1 , . . . , X i − 1 such that the complete network requires O ( n · 2 k ) numbers P ( X i | Parents ( X i )) = P ( X i | X 1 , . . . , X i − 1 ) I.e., grows linearly with n , vs. O (2 n ) for the full joint distribution For burglary net, 1 + 1 + 4 + 2 + 2 = 10 numbers (vs. 2 5 − 1 = 31 ) Chapter 14.1–3 8 Chapter 14.1–3 11 Global semantics Constructing Bayesian networks Global semantics defines the full joint distribution Need a method such that a series of locally testable assertions of B E as the product of the local conditional distributions: conditional independence guarantees the required global semantics A P ( x 1 , . . . , x n ) = Π n i = 1 P ( x i | parents ( X i )) 1. Choose an ordering of variables X 1 , . . . , X n 2. For i = 1 to n J M e.g., P ( j ∧ m ∧ a ∧ ¬ b ∧ ¬ e ) add X i to the network select parents from X 1 , . . . , X i − 1 such that = P ( X i | Parents ( X i )) = P ( X i | X 1 , . . . , X i − 1 ) This choice of parents guarantees the global semantics: P ( X 1 , . . . , X n ) = Π n i = 1 P ( X i | X 1 , . . . , X i − 1 ) (chain rule) = Π n i = 1 P ( X i | Parents ( X i )) (by construction) Chapter 14.1–3 9 Chapter 14.1–3 12
Example Example Suppose we choose the ordering M , J , A , B , E Suppose we choose the ordering M , J , A , B , E MaryCalls MaryCalls JohnCalls JohnCalls Alarm Burglary Earthquake P ( J | M ) = P ( J ) ? P ( J | M ) = P ( J ) ? No P ( A | J, M ) = P ( A | J ) ? P ( A | J, M ) = P ( A ) ? No P ( B | A, J, M ) = P ( B | A ) ? Yes P ( B | A, J, M ) = P ( B ) ? No P ( E | B, A, J, M ) = P ( E | A ) ? P ( E | B, A, J, M ) = P ( E | A, B ) ? Chapter 14.1–3 13 Chapter 14.1–3 16 Example Example Suppose we choose the ordering M , J , A , B , E Suppose we choose the ordering M , J , A , B , E MaryCalls MaryCalls JohnCalls JohnCalls Alarm Alarm Burglary Earthquake P ( J | M ) = P ( J ) ? No P ( J | M ) = P ( J ) ? No P ( A | J, M ) = P ( A | J ) ? P ( A | J, M ) = P ( A ) ? P ( A | J, M ) = P ( A | J ) ? P ( A | J, M ) = P ( A ) ? No P ( B | A, J, M ) = P ( B | A ) ? Yes P ( B | A, J, M ) = P ( B ) ? No P ( E | B, A, J, M ) = P ( E | A ) ? No P ( E | B, A, J, M ) = P ( E | A, B ) ? Yes Chapter 14.1–3 14 Chapter 14.1–3 17 Example Example contd. Suppose we choose the ordering M , J , A , B , E MaryCalls MaryCalls JohnCalls JohnCalls Alarm Alarm Burglary Burglary Earthquake P ( J | M ) = P ( J ) ? No Deciding conditional independence is hard in noncausal directions P ( A | J, M ) = P ( A | J ) ? P ( A | J, M ) = P ( A ) ? No (Causal models and conditional independence seem hardwired for humans!) P ( B | A, J, M ) = P ( B | A ) ? P ( B | A, J, M ) = P ( B ) ? Assessing conditional probabilities is hard in noncausal directions Network is less compact: 1 + 2 + 4 + 2 + 4 = 13 numbers needed Chapter 14.1–3 15 Chapter 14.1–3 18
Example: Car diagnosis Compact conditional distributions contd. Initial evidence: car won’t start Noisy-OR distributions model multiple noninteracting causes Testable variables (green), “broken, so fix it” variables (orange) 1) Parents U 1 . . . U k include all causes (can add leak node) Hidden variables (gray) ensure sparse structure, reduce parameters 2) Independent failure probability q i for each cause alone ⇒ P ( X | U 1 . . . U j , ¬ U j +1 . . . ¬ U k ) = 1 − Π j i = 1 q i fanbelt alternator battery age broken broken Cold Flu Malaria P ( Fever ) P ( ¬ Fever ) F F F 1 . 0 0.0 F F T 0 . 9 0.1 battery no charging F T F 0 . 8 0.2 dead F T T 0 . 98 0 . 02 = 0 . 2 × 0 . 1 T F F 0 . 4 0.6 battery battery fuel line starter no oil no gas T F T 0 . 94 0 . 06 = 0 . 6 × 0 . 1 flat blocked broken meter T T F 0 . 88 0 . 12 = 0 . 6 × 0 . 2 T T T 0 . 988 0 . 012 = 0 . 6 × 0 . 2 × 0 . 1 Number of parameters linear in number of parents car won’t lights oil light gas gauge dipstick start Chapter 14.1–3 19 Chapter 14.1–3 22 Example: Car insurance Hybrid (discrete+continuous) networks SocioEcon Discrete ( Subsidy ? and Buys ? ); continuous ( Harvest and Cost ) Age GoodStudent ExtraCar Subsidy? Harvest Mileage RiskAversion VehicleYear SeniorTrain Cost DrivingSkill MakeModel DrivingHist Antilock DrivQuality HomeBase AntiTheft CarValue Airbag Buys? Accident Ruggedness Theft OwnDamage Option 1: discretization—possibly large errors, large CPTs Cushioning Option 2: finitely parameterized canonical families OwnCost OtherCost 1) Continuous variable, discrete+continuous parents (e.g., Cost ) MedicalCost LiabilityCost PropertyCost 2) Discrete variable, continuous parents (e.g., Buys ? ) Chapter 14.1–3 20 Chapter 14.1–3 23 Compact conditional distributions Continuous variables 2 πσ e − ( x − µ ) 2 / 2 σ 2 CPT grows exponentially with number of parents 1 Gaussian density P ( x ) = √ CPT becomes infinite with continuous-valued parent or child Solution: canonical distributions that are defined compactly Deterministic nodes are the simplest case: X = f ( Parents ( X )) for some function f 0 E.g., Boolean functions NorthAmerican ⇔ Canadian ∨ US ∨ Mexican Uniform density P ( X = x ) = U [18 , 26]( x ) = uniform density between 18 and 26 E.g., numerical relationships among continuous variables 0.125 ∂Level = inflow + precipitation - outflow - evaporation ∂t 18 dx 26 Chapter 14.1–3 21 Chapter 14.1–3 24
Recommend
More recommend