example
play

Example Topology of network encodes conditional independence - PDF document

Example Topology of network encodes conditional independence assertions: Cavity Weather Bayesian networks Toothache Catch Chapter 14.13 Weather is independent of the other variables Toothache and Catch are conditionally independent given


  1. Example Topology of network encodes conditional independence assertions: Cavity Weather Bayesian networks Toothache Catch Chapter 14.1–3 Weather is independent of the other variables Toothache and Catch are conditionally independent given Cavity Chapter 14.1–3 1 Chapter 14.1–3 4 Outline Example ♦ Syntax I’m at work, neighbor John calls to say my alarm is ringing, but neighbor Mary doesn’t call. Sometimes it’s set off by minor earthquakes. Is there a ♦ Semantics burglar? ♦ Parameterized distributions Variables: Burglar , Earthquake , Alarm , JohnCalls , MaryCalls Network topology reflects “causal” knowledge: – A burglar can set the alarm off – An earthquake can set the alarm off – The alarm can cause Mary to call – The alarm can cause John to call Chapter 14.1–3 2 Chapter 14.1–3 5 Bayesian networks Example contd. A simple, graphical notation for conditional independence assertions P(B) P(E) and hence for compact specification of full joint distributions Burglary Earthquake .002 .001 Syntax: a set of nodes, one per variable B E P(A|B,E) a directed, acyclic graph (link ≈ “directly influences”) T T .95 Alarm a conditional distribution for each node given its parents: T F .94 F T .29 P ( X i | Parents ( X i )) F F .001 In the simplest case, conditional distribution represented as a conditional probability table (CPT) giving the A P(J|A) A P(M|A) distribution over X i for each combination of parent values JohnCalls T .90 .70 MaryCalls T F .05 .01 F Chapter 14.1–3 3 Chapter 14.1–3 6

  2. Compactness Global semantics A CPT for Boolean X i with k Boolean parents has “Global” semantics defines the full joint distribution B E B E 2 k rows for the combinations of parent values as the product of the local conditional distributions: A A P ( x 1 , . . . , x n ) = Π n Each row requires one number p for X i = true i = 1 P ( x i | parents ( X i )) (the number for X i = false is just 1 − p ) J M J M e.g., P ( j ∧ m ∧ a ∧ ¬ b ∧ ¬ e ) If each variable has no more than k parents, = P ( j | a ) P ( m | a ) P ( a |¬ b, ¬ e ) P ( ¬ b ) P ( ¬ e ) the complete network requires O ( n · 2 k ) numbers = 0 . 9 × 0 . 7 × 0 . 001 × 0 . 999 × 0 . 998 I.e., grows linearly with n , vs. O (2 n ) for the full joint distribution ≈ 0 . 00063 For burglary net, ?? numbers Chapter 14.1–3 7 Chapter 14.1–3 10 Compactness Constructing Bayesian networks A CPT for Boolean X i with k Boolean parents has Need a method such that a series of locally testable assertions of B E 2 k rows for the combinations of parent values conditional independence guarantees the required global semantics A Each row requires one number p for X i = true 1. Choose an ordering of variables X 1 , . . . , X n (the number for X i = false is just 1 − p ) 2. For i = 1 to n J M add X i to the network If each variable has no more than k parents, select parents from X 1 , . . . , X i − 1 such that the complete network requires O ( n · 2 k ) numbers P ( X i | Parents ( X i )) = P ( X i | X 1 , . . . , X i − 1 ) I.e., grows linearly with n , vs. O (2 n ) for the full joint distribution For burglary net, 1 + 1 + 4 + 2 + 2 = 10 numbers (vs. 2 5 − 1 = 31 ) Chapter 14.1–3 8 Chapter 14.1–3 11 Global semantics Constructing Bayesian networks Global semantics defines the full joint distribution Need a method such that a series of locally testable assertions of B E as the product of the local conditional distributions: conditional independence guarantees the required global semantics A P ( x 1 , . . . , x n ) = Π n i = 1 P ( x i | parents ( X i )) 1. Choose an ordering of variables X 1 , . . . , X n 2. For i = 1 to n J M e.g., P ( j ∧ m ∧ a ∧ ¬ b ∧ ¬ e ) add X i to the network select parents from X 1 , . . . , X i − 1 such that = P ( X i | Parents ( X i )) = P ( X i | X 1 , . . . , X i − 1 ) This choice of parents guarantees the global semantics: P ( X 1 , . . . , X n ) = Π n i = 1 P ( X i | X 1 , . . . , X i − 1 ) (chain rule) = Π n i = 1 P ( X i | Parents ( X i )) (by construction) Chapter 14.1–3 9 Chapter 14.1–3 12

  3. Example Example Suppose we choose the ordering M , J , A , B , E Suppose we choose the ordering M , J , A , B , E MaryCalls MaryCalls JohnCalls JohnCalls Alarm Burglary Earthquake P ( J | M ) = P ( J ) ? P ( J | M ) = P ( J ) ? No P ( A | J, M ) = P ( A | J ) ? P ( A | J, M ) = P ( A ) ? No P ( B | A, J, M ) = P ( B | A ) ? Yes P ( B | A, J, M ) = P ( B ) ? No P ( E | B, A, J, M ) = P ( E | A ) ? P ( E | B, A, J, M ) = P ( E | A, B ) ? Chapter 14.1–3 13 Chapter 14.1–3 16 Example Example Suppose we choose the ordering M , J , A , B , E Suppose we choose the ordering M , J , A , B , E MaryCalls MaryCalls JohnCalls JohnCalls Alarm Alarm Burglary Earthquake P ( J | M ) = P ( J ) ? No P ( J | M ) = P ( J ) ? No P ( A | J, M ) = P ( A | J ) ? P ( A | J, M ) = P ( A ) ? P ( A | J, M ) = P ( A | J ) ? P ( A | J, M ) = P ( A ) ? No P ( B | A, J, M ) = P ( B | A ) ? Yes P ( B | A, J, M ) = P ( B ) ? No P ( E | B, A, J, M ) = P ( E | A ) ? No P ( E | B, A, J, M ) = P ( E | A, B ) ? Yes Chapter 14.1–3 14 Chapter 14.1–3 17 Example Example contd. Suppose we choose the ordering M , J , A , B , E MaryCalls MaryCalls JohnCalls JohnCalls Alarm Alarm Burglary Burglary Earthquake P ( J | M ) = P ( J ) ? No Deciding conditional independence is hard in noncausal directions P ( A | J, M ) = P ( A | J ) ? P ( A | J, M ) = P ( A ) ? No (Causal models and conditional independence seem hardwired for humans!) P ( B | A, J, M ) = P ( B | A ) ? P ( B | A, J, M ) = P ( B ) ? Assessing conditional probabilities is hard in noncausal directions Network is less compact: 1 + 2 + 4 + 2 + 4 = 13 numbers needed Chapter 14.1–3 15 Chapter 14.1–3 18

  4. Example: Car diagnosis Compact conditional distributions contd. Initial evidence: car won’t start Noisy-OR distributions model multiple noninteracting causes Testable variables (green), “broken, so fix it” variables (orange) 1) Parents U 1 . . . U k include all causes (can add leak node) Hidden variables (gray) ensure sparse structure, reduce parameters 2) Independent failure probability q i for each cause alone ⇒ P ( X | U 1 . . . U j , ¬ U j +1 . . . ¬ U k ) = 1 − Π j i = 1 q i fanbelt alternator battery age broken broken Cold Flu Malaria P ( Fever ) P ( ¬ Fever ) F F F 1 . 0 0.0 F F T 0 . 9 0.1 battery no charging F T F 0 . 8 0.2 dead F T T 0 . 98 0 . 02 = 0 . 2 × 0 . 1 T F F 0 . 4 0.6 battery battery fuel line starter no oil no gas T F T 0 . 94 0 . 06 = 0 . 6 × 0 . 1 flat blocked broken meter T T F 0 . 88 0 . 12 = 0 . 6 × 0 . 2 T T T 0 . 988 0 . 012 = 0 . 6 × 0 . 2 × 0 . 1 Number of parameters linear in number of parents car won’t lights oil light gas gauge dipstick start Chapter 14.1–3 19 Chapter 14.1–3 22 Example: Car insurance Hybrid (discrete+continuous) networks SocioEcon Discrete ( Subsidy ? and Buys ? ); continuous ( Harvest and Cost ) Age GoodStudent ExtraCar Subsidy? Harvest Mileage RiskAversion VehicleYear SeniorTrain Cost DrivingSkill MakeModel DrivingHist Antilock DrivQuality HomeBase AntiTheft CarValue Airbag Buys? Accident Ruggedness Theft OwnDamage Option 1: discretization—possibly large errors, large CPTs Cushioning Option 2: finitely parameterized canonical families OwnCost OtherCost 1) Continuous variable, discrete+continuous parents (e.g., Cost ) MedicalCost LiabilityCost PropertyCost 2) Discrete variable, continuous parents (e.g., Buys ? ) Chapter 14.1–3 20 Chapter 14.1–3 23 Compact conditional distributions Continuous variables 2 πσ e − ( x − µ ) 2 / 2 σ 2 CPT grows exponentially with number of parents 1 Gaussian density P ( x ) = √ CPT becomes infinite with continuous-valued parent or child Solution: canonical distributions that are defined compactly Deterministic nodes are the simplest case: X = f ( Parents ( X )) for some function f 0 E.g., Boolean functions NorthAmerican ⇔ Canadian ∨ US ∨ Mexican Uniform density P ( X = x ) = U [18 , 26]( x ) = uniform density between 18 and 26 E.g., numerical relationships among continuous variables 0.125 ∂Level = inflow + precipitation - outflow - evaporation ∂t 18 dx 26 Chapter 14.1–3 21 Chapter 14.1–3 24

Recommend


More recommend