baysian networks
play

Baysian Networks Marco Chiarandini Department of Mathematics & - PowerPoint PPT Presentation

Lecture 5 Baysian Networks Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Slides by Stuart Russell and Peter Norvig Probability Basis Course Overview Bayesian networks Introduction


  1. Lecture 5 Baysian Networks Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Slides by Stuart Russell and Peter Norvig

  2. Probability Basis Course Overview Bayesian networks ✔ Introduction Learning ✔ Artificial Intelligence Supervised ✔ Intelligent Agents Learning Bayesian Networks, Neural Networks ✔ Search Unsupervised ✔ Uninformed Search EM Algorithm ✔ Heuristic Search Reinforcement Learning Uncertain knowledge and Games and Adversarial Search Reasoning Minimax search and Probability and Bayesian Alpha-beta pruning approach Multiagent search Bayesian Networks Knowledge representation and Hidden Markov Chains Reasoning Kalman Filters Propositional logic First order logic Inference Plannning 2

  3. Probability Basis Outline Bayesian networks 1. Probability Basis 2. Bayesian networks 3

  4. Probability Basis Summary Bayesian networks Probability is a rigorous formalism for uncertain knowledge Joint probability distribution specifies probability of every atomic event Queries can be answered by summing over atomic events For nontrivial domains, we must find a way to reduce the joint size Independence and conditional independence provide the tools 4

  5. Probability Basis Outline Bayesian networks 1. Probability Basis 2. Bayesian networks 5

  6. Probability Basis Outline Bayesian networks ♦ Syntax ♦ Semantics ♦ Parameterized distributions 6

  7. Probability Basis Bayesian networks Bayesian networks Definition A simple, graphical notation for conditional independence assertions and hence for compact specification of full joint distributions Syntax: a set of nodes, one per variable a directed, acyclic graph (link ≈ “directly influences”) a conditional distribution for each node given its parents: Pr ( X i | Parents ( X i )) In the simplest case, conditional distribution represented as a conditional probability table (CPT) giving the distribution over X i for each combination of parent values 7

  8. Probability Basis Example Bayesian networks Topology of network encodes conditional independence assertions: Cavity Weather Toothache Catch Weather is independent of the other variables Toothache and Catch are conditionally independent given Cavity 8

  9. Probability Basis Example Bayesian networks I’m at work, neighbor John calls to say my alarm is ringing, but neighbor Mary doesn’t call. Sometimes it’s set off by minor earthquakes. Is there a burglar? Variables: Burglar , Earthquake , Alarm , JohnCalls , MaryCalls Network topology reflects “causal” knowledge: – A burglar can set the alarm off – An earthquake can set the alarm off – The alarm can cause Mary to call – The alarm can cause John to call 9

  10. Probability Basis Example contd. Bayesian networks P(E) P(B) Burglary Earthquake .001 .002 B E P(A|B,E) T T .95 Alarm T F .94 F T .29 F F .001 A P(J|A) A P(M|A) T JohnCalls .90 .70 MaryCalls T F .05 F .01 10

  11. Probability Basis Compactness Bayesian networks A CPT for Boolean X i with k Boolean parents has 2 k B E rows for the combinations of parent values A Each row requires one number p for X i = true (the number for X i = false is just 1 − p ) If each variable has no more than k parents, J M the complete network requires O ( n · 2 k ) numbers I.e., grows linearly with n , vs. O ( 2 n ) for the full joint distribution For burglary net, 1 + 1 + 4 + 2 + 2 = 10 numbers (vs. 2 5 − 1 = 31) 11

  12. Probability Basis Global semantics Bayesian networks “Global” semantics defines the full joint distribution B E as the product of the local conditional distributions: n A � P ( x 1 , . . . , x n ) = P ( x i | parents ( X i )) i = 1 J M e.g., P ( j ∧ m ∧ a ∧ ¬ b ∧ ¬ e ) = P ( j | a ) P ( m | a ) P ( a | ¬ b , ¬ e ) P ( ¬ b ) P ( ¬ e ) = 0 . 9 × 0 . 7 × 0 . 001 × 0 . 999 × 0 . 998 ≈ 0 . 00063 12

  13. Probability Basis Constructing Bayesian networks Bayesian networks Need a method such that a series of locally testable assertions of conditional independence guarantees the required global semantics Choose an ordering of variables X 1 , . . . , X n For i = 1 to n add X i to the network select parents from X 1 , . . . , X i − 1 such that Pr ( X i | Parents ( X i )) = Pr ( X i | X 1 , . . . , X i − 1 ) This choice of parents guarantees the global semantics: n � Pr ( X 1 , . . . , X n ) = Pr ( X i | X 1 , . . . , X i − 1 ) (chain rule) i = 1 n � = Pr ( X i | Parents ( X i )) (by construction) i = 1 13

  14. Probability Basis Example Bayesian networks Suppose we choose the ordering M , J , A , B , E P ( J | M ) = P ( J ) ? No MaryCalls P ( A | J , M ) = P ( A | J ) ? P ( A | J , M ) = P ( A ) ? No JohnCalls P ( B | A , J , M ) = P ( B | A ) ? Yes P ( B | A , J , M ) = P ( B ) ? No Alarm P ( E | B , A , J , M ) = P ( E | A ) ? No P ( E | B , A , J , M ) = P ( E | A , B ) ? Yes Deciding conditional independence is Burglary hard in noncausal directions Earthquake (Causal models and conditional independence seem hardwired for humans!) Assessing conditional probabilities is hard in noncausal directions Network is less compact: 1 + 2 + 4 + 2 + 4 = 13 numbers needed 14

  15. Probability Basis Example: Car insurance Bayesian networks SocioEcon Age GoodStudent ExtraCar Mileage RiskAversion VehicleYear SeniorTrain DrivingSkill MakeModel DrivingHist Antilock DrivQuality HomeBase AntiTheft CarValue Airbag Accident Ruggedness Theft OwnDamage Cushioning OwnCost OtherCost MedicalCost LiabilityCost PropertyCost 16

  16. Probability Basis Compact conditional distributions Bayesian networks CPT grows exponentially with number of parents CPT becomes infinite with continuous-valued parent or child Solution: canonical distributions that are defined compactly Deterministic nodes are the simplest case: X = f ( Parents ( X )) for some function f E.g., Boolean functions NorthAmerican ⇔ Canadian ∨ US ∨ Mexican E.g., numerical relationships among continuous variables ∂ Level = inflow + precipitation - outflow - evaporation ∂ t 17

  17. Probability Basis Compact conditional distributions contd. Bayesian networks Noisy-OR distributions model multiple noninteracting causes 1) Parents U 1 . . . U k include all causes (can add leak node) 2) Independent failure probability q i for each cause alone j � = ⇒ P ( X | U 1 . . . U j , ¬ U j + 1 . . . ¬ U k ) = 1 − q i i = 1 Cold Flu Malaria P ( Fever ) P ( ¬ Fever ) F F F 0.0 1 . 0 F F T 0 . 9 0.1 F T F 0 . 8 0.2 F T T 0 . 98 0 . 02 = 0 . 2 × 0 . 1 T F F 0 . 4 0.6 T F T 0 . 94 0 . 06 = 0 . 6 × 0 . 1 T T F 0 . 88 0 . 12 = 0 . 6 × 0 . 2 T T T 0 . 988 0 . 012 = 0 . 6 × 0 . 2 × 0 . 1 Number of parameters linear in number of parents 18

  18. Probability Basis Hybrid (discrete+continuous) networks Bayesian networks Discrete ( Subsidy ? and Buys ? ); continuous ( Harvest and Cost ) Subsidy? Harvest Cost Buys? Option 1: discretization—possibly large errors, large CPTs Option 2: finitely parameterized canonical families 1) Continuous variable, discrete+continuous parents (e.g., Cost ) 2) Discrete variable, continuous parents (e.g., Buys ? ) 19

  19. Probability Basis Continuous child variables Bayesian networks Need one conditional density function for child variable given continuous parents, for each possible assignment to discrete parents Most common is the linear Gaussian model, e.g.,: P ( Cost = c | Harvest = h , Subsidy = true ) = N ( a t h + b t , σ t ) � � 2 � � c − ( a t h + b t ) 1 − 1 = √ exp 2 σ t σ t 2 π Mean Cost varies linearly with Harvest , variance is fixed � Linear variation is unreasonable over the full range but works OK if the likely range of Harvest is narrow 20

  20. Probability Basis Continuous child variables Bayesian networks P(Cost|Harvest,Subsidy?=true) 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 5 5 Cost 10 0 All-continuous network with linear Gaussian distributions = ⇒ full joint distribution is a multivariate Gaussian Discrete+continuous linear Gaussian network is a conditional Gaussian network i.e., a multivariate Gaussian over all continuous variables for each combination of discrete variable values 21

  21. Probability Basis Discrete variable w/ continuous parents Bayesian networks Probability of Buys ? given Cost should be a “soft” threshold: Normal Distribution: µ = 0, σ = 1 1.0 0.8 Cumulative Probability 0.6 0.4 0.2 0.0 −3 −2 −1 0 1 2 3 x Probit distribution uses integral of Gaussian: � x Φ( x ) = −∞ N ( 0 , 1 )( x ) dx P ( Buys ? = true | Cost = c ) = Φ(( − c + µ ) /σ ) 22

  22. Probability Basis Why the probit? Bayesian networks 1. It’s sort of the right shape 2. Can be viewed as hard threshold whose location is subject to noise Cost Cost Noise Buys? 23

  23. Probability Basis Discrete variable contd. Bayesian networks Sigmoid (or logit) distribution also used in neural networks: 1 P ( Buys ? = true | Cost = c ) = 1 + exp ( − 2 − c + µ ) σ Sigmoid has similar shape to probit but much longer tails: Logistic Distribution: location = 0, scale = 1 1.0 0.8 Cumulative Probability 0.6 0.4 0.2 0.0 −5 0 5 x 24

Recommend


More recommend