bayesian networks basics
play

Bayesian networks: basics Machine Intelligence Thomas D. Nielsen - PowerPoint PPT Presentation

Bayesian networks: basics Machine Intelligence Thomas D. Nielsen September 2008 Bayesian networks: basics September 2008 1 / 17 Basics Random/Chance Variables A name and a state space: Weather : { sunny, cloudy,rain } Blood Pressure: {


  1. Bayesian networks: basics Machine Intelligence Thomas D. Nielsen September 2008 Bayesian networks: basics September 2008 1 / 17

  2. Basics Random/Chance Variables A name and a state space: Weather : { sunny, cloudy,rain } Blood Pressure: { high, normal, low } Grade: {− 3 , 00 , 02 , 4 , 7 , 10 , 12 } Annual income: { 1 DKK , 2 DKK , 3 DKK , 4 DKK , . . . } Weight: x ∈ R A probability distribution on the state space: sunny : 0.3, cloudy : 0.5, rain : 0.2 Occurrence of k events within a time interval: e − λ λ k / k ! (Poisson distribution) 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 5 10 15 20 Continuous distribution: x ∼ N ( µ, σ ) (Gaussian distribution) Notation: sp ( A ) denotes the state space of random variable A . Bayesian networks: basics September 2008 2 / 17

  3. Basics Joint Distribution Usually we are interested in the joint distribution of several variables, e.g. probability that Weather is sunny and Grade is 10. Notation: we also write sp ( A , B ) for the joint state space of two (or more) variables. Example: sp ( Weather , Grade ) = { ( sunny , 1 ) , ( sunny , 2 ) , . . . , ( rain , 13 ) } Bayesian networks: basics September 2008 3 / 17

  4. Basics Joint Distribution Usually we are interested in the joint distribution of several variables, e.g. probability that Weather is sunny and Grade is 10. Notation: we also write sp ( A , B ) for the joint state space of two (or more) variables. Example: sp ( Weather , Grade ) = { ( sunny , 1 ) , ( sunny , 2 ) , . . . , ( rain , 13 ) } Conditional Probabilities A joint distribution defines conditional probabilities: P ( A = a | B = b ) := P ( A = a , B = b ) / P ( B = b ) This is also known as the fundamental rule (when read as a theorem, not a definition). Bayesian networks: basics September 2008 3 / 17

  5. Basics Joint Distribution Usually we are interested in the joint distribution of several variables, e.g. probability that Weather is sunny and Grade is 10. Notation: we also write sp ( A , B ) for the joint state space of two (or more) variables. Example: sp ( Weather , Grade ) = { ( sunny , 1 ) , ( sunny , 2 ) , . . . , ( rain , 13 ) } Conditional Probabilities A joint distribution defines conditional probabilities: P ( A = a | B = b ) := P ( A = a , B = b ) / P ( B = b ) This is also known as the fundamental rule (when read as a theorem, not a definition). Bayes Rule From the definition of the conditional probability: P ( B = b | A = a ) = P ( A = a | B = b ) · P ( B = b ) / P ( A = a ) Bayesian networks: basics September 2008 3 / 17

  6. Basics Generalization If an equality (like Baye’s rule) is true for all possible values a , b of the random variables A , B , one simply writes it in the form P ( B | A ) = P ( A | B ) P ( B ) / P ( A ) Bayesian networks: basics September 2008 4 / 17

  7. Basics Generalization If an equality (like Baye’s rule) is true for all possible values a , b of the random variables A , B , one simply writes it in the form P ( B | A ) = P ( A | B ) P ( B ) / P ( A ) Conditioning on context A probabilistic law remains valid when all probabilities are conditioned on a common “context” variable C . E.g. Baye’s rule: P ( B | A , C ) = P ( A | B , C ) P ( B | C ) / P ( A | C ) Bayesian networks: basics September 2008 4 / 17

  8. Basics Generalization If an equality (like Baye’s rule) is true for all possible values a , b of the random variables A , B , one simply writes it in the form P ( B | A ) = P ( A | B ) P ( B ) / P ( A ) Conditioning on context A probabilistic law remains valid when all probabilities are conditioned on a common “context” variable C . E.g. Baye’s rule: P ( B | A , C ) = P ( A | B , C ) P ( B | C ) / P ( A | C ) Chain rule For any set of random variables V 1 , V 2 , . . . , V n : P ( V 1 , . . . , V n ) = P ( V 1 , . . . , V n − 1 ) P ( V n | V 1 , . . . , V n − 1 ) = P ( V 1 , . . . , V n − 2 ) P ( V n − 1 | V 1 , . . . , V n − 2 ) P ( V n | V 1 , . . . , V n − 1 ) . . . = P ( V 1 ) P ( V 2 | V 1 ) · · · P ( V i | V 1 , . . . , V i − 1 ) · · · P ( V n | V 1 , . . . , V n − 1 ) Bayesian networks: basics September 2008 4 / 17

  9. Basics Conditional Independence A is conditionally independent from B given C if one of the following equivalent conditions holds: P ( A , B | C ) = P ( A | C ) P ( B | C ) P ( A | B , C ) = P ( A | C ) P ( B | A , C ) = P ( B | C ) This extends to sets of random variables. E.g.: A 1 , A 2 , A 3 is independent from B 1 , B 2 given C 1 , C 2 , C 3 if P ( A 1 , A 2 , A 3 , B 1 , B 2 | C 1 , C 2 , C 3 ) = P ( A 1 , A 2 , A 3 | C 1 , C 2 , C 3 ) P ( B 1 , B 2 | C 1 , C 2 , C 3 ) A conditional independence relation remains not necessarily true under conditioning on a context: P ( A , B | C ) = P ( A | C ) P ( B | C ) does not imply P ( A , B | C , D ) = P ( A | C , D ) P ( B | C , D ) Bayesian networks: basics September 2008 5 / 17

  10. Basics Chain rule + Conditional Independence → Factorization Chain rule again: P ( V 1 , . . . , V n ) = P ( V 1 ) P ( V 2 | V 1 ) · · · P ( V i | V 1 , . . . , V i − 1 ) · · · P ( V n | V 1 , . . . , V n − 1 ) Now suppose that for each i : pa ( V i ) ⊆ { V 1 , . . . , V i − 1 } such that P ( V i | V 1 , . . . , V i − 1 ) = P ( V i | pa ( V i )) (i.e. V i is conditionally independent of { V 1 , . . . , V i − 1 } \ pa ( V i ) given pa ( V i ) ). This gives the factorization of P ( V 1 , . . . , V n ) : n P ( V 1 , . . . , V n ) = P ( V i | pa ( V i )) . Y i = 1 Bayesian networks: basics September 2008 6 / 17

  11. Basics Factorization → Bayesian Networks B A C C E a 1 a 2 a 3 B c 1 c 2 E c 1 e 1 0 . 1 0 . 6 0 . 3 C B b 1 c 1 e 2 0 . 2 0 . 8 0 . 5 0 . 5 0 . 0 b 1 b 2 b 3 b 2 c 2 e 1 0 . 5 0 . 5 0 . 4 0 . 2 0 . 4 b 3 c 2 e 2 0 . 3 0 . 3 0 . 4 0 . 6 0 . 4 0 . 1 0 . 1 0 . 8 . . . A D A Bayesian network for the (discrete) random variables V = V 1 , . . . , V n is defined by a directed acyclic graph ( V , → ) for each V i a conditional probability table P ( V i | pa ( V i )) specifying the conditional distribution of V i given its parents in the graph. The Bayesian network defines a joint distribution of V as: n P ( V 1 , . . . , V n ) = P ( V i | pa ( V i )) Y i = 1 Bayesian networks: basics September 2008 7 / 17

  12. Basics Elementary Conditional Independence Property V i : node in Bayesian network desc ( V i ) : descendants of V i rest ( V i ) : nondescendants without parents and V i pa ( V i ) rest ( V i ) V i desc ( V i ) P ( V i | pa ( V i ) , rest ( V i )) = P ( V i | pa ( V i )) “ V i is independent of its nondescendants, given its parents” Bayesian networks: basics September 2008 8 / 17

  13. Basics The d-Separation Relation ( V , → ) a directed acyclic graph, A , B , C ⊆ V disjoint subsets of nodes. C d-separates A from B if the following holds: every undirected path that connects a node A ∈ A with a node B ∈ B satisfies at least one of the following two conditions: 1. the path contains a node C ∈ C , and the edges that connect C are serial ( . . . → C → . . . ) or divergent ( . . . ← C → . . . ). 2. the path contains a node U , the edges that connect U are convergent ( . . . → U ← . . . ), and ( U ∪ desc ( U )) ∩ C = ∅ . A B B A B C C Serial A C Diverging D Converging Bayesian networks: basics September 2008 9 / 17

  14. Basics pa ( A ) d-separates A from rest ( A ) : C C B A U Bayesian networks: basics September 2008 10 / 17

  15. Basics d-Separation Theorem ( V , → , { P ( V i | pa ( V i )) | i = 1 , . . . , n } ) a Bayesian network that defines joint distribution P . Then for all pairwise disjoint A , B , C ⊆ V : If C d-separates A from B in ( V , → ) , then P ( A | B , C ) = P ( A | C ) . [Elementary Conditional Independence Property is a special case] Proof can be found in Verma & Pearl (1990) Bayesian networks: basics September 2008 11 / 17

  16. Basics Basic Inference Problems Given a Bayesian network ( V , → , { P ( V i | pa ( V i )) | i = 1 , . . . , n } ) . (a) Computation of a-posteriori distributions: Given E 1 , . . . , E k ∈ V , e i ∈ sp ( E i ) . Wanted: For all A ∈ V \ E : the conditional distribution of A given (“the evidence”) E = e : P ( A | E 1 = e 1 , . . . , E k = e k ) Bayesian networks: basics September 2008 12 / 17

  17. Basics Basic Inference Problems Given a Bayesian network ( V , → , { P ( V i | pa ( V i )) | i = 1 , . . . , n } ) . (a) Computation of a-posteriori distributions: Given E 1 , . . . , E k ∈ V , e i ∈ sp ( E i ) . Wanted: For all A ∈ V \ E : the conditional distribution of A given (“the evidence”) E = e : P ( A | E 1 = e 1 , . . . , E k = e k ) (b) Computation of most likely configurations ( most probable explanations (MPE) ): Evidence E = e as in (a) . A := V \ E . Wanted: a max ∈ sp ( A ) with P ( A = a max | E = e ) = arg max a ∈ sp ( A ) P ( A = a | E = e ) . Bayesian networks: basics September 2008 12 / 17

Recommend


More recommend