probabilistic reasoning graphical models
play

Probabilistic Reasoning: Graphical Models Christian Borgelt - PowerPoint PPT Presentation

Probabilistic Reasoning: Graphical Models Christian Borgelt Intelligent Data Analysis and Graphical Models Research Unit European Center for Soft Computing c/ Gonzalo Guti errez Quir os s/n, 33600 Mieres (Asturias), Spain


  1. Conditional Possibility and Independence Definition: Let Ω be a (finite) sample space, R a discrete possibility measure on Ω, and E 1 , E 2 ⊆ Ω events. Then R ( E 1 | E 2 ) = R ( E 1 ∩ E 2 ) is called the conditional possibility of E 1 given E 2 . Definition: Let Ω be a (finite) sample space, R a discrete possibility measure on Ω, and A, B, and C attributes with respective domains dom( A ) , dom( B ) , and dom( C ). A and B are called conditionally relationally independent given C , written A ⊥ ⊥ R B | C , iff ∀ a ∈ dom( A ) : ∀ b ∈ dom( B ) : ∀ c ∈ dom( C ) : R ( A = a, B = b | C = c ) = min { R ( A = a | C = c ) , R ( B = b | C = c ) } , ⇔ R ( A = a, B = b, C = c ) = min { R ( A = a, C = c ) , R ( B = b, C = c ) } . • Similar to the corresponding notions of probability theory. Christian Borgelt Probabilistic Reasoning: Graphical Models 20

  2. Conditional Independence: Simple Example Example relation describing ten simple geometric objects by three attributes: color, shape, and size. large medium small • In this example relation, the color of an object is conditionally relationally independent of its size given its shape. • Intuitively: if we fix the shape, the colors and sizes that are possible together with this shape can be combined freely. • Alternative view: once we know the shape, the color does not provide additional information about the size (and vice versa). Christian Borgelt Probabilistic Reasoning: Graphical Models 21

  3. Relational Evidence Propagation Due to the fact that color and size are conditionally independent given the shape, the reasoning result can be obtained using only the projections to the subspaces: s m l color size extend project shape project extend s m l This reasoning scheme can be formally justified with discrete possibility measures. Christian Borgelt Probabilistic Reasoning: Graphical Models 22

  4. Relational Evidence Propagation, Step 1 A : color R ( B = b | A = a obs ) � � � B : shape � � � = R A = a, B = b, C = c � A = a obs C : size a ∈ dom( A ) c ∈ dom( C ) (1) = a ∈ dom( A ) { max c ∈ dom( C ) { R ( A = a, B = b, C = c | A = a obs ) }} max (2) = a ∈ dom( A ) { max c ∈ dom( C ) { min { R ( A = a, B = b, C = c ) , R ( A = a | A = a obs ) }}} max (3) a ∈ dom( A ) { c ∈ dom( C ) { min { R ( A = a, B = b ) , R ( B = b, C = c ) , = max max R ( A = a | A = a obs ) }}} = a ∈ dom( A ) { min { R ( A = a, B = b ) , R ( A = a | A = a obs ) , max c ∈ dom( C ) { R ( B = b, C = c ) } }} max � �� � = R ( B = b ) ≥ R ( A = a,B = b ) = a ∈ dom( A ) { min { R ( A = a, B = b ) , R ( A = a | A = a obs ) }} . max Christian Borgelt Probabilistic Reasoning: Graphical Models 23

  5. Relational Evidence Propagation, Step 1 (continued) (1) holds because of the second axiom a discrete possibility measure has to satisfy. (3) holds because of the fact that the relation R ABC can be decomposed w.r.t. the set M = {{ A, B } , { B, C }} . ( A : color, B : shape, C : size) (2) holds, since in the first place R ( A = a, B = b, C = c | A = a obs ) = R ( A = a, B = b, C = c, A = a obs ) � R ( A = a, B = b, C = c ) , if a = a obs , = 0 , otherwise , and secondly R ( A = a | A = a obs ) = R ( A = a, A = a obs ) � R ( A = a ) , if a = a obs , = 0 , otherwise , and therefore, since trivially R ( A = a ) ≥ R ( A = a, B = b, C = c ), R ( A = a, B = b, C = c | A = a obs ) = min { R ( A = a, B = b, C = c ) , R ( A = a | A = a obs ) } . Christian Borgelt Probabilistic Reasoning: Graphical Models 24

  6. Relational Evidence Propagation, Step 2 A : color R ( C = c | A = a obs ) � � � B : shape � � � = R A = a, B = b, C = c � A = a obs C : size a ∈ dom( A ) b ∈ dom( B ) (1) = a ∈ dom( A ) { max b ∈ dom( B ) { R ( A = a, B = b, C = c | A = a obs ) }} max (2) = a ∈ dom( A ) { max b ∈ dom( B ) { min { R ( A = a, B = b, C = c ) , R ( A = a | A = a obs ) }}} max (3) a ∈ dom( A ) { b ∈ dom( B ) { min { R ( A = a, B = b ) , R ( B = b, C = c ) , = max max R ( A = a | A = a obs ) }}} = b ∈ dom( B ) { min { R ( B = b, C = c ) , max a ∈ dom( A ) { min { R ( A = a, B = b ) , R ( A = a | A = a obs ) }} } max � �� � = R ( B = b | A = a obs ) b ∈ dom( B ) { min { R ( B = b, C = c ) , R ( B = b | A = a obs ) }} . = max Christian Borgelt Probabilistic Reasoning: Graphical Models 25

  7. A Simple Example: The Probabilistic Case Christian Borgelt Probabilistic Reasoning: Graphical Models 26

  8. A Probability Distribution all numbers in 220 330 170 280 parts per 1000 20 90 10 80 400 2 1 20 17 240 28 24 5 3 360 large s m l 18 81 9 72 300 8 4 80 68 20 180 200 56 48 10 6 40 160 40 medium 180 120 60 2 9 1 8 460 2 1 20 17 84 72 15 9 small 240 large 40 180 20 160 50 115 35 100 medium 12 6 120 102 82 133 99 146 small 168 144 30 18 88 82 36 34 The numbers state the probability of the corresponding value combination. Compared to the example relation, the possible combinations are now frequent. Christian Borgelt Probabilistic Reasoning: Graphical Models 27

  9. Reasoning: Computing Conditional Probabilities all numbers in 0 0 0 1000 parts per 1000 0 0 0 286 572 0 0 0 61 364 0 0 0 11 64 large s m l 0 0 0 257 358 0 0 0 242 29 257 286 0 0 0 21 61 242 61 medium 32 21 11 0 0 0 29 520 0 0 0 61 0 0 0 32 small 122 large 0 0 0 572 0 0 0 358 medium 0 0 0 364 0 0 0 531 small 0 0 0 64 0 0 0 111 Using the information that the given object is green: The observed color has a posterior probability of 1. Christian Borgelt Probabilistic Reasoning: Graphical Models 28

  10. Probabilistic Decomposition: Simple Example • As for relational graphical models, the three-dimensional probability distribution can be decomposed into projections to subspaces, namely the marginal distribution on the subspace spanned by color and shape and the marginal distribution on the subspace spanned by shape and size. • The original probability distribution can be reconstructed from the marginal dis- tributions using the following formulae ∀ i, j, k : � � � � � � � a (color) , a (shape) , a (size) a (color) , a (shape) a (size) � a (shape) � P = P · P i j k i j k j � � � � a (color) , a (shape) a (shape) , a (size) P · P i j j k = � � a (shape) P j • These equations express the conditional independence of attributes color and size given the attribute shape , since they only hold if ∀ i, j, k : � � � � � � a (size) � a (shape) a (size) � a (color) , a (shape) � � P = P j i j k k Christian Borgelt Probabilistic Reasoning: Graphical Models 29

  11. Reasoning with Projections Again the same result can be obtained using only projections to subspaces (marginal probability distributions): s m l new 0 0 0 1000 old 240 460 300 color size new 220 330 170 280 122 520 358 old � · new shape old old old column new old new new 40 180 20 160 20 180 200 � 572 400 · new 0 0 0 572 29 257 286 line old 12 6 120 102 40 160 40 364 240 0 0 0 364 61 242 61 168 144 30 18 180 120 60 64 360 0 0 0 64 32 21 11 s m l ✛ ✘ ✛ ✘ ✛ ✘ This justifies a graph representation: color shape size ✚ ✙ ✚ ✙ ✚ ✙ Christian Borgelt Probabilistic Reasoning: Graphical Models 30

  12. Probabilistic Graphical Models: Formalization Christian Borgelt Probabilistic Reasoning: Graphical Models 31

  13. Probabilistic Decomposition Definition: Let U = { A 1 , . . . , A n } be a set of attributes and p U a probability distribution over U . Furthermore, let M = { M 1 , . . . , M m } ⊆ 2 U be a set of nonempty (but not necessarily disjoint) subsets of U satisfying � M = U. M ∈M p U is called decomposable or factorizable w.r.t. M iff it can be written as a R + product of m nonnegative functions φ M : E M → I 0 , M ∈ M , i.e., iff ∀ a 1 ∈ dom( A 1 ) : . . . ∀ a n ∈ dom( A n ) : � � � � � � � p U A i = a i = φ M A i = a i . A i ∈ U M ∈M A i ∈ M If p U is decomposable w.r.t. M the set of functions Φ M = { φ M 1 , . . . , φ M m } = { φ M | M ∈ M} is called the decomposition or the factorization of p U . The functions in Φ M are called the factor potentials of p U . Christian Borgelt Probabilistic Reasoning: Graphical Models 32

  14. Conditional Independence Definition: Let Ω be a (finite) sample space, P a probability measure on Ω, and A, B, and C attributes with respective domains dom( A ) , dom( B ) , and dom( C ). A and B are called conditionally probabilistically independent given C , written A ⊥ ⊥ P B | C , iff ∀ a ∈ dom( A ) : ∀ b ∈ dom( B ) : ∀ c ∈ dom( C ) : P ( A = a, B = b | C = c ) = P ( A = a | C = c ) · P ( B = b | C = c ) Equivalent formula (sometimes more convenient): ∀ a ∈ dom( A ) : ∀ b ∈ dom( B ) : ∀ c ∈ dom( C ) : P ( A = a | B = b, C = c ) = P ( A = a | C = c ) • Conditional independences make it possible to consider parts of a probability distribution independent of others. • Therefore it is plausible that a set of conditional independences may enable a decomposition of a joint probability distribution. Christian Borgelt Probabilistic Reasoning: Graphical Models 33

  15. Conditional Independence: An Example Dependence (fictitious) between smoking and life expectancy. Each dot represents one person. x -axis: age at death y -axis: average number of cigarettes per day Weak, but clear dependence: The more cigarettes are smoked, the lower the life expectancy. (Note that this data is artificial and thus should not be seen as revealing an actual dependence.) Christian Borgelt Probabilistic Reasoning: Graphical Models 34

  16. Conditional Independence: An Example Conjectured explanation: There is a common cause, Group 1 namely whether the person is exposed to stress at work. If this were correct, splitting the data should remove the dependence. Group 1: exposed to stress at work (Note that this data is artificial and therefore should not be seen as an argument against health hazards caused by smoking.) Christian Borgelt Probabilistic Reasoning: Graphical Models 35

  17. Conditional Independence: An Example Conjectured explanation: There is a common cause, namely whether the person is exposed to stress at work. Group 2 If this were correct, splitting the data should remove the dependence. Group 2: not exposed to stress at work (Note that this data is artificial and therefore should not be seen as an argument against health hazards caused by smoking.) Christian Borgelt Probabilistic Reasoning: Graphical Models 36

  18. Probabilistic Decomposition (continued) Chain Rule of Probability: ∀ a 1 ∈ dom( A 1 ) : . . . ∀ a n ∈ dom( A n ) : n �� n � � � � � � i − 1 � P i =1 A i = a i = P A i = a i j =1 A j = a j � i =1 • The chain rule of probability is valid in general (or at least for strictly positive distributions). Chain Rule Factorization: ∀ a 1 ∈ dom( A 1 ) : . . . ∀ a n ∈ dom( A n ) : n � �� n � � � � � � P i =1 A i = a i = P A i = a i A j ∈ parents( A i ) A j = a j � i =1 • Conditional independence statements are used to “cancel” conditions. Christian Borgelt Probabilistic Reasoning: Graphical Models 37

  19. Reasoning with Projections Due to the fact that color and size are conditionally independent given the shape, the reasoning result can be obtained using only the projections to the subspaces: s m l new 0 0 0 1000 old 240 460 300 size color new 220 330 170 280 122 520 358 old � · new shape old old old column new old new new 40 180 20 160 20 180 200 � 572 400 · new 0 0 0 572 29 257 286 line old 12 6 120 102 40 160 40 364 240 0 0 0 364 61 242 61 168 144 30 18 180 120 60 64 360 0 0 0 64 32 21 11 s m l This reasoning scheme can be formally justified with probability measures. Christian Borgelt Probabilistic Reasoning: Graphical Models 38

  20. Probabilistic Evidence Propagation, Step 1 A : color P ( B = b | A = a obs ) � � � � � � B : shape = P A = a, B = b, C = c � A = a obs C : size a ∈ dom( A ) c ∈ dom( C ) � � (1) P ( A = a, B = b, C = c | A = a obs ) = a ∈ dom( A ) c ∈ dom( C ) � � P ( A = a, B = b, C = c ) · P ( A = a | A = a obs ) (2) = P ( A = a ) a ∈ dom( A ) c ∈ dom( C ) � � P ( A = a, B = b ) P ( B = b, C = c ) · P ( A = a | A = a obs ) (3) = P ( B = b ) P ( A = a ) a ∈ dom( A ) c ∈ dom( C ) � P ( A = a, B = b ) · P ( A = a | A = a obs ) � = P ( C = c | B = b ) P ( A = a ) a ∈ dom( A ) c ∈ dom( C ) � �� � =1 � P ( A = a, B = b ) · P ( A = a | A = a obs ) = . P ( A = a ) a ∈ dom( A ) Christian Borgelt Probabilistic Reasoning: Graphical Models 39

  21. Probabilistic Evidence Propagation, Step 1 (continued) (1) holds because of Kolmogorov’s axioms. (3) holds because of the fact that the distribution p ABC can be decomposed w.r.t. the set M = {{ A, B } , { B, C }} . ( A : color, B : shape, C : size) (2) holds, since in the first place P ( A = a, B = b, C = c, A = a obs ) P ( A = a, B = b, C = c | A = a obs ) = P ( A = a obs )  P ( A = a, B = b, C = c )   , if a = a obs , = P ( A = a obs )   0 , otherwise , and secondly � P ( A = a ) , if a = a obs , P ( A = a, A = a obs ) = 0 , otherwise , and therefore P ( A = a, B = b, C = c | A = a obs ) = P ( A = a, B = b, C = c ) · P ( A = a | A = a obs ) . P ( A = a ) Christian Borgelt Probabilistic Reasoning: Graphical Models 40

  22. Probabilistic Evidence Propagation, Step 2 A : color P ( C = c | A = a obs ) � � � � � � B : shape = P A = a, B = b, C = c � A = a obs C : size a ∈ dom( A ) b ∈ dom( B ) � � (1) P ( A = a, B = b, C = c | A = a obs ) = a ∈ dom( A ) b ∈ dom( B ) � � P ( A = a, B = b, C = c ) · P ( A = a | A = a obs ) (2) = P ( A = a ) a ∈ dom( A ) b ∈ dom( B ) � � P ( A = a, B = b ) P ( B = b, C = c ) · P ( A = a | A = a obs ) (3) = P ( B = b ) P ( A = a ) a ∈ dom( A ) b ∈ dom( B ) � P ( B = b, C = c ) � P ( A = a, B = b ) · R ( A = a | A = a obs ) = P ( B = b ) P ( A = a ) b ∈ dom( B ) a ∈ dom( A ) � �� � = P ( B = b | A = a obs ) � P ( B = b, C = c ) · P ( B = b | A = a obs ) = . P ( B = b ) b ∈ dom( B ) Christian Borgelt Probabilistic Reasoning: Graphical Models 41

  23. Excursion: Possibility Theory Christian Borgelt Probabilistic Reasoning: Graphical Models 42

  24. Possibility Theory • The best-known calculus for handling uncertainty is, of course, probability theory . [Laplace 1812] • An less well-known, but noteworthy alternative is possibility theory . [Dubois and Prade 1988] • In the interpretation we consider here, possibility theory can handle uncertain and imprecise information , while probability theory, at least in its basic form, was only designed to handle uncertain information . • Types of imperfect information : ◦ Imprecision: disjunctive or set-valued information about the obtaining state, which is certain: the true state is contained in the disjunction or set. ◦ Uncertainty: precise information about the obtaining state (single case), which is not certain: the true state may differ from the stated one. ◦ Vagueness: meaning of the information is in doubt: the interpretation of the given statements about the obtaining state may depend on the user. Christian Borgelt Probabilistic Reasoning: Graphical Models 43

  25. Possibility Theory: Axiomatic Approach Definition: Let Ω be a (finite) sample space. A possibility measure Π on Ω is a function Π : 2 Ω → [0 , 1] satisfying 1. Π( ∅ ) = 0 and 2. ∀ E 1 , E 2 ⊆ Ω : Π( E 1 ∪ E 2 ) = max { Π( E 1 ) , Π( E 2 ) } . • Similar to Kolmogorov’s axioms of probability theory. • From the axioms follows Π( E 1 ∩ E 2 ) ≤ min { Π( E 1 ) , Π( E 2 ) } . • Attributes are introduced as random variables (as in probability theory). • Π( A = a ) is an abbreviation of Π( { ω ∈ Ω | A ( ω ) = a } ) • If an event E is possible without restriction, then Π( E ) = 1 . If an event E is impossible, then Π( E ) = 0 . Christian Borgelt Probabilistic Reasoning: Graphical Models 44

  26. Possibility Theory and the Context Model Interpretation of Degrees of Possibility [Gebhardt and Kruse 1993] • Let Ω be the (nonempty) set of all possible states of the world, ω 0 the actual (but unknown) state. • Let C = { c 1 , . . . , c n } be a set of contexts (observers, frame conditions etc.) and ( C, 2 C , P ) a finite probability space (context weights). • Let Γ : C → 2 Ω be a set-valued mapping, which assigns to each context the most specific correct set-valued specification of ω 0 . The sets Γ( c ) are called the focal sets of Γ. • Γ is a random set (i.e., a set-valued random variable) [Nguyen 1978]. The basic possibility assignment induced by Γ is the mapping π : Ω → [0 , 1] π ( ω ) �→ P ( { c ∈ C | ω ∈ Γ( c ) } ) . Christian Borgelt Probabilistic Reasoning: Graphical Models 45

  27. � ☎ ✁ ✂ ✄ Example: Dice and Shakers shaker 1 shaker 2 shaker 3 shaker 4 shaker 5 tetrahedron hexahedron octahedron icosahedron dodecahedron 1 – 4 1 – 6 1 – 8 1 – 10 1 – 12 numbers degree of possibility 1 5 + 1 5 + 1 5 + 1 5 + 1 1 – 4 = 1 5 1 5 + 1 5 + 1 5 + 1 4 5 – 6 = 5 5 1 5 + 1 5 + 1 3 7 – 8 = 5 5 5 + 1 1 2 9 – 10 = 5 5 1 1 11 – 12 = 5 5 Christian Borgelt Probabilistic Reasoning: Graphical Models 46

  28. From the Context Model to Possibility Measures Definition: Let Γ : C → 2 Ω be a random set. The possibility measure induced by Γ is the mapping Π : 2 Ω → [0 , 1] , �→ P ( { c ∈ C | E ∩ Γ( c ) � = ∅} ) . E Problem: From the given interpretation it follows only: � � � ∀ E ⊆ Ω : ω ∈ E π ( ω ) ≤ Π( E ) ≤ min max 1 , π ( ω ) . ω ∈ E 1 2 3 4 5 1 2 3 4 5 c 1 : 1 c 1 : 1 • • 2 2 c 2 : 1 c 2 : 1 • • • • • 4 4 c 3 : 1 c 3 : 1 • • • • • • • 4 4 1 1 1 1 1 1 1 1 π 0 1 π 2 2 4 4 4 2 4 4 Christian Borgelt Probabilistic Reasoning: Graphical Models 47

  29. From the Context Model to Possibility Measures (cont.) Attempts to solve the indicated problem: • Require the focal sets to be consonant : Definition: Let Γ : C → 2 Ω be a random set with C = { c 1 , . . . , c n } . The focal sets Γ( c i ), 1 ≤ i ≤ n , are called consonant , iff there exists a sequence c i 1 , c i 2 , . . . , c i n , 1 ≤ i 1 , . . . , i n ≤ n , ∀ 1 ≤ j < k ≤ n : i j � = i k , so that Γ( c i 1 ) ⊆ Γ( c i 2 ) ⊆ . . . ⊆ Γ( c i n ) . → mass assignment theory [Baldwin et al. 1995] Problem: The “voting model” is not sufficient to justify consonance. • Use the lower bound as the “most pessimistic” choice. [Gebhardt 1997] Problem: Basic possibility assignments represent negative information, the lower bound is actually the most optimistic choice. • Justify the lower bound from decision making purposes. [Borgelt 1995, Borgelt 2000] Christian Borgelt Probabilistic Reasoning: Graphical Models 48

  30. From the Context Model to Possibility Measures (cont.) • Assume that in the end we have to decide on a single event. • Each event is described by the values of a set of attributes. • Then it can be useful to assign to a set of events the degree of possibility of the “most possible” event in the set. Example: � 36 18 18 28 0 40 0 40 28 0 0 0 28 28 40 0 0 40 18 18 0 0 0 18 0 0 20 20 18 18 0 0 0 18 36 0 18 18 0 18 max 40 40 20 max 18 18 18 28 Christian Borgelt Probabilistic Reasoning: Graphical Models 49

  31. Possibility Distributions Definition: Let X = { A 1 , . . . , A n } be a set of attributes defined on a (finite) sample space Ω with respective domains dom( A i ), i = 1 , . . . , n . A possibility distribu- tion π X over X is the restriction of a possibility measure Π on Ω to the set of all events that can be defined by stating values for all attributes in X . That is, π X = Π | E X , where E ∈ 2 Ω � � � E X = � ∃ a 1 ∈ dom( A 1 ) : . . . ∃ a n ∈ dom( A n ) : � � E � = A j = a j A j ∈ X E ∈ 2 Ω � � � = � ∃ a 1 ∈ dom( A 1 ) : . . . ∃ a n ∈ dom( A n ) : � � �� � � E = ω ∈ Ω A j ( ω ) = a j . � A j ∈ X • Corresponds to the notion of a probability distribution. • Advantage of this formalization: No index transformation functions are needed for projections, there are just fewer terms in the conjunctions. Christian Borgelt Probabilistic Reasoning: Graphical Models 50

  32. A Simple Example: The Possibilistic Case Christian Borgelt Probabilistic Reasoning: Graphical Models 51

  33. A Possibility Distribution all numbers in 80 90 70 70 parts per 1000 40 70 10 70 80 20 10 20 20 70 30 30 20 10 90 large s m l 40 80 10 70 70 30 10 70 60 20 80 70 60 60 20 10 40 70 20 medium 90 60 30 20 20 10 20 80 30 10 40 40 80 90 20 10 small 90 large 40 80 10 70 40 70 20 70 medium 30 10 70 60 60 80 70 70 small 80 90 20 10 80 90 40 40 • The numbers state the degrees of possibility of the corresp. value combination. Christian Borgelt Probabilistic Reasoning: Graphical Models 52

  34. Reasoning all numbers in 0 0 0 70 parts per 1000 0 0 0 70 70 0 0 0 20 60 0 0 0 10 10 large s m l 0 0 0 70 70 0 0 0 60 20 70 70 0 0 0 10 40 60 20 medium 10 10 10 0 0 0 20 70 0 0 0 40 0 0 0 10 small 40 large 0 0 0 70 0 0 0 70 medium 0 0 0 60 0 0 0 70 small 0 0 0 10 0 0 0 40 • Using the information that the given object is green. Christian Borgelt Probabilistic Reasoning: Graphical Models 53

  35. Possibilistic Decomposition • As for relational and probabilistic networks, the three-dimensional possibility distribution can be decomposed into projections to subspaces, namely: – the maximum projection to the subspace color × shape and – the maximum projection to the subspace shape × size. • It can be reconstructed using the following formula: � � a (color) , a (shape) , a (size) ∀ i, j, k : π i j k � � � � �� a (color) , a (shape) a (shape) , a (size) = min π , π i j j k � � � a (color) , a (shape) , a (size) = min max π , i j k k � � � a (color) , a (shape) , a (size) max π i j k i • Note the analogy to the probabilistic reconstruction formulas. Christian Borgelt Probabilistic Reasoning: Graphical Models 54

  36. Reasoning with Projections Again the same result can be obtained using only projections to subspaces (maximal degrees of possibility): s m l new 0 0 0 70 old 90 80 70 color size new 80 90 70 70 40 70 70 old max min shape old old column new new old new new 40 80 10 70 20 80 70 max 70 80 min 0 0 0 70 20 70 70 line new 30 10 70 60 40 70 20 60 70 0 0 0 60 40 60 20 80 90 20 10 90 60 30 10 90 0 0 0 10 10 10 10 s m l ✛ ✘ ✛ ✘ ✛ ✘ This justifies a graph representation: color shape size ✚ ✙ ✚ ✙ ✚ ✙ Christian Borgelt Probabilistic Reasoning: Graphical Models 55

  37. Possibilistic Graphical Models: Formalization Christian Borgelt Probabilistic Reasoning: Graphical Models 56

  38. Conditional Possibility and Independence Definition: Let Ω be a (finite) sample space, Π a possibility measure on Ω, and E 1 , E 2 ⊆ Ω events. Then Π( E 1 | E 2 ) = Π( E 1 ∩ E 2 ) is called the conditional possibility of E 1 given E 2 . Definition: Let Ω be a (finite) sample space, Π a possibility measure on Ω, and A, B, and C attributes with respective domains dom( A ) , dom( B ) , and dom( C ). A and B are called conditionally possibilistically independent given C , written A ⊥ ⊥ Π B | C , iff ∀ a ∈ dom( A ) : ∀ b ∈ dom( B ) : ∀ c ∈ dom( C ) : Π( A = a, B = b | C = c ) = min { Π( A = a | C = c ) , Π( B = b | C = c ) } . • Similar to the corresponding notions of probability theory. Christian Borgelt Probabilistic Reasoning: Graphical Models 57

  39. Possibilistic Evidence Propagation, Step 1 A : color π ( B = b | A = a obs ) � � � � � B : shape � = π A = a, B = b, C = c � A = a obs C : size a ∈ dom( A ) c ∈ dom( C ) (1) a ∈ dom( A ) { c ∈ dom( C ) { π ( A = a, B = b, C = c | A = a obs ) }} = max max (2) a ∈ dom( A ) { c ∈ dom( C ) { min { π ( A = a, B = b, C = c ) , π ( A = a | A = a obs ) }}} = max max (3) a ∈ dom( A ) { c ∈ dom( C ) { min { π ( A = a, B = b ) , π ( B = b, C = c ) , = max max π ( A = a | A = a obs ) }}} = a ∈ dom( A ) { min { π ( A = a, B = b ) , π ( A = a | A = a obs ) , max c ∈ dom( C ) { π ( B = b, C = c ) } max }} � �� � = π ( B = b ) ≥ π ( A = a,B = b ) = a ∈ dom( A ) { min { π ( A = a, B = b ) , π ( A = a | A = a obs ) }} max Christian Borgelt Probabilistic Reasoning: Graphical Models 58

  40. Graphical Models: The General Theory Christian Borgelt Probabilistic Reasoning: Graphical Models 59

  41. (Semi-)Graphoid Axioms Definition: Let V be a set of (mathematical) objects and ( · ⊥ ⊥ · | · ) a three-place relation of subsets of V . Furthermore, let W, X, Y, and Z be four disjoint subsets of V . The four statements ( X ⊥ ⊥ Y | Z ) ⇒ ( Y ⊥ ⊥ X | Z ) symmetry: decomposition: ( W ∪ X ⊥ ⊥ Y | Z ) ⇒ ( W ⊥ ⊥ Y | Z ) ∧ ( X ⊥ ⊥ Y | Z ) ( W ∪ X ⊥ ⊥ Y | Z ) ⇒ ( X ⊥ ⊥ Y | Z ∪ W ) weak union: ( X ⊥ ⊥ Y | Z ∪ W ) ∧ ( W ⊥ ⊥ Y | Z ) ⇒ ( W ∪ X ⊥ ⊥ Y | Z ) contraction: are called the semi-graphoid axioms . A three-place relation ( · ⊥ ⊥ · | · ) that satisfies the semi-graphoid axioms for all W, X, Y, and Z is called a semi-graphoid . The above four statements together with intersection: ( W ⊥ ⊥ Y | Z ∪ X ) ∧ ( X ⊥ ⊥ Y | Z ∪ W ) ⇒ ( W ∪ X ⊥ ⊥ Y | Z ) are called the graphoid axioms . A three-place relation ( · ⊥ ⊥ · | · ) that satisfies the graphoid axioms for all W, X, Y, and Z is called a graphoid . Christian Borgelt Probabilistic Reasoning: Graphical Models 60

  42. Illustration of the (Semi-)Graphoid Axioms W W Z ⇒ decomposition: ∧ X Z Y Y X Z Y W W ⇒ weak union: X Z Y X Z Y W W Z W ⇒ ∧ contraction: X Z Y Y X Z Y W W W ⇒ ∧ intersection: X Z Y X Z Y X Z Y • Similar to the properties of separation in graphs . • Idea: Represent conditional independence by separation in graphs. Christian Borgelt Probabilistic Reasoning: Graphical Models 61

  43. Separation in Graphs Definition: Let G = ( V, E ) be an undirected graph and X, Y, and Z three disjoint subsets of nodes. Z u-separates X and Y in G , written � X | Z | Y � G , iff all paths from a node in X to a node in Y contain a node in Z . A path that contains a node in Z is called blocked (by Z ), otherwise it is called active . Definition: Let � G = ( V, � E ) be a directed acyclic graph and X, Y, and Z three disjoint subsets of nodes. Z d-separates X and Y in � G , written � X | Z | Y � � G , iff there is no path from a node in X to a node in Y along which the following two conditions hold: 1. every node with converging edges either is in Z or has a descendant in Z , 2. every other node is not in Z . A path satisfying the two conditions above is said to be active , otherwise it is said to be blocked (by Z ). Christian Borgelt Probabilistic Reasoning: Graphical Models 62

  44. Separation in Directed Acyclic Graphs Example Graph: A 1 A 6 A 9 A 3 A 7 A 2 A 4 A 5 A 8 Valid Separations: �{ A 1 } | { A 3 } | { A 4 }� �{ A 8 } | { A 7 } | { A 9 }� �{ A 3 } | { A 4 , A 6 } | { A 7 }� �{ A 1 } | ∅ | { A 2 }� Invalid Separations: �{ A 1 } | { A 4 } | { A 2 }� �{ A 1 } | { A 6 } | { A 7 }� �{ A 4 } | { A 3 , A 7 } | { A 6 }� �{ A 1 } | { A 4 , A 9 } | { A 5 }� Christian Borgelt Probabilistic Reasoning: Graphical Models 63

  45. Conditional (In)Dependence Graphs Definition: Let ( · ⊥ ⊥ δ · | · ) be a three-place relation representing the set of conditional independence statements that hold in a given distribution δ over a set U of attributes. An undirected graph G = ( U, E ) over U is called a conditional dependence graph or a dependence map w.r.t. δ , iff for all disjoint subsets X, Y, Z ⊆ U of attributes X ⊥ ⊥ δ Y | Z ⇒ � X | Z | Y � G , i.e., if G captures by u -separation all (conditional) independences that hold in δ and thus represents only valid (conditional) dependences. Similarly, G is called a condi- tional independence graph or an independence map w.r.t. δ , iff for all disjoint subsets X, Y, Z ⊆ U of attributes � X | Z | Y � G ⇒ X ⊥ ⊥ δ Y | Z, i.e., if G captures by u -separation only (conditional) independences that are valid in δ . G is said to be a perfect map of the conditional (in)dependences in δ , if it is both a dependence map and an independence map. Christian Borgelt Probabilistic Reasoning: Graphical Models 64

  46. Conditional (In)Dependence Graphs Definition: A conditional dependence graph is called maximal w.r.t. a distribu- tion δ (or, in other words, a maximal dependence map w.r.t. δ ) iff no edge can be added to it so that the resulting graph is still a conditional dependence graph w.r.t. the distribution δ . Definition: A conditional independence graph is called minimal w.r.t. a distribu- tion δ (or, in other words, a minimal independence map w.r.t. δ ) iff no edge can be removed from it so that the resulting graph is still a conditional independence graph w.r.t. the distribution δ . • Conditional independence graphs are sometimes required to be minimal. • However, this requirement is not necessary for a conditional independence graph to be usable for evidence propagation. • The disadvantage of a non-minimal conditional independence graph is that evidence propagation may be more costly computationally than necessary. Christian Borgelt Probabilistic Reasoning: Graphical Models 65

  47. Limitations of Graph Representations Perfect directed map, no perfect undirected map: A = a 1 A = a 2 A B p ABC B = b 1 B = b 2 B = b 1 B = b 2 4 / 24 3 / 24 3 / 24 2 / 24 C = c 1 C 2 / 24 3 / 24 3 / 24 4 / 24 C = c 2 Perfect undirected map, no perfect directed map: A = a 1 A = a 2 p ABCD B = b 1 B = b 2 B = b 1 B = b 2 A D 1 / 47 1 / 47 1 / 47 2 / 47 D = d 1 C = c 1 1 / 47 1 / 47 2 / 47 4 / 47 D = d 2 B C 1 / 47 2 / 47 1 / 47 4 / 47 D = d 1 C = c 2 2 / 47 4 / 47 4 / 47 16 / 47 D = d 2 Christian Borgelt Probabilistic Reasoning: Graphical Models 66

  48. Limitations of Graph Representations • There are also probability distributions for which there exists neither a directed nor an undirected perfect map: A = a 1 A = a 2 p ABC A B = b 1 B = b 2 B = b 1 B = b 2 2 / 12 1 / 12 1 / 12 2 / 12 C = c 1 B C 1 / 12 2 / 12 2 / 12 1 / 12 C = c 2 • In such cases either not all dependences or not all independences can be captured by a graph representation. • In such a situation one usually decides to neglect some of the independence information, that is, to use only a (minimal) conditional independence graph. • This is sufficient for correct evidence propagation, the existence of a perfect map is not required. Christian Borgelt Probabilistic Reasoning: Graphical Models 67

  49. Markov Properties of Undirected Graphs Definition: An undirected graph G = ( U, E ) over a set U of attributes is said to have (w.r.t. a distribution δ ) the pairwise Markov property , iff in δ any pair of attributes which are nonadjacent in the graph are conditionally independent given all remaining attributes, i.e., iff ∀ A, B ∈ U, A � = B : ( A, B ) / ∈ E ⇒ A ⊥ ⊥ δ B | U − { A, B } , local Markov property , iff in δ any attribute is conditionally independent of all remaining attributes given its neighbors, i.e., iff ∀ A ∈ U : A ⊥ ⊥ δ U − closure( A ) | boundary( A ) , global Markov property , iff in δ any two sets of attributes which are u -separated by a third are conditionally independent given the attributes in the third set, i.e., iff ∀ X, Y, Z ⊆ U : � X | Z | Y � G ⇒ X ⊥ ⊥ δ Y | Z. Christian Borgelt Probabilistic Reasoning: Graphical Models 68

  50. Markov Properties of Directed Acyclic Graphs Definition: A directed acyclic graph � G = ( U, � E ) over a set U of attributes is said to have (w.r.t. a distribution δ ) the pairwise Markov property , iff in δ any attribute is conditionally independent of any non-descendant not among its parents given all remaining non-descendants, i.e., iff ∀ A, B ∈ U : B ∈ nondescs( A ) − parents( A ) ⇒ A ⊥ ⊥ δ B | nondescs( A ) − { B } , local Markov property , iff in δ any attribute is conditionally independent of all remaining non-descendants given its parents, i.e., iff ∀ A ∈ U : A ⊥ ⊥ δ nondescs( A ) − parents( A ) | parents( A ) , global Markov property , iff in δ any two sets of attributes which are d -separated by a third are conditionally independent given the attributes in the third set, i.e., iff ∀ X, Y, Z ⊆ U : � X | Z | Y � � G ⇒ X ⊥ ⊥ δ Y | Z. Christian Borgelt Probabilistic Reasoning: Graphical Models 69

  51. Equivalence of Markov Properties Theorem: If a three-place relation ( · ⊥ ⊥ δ · | · ) representing the set of conditional independence statements that hold in a given joint distribution δ over a set U of attributes satisfies the graphoid axioms, then the pairwise, the local, and the global Markov property of an undirected graph G = ( U, E ) over U are equivalent. Theorem: If a three-place relation ( · ⊥ ⊥ δ · | · ) representing the set of conditional independence statements that hold in a given joint distribution δ over a set U of attributes satisfies the semi-graphoid axioms, then the local and the global Markov property of a directed acyclic graph � G = ( U, � E ) over U are equivalent. If ( · ⊥ ⊥ δ · | · ) satisfies the graphoid axioms, then the pairwise, the local, and the global Markov property are equivalent. Christian Borgelt Probabilistic Reasoning: Graphical Models 70

  52. Markov Equivalence of Graphs • Can two distinct graphs represent the exactly the same set of conditional independence statements? • The answer is relevant for learning graphical models from data, because it deter- mines whether we can expect a unique graph as a learning result or not. Definition: Two (directed or undirected) graphs G 1 = ( U, E 1 ) and G 2 = ( U, E 2 ) with the same set U of nodes are called Markov equivalent iff they satisfy the same set of node separation statements (with d -separation for directed graphs and u -separation for undirected graphs), or formally, iff ∀ X, Y, Z ⊆ U : � X | Z | Y � G 1 ⇔ � X | Z | Y � G 2 . • No two different undirected graphs can be Markov equivalent. • The reason is that these two graphs, in order to be different, have to differ in at least one edge. However, the graph lacking this edge satisfies a node separation (and thus expresses a conditional independence) that is not statisfied (expressed) by the graph possessing the edge. Christian Borgelt Probabilistic Reasoning: Graphical Models 71

  53. Markov Equivalence of Graphs Definition: Let � G = ( U, � E ) be a directed graph. The skeleton of � G is the undirected graph G = ( V, E ) where E contains the same edges as � E , but with their directions removed, or formally: E = { ( A, B ) ∈ U × U | ( A, B ) ∈ � E ∨ ( B, A ) ∈ � E } . Definition: Let � G = ( U, � E ) be a directed graph and A, B, C ∈ U three nodes of � G . The triple ( A, B, C ) is called a v-structure of � G iff ( A, B ) ∈ � E and ( C, B ) ∈ � E , but neither ( A, C ) ∈ � E nor ( C, A ) ∈ � E , that is, iff � G has converging edges from A and C at B , but A and C are unconnected. Theorem: Let � G 1 = ( U, � E 1 ) and � G 2 = ( U, � E 2 ) be two directed acyclic graphs with the same node set U . The graphs � G 1 and � G 2 are Markov equivalent iff they possess the same skeleton and the same set of v-structures. • Intuitively: Edge directions may be reversed if this does not change the set of v-structures. Christian Borgelt Probabilistic Reasoning: Graphical Models 72

  54. Markov Equivalence of Graphs A A B C B C D D Graphs with the same skeleton, but converging edges at different nodes, which start from connected nodes, can be Markov equivalent. A A B C B C D D Of several edges that converge at a node only a subset may actually represent a v-structure. This v-structure, however, is relevant. Christian Borgelt Probabilistic Reasoning: Graphical Models 73

  55. Undirected Graphs and Decompositions Definition: A probability distribution p V over a set V of variables is called decom- posable or factorizable w.r.t. an undirected graph G = ( V, E ) iff it can be written as a product of nonnegative functions on the maximal cliques of G . That is, let M be a family of subsets of variables, such that the subgraphs of G in- duced by the sets M ∈ M are the maximal cliques of G . Then there exist functions R + φ M : E M → I 0 , M ∈ M , ∀ a 1 ∈ dom( A 1 ) : . . . ∀ a n ∈ dom( A n ) : � � � � � � � p V A i = a i = φ M A i = a i . A i ∈ V M ∈M A i ∈ M Example: p V ( A 1 = a 1 , . . . , A 6 = a 6 ) A 1 A 2 = φ A 1 A 2 A 3 ( A 1 = a 1 , A 2 = a 2 , A 3 = a 3 ) A 3 A 4 · φ A 3 A 5 A 6 ( A 3 = a 3 , A 5 = a 5 , A 6 = a 6 ) · φ A 2 A 4 ( A 2 = a 2 , A 4 = a 4 ) A 5 A 6 · φ A 4 A 6 ( A 4 = a 4 , A 6 = a 6 ) . Christian Borgelt Probabilistic Reasoning: Graphical Models 74

  56. Directed Acyclic Graphs and Decompositions Definition: A probability distribution p U over a set U of attributes is called de- composable or factorizable w.r.t. a directed acyclic graph � G = ( U, � E ) over U, iff it can be written as a product of the conditional probabilities of the attributes given their parents in � G , i.e., iff ∀ a 1 ∈ dom( A 1 ) : . . . ∀ a n ∈ dom( A n ) : � � � � � � � � � p U A i = a i = P A i = a i A j = a j . � A i ∈ U A i ∈ U A j ∈ parents � G ( A i ) Example: A 1 A 2 A 3 P ( A 1 = a 1 , . . . , A 7 = a 7 ) = P ( A 1 = a 1 ) · P ( A 2 = a 2 | A 1 = a 1 ) · P ( A 3 = a 3 ) · P ( A 4 = a 4 | A 1 = a 1 , A 2 = a 2 ) A 4 A 5 · P ( A 5 = a 5 | A 2 = a 2 , A 3 = a 3 ) · P ( A 6 = a 6 | A 4 = a 4 , A 5 = a 5 ) A 6 A 7 · P ( A 7 = a 7 | A 5 = a 5 ) . Christian Borgelt Probabilistic Reasoning: Graphical Models 75

  57. Conditional Independence Graphs and Decompositions Core Theorem of Graphical Models: Let p V be a strictly positive probability distribution on a set V of (discrete) variables. A directed or undirected graph G = ( V, E ) is a conditional independence graph w.r.t. p V if and only if p V is factorizable w.r.t. G. Definition: A Markov network is an undirected conditional independence graph of a probability distribution p V together with the family of positive functions φ M of the factorization induced by the graph. Definition: A Bayesian network is a directed conditional independence graph of a probability distribution p U together with the family of conditional probabilities of the factorization induced by the graph. • Sometimes the conditional independence graph is required to be minimal, if it is to be used as the graph underlying a Markov or Bayesian network. • For correct evidence propagation it is not required that the graph is minimal. Evidence propagation may just be less efficient than possible. Christian Borgelt Probabilistic Reasoning: Graphical Models 76

  58. Probabilistic Graphical Models: Evidence Propagation in Undirected Trees Christian Borgelt Probabilistic Reasoning: Graphical Models 77

  59. Evidence Propagation in Undirected Trees Node processors communicating by A µ B → A message passing. The messages rep- resent information collected in the µ A → B corresponding subgraphs. B Derivation of the Propagation Formulae Computation of Marginal Distribution: ∀ A k ∈ U −{ A g } : � � P ( A g = a g ) = P ( A i = a i ) , A i ∈ U a k ∈ dom( A k ) Factor Potential Decomposition w.r.t. Undirected Tree: ∀ A k ∈ U −{ A g } : � � P ( A g = a g ) = φ A i A j ( a i , a j ) . a k ∈ dom( A k ) ( A i ,A j ) ∈ E Christian Borgelt Probabilistic Reasoning: Graphical Models 78

  60. Evidence Propagation in Undirected Trees • All factor potentials have only two arguments, because we deal with a tree: the maximal cliques of a tree are simply its edges, as there are no cycles. • In addition, a tree has the convenient property that by removing an edge it is split into two disconnected subgraphs. • In order to be able to refer to such subgraphs, we define: G ′ C, G ′ = ( U, E − { ( A, B ) , ( B, A ) } ) } , U A B = { A } ∪ { C ∈ U | A ∼ that is, U A B is the set of those attributes that can still be reached from the attribute A if the edge A − B is removed. • Similarly, we introduce a notation for the edges in these subgraphs, namely E A B = E ∩ ( U A B × U A B ) . • Thus G A B = ( U A B , E A B ) is the subgraph containing all attributes that can be reached from the attribute B through its neighbor A (including A itself). Christian Borgelt Probabilistic Reasoning: Graphical Models 79

  61. Evidence Propagation in Undirected Trees • In the next step we split the product over all edges into individual factors w.r.t. the neighbors of the goal attribute: we write one factor for each neighbor. • Each of these factors captures the part of the factorization that refers to the subgraph consisting of the attributes that can be reached from the goal attribute through this neighbor, including the factor potential of the edge that connects the neighbor to the goal attribute. • That is, we write: P ( A g = a g ) ∀ A k ∈ U −{ A g } : � � � � � = φ A g A h ( a g , a h ) φ A i A j ( a i , a j ) . a k ∈ dom( A k ) A h ∈ neighbors( A g ) ( A i ,A j ) ∈ E Ah Ag • Note that indeed each factor of the outer product in the above formula refers only to attributes in the subgraph that can be reached from the attribute A g through the neighbor attribute A h defining the factor. Christian Borgelt Probabilistic Reasoning: Graphical Models 80

  62. Evidence Propagation in Undirected Trees • In the third step it is exploited that terms that are independent of a summation variable can be moved out of the corresponding sum. � � � � • In addition we make use of a i b j = ( a i )( b j ) . i j i j • This yields a decomposition of the expression for P ( A g = a g ) into factors: P ( A g = a g ) � ∀ A k ∈ U Ah Ag : � � � � = φ A g A h ( a g , a h ) φ A i A j ( a i , a j ) A h ∈ neighbors( A g ) a k ∈ dom( A k ) ( A i ,A j ) ∈ E Ah Ag � = µ A h → A g ( A g = a g ) . A h ∈ neighbors( A g ) • Each factor represents the probabilistic influence of the subgraph that can be reached through the corresponding neighbor A h ∈ neighbors( A g ). • Thus it can be interpreted as a message about this influence sent from A h to A g . Christian Borgelt Probabilistic Reasoning: Graphical Models 81

  63. Evidence Propagation in Undirected Trees • With this formula the propagation formula can now easily be derived. • The key is to consider a single factor of the above product and to compare it to the expression for P ( A h = a h ) for the corresponding neighbor A h , that is, to ∀ A k ∈ U −{ A h } : � � P ( A h = a h ) = φ A i A j ( a i , a j ) . a k ∈ dom( A k ) ( A i ,A j ) ∈ E • Note that this formula is completely analogous to the formula for P ( A g = a g ) after the first step, that is, after the application of the factorization formula, with the only difference that this formula refers to A h instead of A g : ∀ A k ∈ U −{ A g } : � � P ( A g = a g ) = φ A i A j ( a i , a j ) . a k ∈ dom( A k ) ( A i ,A j ) ∈ E • We now identify terms that occur in both formulas. Christian Borgelt Probabilistic Reasoning: Graphical Models 82

  64. Evidence Propagation in Undirected Trees A g ∪ U A g • Exploiting that obviously U = U A h A h and drawing on the distributive law again, we can easily rewrite this expression as a product with two factors: � ∀ A k ∈ U Ah Ag −{ A h } : � � � P ( A h = a h ) = φ A i A j ( a i , a j ) ( A i ,A j ) ∈ E Ah a k ∈ dom( A k ) Ag � ∀ A k ∈ U Ag Ah : � � � · φ A g A h ( a g , a h ) φ A i A j ( a i , a j ) . a k ∈ dom( A k ) ( A i ,A j ) ∈ E Ag Ah � �� � = µ A g → A h ( A h = a h ) Christian Borgelt Probabilistic Reasoning: Graphical Models 83

  65. Evidence Propagation in Undirected Trees • As a consequence, we obtain the simple expression µ A h → A g ( A g = a g ) � � � P ( A h = a h ) φ A g A h ( a g , a h ) · = µ A g → A h ( A h = a h ) a h ∈ dom( A h ) � � � � = φ A g A h ( a g , a h ) µ A i → A h ( A h = a h ) . a h ∈ dom( A h ) A i ∈ neighbors( A h ) −{ A g } • This formula is very intuitive: ◦ In the upper form it says that all information collected at A k (expressed as P ( A k = a k )) should be transferred to A g , with the exception of the information that was received from A g . ◦ In the lower form the formula says that everything coming in through edges other than A g − A k has to be combined and then passed on to A g . Christian Borgelt Probabilistic Reasoning: Graphical Models 84

  66. Evidence Propagation in Undirected Trees • The second form of this formula also provides us with a means to start the message computations. • Obviously, the value of the message µ A h → A g ( A g = a g ) can immediately be computed if A h is a leaf node of the tree. In this case the product has no factors and thus the equation reduces to � µ A h → A g ( A g = a g ) = φ A g A h ( a g , a h ) . a h ∈ dom( A h ) • After all leaves have computed these messages, there must be at least one node, for which messages from all but one neighbor are known. • This enables this node to compute the message to the neighbor it did not receive a message from. • After that, there must again be at least one node, which has received messages from all but one neighbor. Hence it can send a message and so on, until all messages have been computed. Christian Borgelt Probabilistic Reasoning: Graphical Models 85

  67. Evidence Propagation in Undirected Trees • Up to now we have assumed that no evidence has been added to the network, that is, that no attributes have been instantiated. • However, if attributes are instantiated, the formulae change only slightly. • We have to add to the joint probability distribution an evidence factor for each instantiated attribute: if U obs is the set of observed (instantiated) attributes, we compute � A o = a (obs) P ( A g = a g | ) o A o ∈ U obs evidence factor for A o � �� � ∀ A k ∈ U −{ A g } : P ( A o = a o | A o = a (obs) � � � ) o = α P ( A i = a i ) , P ( A o = a o ) A i ∈ U A o ∈ U obs a k ∈ dom( A k ) where the a (obs) are the observed values and α is a normalization constant, o � � P ( A j = a (obs) A j = a (obs) ) − 1 . α = β · ) with β = P ( j j A j ∈ U obs A j ∈ U obs Christian Borgelt Probabilistic Reasoning: Graphical Models 86

  68. Evidence Propagation in Undirected Trees • The justification for this formula is analogous to the justification for the introduction of similar evidence factors for the observed attributes in the simple three-attribute example (color/shape/size): � � A o = a (obs) P ( A i = a i | ) o A i ∈ U A o ∈ U obs � � A o = a (obs) = β P ( A i = a i , ) o A i ∈ U A o ∈ U obs  �� � , if ∀ A i ∈ U obs : a i = a (obs)  β P A i ∈ U A i = a i , i =  0 , otherwise, with β as defined above, � A j = a (obs) ) − 1 . β = P ( j A j ∈ U obs Christian Borgelt Probabilistic Reasoning: Graphical Models 87

  69. Evidence Propagation in Undirected Trees • In addition, it is clear that  � � 1 , if a j = a (obs)  , A j = a j | A j = a (obs) ∀ A j ∈ U obs : j P = j  0 , otherwise, • Therefore we have  � � 1 , if ∀ A j ∈ U obs : a j = a (obs)  � , A j = a j | A j = a (obs) j P = j  0 , otherwise. A j ∈ U obs • Combining these equations, we arrive at the formula stated above: � A o = a (obs) P ( A g = a g | ) o A o ∈ U obs evidence factor for A o � �� � ∀ A k ∈ U −{ A g } : P ( A o = a o | A o = a (obs) � � � ) o = α P ( A i = a i ) , P ( A o = a o ) A i ∈ U A o ∈ U obs a k ∈ dom( A k ) Christian Borgelt Probabilistic Reasoning: Graphical Models 88

  70. Evidence Propagation in Undirected Trees • Note that we can neglect the normalization factor α , because it can always be recovered from the fact that a probability distribution, whether marginal or conditional, must be normalized. • That is, instead of trying to determine α beforehand in order to compute � A o ∈ U obs A o = a (obs) P ( A g = a g | ) directly, we confine ourselves to computing o � A o ∈ U obs A o = a (obs) 1 α P ( A g = a g | ) for all a g ∈ dom( A g ). o • Then we determine α indirectly with the equation � � A o = a (obs) P ( A g = a g | ) = 1 . o A o ∈ U obs a g ∈ dom( A g ) � A o ∈ U obs A o = a (obs) • In other words, the computed values 1 α P ( A g = a g | ) o are simply normalized to sum 1 to compute the desired probabilities. Christian Borgelt Probabilistic Reasoning: Graphical Models 89

  71. Evidence Propagation in Undirected Trees • If the derivation is redone with the modified initial formula for the probability of a value of some goal attribute A g , the evidence factors P ( A o = a o | A o = a (obs) ) /P ( A o = a o ) o directly influence only the formula for the messages that are sent out from the instantiated attributes. • Therefore we obtain the following formula for the messages that are sent from an instantiated attribute A o : µ A o → A i ( A i = a i ) � P ( A o = a o | A o = a (obs) � � P ( A o = a o ) ) o = φ A i A o ( a i , a o ) µ A i → A o ( A o = a o ) P ( A o = a o ) a o ∈ dom( A o )  γ · φ A i A o ( a i , a (obs) ) , if a o = a (obs)  , o o =  0 , otherwise, where γ = 1 / µ A i → A o ( A o = a (obs) ). o Christian Borgelt Probabilistic Reasoning: Graphical Models 90

  72. Evidence Propagation in Undirected Trees This formula is again very intuitive: • In an undirected tree, any attribute A o u -separates all attributes in a subgraph reached through one of its neighbors from all attributes in a subgraph reached through any other of its neighbors. • Consequently, if A o is instantiated, all paths through A o are blocked and thus no information should be passed from one neighbor to any other. • Note that in an implementation we can neglect γ , because it is the same for all values a i ∈ dom( A i ) and thus can be incorporated into the constant α . Rewriting the Propagation Formulae in Vector Form: • We need to determine the probability of all values of the goal attribute and we have to evaluate the messages for all values of the attributes that are arguments. • Therefore it is convenient to write the equations in vector form, with a vector for each attribute that has as many elements as the attribute has values. The factor potentials can then be represented as matrices. Christian Borgelt Probabilistic Reasoning: Graphical Models 91

  73. Probabilistic Graphical Models: Evidence Propagation in Polytrees Christian Borgelt Probabilistic Reasoning: Graphical Models 92

  74. ✑ ✆ ✓ Evidence Propagation in Polytrees ✛✘ Idea: Node processors communicating A ✚✙ λ B → A ❅ by message passing: π -messages are sent ❅ ❅ from parent to child and λ -messages are π A → B ❅ ✛✘ B sent from child to parent. ✚✙ Derivation of the Propagation Formulae Computation of Marginal Distribution: � � � � P ( A g = a g ) = P A j = a j A j ∈ U ∀ A i ∈ U −{ A g } : a i ∈ dom( A i ) Chain Rule Factorization w.r.t. the Polytree: � � � � � � � P ( A g = a g ) = P A k = a k A j = a j � A k ∈ U ∀ A i ∈ U −{ A g } : A j ∈ parents( A k ) a i ∈ dom( A i ) Christian Borgelt Probabilistic Reasoning: Graphical Models 93

  75. Evidence Propagation in Polytrees (continued) Decomposition w.r.t. Subgraphs: � � � � � � � P ( A g = a g ) = P A g = a g A j = a j � ∀ A i ∈ U −{ A g } : A j ∈ parents( A g ) a i ∈ dom( A i ) � � � � � � · P A k = a k A j = a j � A k ∈ U + ( A g ) A j ∈ parents( A k ) � � �� � � � · P A k = a k A j = a j . � A k ∈ U − ( A g ) A j ∈ parents( A k ) Attribute sets underlying subgraphs: G ′ = ( U, E − { ( A, B ) } ) } , U A B ( C ) = { C } ∪ { D ∈ U | D ∼ G ′ C, � � � � U C U C U + ( A ) = A ( C ) , U + ( A, B ) = A ( C ) , C ∈ parents( A ) C ∈ parents( A ) −{ B } � � U A U C U − ( A ) = C ( C ) , U − ( A, B ) = A ( C ) . C ∈ children( A ) C ∈ children( A ) −{ B } Christian Borgelt Probabilistic Reasoning: Graphical Models 94

  76. Evidence Propagation in Polytrees (continued) Terms that are independent of a summation variable can be moved out of the corre- sponding sum. This yields a decomposition into two main factors: � � � � � � � P ( A g = a g ) = P A g = a g A j = a j � ∀ A i ∈ parents( A g ): A j ∈ parents( A g ) a i ∈ dom( A i ) � � � ��� � � � � · P A k = a k A j = a j � ∀ A i ∈ U ∗ + ( A g ): A k ∈ U + ( A g ) A j ∈ parents( A k ) a i ∈ dom( A i ) � � � �� � � � � · P A k = a k A j = a j � ∀ A i ∈ U − ( A g ): A k ∈ U − ( A g ) A j ∈ parents( A k ) a i ∈ dom( A i ) = π ( A g = a g ) · λ ( A g = a g ) , where U ∗ + ( A g ) = U + ( A g ) − parents( A g ) . Christian Borgelt Probabilistic Reasoning: Graphical Models 95

  77. Evidence Propagation in Polytrees (continued) � � � � � � � P A k = a k A j = a j � ∀ A i ∈ U ∗ A k ∈ U + ( A g ) A j ∈ parents( A k ) + ( A g ): a i ∈ dom( A i ) � � � � � � � � = P A p = a p A j = a j � A p ∈ parents( A g ) ∀ A i ∈ parents( A p ): A j ∈ parents( A p ) a i ∈ dom( A i ) � � � ��� � � � � · P A k = a k A j = a j � ∀ A i ∈ U ∗ + ( A p ): A k ∈ U + ( A p ) A j ∈ parents( A k ) a i ∈ dom( A i ) � � � �� � � � � · P A k = a k A j = a j � ∀ A i ∈ U − ( A p ,A g ): A k ∈ U − ( A p ,A g ) A j ∈ parents( A k ) a i ∈ dom( A i ) � = π ( A p = a p ) A p ∈ parents( A g ) � � � �� � � � � · P A k = a k A j = a j � ∀ A i ∈ U − ( A p ,A g ): A k ∈ U − ( A p ,A g ) A j ∈ parents( A k ) a i ∈ dom( A i ) Christian Borgelt Probabilistic Reasoning: Graphical Models 96

  78. Evidence Propagation in Polytrees (continued) � � � � � � � P A k = a k A j = a j � ∀ A i ∈ U ∗ A k ∈ U + ( A g ) A j ∈ parents( A k ) + ( A g ): a i ∈ dom( A i ) � = π ( A p = a p ) A p ∈ parents( A g ) � � � �� � � � � · P A k = a k A j = a j � ∀ A i ∈ U − ( A p ,A g ): A k ∈ U − ( A p ,A g ) A j ∈ parents( A k ) a i ∈ dom( A i ) � = π A p → A g ( A p = a p ) A p ∈ parents( A g ) � � π ( A g = a g ) = P ( A g = a g | A j = a j ) ∀ A i ∈ parents( A g ): A j ∈ parents( A g ) a i ∈ dom( A i ) � · π A p → A g ( A p = a p ) A p ∈ parents( A g ) Christian Borgelt Probabilistic Reasoning: Graphical Models 97

  79. Evidence Propagation in Polytrees (continued) � � � P ( A k = a k | λ ( A g = a g ) = A j = a j ) ∀ A i ∈ U − ( A g ): A k ∈ U − ( A g ) A j ∈ parents( A k ) a i ∈ dom( A i ) � � = A c ∈ children( A g ) a c ∈ dom( A c ) � � � P ( A c = a c | A j = a j ) ∀ A i ∈ parents( A c ) −{ A g } : A j ∈ parents( A c ) a i ∈ dom( A i ) � �� � � � · P ( A k = a k | A j = a j ) ∀ A i ∈ U ∗ + ( A c ,A g ): A k ∈ U + ( A c ,A g ) A j ∈ parents( A k ) a i ∈ dom( A i ) � � � � � · P ( A k = a k | A j = a j ) ∀ A i ∈ U − ( A c ): A k ∈ U − ( A c ) A j ∈ parents( A k ) a i ∈ dom( A i ) � �� � = λ ( A c = a c ) � = λ A c → A g ( A g = a g ) A c ∈ children( A g ) Christian Borgelt Probabilistic Reasoning: Graphical Models 98

  80. Propagation Formulae without Evidence π A p → A c ( A p = a p ) � � � �� � � � � = π ( A p = a p ) · P A k = a k A j = a j � ∀ A i ∈ U − ( A p ,A c ): A k ∈ U − ( A p ,A c ) A j ∈ parents( A k ) a i ∈ dom( A i ) P ( A p = a p ) = λ A c → A p ( A p = a p ) λ A c → A p ( A p = a p ) � � � � � � � = λ ( A c = a c ) P A c = a c A j = a j � a c ∈ dom( A c ) ∀ A i ∈ parents( A c ) −{ A p } : A j ∈ parents( A c ) a i ∈ dom( A k ) � · π A k → A p ( A k = a k ) A k ∈ parents( A c ) −{ A p } Christian Borgelt Probabilistic Reasoning: Graphical Models 99

  81. Evidence Propagation in Polytrees (continued) Evidence: The attributes in a set X obs are observed. � � � � A k = a (obs) � P A g = a g � k A k ∈ X obs � � � � � � A k = a (obs) � = P A j = a j � k A j ∈ U A k ∈ X obs ∀ A i ∈ U −{ A g } : a i ∈ dom( A i ) � � � � � � � � � A k = a (obs) � = α P A j = a j P A k = a k , k A j ∈ U A k ∈ X obs ∀ A i ∈ U −{ A g } : a i ∈ dom( A i ) 1 where α = �� � A k ∈ X obs A k = a (obs) P k Christian Borgelt Probabilistic Reasoning: Graphical Models 100

Recommend


More recommend