data mining 2020 bayesian networks 1
play

Data Mining 2020 Bayesian Networks (1) Ad Feelders Universiteit - PowerPoint PPT Presentation

Data Mining 2020 Bayesian Networks (1) Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Data Mining 1 / 49 Do you like noodles? Do you like noodles? Race Gender Yes No Black Male 10 40 Female 30 20 White


  1. Data Mining 2020 Bayesian Networks (1) Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Data Mining 1 / 49

  2. Do you like noodles? Do you like noodles? Race Gender Yes No Black Male 10 40 Female 30 20 White Male 100 100 Female 120 80 Ad Feelders ( Universiteit Utrecht ) Data Mining 2 / 49

  3. Do you like noodles? Undirected G R A G ⊥ ⊥ R | A Strange: Gender and Race are prior to Answer, but this model says they are independent given Answer! Ad Feelders ( Universiteit Utrecht ) Data Mining 3 / 49

  4. Do you like noodles? Marginal table for Gender and Race: Race Gender Black White Male 50 200 Female 50 200 From this table we conclude that Race and Gender are independent in the data. cpr(G,R)= 1 Ad Feelders ( Universiteit Utrecht ) Data Mining 4 / 49

  5. Do you like noodles? Table for Gender and Race given Answer=yes: Race Gender Black White Male 10 100 Female 30 120 cpr(G,R) = 0.4 Table for Gender and Race given Answer=no: Race Gender Black White Male 40 100 Female 20 80 cpr(G,R)=1.6 From these tables we conclude that Race and Gender are dependent given Answer. Ad Feelders ( Universiteit Utrecht ) Data Mining 5 / 49

  6. Do you like noodles? Directed G R A G ⊥ ⊥ R , G �⊥ ⊥ R | A Gender and Race are marginally independent (but dependent given Answer). Ad Feelders ( Universiteit Utrecht ) Data Mining 6 / 49

  7. Explaining away S A L Smoking (S) and asbestos exposure (A) are independent, but become dependent if we observe that someone has lung cancer (L). If we observe L, this raises the probability of both S and A. If we subsequently observe S, then the probability of A drops (explaining away effect). Ad Feelders ( Universiteit Utrecht ) Data Mining 7 / 49

  8. Directed Independence Graphs G = ( K , E ), K is a set of vertices and E is a set of edges with ordered pairs of vertices. No directed cycles (DAG) parent/child ancestor/descendant ancestral set Because G is a DAG, there exists a complete ordering of the vertices that is respected in the graph (edges point from lower ordered to higher ordered nodes). Ad Feelders ( Universiteit Utrecht ) Data Mining 8 / 49

  9. Parents Of Node i : pa( i ) i Ad Feelders ( Universiteit Utrecht ) Data Mining 9 / 49

  10. Ancestors Of Node i : an( i ) i Ad Feelders ( Universiteit Utrecht ) Data Mining 10 / 49

  11. Ancestral Set Of Node i : an + ( i ) i Ad Feelders ( Universiteit Utrecht ) Data Mining 11 / 49

  12. Children Of Node i : ch( i ) i Ad Feelders ( Universiteit Utrecht ) Data Mining 12 / 49

  13. Descendants Of Node i : de( i ) i Ad Feelders ( Universiteit Utrecht ) Data Mining 13 / 49

  14. Construction of DAG Suppose that prior knowledge tells us the variables can be labeled X 1 , X 2 , . . . , X k such that X i is prior to X i +1 . (for example: causal or temporal ordering) Corresponding to this ordering we can use the product rule to factorize the joint distribution of X 1 , X 2 , . . . , X k as P ( X ) = P ( X 1 ) P ( X 2 | X 1 ) · · · P ( X k | X k − 1 , X k − 2 , . . . , X 1 ) Note that: 1 This is an identity of probability theory, no independence assumptions have been made yet! 2 The joint probability of any initial segment X 1 , X 2 , . . . , X j (1 ≤ j ≤ k ) is given by the corresponding initial segment of the factorization. Ad Feelders ( Universiteit Utrecht ) Data Mining 14 / 49

  15. Constructing a DAG from pairwise independencies Starting from the complete graph (containing arrows i → j for all i < j ) an arrow from i to j is removed if P ( X j | X j − 1 , . . . , X 1 ) does not depend on X i , in other words, if j ⊥ ⊥ i | { 1 , . . . , j } \ { i , j } More loosely j ⊥ ⊥ i | prior variables Compare this to pairwise independence j ⊥ ⊥ i | rest in undirected independence graphs. Ad Feelders ( Universiteit Utrecht ) Data Mining 15 / 49

  16. Construction Of DAG 1 2 4 3 P ( X ) = P ( X 1 ) P ( X 2 | X 1 ) P ( X 3 | X 1 , X 2 ) P ( X 4 | X 1 , X 2 , X 3 ) Suppose the following independencies are given: 1 X 1 ⊥ ⊥ X 2 2 X 4 ⊥ ⊥ X 3 | ( X 1 , X 2 ) 3 X 1 ⊥ ⊥ X 3 | X 2 Ad Feelders ( Universiteit Utrecht ) Data Mining 16 / 49

  17. Construction Of DAG 1 2 4 3 P ( X ) = P ( X 1 ) P ( X 2 | X 1 ) P ( X 3 | X 1 , X 2 ) P ( X 4 | X 1 , X 2 , X 3 ) � �� � P ( X 2 ) 1 If X 1 ⊥ ⊥ X 2 , then P ( X 2 | X 1 ) = P ( X 2 ). The edge 1 → 2 is removed. Ad Feelders ( Universiteit Utrecht ) Data Mining 17 / 49

  18. Construction Of DAG 1 2 3 4 P ( X ) = P ( X 1 ) P ( X 2 ) P ( X 3 | X 1 , X 2 ) P ( X 4 | X 1 , X 2 , X 3 ) Ad Feelders ( Universiteit Utrecht ) Data Mining 18 / 49

  19. Construction Of DAG 1 2 4 3 P ( X ) = P ( X 1 ) P ( X 2 ) P ( X 3 | X 1 , X 2 ) P ( X 4 | X 1 , X 2 , X 3 ) � �� � P ( X 4 | X 1 , X 2 ) 2 If X 4 ⊥ ⊥ X 3 | ( X 1 , X 2 ), then P ( X 4 | X 1 , X 2 , X 3 ) = P ( X 4 | X 1 , X 2 ). The edge 3 → 4 is removed. Ad Feelders ( Universiteit Utrecht ) Data Mining 19 / 49

  20. Construction Of DAG 1 2 3 4 P ( X ) = P ( X 1 ) P ( X 2 ) P ( X 3 | X 1 , X 2 ) P ( X 4 | X 1 , X 2 ) Ad Feelders ( Universiteit Utrecht ) Data Mining 20 / 49

  21. Construction Of DAG 1 2 4 3 P ( X ) = P ( X 1 ) P ( X 2 ) P ( X 3 | X 1 , X 2 ) P ( X 4 | X 1 , X 2 ) � �� � P ( X 3 | X 2 ) 3 If X 1 ⊥ ⊥ X 3 | X 2 , then P ( X 3 | X 1 , X 2 ) = P ( X 3 | X 2 ) The edge 1 → 3 is removed. Ad Feelders ( Universiteit Utrecht ) Data Mining 21 / 49

  22. Construction Of DAG We end up with this independence graph and corresponding factorization: 1 2 4 3 P ( X ) = P ( X 1 ) P ( X 2 ) P ( X 3 | X 2 ) P ( X 4 | X 1 , X 2 ) Ad Feelders ( Universiteit Utrecht ) Data Mining 22 / 49

  23. Joint probability distribution of Bayesian Network We can write the joint probability distribution more elegantly as k � P ( X 1 , . . . , X k ) = P ( X i | X pa ( i ) ) i =1 Ad Feelders ( Universiteit Utrecht ) Data Mining 23 / 49

  24. Independence Properties of DAGs: d-separation and Moral Graphs Can we infer other/stronger independence statements from the directed graph like we did using separation in the undirected graphical models? Yes, the relevant concept is called d-separation. establishing d-separation directly (Pearl) establishing d-separation via the moral graph and “normal” separation We discuss the second approach. Ad Feelders ( Universiteit Utrecht ) Data Mining 24 / 49

  25. Independence Properties of DAGs: Moral Graph Given a DAG G = ( K , E ) we construct the moral graph G m by marrying parents, and deleting directions, that is, 1 For each i ∈ K , we connect all vertices in pa( i ) with undirected edges. 2 We replace all directed edges in E with undirected ones. DAG Moral Graph Ad Feelders ( Universiteit Utrecht ) Data Mining 25 / 49

  26. Independence Properties of DAGs: Moral Graph The directed independence graph G possesses the conditional independence properties of its associated moral graph G m . Why? We have the factorisation: k � P ( X ) = P ( X i | X pa ( i ) ) i =1 k � = g i ( X i , X pa ( i ) ) i =1 by setting g i ( X i , X pa ( i ) ) = P ( X i | X pa ( i ) ). Ad Feelders ( Universiteit Utrecht ) Data Mining 26 / 49

  27. Independence Properties of DAGs: Moral Graph We have the factorisation: k � P ( X ) = g i ( X i , X pa ( i ) ) i =1 We thus have a factorisation of the joint probability distribution in terms of functions g i ( X a i ) where a i = { i } ∪ pa ( i ). By application of the factorisation criterion the sets a i become cliques in the undirected independence graph. These cliques are formed by moralization. Ad Feelders ( Universiteit Utrecht ) Data Mining 27 / 49

  28. Moralisation: Example X 1 X 2 X 3 X 4 X 5 Ad Feelders ( Universiteit Utrecht ) Data Mining 28 / 49

  29. Moralisation: Example X 1 X 2 X 3 X 4 X 5 { i } ∪ pa ( i ) becomes a complete subgraph in the moral graph (by marrying all unmarried parents). Ad Feelders ( Universiteit Utrecht ) Data Mining 29 / 49

  30. Moralisation Continued Warning: the complete moral graph can obscure independencies! To verify i ⊥ ⊥ j | S construct the moral graph of the induced subgraph on: A = an + ( { i , j } ∪ S ) , that is, A contains i , j , S and all their ancestors. Let G = ( K , E ) and A ⊆ K . The induced subgraph G A contains nodes A and edges E ′ , where i → j ∈ E ′ ⇔ i → j ∈ E and i ∈ A and j ∈ A . Ad Feelders ( Universiteit Utrecht ) Data Mining 30 / 49

  31. Moralisation Continued Since for ℓ ∈ A , pa ( ℓ ) ∈ A , we know that the joint distribution of X A is given by � P ( X A ) = P ( X ℓ | X pa ( ℓ ) ) ℓ ∈ A which corresponds to the subgraph G A of G . 1 This is a product of factors P ( X ℓ | X pa ( ℓ ) ), involving the variables X { ℓ }∪ pa ( ℓ ) only. 2 So it factorizes according to G m A , and thus the independence properties for undirected graphs apply. 3 Hence, if S separates i from j in G m A , then i ⊥ ⊥ j | S . Ad Feelders ( Universiteit Utrecht ) Data Mining 31 / 49

  32. Full moral graph may obscure independencies: example G R A P ( G , R , A ) = P ( G ) P ( R ) P ( A | G , R ) Does G ⊥ ⊥ R hold? Summing out A we obtain: � P ( G , R ) = P ( G , R , A = a ) (sum rule) a � = P ( G ) P ( R ) P ( A = a | G , R ) (BN factorisation) a � = P ( G ) P ( R ) P ( A = a | G , R ) (rule of summation) a ( � = P ( G ) P ( R ) a P ( A = a | G , R ) = 1) Ad Feelders ( Universiteit Utrecht ) Data Mining 32 / 49

Recommend


More recommend