Data Mining 2020 Bayesian Networks (1) Ad Feelders Universiteit - PowerPoint PPT Presentation

Data Mining 2020 Bayesian Networks (1) Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Data Mining 1 / 49

Do you like noodles? Do you like noodles? Race Gender Yes No Black Male 10 40 Female 30 20 White Male 100 100 Female 120 80 Ad Feelders ( Universiteit Utrecht ) Data Mining 2 / 49

Do you like noodles? Undirected G R A G ⊥ ⊥ R | A Strange: Gender and Race are prior to Answer, but this model says they are independent given Answer! Ad Feelders ( Universiteit Utrecht ) Data Mining 3 / 49

Do you like noodles? Marginal table for Gender and Race: Race Gender Black White Male 50 200 Female 50 200 From this table we conclude that Race and Gender are independent in the data. cpr(G,R)= 1 Ad Feelders ( Universiteit Utrecht ) Data Mining 4 / 49

Do you like noodles? Table for Gender and Race given Answer=yes: Race Gender Black White Male 10 100 Female 30 120 cpr(G,R) = 0.4 Table for Gender and Race given Answer=no: Race Gender Black White Male 40 100 Female 20 80 cpr(G,R)=1.6 From these tables we conclude that Race and Gender are dependent given Answer. Ad Feelders ( Universiteit Utrecht ) Data Mining 5 / 49

Do you like noodles? Directed G R A G ⊥ ⊥ R , G �⊥ ⊥ R | A Gender and Race are marginally independent (but dependent given Answer). Ad Feelders ( Universiteit Utrecht ) Data Mining 6 / 49

Explaining away S A L Smoking (S) and asbestos exposure (A) are independent, but become dependent if we observe that someone has lung cancer (L). If we observe L, this raises the probability of both S and A. If we subsequently observe S, then the probability of A drops (explaining away effect). Ad Feelders ( Universiteit Utrecht ) Data Mining 7 / 49

Directed Independence Graphs G = ( K , E ), K is a set of vertices and E is a set of edges with ordered pairs of vertices. No directed cycles (DAG) parent/child ancestor/descendant ancestral set Because G is a DAG, there exists a complete ordering of the vertices that is respected in the graph (edges point from lower ordered to higher ordered nodes). Ad Feelders ( Universiteit Utrecht ) Data Mining 8 / 49

Parents Of Node i : pa( i ) i Ad Feelders ( Universiteit Utrecht ) Data Mining 9 / 49

Ancestors Of Node i : an( i ) i Ad Feelders ( Universiteit Utrecht ) Data Mining 10 / 49

Ancestral Set Of Node i : an + ( i ) i Ad Feelders ( Universiteit Utrecht ) Data Mining 11 / 49

Children Of Node i : ch( i ) i Ad Feelders ( Universiteit Utrecht ) Data Mining 12 / 49

Descendants Of Node i : de( i ) i Ad Feelders ( Universiteit Utrecht ) Data Mining 13 / 49

Construction of DAG Suppose that prior knowledge tells us the variables can be labeled X 1 , X 2 , . . . , X k such that X i is prior to X i +1 . (for example: causal or temporal ordering) Corresponding to this ordering we can use the product rule to factorize the joint distribution of X 1 , X 2 , . . . , X k as P ( X ) = P ( X 1 ) P ( X 2 | X 1 ) · · · P ( X k | X k − 1 , X k − 2 , . . . , X 1 ) Note that: 1 This is an identity of probability theory, no independence assumptions have been made yet! 2 The joint probability of any initial segment X 1 , X 2 , . . . , X j (1 ≤ j ≤ k ) is given by the corresponding initial segment of the factorization. Ad Feelders ( Universiteit Utrecht ) Data Mining 14 / 49

Constructing a DAG from pairwise independencies Starting from the complete graph (containing arrows i → j for all i < j ) an arrow from i to j is removed if P ( X j | X j − 1 , . . . , X 1 ) does not depend on X i , in other words, if j ⊥ ⊥ i | { 1 , . . . , j } \ { i , j } More loosely j ⊥ ⊥ i | prior variables Compare this to pairwise independence j ⊥ ⊥ i | rest in undirected independence graphs. Ad Feelders ( Universiteit Utrecht ) Data Mining 15 / 49

Construction Of DAG 1 2 4 3 P ( X ) = P ( X 1 ) P ( X 2 | X 1 ) P ( X 3 | X 1 , X 2 ) P ( X 4 | X 1 , X 2 , X 3 ) Suppose the following independencies are given: 1 X 1 ⊥ ⊥ X 2 2 X 4 ⊥ ⊥ X 3 | ( X 1 , X 2 ) 3 X 1 ⊥ ⊥ X 3 | X 2 Ad Feelders ( Universiteit Utrecht ) Data Mining 16 / 49

Construction Of DAG 1 2 4 3 P ( X ) = P ( X 1 ) P ( X 2 | X 1 ) P ( X 3 | X 1 , X 2 ) P ( X 4 | X 1 , X 2 , X 3 ) � �� P ( X 2 ) 1 If X 1 ⊥ ⊥ X 2 , then P ( X 2 | X 1 ) = P ( X 2 ). The edge 1 → 2 is removed. Ad Feelders ( Universiteit Utrecht ) Data Mining 17 / 49

Construction Of DAG 1 2 3 4 P ( X ) = P ( X 1 ) P ( X 2 ) P ( X 3 | X 1 , X 2 ) P ( X 4 | X 1 , X 2 , X 3 ) Ad Feelders ( Universiteit Utrecht ) Data Mining 18 / 49

Construction Of DAG 1 2 4 3 P ( X ) = P ( X 1 ) P ( X 2 ) P ( X 3 | X 1 , X 2 ) P ( X 4 | X 1 , X 2 , X 3 ) � �� P ( X 4 | X 1 , X 2 ) 2 If X 4 ⊥ ⊥ X 3 | ( X 1 , X 2 ), then P ( X 4 | X 1 , X 2 , X 3 ) = P ( X 4 | X 1 , X 2 ). The edge 3 → 4 is removed. Ad Feelders ( Universiteit Utrecht ) Data Mining 19 / 49

Construction Of DAG 1 2 3 4 P ( X ) = P ( X 1 ) P ( X 2 ) P ( X 3 | X 1 , X 2 ) P ( X 4 | X 1 , X 2 ) Ad Feelders ( Universiteit Utrecht ) Data Mining 20 / 49

Construction Of DAG We end up with this independence graph and corresponding factorization: 1 2 4 3 P ( X ) = P ( X 1 ) P ( X 2 ) P ( X 3 | X 2 ) P ( X 4 | X 1 , X 2 ) Ad Feelders ( Universiteit Utrecht ) Data Mining 22 / 49

Joint probability distribution of Bayesian Network We can write the joint probability distribution more elegantly as k � P ( X 1 , . . . , X k ) = P ( X i | X pa ( i ) ) i =1 Ad Feelders ( Universiteit Utrecht ) Data Mining 23 / 49

Independence Properties of DAGs: d-separation and Moral Graphs Can we infer other/stronger independence statements from the directed graph like we did using separation in the undirected graphical models? Yes, the relevant concept is called d-separation. establishing d-separation directly (Pearl) establishing d-separation via the moral graph and “normal” separation We discuss the second approach. Ad Feelders ( Universiteit Utrecht ) Data Mining 24 / 49

Independence Properties of DAGs: Moral Graph Given a DAG G = ( K , E ) we construct the moral graph G m by marrying parents, and deleting directions, that is, 1 For each i ∈ K , we connect all vertices in pa( i ) with undirected edges. 2 We replace all directed edges in E with undirected ones. DAG Moral Graph Ad Feelders ( Universiteit Utrecht ) Data Mining 25 / 49

Independence Properties of DAGs: Moral Graph The directed independence graph G possesses the conditional independence properties of its associated moral graph G m . Why? We have the factorisation: k � P ( X ) = P ( X i | X pa ( i ) ) i =1 k � = g i ( X i , X pa ( i ) ) i =1 by setting g i ( X i , X pa ( i ) ) = P ( X i | X pa ( i ) ). Ad Feelders ( Universiteit Utrecht ) Data Mining 26 / 49

Independence Properties of DAGs: Moral Graph We have the factorisation: k � P ( X ) = g i ( X i , X pa ( i ) ) i =1 We thus have a factorisation of the joint probability distribution in terms of functions g i ( X a i ) where a i = { i } ∪ pa ( i ). By application of the factorisation criterion the sets a i become cliques in the undirected independence graph. These cliques are formed by moralization. Ad Feelders ( Universiteit Utrecht ) Data Mining 27 / 49

Moralisation: Example X 1 X 2 X 3 X 4 X 5 Ad Feelders ( Universiteit Utrecht ) Data Mining 28 / 49

Moralisation: Example X 1 X 2 X 3 X 4 X 5 { i } ∪ pa ( i ) becomes a complete subgraph in the moral graph (by marrying all unmarried parents). Ad Feelders ( Universiteit Utrecht ) Data Mining 29 / 49

Moralisation Continued Warning: the complete moral graph can obscure independencies! To verify i ⊥ ⊥ j | S construct the moral graph of the induced subgraph on: A = an + ( { i , j } ∪ S ) , that is, A contains i , j , S and all their ancestors. Let G = ( K , E ) and A ⊆ K . The induced subgraph G A contains nodes A and edges E ′ , where i → j ∈ E ′ ⇔ i → j ∈ E and i ∈ A and j ∈ A . Ad Feelders ( Universiteit Utrecht ) Data Mining 30 / 49

Moralisation Continued Since for ℓ ∈ A , pa ( ℓ ) ∈ A , we know that the joint distribution of X A is given by � P ( X A ) = P ( X ℓ | X pa ( ℓ ) ) ℓ ∈ A which corresponds to the subgraph G A of G . 1 This is a product of factors P ( X ℓ | X pa ( ℓ ) ), involving the variables X { ℓ }∪ pa ( ℓ ) only. 2 So it factorizes according to G m A , and thus the independence properties for undirected graphs apply. 3 Hence, if S separates i from j in G m A , then i ⊥ ⊥ j | S . Ad Feelders ( Universiteit Utrecht ) Data Mining 31 / 49

Full moral graph may obscure independencies: example G R A P ( G , R , A ) = P ( G ) P ( R ) P ( A | G , R ) Does G ⊥ ⊥ R hold? Summing out A we obtain: � P ( G , R ) = P ( G , R , A = a ) (sum rule) a � = P ( G ) P ( R ) P ( A = a | G , R ) (BN factorisation) a � = P ( G ) P ( R ) P ( A = a | G , R ) (rule of summation) a ( � = P ( G ) P ( R ) a P ( A = a | G , R ) = 1) Ad Feelders ( Universiteit Utrecht ) Data Mining 32 / 49

Data Mining 2020 Bayesian Networks (1) Ad Feelders Universiteit - PowerPoint PPT Presentation

Data Mining 2020 Bayesian Networks (1) Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Data Mining 1 / 49 Do you like noodles? Do you like noodles? Race Gender Yes No Black Male 10 40 Female 30 20 White

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Bayesian Networks Youve heard about how Bayesian networks have revolutionized AI

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Bayesian networks (2) Lirong Xia Last class Bayesian networks compact, graphical

Introduction What is data mining? to Data mining functionalities Data Mining Major

Data mining Machine Intelligence Thomas D. Nielsen September 2008 Data mining September 2008

Bayesian Networks Philipp Koehn 2 April 2020 Philipp Koehn Artificial Intelligence: Bayesian

DATA MINING LECTURE 2 What is data? The data mining pipeline What is Data Mining? Data

AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG)

Bayesian Methods for Neural Networks Readings: Bishop, Neural Networks for Pattern Recognition .

Chapter14 Probabilistic Reasoning (Bayesian Networks) Sec. 1 - 2 20070607 Chap14 1

Agenda 08:00 PST 1 hr 50 mins Part I - Review of CSKGs 15 min Introduction to commonsense

Makerspaces (and libraries) Noord-Brabant 60 Munipali3es

Categorial Grammar Raffaella Bernardi Contents First Last Prev Next Contents 1

Meet Ra Rashi shida Uma da Umar 2 COO Access & Content Ltd Program Officer, ngNOG

A Location-Allocation model for Fog Computing Infrastructures Thiago Alves de Queiroz IMT,

Sorting networks using a different architecture? Yep. 2. Spaghetti sort ! Lecture 24 November

Artificial Intelligence 15-110 Monday 11/30 Learning Goals Recognize how AIs attempt to

Factoring lexical and phonetic phylogenetic characters from word lists Gerhard J ager &

Data Mining 2020 Bayesian Networks (1) Ad Feelders Universiteit - PowerPoint PPT Presentation

Data Mining 2020 Bayesian Networks (1) Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Data Mining 1 / 49 Do you like noodles? Do you like noodles? Race Gender Yes No Black Male 10 40 Female 30 20 White

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Bayesian Networks Youve heard about how Bayesian networks have revolutionized AI

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Bayesian networks (2) Lirong Xia Last class Bayesian networks compact, graphical

Introduction What is data mining? to Data mining functionalities Data Mining Major

Data mining Machine Intelligence Thomas D. Nielsen September 2008 Data mining September 2008

Bayesian Networks Philipp Koehn 2 April 2020 Philipp Koehn Artificial Intelligence: Bayesian

DATA MINING LECTURE 2 What is data? The data mining pipeline What is Data Mining? Data

AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG)

Bayesian Methods for Neural Networks Readings: Bishop, Neural Networks for Pattern Recognition .

Chapter14 Probabilistic Reasoning (Bayesian Networks) Sec. 1 - 2 20070607 Chap14 1

Agenda 08:00 PST 1 hr 50 mins Part I - Review of CSKGs 15 min Introduction to commonsense

Makerspaces (and libraries) Noord-Brabant 60 Munipali3es

Categorial Grammar Raffaella Bernardi Contents First Last Prev Next Contents 1

Meet Ra Rashi shida Uma da Umar 2 COO Access &amp; Content Ltd Program Officer, ngNOG

A Location-Allocation model for Fog Computing Infrastructures Thiago Alves de Queiroz IMT,

Sorting networks using a different architecture? Yep. 2. Spaghetti sort ! Lecture 24 November

Artificial Intelligence 15-110 Monday 11/30 Learning Goals Recognize how AIs attempt to

Factoring lexical and phonetic phylogenetic characters from word lists Gerhard J ager &amp;

Meet Ra Rashi shida Uma da Umar 2 COO Access & Content Ltd Program Officer, ngNOG

Factoring lexical and phonetic phylogenetic characters from word lists Gerhard J ager &