Directed Graphical Models + Undirected Graphical Models Matt - PowerPoint PPT Presentation

10-418 / 10-618 Machine Learning for Structured Data Machine Learning Department School of Computer Science Carnegie Mellon University Directed Graphical Models + Undirected Graphical Models Matt Gormley Lecture 7 Sep. 18, 2019 1

Q&A Q: How will I earn the 5% participation points? A: Very gradually. There will be a few aspects of the course (polls, surveys, meetings with the course staff) that we will attach participation points to. That said, we might not actually use the whole 5% that is being held out. 2

Q&A Q: When should I prefer a directed graphical model to an undirected graphical model? A: As we’ll see today, the primary differences between them are: 1. the conditional independence assumptions they define 2. the normalization assumptions they make (Bayes Nets are locally normalized) (That said, we’ll also tie them together via a single framework: factor graphs.) There are also some practical differences (e.g. ease of learning) that result from the locally vs. globally normalized difference. 3

Reminders • Homework 1: DAgger for seq2seq – Out: Thu, Sep. 12 – Due: Thu, Sep. 26 at 11:59pm 4

SUPERVISED LEARNING FOR BAYES NETS 5

Recipe for Closed-form MLE 1. Assume data was generated i.i.d. from some model (i.e. write the generative story) x (i) ~ p(x| θ ) 2. Write log-likelihood l ( θ ) = log p(x (1) | θ ) + … + log p(x (N) | θ ) 3. Compute partial derivatives (i.e. gradient) ! l ( θ )/ ! θ 1 = … ! l ( θ )/ ! θ 2 = … … ! l ( θ )/ ! θ M = … 4. Set derivatives to zero and solve for θ ! l ( θ )/ ! θ m = 0 for all m ∈ {1, …, M} θ MLE = solution to system of M equations and M variables Compute the second derivative and check that l ( θ ) is concave down 5. at θ MLE 6

Machine Learning Our model The data inspires defines a score the structures for each structure we want to predict It also tells us Domain Mathematical Knowledge Modeling what to optimize ML Inference finds Optimization Combinatorial { best structure, marginals, Optimization partition function }for a new observation Learning tunes the parameters of the (Inference is usually model called as a subroutine in learning) 7

Machine Learning Model Data X 1 X 3 arrow X 2 an like flies time X 4 X 5 Objective Inference Learning (Inference is usually called as a subroutine in learning) 8

Learning Fully Observed BNs X 1 p ( X 1 , X 2 , X 3 , X 4 , X 5 ) = X 3 X 2 p ( X 5 | X 3 ) p ( X 4 | X 2 , X 3 ) p ( X 3 ) p ( X 2 | X 1 ) p ( X 1 ) X 4 X 5 9

Learning Fully Observed BNs X 1 p ( X 1 , X 2 , X 3 , X 4 , X 5 ) = X 3 X 2 p ( X 5 | X 3 ) p ( X 4 | X 2 , X 3 ) p ( X 3 ) p ( X 2 | X 1 ) p ( X 1 ) X 4 X 5 10

Learning Fully Observed BNs X 1 p ( X 1 , X 2 , X 3 , X 4 , X 5 ) = X 3 X 2 p ( X 5 | X 3 ) p ( X 4 | X 2 , X 3 ) p ( X 3 ) p ( X 2 | X 1 ) p ( X 1 ) X 4 X 5 How do we learn these conditional and marginal distributions for a Bayes Net? 11

Learning Fully Observed BNs Learning this fully observed p ( X 1 , X 2 , X 3 , X 4 , X 5 ) = Bayesian Network is p ( X 5 | X 3 ) p ( X 4 | X 2 , X 3 ) equivalent to learning five (small / simple) independent p ( X 3 ) p ( X 2 | X 1 ) p ( X 1 ) networks from the same data X 1 X 1 X 1 X 3 X 2 X 3 X 2 X 3 X 3 X 2 X 4 X 5 X 4 X 5 12

Learning Fully Observed BNs How do we learn these θ ∗ = argmax conditional and marginal log p ( X 1 , X 2 , X 3 , X 4 , X 5 ) distributions for a Bayes Net? θ = argmax log p ( X 5 | X 3 , θ 5 ) + log p ( X 4 | X 2 , X 3 , θ 4 ) θ X 1 + log p ( X 3 | θ 3 ) + log p ( X 2 | X 1 , θ 2 ) + log p ( X 1 | θ 1 ) X 3 X 2 θ ∗ 1 = argmax log p ( X 1 | θ 1 ) θ 1 X 4 X 5 θ ∗ 2 = argmax log p ( X 2 | X 1 , θ 2 ) θ 2 θ ∗ 3 = argmax log p ( X 3 | θ 3 ) θ 3 θ ∗ 4 = argmax log p ( X 4 | X 2 , X 3 , θ 4 ) θ 4 5 = argmax log p ( X 5 | X 3 , θ 5 ) θ ∗ θ 5 13

Learning Fully Observed BNs 14

INFERENCE FOR BAYESIAN NETWORKS 16

A Few Problems for Bayes Nets Suppose we already have the parameters of a Bayesian Network… 1. How do we compute the probability of a specific assignment to the variables? P(T=t, H=h, A=a, C=c) 2. How do we draw a sample from the joint distribution? t,h,a,c ∼ P(T, H, A, C) 3. How do we compute marginal probabilities? P(A) = … 4. How do we draw samples from a conditional distribution? t,h,a ∼ P(T, H, A | C = c) 5. How do we compute conditional marginal probabilities? P(H | C = c) = … 17

GRAPHICAL MODELS: DETERMINING CONDITIONAL INDEPENDENCIES

What Independencies does a Bayes Net Model? • In order for a Bayesian network to model a probability distribution, the following must be true: Each variable is conditionally independent of all its non-descendants in the graph given the value of all its parents. • This follows from n P ( X 1 … X n ) = ∏ P ( X i | parents ( X i )) i = 1 n ∏ P ( X i | X 1 … X i − 1 ) = i = 1 • But what else does it imply? Slide from William Cohen

What Independencies does a Bayes Net Model? Three cases of interest… Cascade Common Parent V-Structure Z Y X Z Y Y X Z X 20

What Independencies does a Bayes Net Model? Three cases of interest… Cascade Common Parent V-Structure Z Y X Z Y Y X Z X �� Z | Y X � ⊥ Z | Y ⊥ Z | Y X ⊥ X ⊥ Knowing Y Knowing Y decouples X and Z couples X and Z 21

Whiteboard Common Parent Y (The other two Proof of cases can be conditional shown just as X Z independence easily.) ⊥ Z | Y X ⊥ 22

The � Burglar Alarm � example • Your house has a twitchy burglar Burglar Earthquake alarm that is also sometimes triggered by earthquakes. Alarm • Earth arguably doesn’t care whether your house is currently being burgled Phone Call • While you are on vacation, one of your neighbors calls and tells you your home’s burglar alarm is ringing. Uh oh! Quiz: True or False? ⊥ Earthquake | PhoneCall Burglar ⊥ Slide from William Cohen

Markov Blanket (Directed) Def: the co-parents of a node are the parents of its children Def: the Markov Blanket of a X 1 node in a directed graphical model is the set containing the X 2 X 4 X 3 node’s parents, children, and co-parents. X 5 X 8 X 6 X 7 X 9 X 10 X 11 X 13 X 12 25

Markov Blanket (Directed) Example: The Markov Def: the co-parents of a node Blanket of X 6 is are the parents of its children { X 3 , X 4 , X 5 , X 8 , X 9 , X 10 } Def: the Markov Blanket of a X 1 node in a directed graphical model is the set containing the X 2 X 4 X 3 node’s parents, children, and co-parents. Parents Parents X 5 X 8 X 6 X 7 Co-parents Parents X 9 X 10 X 11 Parents Children X 13 X 12 26

Markov Blanket (Directed) Example: The Markov Def: the co-parents of a node Blanket of X 6 is are the parents of its children { X 3 , X 4 , X 5 , X 8 , X 9 , X 10 } Def: the Markov Blanket of a X 1 node in a directed graphical model is the set containing the X 2 X 4 X 3 node’s parents, children, and co-parents. Parents Parents X 5 X 8 X 6 X 7 Theorem: a node is Co-parents Parents conditionally independent of X 9 X 10 every other node in the graph X 11 given its Markov blanket Parents Children X 13 X 12 27

D-Separation If variables X and Z are d-separated given a set of variables E Then X and Z are conditionally independent given the set E Definition #1: Variables X and Z are d-separated given a set of evidence variables E iff every path from X to Z is “blocked”. A path is “blocked” whenever: ∃ Y on path s.t. Y ∈ E and Y is a “common parent” 1. … … Z X Y ∃ Y on path s.t. Y ∈ E and Y is in a “cascade” 2. … … Z X Y ∃ Y on path s.t. {Y, descendants(Y)} ∉ E and Y is in a “v-structure” 3. … … Z X Y 28

D-Separation If variables X and Z are d-separated given a set of variables E Then X and Z are conditionally independent given the set E Definition #2: Variables X and Z are d-separated given a set of evidence variables E iff there does not exist a path in the undirected ancestral moral graph with E removed . 1. Ancestral graph : keep only X, Z, E and their ancestors 2. Moral graph : add undirected edge between all pairs of each node’s parents 3. Undirected graph : convert all directed edges to undirected 4. Givens Removed: delete any nodes in E Example Query: A ⫫ B | {D, E} Moral: Original: Ancestral: Undirected: Givens Removed: A A A A A B B B B B C C C C C ⇒ A and B connected D E D E D E D E ⇒ not d-separated F 29

Learning Objectives Bayesian Networks You should be able to… 1. Identify the conditional independence assumptions given by a generative story or a specification of a joint distribution 2. Draw a Bayesian network given a set of conditional independence assumptions 3. Define the joint distribution specified by a Bayesian network 4. User domain knowledge to construct a (simple) Bayesian network for a real-world modeling problem 5. Depict familiar models as Bayesian networks 6. Use d-separation to prove the existence of conditional independencies in a Bayesian network 7. Employ a Markov blanket to identify conditional independence assumptions of a graphical model 8. Develop a supervised learning algorithm for a Bayesian network 30

TYPES OF GRAPHICAL MODELS 31

Three Types of Graphical Models Directed Graphical Undirected Graphical Factor Graph Model Model X 1 X 1 X 1 X 1 X 1 X 1 X 1 X 1 X 1 X 1 X 1 X 1 X 1 X 1 X 1 X 1 X 1 X 1 32

Directed Graphical Models + Undirected Graphical Models Matt - PowerPoint PPT Presentation

10-418 / 10-618 Machine Learning for Structured Data Machine Learning Department School of Computer Science Carnegie Mellon University Directed Graphical Models + Undirected Graphical Models Matt Gormley Lecture 7 Sep. 18, 2019 1 Q&A

Undirected Graphical Models Aaron Courville, Universit de Montral 2 (UNDIRECTED) GRAPHICAL

Undirected Graphical Models: Markov Random Fields Probabilistic Graphical Models Sharif

Graphical Models Graphical Models Relationship between the directed & undirected models

Probabilistic Graphical Models Probabilistic Graphical Models parameter learning in undirected

Probabilistic Graphical Models Probabilistic Graphical Models Relationship between the directed

Undirected Graphical Models Undirected Graphs Chris Williams, School of Informatics, University

Undirected graphical models Graph G : arbitrary undirected graph Useful when variables interact

Probabilistic Graphical Models Probabilistic Graphical Models Undirected Models Fall 2019

Finding Strongly Connected Components Directed Acyclic Graphs Directed Acyclic Graphs Directed

Jeffrey D. Ullman Stanford University/Infolab Graphs can be either directed or undirected.

Probabilistic Graphical Models 10-708 Learning Completely Observed Learning Completely Observed

Probabilistic Graphical Models Part II: Undirected Graphical Models Selim Aksoy Department of

Probabilistic Graphical Models Part II: Undirected Graphical Models Selim Aksoy Department of

Lecture 5: Connections and Differences between Directed Acyclic and Undirected Graphical Models

Undirected Graphical Model Application Aryan Arbabi CSC 412 Tutorial February 1, 2018 Outline

Two types of GMs Directed edges give causality relationships ( Bayesian Network or Directed

Undirected Graphical Models Michael Gutmann Probabilistic Modelling and Reasoning (INFR11134)

Properties of the QGP with hard probes Oliver Busch for the ALICE collaboration 1 Oliver Busch

In collaboration with: G. Jungman, J.L. Friar, and G. Garvey, Los Alamos E. McCutchan and A.

The first generations of stars Elisabetta Caffau GEPI 0.1 Primordial Universe Understanding

Markov Networks Alan Ri2er Markov Networks Undirected graphical models

Overview MAXENT-Modeling: A framework for Discrete MAXENT-Models and RMs IRT-Modeling?

Outline Vienna, Austria - introduction to the giRaph package The giRaph package for graph

International Association for Cryptologic Research Christian Cachin President, IACR Crypto 2018

Directed Graphical Models + Undirected Graphical Models Matt - PowerPoint PPT Presentation

10-418 / 10-618 Machine Learning for Structured Data Machine Learning Department School of Computer Science Carnegie Mellon University Directed Graphical Models + Undirected Graphical Models Matt Gormley Lecture 7 Sep. 18, 2019 1 Q&A

Undirected Graphical Models Aaron Courville, Universit de Montral 2 (UNDIRECTED) GRAPHICAL

Undirected Graphical Models: Markov Random Fields Probabilistic Graphical Models Sharif

Graphical Models Graphical Models Relationship between the directed &amp; undirected models

Probabilistic Graphical Models Probabilistic Graphical Models parameter learning in undirected

Probabilistic Graphical Models Probabilistic Graphical Models Relationship between the directed

Undirected Graphical Models Undirected Graphs Chris Williams, School of Informatics, University

Undirected graphical models Graph G : arbitrary undirected graph Useful when variables interact

Probabilistic Graphical Models Probabilistic Graphical Models Undirected Models Fall 2019

Finding Strongly Connected Components Directed Acyclic Graphs Directed Acyclic Graphs Directed

Jeffrey D. Ullman Stanford University/Infolab Graphs can be either directed or undirected.

Probabilistic Graphical Models 10-708 Learning Completely Observed Learning Completely Observed

Probabilistic Graphical Models Part II: Undirected Graphical Models Selim Aksoy Department of

Probabilistic Graphical Models Part II: Undirected Graphical Models Selim Aksoy Department of

Lecture 5: Connections and Differences between Directed Acyclic and Undirected Graphical Models

Undirected Graphical Model Application Aryan Arbabi CSC 412 Tutorial February 1, 2018 Outline

Two types of GMs Directed edges give causality relationships ( Bayesian Network or Directed

Undirected Graphical Models Michael Gutmann Probabilistic Modelling and Reasoning (INFR11134)

Properties of the QGP with hard probes Oliver Busch for the ALICE collaboration 1 Oliver Busch

In collaboration with: G. Jungman, J.L. Friar, and G. Garvey, Los Alamos E. McCutchan and A.

The first generations of stars Elisabetta Caffau GEPI 0.1 Primordial Universe Understanding

Markov Networks Alan Ri2er Markov Networks Undirected graphical models

Overview MAXENT-Modeling: A framework for Discrete MAXENT-Models and RMs IRT-Modeling?

Outline Vienna, Austria - introduction to the giRaph package The giRaph package for graph

International Association for Cryptologic Research Christian Cachin President, IACR Crypto 2018

Graphical Models Graphical Models Relationship between the directed & undirected models