CS7015 (Deep Learning) : Lecture 18 Markov Networks Mitesh M. - PowerPoint PPT Presentation

Let us try a different network 9/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Let us try a different network D B C A 9/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Let us try a different network D B Again A ⊥ C |{ B, D } C A 9/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Let us try a different network D B Again A ⊥ C |{ B, D } But C A B ⊥ D (unconditional) 9/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Let us try a different network D B Again A ⊥ C |{ B, D } But C A B ⊥ D (unconditional) You can try other networks 9/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Let us try a different network D B Again A ⊥ C |{ B, D } But C A B ⊥ D (unconditional) You can try other networks Turns out there is no Bayesian Net- work which can exactly capture inde- pendence relations that we are interested in 9/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Let us try a different network D B Again A ⊥ C |{ B, D } But C A B ⊥ D (unconditional) You can try other networks Perfect Map : A graph G is a Per- fect Map for a distribution P if the in- Turns out there is no Bayesian Net- dependance relations implied by the work which can exactly capture inde- graph are exactly the same as those pendence relations that we are inter- implied by the distribution ested in 9/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Let us try a different network D B Again A ⊥ C |{ B, D } But C A B ⊥ D (unconditional) You can try other networks Perfect Map : A graph G is a Per- fect Map for a distribution P if the in- Turns out there is no Bayesian Net- dependance relations implied by the work which can exactly capture inde- graph are exactly the same as those pendence relations that we are inter- implied by the distribution ested in There is no Perfect Map for the distribution 9/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

The problem is that a directed graphical model is not suitable for this ex- A ample D B C 10/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

The problem is that a directed graphical model is not suitable for this ex- A ample A directed edge between two nodes D B implies some kind of direction in the interaction C 10/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

The problem is that a directed graphical model is not suitable for this ex- A ample A directed edge between two nodes D B implies some kind of direction in the interaction C For example A → B could indicate that A influences B but not the other way round 10/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

The problem is that a directed graphical model is not suitable for this ex- A ample A directed edge between two nodes D B implies some kind of direction in the interaction C For example A → B could indicate that A influences B but not the other way round But in our example A & B are equal partners (they both contribute to the study discussion) 10/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

The problem is that a directed graphical model is not suitable for this ex- A ample A directed edge between two nodes D B implies some kind of direction in the interaction C For example A → B could indicate that A influences B but not the other way round But in our example A & B are equal partners (they both contribute to the study discussion) We want to capture the strength of this interaction (and there is no direction here) 10/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

We move on from Directed Graph- ical Models to Undirected Graphical A Models D B C 11/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

We move on from Directed Graph- ical Models to Undirected Graphical A Models Also known as Markov Network D B C 11/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

We move on from Directed Graph- ical Models to Undirected Graphical A Models Also known as Markov Network D B The Markov Network on the left exactly captures the interactions inher- C ent in the problem 11/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

We move on from Directed Graph- ical Models to Undirected Graphical A Models Also known as Markov Network D B The Markov Network on the left exactly captures the interactions inher- C ent in the problem But how do we parameterize this graph? 11/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Module 18.2: Factors in Markov Network 12/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Recall that in the directed case the Difficulty Intellligence factors were Conditional Probability Distributions (CPDs) Grade SAT Letter P ( G,S, I, L, D ) = P ( I ) P ( D ) P ( G | I, D ) P ( S | I ) P ( L | G ) 13/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Recall that in the directed case the Difficulty Intellligence factors were Conditional Probability Distributions (CPDs) Each such factor captured interaction Grade SAT (dependence) between the connected nodes Letter P ( G,S, I, L, D ) = P ( I ) P ( D ) P ( G | I, D ) P ( S | I ) P ( L | G ) 13/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Recall that in the directed case the Difficulty Intellligence factors were Conditional Probability Distributions (CPDs) Each such factor captured interaction Grade SAT (dependence) between the connected nodes Letter Can we use CPDs in the undirected case also ? P ( G,S, I, L, D ) = P ( I ) P ( D ) P ( G | I, D ) P ( S | I ) P ( L | G ) 13/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Recall that in the directed case the Difficulty Intellligence factors were Conditional Probability Distributions (CPDs) Each such factor captured interaction Grade SAT (dependence) between the connected nodes Letter Can we use CPDs in the undirected case also ? CPDs don’t make sense in the undir- P ( G,S, I, L, D ) = ected case because there is no direc- P ( I ) P ( D ) P ( G | I, D ) P ( S | I ) P ( L | G ) tion and hence no natural condition- ing (Is A | B or B | A ?) 13/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

So what should be the factors or para- meters in this case A D B C 14/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

So what should be the factors or para- meters in this case A Question: What do we want these factors to capture ? D B C 14/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

So what should be the factors or para- meters in this case A Question: What do we want these factors to capture ? D B Answer: The affinity between connected random variables C 14/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

So what should be the factors or para- meters in this case A Question: What do we want these factors to capture ? D B Answer: The affinity between connected random variables C Just as in the directed case the factors captured the conditional dependence between a set of random variables, here we want them to capture the affinity between them 14/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

However we can borrow the intuition from the directed case. A D B C 15/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

However we can borrow the intuition from the directed case. A Even in the undirected case, we want each such factor to capture inter- D B actions (affinity) between connected nodes C 15/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

However we can borrow the intuition from the directed case. A Even in the undirected case, we want each such factor to capture inter- D B actions (affinity) between connected nodes C We could have factors φ 1 ( A, B ), φ 2 ( B, C ), φ 3 ( C, D ), φ 4 ( D, A ) which capture the affinity between the cor- responding nodes. 15/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Intuitively, it makes sense to have these factors associated with each A pair of connected random variables. D B C φ 1 ( A, B ) φ 2 ( B, C ) φ 3 ( C, D ) φ 4 ( D, A ) a 0 b 0 a 0 b 0 a 0 b 0 a 0 b 0 a 0 b 1 a 0 b 1 a 0 b 1 a 0 b 1 a 1 b 0 a 1 b 0 a 1 b 0 a 1 b 0 a 1 b 1 a 1 b 1 a 1 b 1 a 1 b 1 16/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Intuitively, it makes sense to have these factors associated with each A pair of connected random variables. We could now assign some values of D B these factors C φ 1 ( A, B ) φ 2 ( B, C ) φ 3 ( C, D ) φ 4 ( D, A ) a 0 b 0 a 0 b 0 a 0 b 0 a 0 b 0 30 100 1 100 a 0 b 1 a 0 b 1 a 0 b 0 a 0 b 1 5 1 100 1 a 1 b 0 a 1 b 0 a 1 b 1 a 1 b 0 1 1 100 1 a 1 b 1 a 1 b 1 a 1 b 1 a 1 b 1 10 100 1 100 16/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Intuitively, it makes sense to have these factors associated with each A pair of connected random variables. We could now assign some values of D B these factors C φ 1 ( A, B ) φ 2 ( B, C ) φ 3 ( C, D ) φ 4 ( D, A ) a 0 b 0 a 0 b 0 a 0 b 0 a 0 b 0 30 100 1 100 a 0 b 1 a 0 b 1 a 0 b 0 a 0 b 1 5 1 100 1 a 1 b 0 a 1 b 0 a 1 b 1 a 1 b 0 1 1 100 1 a 1 b 1 a 1 b 1 a 1 b 1 a 1 b 1 10 100 1 100 But who will give us these values ? 16/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Intuitively, it makes sense to have these factors associated with each A pair of connected random variables. We could now assign some values of D B these factors C φ 1 ( A, B ) φ 2 ( B, C ) φ 3 ( C, D ) φ 4 ( D, A ) a 0 b 0 a 0 b 0 a 0 b 0 a 0 b 0 30 100 1 100 a 0 b 1 a 0 b 1 a 0 b 0 a 0 b 1 5 1 100 1 a 1 b 0 a 1 b 0 a 1 b 1 a 1 b 0 1 1 100 1 a 1 b 1 a 1 b 1 a 1 b 1 a 1 b 1 10 100 1 100 But who will give us these values ? Well now you need to learn them from data (same as in the directed case) 16/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Intuitively, it makes sense to have these factors associated with each A pair of connected random variables. We could now assign some values of D B these factors C φ 1 ( A, B ) φ 2 ( B, C ) φ 3 ( C, D ) φ 4 ( D, A ) a 0 b 0 a 0 b 0 a 0 b 0 a 0 b 0 30 100 1 100 a 0 b 1 a 0 b 1 a 0 b 0 a 0 b 1 5 1 100 1 a 1 b 0 a 1 b 0 a 1 b 1 a 1 b 0 1 1 100 1 a 1 b 1 a 1 b 1 a 1 b 1 a 1 b 1 10 100 1 100 But who will give us these values ? Well now you need to learn them from data (same as in the directed case) If you have access to a lot of past interactions between A & B then you could learn these values(more on this later) 16/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Intuitively, it makes sense to have these factors associated with each A pair of connected random variables. We could now assign some values of D B these factors Roughly speaking φ 1 ( A, B ) asserts C that it is more likely for A and B to agree [ ∵ weights for a 0 b 0 , a 1 b 1 > a 0 b 1 , a 1 b 0 ] φ 1 ( A, B ) φ 2 ( B, C ) φ 3 ( C, D ) φ 4 ( D, A ) a 0 b 0 a 0 b 0 a 0 b 0 a 0 b 0 30 100 1 100 a 0 b 1 a 0 b 1 a 0 b 0 a 0 b 1 5 1 100 1 a 1 b 0 a 1 b 0 a 1 b 1 a 1 b 0 1 1 100 1 a 1 b 1 a 1 b 1 a 1 b 1 a 1 b 1 10 100 1 100 But who will give us these values ? Well now you need to learn them from data (same as in the directed case) If you have access to a lot of past interactions between A & B then you could learn these values(more on this later) 16/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Intuitively, it makes sense to have these factors associated with each A pair of connected random variables. We could now assign some values of D B these factors Roughly speaking φ 1 ( A, B ) asserts C that it is more likely for A and B to agree [ ∵ weights for a 0 b 0 , a 1 b 1 > a 0 b 1 , a 1 b 0 ] φ 1 ( A, B ) φ 2 ( B, C ) φ 3 ( C, D ) φ 4 ( D, A ) a 0 b 0 a 0 b 0 a 0 b 0 a 0 b 0 30 100 1 100 φ 1 ( A, B ) also assigns more weight to a 0 b 1 a 0 b 1 a 0 b 0 a 0 b 1 5 1 100 1 a 1 b 0 a 1 b 0 a 1 b 1 a 1 b 0 1 1 100 1 the case when both do not have a mis- a 1 b 1 a 1 b 1 a 1 b 1 a 1 b 1 10 100 1 100 conception as compared to the case But who will give us these values ? when both have the misconception Well now you need to learn them from data a 0 b 0 > a 1 b 1 (same as in the directed case) If you have access to a lot of past interactions between A & B then you could learn these values(more on this later) 16/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Intuitively, it makes sense to have these factors associated with each A pair of connected random variables. We could now assign some values of D B these factors Roughly speaking φ 1 ( A, B ) asserts C that it is more likely for A and B to agree [ ∵ weights for a 0 b 0 , a 1 b 1 > a 0 b 1 , a 1 b 0 ] φ 1 ( A, B ) φ 2 ( B, C ) φ 3 ( C, D ) φ 4 ( D, A ) a 0 b 0 a 0 b 0 a 0 b 0 a 0 b 0 30 100 1 100 φ 1 ( A, B ) also assigns more weight to a 0 b 1 a 0 b 1 a 0 b 0 a 0 b 1 5 1 100 1 a 1 b 0 a 1 b 0 a 1 b 1 a 1 b 0 1 1 100 1 the case when both do not have a mis- a 1 b 1 a 1 b 1 a 1 b 1 a 1 b 1 10 100 1 100 conception as compared to the case But who will give us these values ? when both have the misconception Well now you need to learn them from data a 0 b 0 > a 1 b 1 (same as in the directed case) If you have access to a lot of past interac- We could have similar assignments for tions between A & B then you could learn the other factors these values(more on this later) 16/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Notice a few things A D B C φ 1 ( A, B ) φ 2 ( B, C ) φ 3 ( C, D ) φ 4 ( D, A ) a 0 b 0 a 0 b 0 a 0 b 0 a 0 b 0 30 100 1 100 a 0 b 1 a 0 b 1 a 0 b 0 a 0 b 1 5 1 100 1 a 1 b 0 a 1 b 0 a 1 b 1 a 1 b 0 1 1 100 1 a 1 a 1 a 1 b 1 a 1 b 1 a 1 b 1 10 100 1 100 17/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Notice a few things A These tables do not represent probability distributions D B C φ 1 ( A, B ) φ 2 ( B, C ) φ 3 ( C, D ) φ 4 ( D, A ) a 0 b 0 a 0 b 0 a 0 b 0 a 0 b 0 30 100 1 100 a 0 b 1 a 0 b 1 a 0 b 0 a 0 b 1 5 1 100 1 a 1 b 0 a 1 b 0 a 1 b 1 a 1 b 0 1 1 100 1 a 1 a 1 a 1 b 1 a 1 b 1 a 1 b 1 10 100 1 100 17/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Notice a few things A These tables do not represent probability distributions They are just weights which can be D B interpreted as the relative likelihood of an event C φ 1 ( A, B ) φ 2 ( B, C ) φ 3 ( C, D ) φ 4 ( D, A ) a 0 b 0 a 0 b 0 a 0 b 0 a 0 b 0 30 100 1 100 a 0 b 1 a 0 b 1 a 0 b 0 a 0 b 1 5 1 100 1 a 1 b 0 a 1 b 0 a 1 b 1 a 1 b 0 1 1 100 1 a 1 a 1 a 1 b 1 a 1 b 1 a 1 b 1 10 100 1 100 17/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Notice a few things A These tables do not represent probability distributions They are just weights which can be D B interpreted as the relative likelihood of an event C For example, a = 0 , b = 0 is more likely than a = 1 , b = 1 φ 1 ( A, B ) φ 2 ( B, C ) φ 3 ( C, D ) φ 4 ( D, A ) a 0 b 0 a 0 b 0 a 0 b 0 a 0 b 0 30 100 1 100 a 0 b 1 a 0 b 1 a 0 b 0 a 0 b 1 5 1 100 1 a 1 b 0 a 1 b 0 a 1 b 1 a 1 b 0 1 1 100 1 a 1 a 1 a 1 b 1 a 1 b 1 a 1 b 1 10 100 1 100 17/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

But eventually we are interested in probability distributions A D B C φ 1 ( A, B ) φ 2 ( B, C ) φ 3 ( C, D ) φ 4 ( D, A ) a 0 b 0 a 0 b 0 a 0 b 0 a 0 b 0 30 100 1 100 a 0 b 1 a 0 b 1 a 0 b 0 a 0 b 1 5 1 100 1 a 1 b 0 a 1 b 0 a 1 b 1 a 1 b 0 1 1 100 1 a 1 a 1 a 1 b 1 a 1 b 1 a 1 b 1 10 100 1 100 18/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

But eventually we are interested in probability distributions A In the directed case going from factors to a joint probability dis- D B tribution was easy as the factors were themselves conditional probab- C ility distributions φ 1 ( A, B ) φ 2 ( B, C ) φ 3 ( C, D ) φ 4 ( D, A ) a 0 b 0 a 0 b 0 a 0 b 0 a 0 b 0 30 100 1 100 a 0 b 1 a 0 b 1 a 0 b 0 a 0 b 1 5 1 100 1 a 1 b 0 a 1 b 0 a 1 b 1 a 1 b 0 1 1 100 1 a 1 a 1 a 1 b 1 a 1 b 1 a 1 b 1 10 100 1 100 18/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

But eventually we are interested in probability distributions A In the directed case going from factors to a joint probability dis- D B tribution was easy as the factors were themselves conditional probab- C ility distributions We could just write the joint probab- φ 1 ( A, B ) φ 2 ( B, C ) φ 3 ( C, D ) φ 4 ( D, A ) a 0 b 0 a 0 b 0 a 0 b 0 a 0 b 0 30 100 1 100 ility distribution as the product of the a 0 b 1 a 0 b 1 a 0 b 0 a 0 b 1 5 1 100 1 factors (without violating the axioms a 1 b 0 a 1 b 0 a 1 b 1 a 1 b 0 1 1 100 1 a 1 a 1 a 1 b 1 a 1 b 1 a 1 b 1 10 100 1 100 of probability) 18/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

But eventually we are interested in probability distributions A In the directed case going from factors to a joint probability dis- D B tribution was easy as the factors were themselves conditional probab- C ility distributions We could just write the joint probab- φ 1 ( A, B ) φ 2 ( B, C ) φ 3 ( C, D ) φ 4 ( D, A ) a 0 b 0 a 0 b 0 a 0 b 0 a 0 b 0 30 100 1 100 ility distribution as the product of the a 0 b 1 a 0 b 1 a 0 b 0 a 0 b 1 5 1 100 1 factors (without violating the axioms a 1 b 0 a 1 b 0 a 1 b 1 a 1 b 0 1 1 100 1 a 1 a 1 a 1 b 1 a 1 b 1 a 1 b 1 10 100 1 100 of probability) What do we do in this case when the factors are not probability distributions 18/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Well we could still write it as a product Assignment Unnormalized Normalized a 0 b 0 c 0 d 0 300,000 4.17E-02 of these factors and normalize it appro- a 0 b 0 c 0 d 1 300,000 4.17E-02 priately a 0 b 0 c 1 d 0 300,000 4.17E-02 a 0 b 0 c 1 d 1 30 4.17E-06 a 0 b 1 c 0 d 0 500 6.94E-05 a 0 b 1 c 0 d 1 500 6.94E-05 a 0 b 1 c 1 d 0 5,000,000 6.94E-01 a 0 b 1 c 1 d 1 500 6.94E-05 a 1 b 0 c 0 d 0 100 1.39E-05 a 1 b 0 c 0 d 1 1,000,000 1.39E-01 a 1 b 0 c 1 d 0 100 1.39E-05 a 1 b 0 c 1 d 1 100 1.39E-05 a 1 b 1 c 0 d 0 10 1.39E-06 a 1 b 1 c 0 d 1 100,000 1.39E-02 a 1 b 1 c 1 d 0 100,000 1.39E-02 a 1 b 1 c 1 d 1 100,000 1.39E-02 19/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Well we could still write it as a product Assignment Unnormalized Normalized a 0 b 0 c 0 d 0 300,000 4.17E-02 of these factors and normalize it appro- a 0 b 0 c 0 d 1 300,000 4.17E-02 priately a 0 b 0 c 1 d 0 300,000 4.17E-02 a 0 b 0 c 1 d 1 30 4.17E-06 a 0 b 1 c 0 d 0 500 6.94E-05 P ( a, b, c, d ) = a 0 b 1 c 0 d 1 500 6.94E-05 1 a 0 b 1 c 1 d 0 5,000,000 6.94E-01 Z φ 1 ( a, b ) φ 2 ( b, c ) φ 3 ( c, d ) φ 4 ( d, a ) a 0 b 1 c 1 d 1 500 6.94E-05 a 1 b 0 c 0 d 0 100 1.39E-05 a 1 b 0 c 0 d 1 1,000,000 1.39E-01 a 1 b 0 c 1 d 0 100 1.39E-05 a 1 b 0 c 1 d 1 100 1.39E-05 a 1 b 1 c 0 d 0 10 1.39E-06 a 1 b 1 c 0 d 1 100,000 1.39E-02 a 1 b 1 c 1 d 0 100,000 1.39E-02 a 1 b 1 c 1 d 1 100,000 1.39E-02 19/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Well we could still write it as a product Assignment Unnormalized Normalized a 0 b 0 c 0 d 0 300,000 4.17E-02 of these factors and normalize it appro- a 0 b 0 c 0 d 1 300,000 4.17E-02 priately a 0 b 0 c 1 d 0 300,000 4.17E-02 a 0 b 0 c 1 d 1 30 4.17E-06 a 0 b 1 c 0 d 0 500 6.94E-05 P ( a, b, c, d ) = a 0 b 1 c 0 d 1 500 6.94E-05 1 a 0 b 1 c 1 d 0 5,000,000 6.94E-01 Z φ 1 ( a, b ) φ 2 ( b, c ) φ 3 ( c, d ) φ 4 ( d, a ) a 0 b 1 c 1 d 1 500 6.94E-05 a 1 b 0 c 0 d 0 100 1.39E-05 a 1 b 0 c 0 d 1 1,000,000 1.39E-01 where a 1 b 0 c 1 d 0 100 1.39E-05 a 1 b 0 c 1 d 1 100 1.39E-05 � a 1 b 1 c 0 d 0 Z = φ 1 ( a, b ) φ 2 ( b, c ) φ 3 ( c, d ) φ 4 ( d, a ) 10 1.39E-06 a 1 b 1 c 0 d 1 100,000 1.39E-02 a,b,c,d a 1 b 1 c 1 d 0 100,000 1.39E-02 a 1 b 1 c 1 d 1 100,000 1.39E-02 19/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Well we could still write it as a product Assignment Unnormalized Normalized a 0 b 0 c 0 d 0 300,000 4.17E-02 of these factors and normalize it appro- a 0 b 0 c 0 d 1 300,000 4.17E-02 priately a 0 b 0 c 1 d 0 300,000 4.17E-02 a 0 b 0 c 1 d 1 30 4.17E-06 a 0 b 1 c 0 d 0 500 6.94E-05 P ( a, b, c, d ) = a 0 b 1 c 0 d 1 500 6.94E-05 1 a 0 b 1 c 1 d 0 5,000,000 6.94E-01 Z φ 1 ( a, b ) φ 2 ( b, c ) φ 3 ( c, d ) φ 4 ( d, a ) a 0 b 1 c 1 d 1 500 6.94E-05 a 1 b 0 c 0 d 0 100 1.39E-05 a 1 b 0 c 0 d 1 1,000,000 1.39E-01 where a 1 b 0 c 1 d 0 100 1.39E-05 a 1 b 0 c 1 d 1 100 1.39E-05 � a 1 b 1 c 0 d 0 Z = φ 1 ( a, b ) φ 2 ( b, c ) φ 3 ( c, d ) φ 4 ( d, a ) 10 1.39E-06 a 1 b 1 c 0 d 1 100,000 1.39E-02 a,b,c,d a 1 b 1 c 1 d 0 100,000 1.39E-02 a 1 b 1 c 1 d 1 100,000 1.39E-02 Based on the values that we had assigned to the factors we can now compute the full joint probability distribution 19/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Well we could still write it as a product Assignment Unnormalized Normalized a 0 b 0 c 0 d 0 300,000 4.17E-02 of these factors and normalize it appro- a 0 b 0 c 0 d 1 300,000 4.17E-02 priately a 0 b 0 c 1 d 0 300,000 4.17E-02 a 0 b 0 c 1 d 1 30 4.17E-06 a 0 b 1 c 0 d 0 500 6.94E-05 P ( a, b, c, d ) = a 0 b 1 c 0 d 1 500 6.94E-05 1 a 0 b 1 c 1 d 0 5,000,000 6.94E-01 Z φ 1 ( a, b ) φ 2 ( b, c ) φ 3 ( c, d ) φ 4 ( d, a ) a 0 b 1 c 1 d 1 500 6.94E-05 a 1 b 0 c 0 d 0 100 1.39E-05 a 1 b 0 c 0 d 1 1,000,000 1.39E-01 where a 1 b 0 c 1 d 0 100 1.39E-05 a 1 b 0 c 1 d 1 100 1.39E-05 � a 1 b 1 c 0 d 0 Z = φ 1 ( a, b ) φ 2 ( b, c ) φ 3 ( c, d ) φ 4 ( d, a ) 10 1.39E-06 a 1 b 1 c 0 d 1 100,000 1.39E-02 a,b,c,d a 1 b 1 c 1 d 0 100,000 1.39E-02 a 1 b 1 c 1 d 1 100,000 1.39E-02 Based on the values that we had assigned to the factors we can now compute the full joint probability distribution Z is called the partition function. 19/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Let us build on the original example by adding some more students 20/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Let us build on the original example by adding some more students E A F D B C 20/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Let us build on the original example by adding some more students Once again there is an edge between E A F two students if they study together D B C 20/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Let us build on the original example by adding some more students Once again there is an edge between E A F two students if they study together One way of interpreting these new D B connections is that { A, D, E } from a study group or a clique C 20/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Let us build on the original example by adding some more students Once again there is an edge between E A F two students if they study together One way of interpreting these new D B connections is that { A, D, E } from a study group or a clique C Similarly { A, F, B } form a study group and { C, D } form a study group and { B, C } form a study group 20/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Now, what should the factors be? E A F D B C 21/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Now, what should the factors be? We could still have factors which capture pairwise interactions E A F D B C 21/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Now, what should the factors be? We could still have factors which capture pairwise interactions E A F D B C φ 1 ( A, E ) φ 2 ( A, F ) φ 3 ( B, F ) φ 4 ( A, B ) φ 5 ( A, D ) φ 6 ( D, E ) φ 7 ( B, C ) φ 8 ( C, D ) 21/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Now, what should the factors be? We could still have factors which capture pairwise interactions E A F But could we do something smarter (and more efficient) D B C φ 1 ( A, E ) φ 2 ( A, F ) φ 3 ( B, F ) φ 4 ( A, B ) φ 5 ( A, D ) φ 6 ( D, E ) φ 7 ( B, C ) φ 8 ( C, D ) 21/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Now, what should the factors be? We could still have factors which capture pairwise interactions E A F But could we do something smarter (and more efficient) D B Instead of having a factor for each pair of nodes why not have it for each C maximal clique? φ 1 ( A, E ) φ 2 ( A, F ) φ 3 ( B, F ) φ 4 ( A, B ) φ 5 ( A, D ) φ 6 ( D, E ) φ 7 ( B, C ) φ 8 ( C, D ) 21/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Now, what should the factors be? We could still have factors which capture pairwise interactions E A F But could we do something smarter (and more efficient) D B Instead of having a factor for each pair of nodes why not have it for each C maximal clique? φ 1 ( A, E ) φ 2 ( A, F ) φ 3 ( B, F ) φ 4 ( A, B ) φ 5 ( A, D ) φ 6 ( D, E ) φ 7 ( B, C ) φ 8 ( C, D ) φ 1 ( A, E, D ) φ 2 ( A, F, B ) φ 3 ( B, C ) φ 4 ( C, D ) 21/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

What if we add one more student? 22/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

What if we add one more student? E A F G D B C 22/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

What if we add one more student? What will be the factors in this case? E A F G D B C 22/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

What if we add one more student? What will be the factors in this case? Remember, we are interested in maximal cliques E A F G D B C 22/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

What if we add one more student? What will be the factors in this case? Remember, we are interested in maximal cliques E A F So instead of having factors φ ( EAG ) G φ ( GAD ) φ ( EGD ) we will have a single factor φ ( AEGD ) correspond- D B ing to the maximal clique C 22/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Difficulty Intellligence Grade SAT Letter 23/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Difficulty Intellligence Grade SAT Letter A distribution P factorizes over a Bayesian Network G if P can be expressed as n � P ( X 1 , . . . , X n ) = P ( X i | P a Xi ) i =1 23/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

E A F Difficulty Intellligence B C Grade SAT D Letter A distribution P factorizes over a Bayesian Network G if P can be expressed as n � P ( X 1 , . . . , X n ) = P ( X i | P a Xi ) i =1 23/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

E A F Difficulty Intellligence B C Grade SAT D Letter A distribution factorizes over a Markov A distribution P factorizes over a Bayesian Network H if P can be expressed as Network G if P can be expressed as m P ( X 1 , . . . , X n ) = 1 � n φ ( D i ) Z � P ( X 1 , . . . , X n ) = P ( X i | P a Xi ) i =1 i =1 where each D i is a complete sub-graph (maximal clique) in H 23/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

E A F Difficulty Intellligence B C Grade SAT D Letter A distribution factorizes over a Markov A distribution P factorizes over a Bayesian Network H if P can be expressed as Network G if P can be expressed as m P ( X 1 , . . . , X n ) = 1 � n φ ( D i ) Z � P ( X 1 , . . . , X n ) = P ( X i | P a Xi ) i =1 i =1 where each D i is a complete sub-graph (maximal clique) in H A distribution is a Gibbs distribution parametrized by a set of factors Φ = { φ 1 ( D 1 ) , . . . , φ m ( D m ) } if it is defined as m P ( X 1 , . . . , X n ) = 1 � φ i ( D i ) Z i =1 23/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Module 18.3: Local Independencies in a Markov Network 24/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

CS7015 (Deep Learning) : Lecture 18 Markov Networks Mitesh M. - PowerPoint PPT Presentation

CS7015 (Deep Learning) : Lecture 18 Markov Networks Mitesh M. Khapra Department of Computer Science and Engineering Indian Institute of Technology Madras 1/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18 Acknowledgments

CS7015 (Deep Learning) : Lecture 10 Learning Vectorial Representations Of Words Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 1 (Partial/Brief) History of Deep Learning Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 1 (Partial/Brief) History of Deep Learning Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 21 Variational Autoencoders Mitesh M. Khapra Department of

CS7015 (Deep Learning) : Lecture 21 Variational Autoencoders Mitesh M. Khapra Department of

CS7015 (Deep Learning) : Lecture 22 Autoregressive Models (NADE, MADE) Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 23 Generative Adversarial Networks (GANs) Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 23 Generative Adversarial Networks (GANs) Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 22 Autoregressive Models (NADE, MADE) Mitesh M. Khapra

CS7015 (Deep Learning): Lecture 4 Feedforward Neural Networks, Backpropagation Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 16 Encoder Decoder Models, Attention Mechanism Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 19 Using joint distributions for classification and sampling,

CS7015 (Deep Learning) : Lecture 13 Visualizing Convolutional Neural Networks, Guided

CS7015 (Deep Learning) : Lecture 8 Regularization: Bias Variance Tradeoff, l2 regularization,

CS7015 (Deep Learning) : Lecture 15 Long Short Term Memory Cells (LSTMs), Gated Recurrent Units

CS7015 (Deep Learning) : Lecture 15 Long Short Term Memory Cells (LSTMs), Gated Recurrent Units

CS7015 (Deep Learning) : Lecture 9 Greedy Layerwise Pre-training, Better activation functions,

CS7015 (Deep Learning) : Lecture 5 Gradient Descent (GD), Momentum Based GD, Nesterov Accelerated

CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition,

CS7015 (Deep Learning) : Lecture 12 Object Detection: R-CNN, Fast R-CNN, Faster R-CNN, You Only

CS7015 (Deep Learning) : Lecture 2 McCulloch Pitts Neuron, Thresholding Logic, Perceptrons,

CS7015 (Deep Learning) : Lecture 17 Recap of Probability Theory, Bayesian Networks, Conditional

CS7015 (Deep Learning) : Lecture 11 Convolutional Neural Networks, LeNet, AlexNet, ZF-Net, VGGNet,

CS7015 (Deep Learning) : Lecture 3 Sigmoid Neurons, Gradient Descent, Feedforward Neural Networks,