Let us try a different network 9/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
Let us try a different network D B C A 9/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
Let us try a different network D B Again A ⊥ C |{ B, D } C A 9/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
Let us try a different network D B Again A ⊥ C |{ B, D } But C A B ⊥ D (unconditional) 9/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
Let us try a different network D B Again A ⊥ C |{ B, D } But C A B ⊥ D (unconditional) You can try other networks 9/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
Let us try a different network D B Again A ⊥ C |{ B, D } But C A B ⊥ D (unconditional) You can try other networks Turns out there is no Bayesian Net- work which can exactly capture inde- pendence relations that we are inter- ested in 9/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
Let us try a different network D B Again A ⊥ C |{ B, D } But C A B ⊥ D (unconditional) You can try other networks Perfect Map : A graph G is a Per- fect Map for a distribution P if the in- Turns out there is no Bayesian Net- dependance relations implied by the work which can exactly capture inde- graph are exactly the same as those pendence relations that we are inter- implied by the distribution ested in 9/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
Let us try a different network D B Again A ⊥ C |{ B, D } But C A B ⊥ D (unconditional) You can try other networks Perfect Map : A graph G is a Per- fect Map for a distribution P if the in- Turns out there is no Bayesian Net- dependance relations implied by the work which can exactly capture inde- graph are exactly the same as those pendence relations that we are inter- implied by the distribution ested in There is no Perfect Map for the dis- tribution 9/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
The problem is that a directed graph- ical model is not suitable for this ex- A ample D B C 10/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
The problem is that a directed graph- ical model is not suitable for this ex- A ample A directed edge between two nodes D B implies some kind of direction in the interaction C 10/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
The problem is that a directed graph- ical model is not suitable for this ex- A ample A directed edge between two nodes D B implies some kind of direction in the interaction C For example A → B could indicate that A influences B but not the other way round 10/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
The problem is that a directed graph- ical model is not suitable for this ex- A ample A directed edge between two nodes D B implies some kind of direction in the interaction C For example A → B could indicate that A influences B but not the other way round But in our example A & B are equal partners (they both contribute to the study discussion) 10/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
The problem is that a directed graph- ical model is not suitable for this ex- A ample A directed edge between two nodes D B implies some kind of direction in the interaction C For example A → B could indicate that A influences B but not the other way round But in our example A & B are equal partners (they both contribute to the study discussion) We want to capture the strength of this interaction (and there is no dir- ection here) 10/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
We move on from Directed Graph- ical Models to Undirected Graphical A Models D B C 11/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
We move on from Directed Graph- ical Models to Undirected Graphical A Models Also known as Markov Network D B C 11/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
We move on from Directed Graph- ical Models to Undirected Graphical A Models Also known as Markov Network D B The Markov Network on the left ex- actly captures the interactions inher- C ent in the problem 11/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
We move on from Directed Graph- ical Models to Undirected Graphical A Models Also known as Markov Network D B The Markov Network on the left ex- actly captures the interactions inher- C ent in the problem But how do we parameterize this graph? 11/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
Module 18.2: Factors in Markov Network 12/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
Recall that in the directed case the Difficulty Intellligence factors were Conditional Probability Distributions (CPDs) Grade SAT Letter P ( G,S, I, L, D ) = P ( I ) P ( D ) P ( G | I, D ) P ( S | I ) P ( L | G ) 13/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
Recall that in the directed case the Difficulty Intellligence factors were Conditional Probability Distributions (CPDs) Each such factor captured interaction Grade SAT (dependence) between the connected nodes Letter P ( G,S, I, L, D ) = P ( I ) P ( D ) P ( G | I, D ) P ( S | I ) P ( L | G ) 13/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
Recall that in the directed case the Difficulty Intellligence factors were Conditional Probability Distributions (CPDs) Each such factor captured interaction Grade SAT (dependence) between the connected nodes Letter Can we use CPDs in the undirected case also ? P ( G,S, I, L, D ) = P ( I ) P ( D ) P ( G | I, D ) P ( S | I ) P ( L | G ) 13/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
Recall that in the directed case the Difficulty Intellligence factors were Conditional Probability Distributions (CPDs) Each such factor captured interaction Grade SAT (dependence) between the connected nodes Letter Can we use CPDs in the undirected case also ? CPDs don’t make sense in the undir- P ( G,S, I, L, D ) = ected case because there is no direc- P ( I ) P ( D ) P ( G | I, D ) P ( S | I ) P ( L | G ) tion and hence no natural condition- ing (Is A | B or B | A ?) 13/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
So what should be the factors or para- meters in this case A D B C 14/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
So what should be the factors or para- meters in this case A Question: What do we want these factors to capture ? D B C 14/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
So what should be the factors or para- meters in this case A Question: What do we want these factors to capture ? D B Answer: The affinity between con- nected random variables C 14/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
So what should be the factors or para- meters in this case A Question: What do we want these factors to capture ? D B Answer: The affinity between con- nected random variables C Just as in the directed case the factors captured the conditional dependence between a set of random variables, here we want them to capture the af- finity between them 14/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
However we can borrow the intuition from the directed case. A D B C 15/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
However we can borrow the intuition from the directed case. A Even in the undirected case, we want each such factor to capture inter- D B actions (affinity) between connected nodes C 15/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
However we can borrow the intuition from the directed case. A Even in the undirected case, we want each such factor to capture inter- D B actions (affinity) between connected nodes C We could have factors φ 1 ( A, B ), φ 2 ( B, C ), φ 3 ( C, D ), φ 4 ( D, A ) which capture the affinity between the cor- responding nodes. 15/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
Intuitively, it makes sense to have these factors associated with each A pair of connected random variables. D B C φ 1 ( A, B ) φ 2 ( B, C ) φ 3 ( C, D ) φ 4 ( D, A ) a 0 b 0 a 0 b 0 a 0 b 0 a 0 b 0 a 0 b 1 a 0 b 1 a 0 b 1 a 0 b 1 a 1 b 0 a 1 b 0 a 1 b 0 a 1 b 0 a 1 b 1 a 1 b 1 a 1 b 1 a 1 b 1 16/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
Intuitively, it makes sense to have these factors associated with each A pair of connected random variables. We could now assign some values of D B these factors C φ 1 ( A, B ) φ 2 ( B, C ) φ 3 ( C, D ) φ 4 ( D, A ) a 0 b 0 a 0 b 0 a 0 b 0 a 0 b 0 30 100 1 100 a 0 b 1 a 0 b 1 a 0 b 0 a 0 b 1 5 1 100 1 a 1 b 0 a 1 b 0 a 1 b 1 a 1 b 0 1 1 100 1 a 1 b 1 a 1 b 1 a 1 b 1 a 1 b 1 10 100 1 100 16/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
Intuitively, it makes sense to have these factors associated with each A pair of connected random variables. We could now assign some values of D B these factors C φ 1 ( A, B ) φ 2 ( B, C ) φ 3 ( C, D ) φ 4 ( D, A ) a 0 b 0 a 0 b 0 a 0 b 0 a 0 b 0 30 100 1 100 a 0 b 1 a 0 b 1 a 0 b 0 a 0 b 1 5 1 100 1 a 1 b 0 a 1 b 0 a 1 b 1 a 1 b 0 1 1 100 1 a 1 b 1 a 1 b 1 a 1 b 1 a 1 b 1 10 100 1 100 But who will give us these values ? 16/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
Intuitively, it makes sense to have these factors associated with each A pair of connected random variables. We could now assign some values of D B these factors C φ 1 ( A, B ) φ 2 ( B, C ) φ 3 ( C, D ) φ 4 ( D, A ) a 0 b 0 a 0 b 0 a 0 b 0 a 0 b 0 30 100 1 100 a 0 b 1 a 0 b 1 a 0 b 0 a 0 b 1 5 1 100 1 a 1 b 0 a 1 b 0 a 1 b 1 a 1 b 0 1 1 100 1 a 1 b 1 a 1 b 1 a 1 b 1 a 1 b 1 10 100 1 100 But who will give us these values ? Well now you need to learn them from data (same as in the directed case) 16/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
Intuitively, it makes sense to have these factors associated with each A pair of connected random variables. We could now assign some values of D B these factors C φ 1 ( A, B ) φ 2 ( B, C ) φ 3 ( C, D ) φ 4 ( D, A ) a 0 b 0 a 0 b 0 a 0 b 0 a 0 b 0 30 100 1 100 a 0 b 1 a 0 b 1 a 0 b 0 a 0 b 1 5 1 100 1 a 1 b 0 a 1 b 0 a 1 b 1 a 1 b 0 1 1 100 1 a 1 b 1 a 1 b 1 a 1 b 1 a 1 b 1 10 100 1 100 But who will give us these values ? Well now you need to learn them from data (same as in the directed case) If you have access to a lot of past interac- tions between A & B then you could learn these values(more on this later) 16/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
Intuitively, it makes sense to have these factors associated with each A pair of connected random variables. We could now assign some values of D B these factors Roughly speaking φ 1 ( A, B ) asserts C that it is more likely for A and B to agree [ ∵ weights for a 0 b 0 , a 1 b 1 > a 0 b 1 , a 1 b 0 ] φ 1 ( A, B ) φ 2 ( B, C ) φ 3 ( C, D ) φ 4 ( D, A ) a 0 b 0 a 0 b 0 a 0 b 0 a 0 b 0 30 100 1 100 a 0 b 1 a 0 b 1 a 0 b 0 a 0 b 1 5 1 100 1 a 1 b 0 a 1 b 0 a 1 b 1 a 1 b 0 1 1 100 1 a 1 b 1 a 1 b 1 a 1 b 1 a 1 b 1 10 100 1 100 But who will give us these values ? Well now you need to learn them from data (same as in the directed case) If you have access to a lot of past interac- tions between A & B then you could learn these values(more on this later) 16/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
Intuitively, it makes sense to have these factors associated with each A pair of connected random variables. We could now assign some values of D B these factors Roughly speaking φ 1 ( A, B ) asserts C that it is more likely for A and B to agree [ ∵ weights for a 0 b 0 , a 1 b 1 > a 0 b 1 , a 1 b 0 ] φ 1 ( A, B ) φ 2 ( B, C ) φ 3 ( C, D ) φ 4 ( D, A ) a 0 b 0 a 0 b 0 a 0 b 0 a 0 b 0 30 100 1 100 φ 1 ( A, B ) also assigns more weight to a 0 b 1 a 0 b 1 a 0 b 0 a 0 b 1 5 1 100 1 a 1 b 0 a 1 b 0 a 1 b 1 a 1 b 0 1 1 100 1 the case when both do not have a mis- a 1 b 1 a 1 b 1 a 1 b 1 a 1 b 1 10 100 1 100 conception as compared to the case But who will give us these values ? when both have the misconception Well now you need to learn them from data a 0 b 0 > a 1 b 1 (same as in the directed case) If you have access to a lot of past interac- tions between A & B then you could learn these values(more on this later) 16/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
Intuitively, it makes sense to have these factors associated with each A pair of connected random variables. We could now assign some values of D B these factors Roughly speaking φ 1 ( A, B ) asserts C that it is more likely for A and B to agree [ ∵ weights for a 0 b 0 , a 1 b 1 > a 0 b 1 , a 1 b 0 ] φ 1 ( A, B ) φ 2 ( B, C ) φ 3 ( C, D ) φ 4 ( D, A ) a 0 b 0 a 0 b 0 a 0 b 0 a 0 b 0 30 100 1 100 φ 1 ( A, B ) also assigns more weight to a 0 b 1 a 0 b 1 a 0 b 0 a 0 b 1 5 1 100 1 a 1 b 0 a 1 b 0 a 1 b 1 a 1 b 0 1 1 100 1 the case when both do not have a mis- a 1 b 1 a 1 b 1 a 1 b 1 a 1 b 1 10 100 1 100 conception as compared to the case But who will give us these values ? when both have the misconception Well now you need to learn them from data a 0 b 0 > a 1 b 1 (same as in the directed case) If you have access to a lot of past interac- We could have similar assignments for tions between A & B then you could learn the other factors these values(more on this later) 16/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
Notice a few things A D B C φ 1 ( A, B ) φ 2 ( B, C ) φ 3 ( C, D ) φ 4 ( D, A ) a 0 b 0 a 0 b 0 a 0 b 0 a 0 b 0 30 100 1 100 a 0 b 1 a 0 b 1 a 0 b 0 a 0 b 1 5 1 100 1 a 1 b 0 a 1 b 0 a 1 b 1 a 1 b 0 1 1 100 1 a 1 a 1 a 1 b 1 a 1 b 1 a 1 b 1 10 100 1 100 17/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
Notice a few things A These tables do not represent prob- ability distributions D B C φ 1 ( A, B ) φ 2 ( B, C ) φ 3 ( C, D ) φ 4 ( D, A ) a 0 b 0 a 0 b 0 a 0 b 0 a 0 b 0 30 100 1 100 a 0 b 1 a 0 b 1 a 0 b 0 a 0 b 1 5 1 100 1 a 1 b 0 a 1 b 0 a 1 b 1 a 1 b 0 1 1 100 1 a 1 a 1 a 1 b 1 a 1 b 1 a 1 b 1 10 100 1 100 17/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
Notice a few things A These tables do not represent prob- ability distributions They are just weights which can be D B interpreted as the relative likelihood of an event C φ 1 ( A, B ) φ 2 ( B, C ) φ 3 ( C, D ) φ 4 ( D, A ) a 0 b 0 a 0 b 0 a 0 b 0 a 0 b 0 30 100 1 100 a 0 b 1 a 0 b 1 a 0 b 0 a 0 b 1 5 1 100 1 a 1 b 0 a 1 b 0 a 1 b 1 a 1 b 0 1 1 100 1 a 1 a 1 a 1 b 1 a 1 b 1 a 1 b 1 10 100 1 100 17/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
Notice a few things A These tables do not represent prob- ability distributions They are just weights which can be D B interpreted as the relative likelihood of an event C For example, a = 0 , b = 0 is more likely than a = 1 , b = 1 φ 1 ( A, B ) φ 2 ( B, C ) φ 3 ( C, D ) φ 4 ( D, A ) a 0 b 0 a 0 b 0 a 0 b 0 a 0 b 0 30 100 1 100 a 0 b 1 a 0 b 1 a 0 b 0 a 0 b 1 5 1 100 1 a 1 b 0 a 1 b 0 a 1 b 1 a 1 b 0 1 1 100 1 a 1 a 1 a 1 b 1 a 1 b 1 a 1 b 1 10 100 1 100 17/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
But eventually we are interested in probability distributions A D B C φ 1 ( A, B ) φ 2 ( B, C ) φ 3 ( C, D ) φ 4 ( D, A ) a 0 b 0 a 0 b 0 a 0 b 0 a 0 b 0 30 100 1 100 a 0 b 1 a 0 b 1 a 0 b 0 a 0 b 1 5 1 100 1 a 1 b 0 a 1 b 0 a 1 b 1 a 1 b 0 1 1 100 1 a 1 a 1 a 1 b 1 a 1 b 1 a 1 b 1 10 100 1 100 18/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
But eventually we are interested in probability distributions A In the directed case going from factors to a joint probability dis- D B tribution was easy as the factors were themselves conditional probab- C ility distributions φ 1 ( A, B ) φ 2 ( B, C ) φ 3 ( C, D ) φ 4 ( D, A ) a 0 b 0 a 0 b 0 a 0 b 0 a 0 b 0 30 100 1 100 a 0 b 1 a 0 b 1 a 0 b 0 a 0 b 1 5 1 100 1 a 1 b 0 a 1 b 0 a 1 b 1 a 1 b 0 1 1 100 1 a 1 a 1 a 1 b 1 a 1 b 1 a 1 b 1 10 100 1 100 18/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
But eventually we are interested in probability distributions A In the directed case going from factors to a joint probability dis- D B tribution was easy as the factors were themselves conditional probab- C ility distributions We could just write the joint probab- φ 1 ( A, B ) φ 2 ( B, C ) φ 3 ( C, D ) φ 4 ( D, A ) a 0 b 0 a 0 b 0 a 0 b 0 a 0 b 0 30 100 1 100 ility distribution as the product of the a 0 b 1 a 0 b 1 a 0 b 0 a 0 b 1 5 1 100 1 factors (without violating the axioms a 1 b 0 a 1 b 0 a 1 b 1 a 1 b 0 1 1 100 1 a 1 a 1 a 1 b 1 a 1 b 1 a 1 b 1 10 100 1 100 of probability) 18/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
But eventually we are interested in probability distributions A In the directed case going from factors to a joint probability dis- D B tribution was easy as the factors were themselves conditional probab- C ility distributions We could just write the joint probab- φ 1 ( A, B ) φ 2 ( B, C ) φ 3 ( C, D ) φ 4 ( D, A ) a 0 b 0 a 0 b 0 a 0 b 0 a 0 b 0 30 100 1 100 ility distribution as the product of the a 0 b 1 a 0 b 1 a 0 b 0 a 0 b 1 5 1 100 1 factors (without violating the axioms a 1 b 0 a 1 b 0 a 1 b 1 a 1 b 0 1 1 100 1 a 1 a 1 a 1 b 1 a 1 b 1 a 1 b 1 10 100 1 100 of probability) What do we do in this case when the factors are not probability distribu- tions 18/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
Well we could still write it as a product Assignment Unnormalized Normalized a 0 b 0 c 0 d 0 300,000 4.17E-02 of these factors and normalize it appro- a 0 b 0 c 0 d 1 300,000 4.17E-02 priately a 0 b 0 c 1 d 0 300,000 4.17E-02 a 0 b 0 c 1 d 1 30 4.17E-06 a 0 b 1 c 0 d 0 500 6.94E-05 a 0 b 1 c 0 d 1 500 6.94E-05 a 0 b 1 c 1 d 0 5,000,000 6.94E-01 a 0 b 1 c 1 d 1 500 6.94E-05 a 1 b 0 c 0 d 0 100 1.39E-05 a 1 b 0 c 0 d 1 1,000,000 1.39E-01 a 1 b 0 c 1 d 0 100 1.39E-05 a 1 b 0 c 1 d 1 100 1.39E-05 a 1 b 1 c 0 d 0 10 1.39E-06 a 1 b 1 c 0 d 1 100,000 1.39E-02 a 1 b 1 c 1 d 0 100,000 1.39E-02 a 1 b 1 c 1 d 1 100,000 1.39E-02 19/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
Well we could still write it as a product Assignment Unnormalized Normalized a 0 b 0 c 0 d 0 300,000 4.17E-02 of these factors and normalize it appro- a 0 b 0 c 0 d 1 300,000 4.17E-02 priately a 0 b 0 c 1 d 0 300,000 4.17E-02 a 0 b 0 c 1 d 1 30 4.17E-06 a 0 b 1 c 0 d 0 500 6.94E-05 P ( a, b, c, d ) = a 0 b 1 c 0 d 1 500 6.94E-05 1 a 0 b 1 c 1 d 0 5,000,000 6.94E-01 Z φ 1 ( a, b ) φ 2 ( b, c ) φ 3 ( c, d ) φ 4 ( d, a ) a 0 b 1 c 1 d 1 500 6.94E-05 a 1 b 0 c 0 d 0 100 1.39E-05 a 1 b 0 c 0 d 1 1,000,000 1.39E-01 a 1 b 0 c 1 d 0 100 1.39E-05 a 1 b 0 c 1 d 1 100 1.39E-05 a 1 b 1 c 0 d 0 10 1.39E-06 a 1 b 1 c 0 d 1 100,000 1.39E-02 a 1 b 1 c 1 d 0 100,000 1.39E-02 a 1 b 1 c 1 d 1 100,000 1.39E-02 19/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
Well we could still write it as a product Assignment Unnormalized Normalized a 0 b 0 c 0 d 0 300,000 4.17E-02 of these factors and normalize it appro- a 0 b 0 c 0 d 1 300,000 4.17E-02 priately a 0 b 0 c 1 d 0 300,000 4.17E-02 a 0 b 0 c 1 d 1 30 4.17E-06 a 0 b 1 c 0 d 0 500 6.94E-05 P ( a, b, c, d ) = a 0 b 1 c 0 d 1 500 6.94E-05 1 a 0 b 1 c 1 d 0 5,000,000 6.94E-01 Z φ 1 ( a, b ) φ 2 ( b, c ) φ 3 ( c, d ) φ 4 ( d, a ) a 0 b 1 c 1 d 1 500 6.94E-05 a 1 b 0 c 0 d 0 100 1.39E-05 a 1 b 0 c 0 d 1 1,000,000 1.39E-01 where a 1 b 0 c 1 d 0 100 1.39E-05 a 1 b 0 c 1 d 1 100 1.39E-05 � a 1 b 1 c 0 d 0 Z = φ 1 ( a, b ) φ 2 ( b, c ) φ 3 ( c, d ) φ 4 ( d, a ) 10 1.39E-06 a 1 b 1 c 0 d 1 100,000 1.39E-02 a,b,c,d a 1 b 1 c 1 d 0 100,000 1.39E-02 a 1 b 1 c 1 d 1 100,000 1.39E-02 19/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
Well we could still write it as a product Assignment Unnormalized Normalized a 0 b 0 c 0 d 0 300,000 4.17E-02 of these factors and normalize it appro- a 0 b 0 c 0 d 1 300,000 4.17E-02 priately a 0 b 0 c 1 d 0 300,000 4.17E-02 a 0 b 0 c 1 d 1 30 4.17E-06 a 0 b 1 c 0 d 0 500 6.94E-05 P ( a, b, c, d ) = a 0 b 1 c 0 d 1 500 6.94E-05 1 a 0 b 1 c 1 d 0 5,000,000 6.94E-01 Z φ 1 ( a, b ) φ 2 ( b, c ) φ 3 ( c, d ) φ 4 ( d, a ) a 0 b 1 c 1 d 1 500 6.94E-05 a 1 b 0 c 0 d 0 100 1.39E-05 a 1 b 0 c 0 d 1 1,000,000 1.39E-01 where a 1 b 0 c 1 d 0 100 1.39E-05 a 1 b 0 c 1 d 1 100 1.39E-05 � a 1 b 1 c 0 d 0 Z = φ 1 ( a, b ) φ 2 ( b, c ) φ 3 ( c, d ) φ 4 ( d, a ) 10 1.39E-06 a 1 b 1 c 0 d 1 100,000 1.39E-02 a,b,c,d a 1 b 1 c 1 d 0 100,000 1.39E-02 a 1 b 1 c 1 d 1 100,000 1.39E-02 Based on the values that we had assigned to the factors we can now compute the full joint probability distribution 19/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
Well we could still write it as a product Assignment Unnormalized Normalized a 0 b 0 c 0 d 0 300,000 4.17E-02 of these factors and normalize it appro- a 0 b 0 c 0 d 1 300,000 4.17E-02 priately a 0 b 0 c 1 d 0 300,000 4.17E-02 a 0 b 0 c 1 d 1 30 4.17E-06 a 0 b 1 c 0 d 0 500 6.94E-05 P ( a, b, c, d ) = a 0 b 1 c 0 d 1 500 6.94E-05 1 a 0 b 1 c 1 d 0 5,000,000 6.94E-01 Z φ 1 ( a, b ) φ 2 ( b, c ) φ 3 ( c, d ) φ 4 ( d, a ) a 0 b 1 c 1 d 1 500 6.94E-05 a 1 b 0 c 0 d 0 100 1.39E-05 a 1 b 0 c 0 d 1 1,000,000 1.39E-01 where a 1 b 0 c 1 d 0 100 1.39E-05 a 1 b 0 c 1 d 1 100 1.39E-05 � a 1 b 1 c 0 d 0 Z = φ 1 ( a, b ) φ 2 ( b, c ) φ 3 ( c, d ) φ 4 ( d, a ) 10 1.39E-06 a 1 b 1 c 0 d 1 100,000 1.39E-02 a,b,c,d a 1 b 1 c 1 d 0 100,000 1.39E-02 a 1 b 1 c 1 d 1 100,000 1.39E-02 Based on the values that we had assigned to the factors we can now compute the full joint probability distribution Z is called the partition function. 19/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
Let us build on the original example by adding some more students 20/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
Let us build on the original example by adding some more students E A F D B C 20/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
Let us build on the original example by adding some more students Once again there is an edge between E A F two students if they study together D B C 20/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
Let us build on the original example by adding some more students Once again there is an edge between E A F two students if they study together One way of interpreting these new D B connections is that { A, D, E } from a study group or a clique C 20/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
Let us build on the original example by adding some more students Once again there is an edge between E A F two students if they study together One way of interpreting these new D B connections is that { A, D, E } from a study group or a clique C Similarly { A, F, B } form a study group and { C, D } form a study group and { B, C } form a study group 20/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
Now, what should the factors be? E A F D B C 21/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
Now, what should the factors be? We could still have factors which cap- ture pairwise interactions E A F D B C 21/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
Now, what should the factors be? We could still have factors which cap- ture pairwise interactions E A F D B C φ 1 ( A, E ) φ 2 ( A, F ) φ 3 ( B, F ) φ 4 ( A, B ) φ 5 ( A, D ) φ 6 ( D, E ) φ 7 ( B, C ) φ 8 ( C, D ) 21/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
Now, what should the factors be? We could still have factors which cap- ture pairwise interactions E A F But could we do something smarter (and more efficient) D B C φ 1 ( A, E ) φ 2 ( A, F ) φ 3 ( B, F ) φ 4 ( A, B ) φ 5 ( A, D ) φ 6 ( D, E ) φ 7 ( B, C ) φ 8 ( C, D ) 21/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
Now, what should the factors be? We could still have factors which cap- ture pairwise interactions E A F But could we do something smarter (and more efficient) D B Instead of having a factor for each pair of nodes why not have it for each C maximal clique? φ 1 ( A, E ) φ 2 ( A, F ) φ 3 ( B, F ) φ 4 ( A, B ) φ 5 ( A, D ) φ 6 ( D, E ) φ 7 ( B, C ) φ 8 ( C, D ) 21/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
Now, what should the factors be? We could still have factors which cap- ture pairwise interactions E A F But could we do something smarter (and more efficient) D B Instead of having a factor for each pair of nodes why not have it for each C maximal clique? φ 1 ( A, E ) φ 2 ( A, F ) φ 3 ( B, F ) φ 4 ( A, B ) φ 5 ( A, D ) φ 6 ( D, E ) φ 7 ( B, C ) φ 8 ( C, D ) φ 1 ( A, E, D ) φ 2 ( A, F, B ) φ 3 ( B, C ) φ 4 ( C, D ) 21/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
What if we add one more student? 22/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
What if we add one more student? E A F G D B C 22/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
What if we add one more student? What will be the factors in this case? E A F G D B C 22/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
What if we add one more student? What will be the factors in this case? Remember, we are interested in max- imal cliques E A F G D B C 22/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
What if we add one more student? What will be the factors in this case? Remember, we are interested in max- imal cliques E A F So instead of having factors φ ( EAG ) G φ ( GAD ) φ ( EGD ) we will have a single factor φ ( AEGD ) correspond- D B ing to the maximal clique C 22/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
Difficulty Intellligence Grade SAT Letter 23/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
Difficulty Intellligence Grade SAT Letter A distribution P factorizes over a Bayesian Network G if P can be expressed as n � P ( X 1 , . . . , X n ) = P ( X i | P a Xi ) i =1 23/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
E A F Difficulty Intellligence B C Grade SAT D Letter A distribution P factorizes over a Bayesian Network G if P can be expressed as n � P ( X 1 , . . . , X n ) = P ( X i | P a Xi ) i =1 23/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
E A F Difficulty Intellligence B C Grade SAT D Letter A distribution factorizes over a Markov A distribution P factorizes over a Bayesian Network H if P can be expressed as Network G if P can be expressed as m P ( X 1 , . . . , X n ) = 1 � n φ ( D i ) Z � P ( X 1 , . . . , X n ) = P ( X i | P a Xi ) i =1 i =1 where each D i is a complete sub-graph (maximal clique) in H 23/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
E A F Difficulty Intellligence B C Grade SAT D Letter A distribution factorizes over a Markov A distribution P factorizes over a Bayesian Network H if P can be expressed as Network G if P can be expressed as m P ( X 1 , . . . , X n ) = 1 � n φ ( D i ) Z � P ( X 1 , . . . , X n ) = P ( X i | P a Xi ) i =1 i =1 where each D i is a complete sub-graph (maximal clique) in H A distribution is a Gibbs distribution parametrized by a set of factors Φ = { φ 1 ( D 1 ) , . . . , φ m ( D m ) } if it is defined as m P ( X 1 , . . . , X n ) = 1 � φ i ( D i ) Z i =1 23/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
Module 18.3: Local Independencies in a Markov Network 24/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18
Recommend
More recommend