cs7015 deep learning lecture 18
play

CS7015 (Deep Learning) : Lecture 18 Markov Networks Mitesh M. - PowerPoint PPT Presentation

CS7015 (Deep Learning) : Lecture 18 Markov Networks Mitesh M. Khapra Department of Computer Science and Engineering Indian Institute of Technology Madras 1/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18 Acknowledgments


  1. Let us try a different network 9/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  2. Let us try a different network D B C A 9/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  3. Let us try a different network D B Again A ⊥ C |{ B, D } C A 9/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  4. Let us try a different network D B Again A ⊥ C |{ B, D } But C A B ⊥ D (unconditional) 9/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  5. Let us try a different network D B Again A ⊥ C |{ B, D } But C A B ⊥ D (unconditional) You can try other networks 9/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  6. Let us try a different network D B Again A ⊥ C |{ B, D } But C A B ⊥ D (unconditional) You can try other networks Turns out there is no Bayesian Net- work which can exactly capture inde- pendence relations that we are inter- ested in 9/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  7. Let us try a different network D B Again A ⊥ C |{ B, D } But C A B ⊥ D (unconditional) You can try other networks Perfect Map : A graph G is a Per- fect Map for a distribution P if the in- Turns out there is no Bayesian Net- dependance relations implied by the work which can exactly capture inde- graph are exactly the same as those pendence relations that we are inter- implied by the distribution ested in 9/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  8. Let us try a different network D B Again A ⊥ C |{ B, D } But C A B ⊥ D (unconditional) You can try other networks Perfect Map : A graph G is a Per- fect Map for a distribution P if the in- Turns out there is no Bayesian Net- dependance relations implied by the work which can exactly capture inde- graph are exactly the same as those pendence relations that we are inter- implied by the distribution ested in There is no Perfect Map for the dis- tribution 9/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  9. The problem is that a directed graph- ical model is not suitable for this ex- A ample D B C 10/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  10. The problem is that a directed graph- ical model is not suitable for this ex- A ample A directed edge between two nodes D B implies some kind of direction in the interaction C 10/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  11. The problem is that a directed graph- ical model is not suitable for this ex- A ample A directed edge between two nodes D B implies some kind of direction in the interaction C For example A → B could indicate that A influences B but not the other way round 10/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  12. The problem is that a directed graph- ical model is not suitable for this ex- A ample A directed edge between two nodes D B implies some kind of direction in the interaction C For example A → B could indicate that A influences B but not the other way round But in our example A & B are equal partners (they both contribute to the study discussion) 10/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  13. The problem is that a directed graph- ical model is not suitable for this ex- A ample A directed edge between two nodes D B implies some kind of direction in the interaction C For example A → B could indicate that A influences B but not the other way round But in our example A & B are equal partners (they both contribute to the study discussion) We want to capture the strength of this interaction (and there is no dir- ection here) 10/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  14. We move on from Directed Graph- ical Models to Undirected Graphical A Models D B C 11/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  15. We move on from Directed Graph- ical Models to Undirected Graphical A Models Also known as Markov Network D B C 11/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  16. We move on from Directed Graph- ical Models to Undirected Graphical A Models Also known as Markov Network D B The Markov Network on the left ex- actly captures the interactions inher- C ent in the problem 11/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  17. We move on from Directed Graph- ical Models to Undirected Graphical A Models Also known as Markov Network D B The Markov Network on the left ex- actly captures the interactions inher- C ent in the problem But how do we parameterize this graph? 11/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  18. Module 18.2: Factors in Markov Network 12/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  19. Recall that in the directed case the Difficulty Intellligence factors were Conditional Probability Distributions (CPDs) Grade SAT Letter P ( G,S, I, L, D ) = P ( I ) P ( D ) P ( G | I, D ) P ( S | I ) P ( L | G ) 13/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  20. Recall that in the directed case the Difficulty Intellligence factors were Conditional Probability Distributions (CPDs) Each such factor captured interaction Grade SAT (dependence) between the connected nodes Letter P ( G,S, I, L, D ) = P ( I ) P ( D ) P ( G | I, D ) P ( S | I ) P ( L | G ) 13/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  21. Recall that in the directed case the Difficulty Intellligence factors were Conditional Probability Distributions (CPDs) Each such factor captured interaction Grade SAT (dependence) between the connected nodes Letter Can we use CPDs in the undirected case also ? P ( G,S, I, L, D ) = P ( I ) P ( D ) P ( G | I, D ) P ( S | I ) P ( L | G ) 13/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  22. Recall that in the directed case the Difficulty Intellligence factors were Conditional Probability Distributions (CPDs) Each such factor captured interaction Grade SAT (dependence) between the connected nodes Letter Can we use CPDs in the undirected case also ? CPDs don’t make sense in the undir- P ( G,S, I, L, D ) = ected case because there is no direc- P ( I ) P ( D ) P ( G | I, D ) P ( S | I ) P ( L | G ) tion and hence no natural condition- ing (Is A | B or B | A ?) 13/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  23. So what should be the factors or para- meters in this case A D B C 14/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  24. So what should be the factors or para- meters in this case A Question: What do we want these factors to capture ? D B C 14/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  25. So what should be the factors or para- meters in this case A Question: What do we want these factors to capture ? D B Answer: The affinity between con- nected random variables C 14/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  26. So what should be the factors or para- meters in this case A Question: What do we want these factors to capture ? D B Answer: The affinity between con- nected random variables C Just as in the directed case the factors captured the conditional dependence between a set of random variables, here we want them to capture the af- finity between them 14/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  27. However we can borrow the intuition from the directed case. A D B C 15/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  28. However we can borrow the intuition from the directed case. A Even in the undirected case, we want each such factor to capture inter- D B actions (affinity) between connected nodes C 15/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  29. However we can borrow the intuition from the directed case. A Even in the undirected case, we want each such factor to capture inter- D B actions (affinity) between connected nodes C We could have factors φ 1 ( A, B ), φ 2 ( B, C ), φ 3 ( C, D ), φ 4 ( D, A ) which capture the affinity between the cor- responding nodes. 15/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  30. Intuitively, it makes sense to have these factors associated with each A pair of connected random variables. D B C φ 1 ( A, B ) φ 2 ( B, C ) φ 3 ( C, D ) φ 4 ( D, A ) a 0 b 0 a 0 b 0 a 0 b 0 a 0 b 0 a 0 b 1 a 0 b 1 a 0 b 1 a 0 b 1 a 1 b 0 a 1 b 0 a 1 b 0 a 1 b 0 a 1 b 1 a 1 b 1 a 1 b 1 a 1 b 1 16/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  31. Intuitively, it makes sense to have these factors associated with each A pair of connected random variables. We could now assign some values of D B these factors C φ 1 ( A, B ) φ 2 ( B, C ) φ 3 ( C, D ) φ 4 ( D, A ) a 0 b 0 a 0 b 0 a 0 b 0 a 0 b 0 30 100 1 100 a 0 b 1 a 0 b 1 a 0 b 0 a 0 b 1 5 1 100 1 a 1 b 0 a 1 b 0 a 1 b 1 a 1 b 0 1 1 100 1 a 1 b 1 a 1 b 1 a 1 b 1 a 1 b 1 10 100 1 100 16/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  32. Intuitively, it makes sense to have these factors associated with each A pair of connected random variables. We could now assign some values of D B these factors C φ 1 ( A, B ) φ 2 ( B, C ) φ 3 ( C, D ) φ 4 ( D, A ) a 0 b 0 a 0 b 0 a 0 b 0 a 0 b 0 30 100 1 100 a 0 b 1 a 0 b 1 a 0 b 0 a 0 b 1 5 1 100 1 a 1 b 0 a 1 b 0 a 1 b 1 a 1 b 0 1 1 100 1 a 1 b 1 a 1 b 1 a 1 b 1 a 1 b 1 10 100 1 100 But who will give us these values ? 16/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  33. Intuitively, it makes sense to have these factors associated with each A pair of connected random variables. We could now assign some values of D B these factors C φ 1 ( A, B ) φ 2 ( B, C ) φ 3 ( C, D ) φ 4 ( D, A ) a 0 b 0 a 0 b 0 a 0 b 0 a 0 b 0 30 100 1 100 a 0 b 1 a 0 b 1 a 0 b 0 a 0 b 1 5 1 100 1 a 1 b 0 a 1 b 0 a 1 b 1 a 1 b 0 1 1 100 1 a 1 b 1 a 1 b 1 a 1 b 1 a 1 b 1 10 100 1 100 But who will give us these values ? Well now you need to learn them from data (same as in the directed case) 16/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  34. Intuitively, it makes sense to have these factors associated with each A pair of connected random variables. We could now assign some values of D B these factors C φ 1 ( A, B ) φ 2 ( B, C ) φ 3 ( C, D ) φ 4 ( D, A ) a 0 b 0 a 0 b 0 a 0 b 0 a 0 b 0 30 100 1 100 a 0 b 1 a 0 b 1 a 0 b 0 a 0 b 1 5 1 100 1 a 1 b 0 a 1 b 0 a 1 b 1 a 1 b 0 1 1 100 1 a 1 b 1 a 1 b 1 a 1 b 1 a 1 b 1 10 100 1 100 But who will give us these values ? Well now you need to learn them from data (same as in the directed case) If you have access to a lot of past interac- tions between A & B then you could learn these values(more on this later) 16/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  35. Intuitively, it makes sense to have these factors associated with each A pair of connected random variables. We could now assign some values of D B these factors Roughly speaking φ 1 ( A, B ) asserts C that it is more likely for A and B to agree [ ∵ weights for a 0 b 0 , a 1 b 1 > a 0 b 1 , a 1 b 0 ] φ 1 ( A, B ) φ 2 ( B, C ) φ 3 ( C, D ) φ 4 ( D, A ) a 0 b 0 a 0 b 0 a 0 b 0 a 0 b 0 30 100 1 100 a 0 b 1 a 0 b 1 a 0 b 0 a 0 b 1 5 1 100 1 a 1 b 0 a 1 b 0 a 1 b 1 a 1 b 0 1 1 100 1 a 1 b 1 a 1 b 1 a 1 b 1 a 1 b 1 10 100 1 100 But who will give us these values ? Well now you need to learn them from data (same as in the directed case) If you have access to a lot of past interac- tions between A & B then you could learn these values(more on this later) 16/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  36. Intuitively, it makes sense to have these factors associated with each A pair of connected random variables. We could now assign some values of D B these factors Roughly speaking φ 1 ( A, B ) asserts C that it is more likely for A and B to agree [ ∵ weights for a 0 b 0 , a 1 b 1 > a 0 b 1 , a 1 b 0 ] φ 1 ( A, B ) φ 2 ( B, C ) φ 3 ( C, D ) φ 4 ( D, A ) a 0 b 0 a 0 b 0 a 0 b 0 a 0 b 0 30 100 1 100 φ 1 ( A, B ) also assigns more weight to a 0 b 1 a 0 b 1 a 0 b 0 a 0 b 1 5 1 100 1 a 1 b 0 a 1 b 0 a 1 b 1 a 1 b 0 1 1 100 1 the case when both do not have a mis- a 1 b 1 a 1 b 1 a 1 b 1 a 1 b 1 10 100 1 100 conception as compared to the case But who will give us these values ? when both have the misconception Well now you need to learn them from data a 0 b 0 > a 1 b 1 (same as in the directed case) If you have access to a lot of past interac- tions between A & B then you could learn these values(more on this later) 16/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  37. Intuitively, it makes sense to have these factors associated with each A pair of connected random variables. We could now assign some values of D B these factors Roughly speaking φ 1 ( A, B ) asserts C that it is more likely for A and B to agree [ ∵ weights for a 0 b 0 , a 1 b 1 > a 0 b 1 , a 1 b 0 ] φ 1 ( A, B ) φ 2 ( B, C ) φ 3 ( C, D ) φ 4 ( D, A ) a 0 b 0 a 0 b 0 a 0 b 0 a 0 b 0 30 100 1 100 φ 1 ( A, B ) also assigns more weight to a 0 b 1 a 0 b 1 a 0 b 0 a 0 b 1 5 1 100 1 a 1 b 0 a 1 b 0 a 1 b 1 a 1 b 0 1 1 100 1 the case when both do not have a mis- a 1 b 1 a 1 b 1 a 1 b 1 a 1 b 1 10 100 1 100 conception as compared to the case But who will give us these values ? when both have the misconception Well now you need to learn them from data a 0 b 0 > a 1 b 1 (same as in the directed case) If you have access to a lot of past interac- We could have similar assignments for tions between A & B then you could learn the other factors these values(more on this later) 16/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  38. Notice a few things A D B C φ 1 ( A, B ) φ 2 ( B, C ) φ 3 ( C, D ) φ 4 ( D, A ) a 0 b 0 a 0 b 0 a 0 b 0 a 0 b 0 30 100 1 100 a 0 b 1 a 0 b 1 a 0 b 0 a 0 b 1 5 1 100 1 a 1 b 0 a 1 b 0 a 1 b 1 a 1 b 0 1 1 100 1 a 1 a 1 a 1 b 1 a 1 b 1 a 1 b 1 10 100 1 100 17/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  39. Notice a few things A These tables do not represent prob- ability distributions D B C φ 1 ( A, B ) φ 2 ( B, C ) φ 3 ( C, D ) φ 4 ( D, A ) a 0 b 0 a 0 b 0 a 0 b 0 a 0 b 0 30 100 1 100 a 0 b 1 a 0 b 1 a 0 b 0 a 0 b 1 5 1 100 1 a 1 b 0 a 1 b 0 a 1 b 1 a 1 b 0 1 1 100 1 a 1 a 1 a 1 b 1 a 1 b 1 a 1 b 1 10 100 1 100 17/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  40. Notice a few things A These tables do not represent prob- ability distributions They are just weights which can be D B interpreted as the relative likelihood of an event C φ 1 ( A, B ) φ 2 ( B, C ) φ 3 ( C, D ) φ 4 ( D, A ) a 0 b 0 a 0 b 0 a 0 b 0 a 0 b 0 30 100 1 100 a 0 b 1 a 0 b 1 a 0 b 0 a 0 b 1 5 1 100 1 a 1 b 0 a 1 b 0 a 1 b 1 a 1 b 0 1 1 100 1 a 1 a 1 a 1 b 1 a 1 b 1 a 1 b 1 10 100 1 100 17/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  41. Notice a few things A These tables do not represent prob- ability distributions They are just weights which can be D B interpreted as the relative likelihood of an event C For example, a = 0 , b = 0 is more likely than a = 1 , b = 1 φ 1 ( A, B ) φ 2 ( B, C ) φ 3 ( C, D ) φ 4 ( D, A ) a 0 b 0 a 0 b 0 a 0 b 0 a 0 b 0 30 100 1 100 a 0 b 1 a 0 b 1 a 0 b 0 a 0 b 1 5 1 100 1 a 1 b 0 a 1 b 0 a 1 b 1 a 1 b 0 1 1 100 1 a 1 a 1 a 1 b 1 a 1 b 1 a 1 b 1 10 100 1 100 17/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  42. But eventually we are interested in probability distributions A D B C φ 1 ( A, B ) φ 2 ( B, C ) φ 3 ( C, D ) φ 4 ( D, A ) a 0 b 0 a 0 b 0 a 0 b 0 a 0 b 0 30 100 1 100 a 0 b 1 a 0 b 1 a 0 b 0 a 0 b 1 5 1 100 1 a 1 b 0 a 1 b 0 a 1 b 1 a 1 b 0 1 1 100 1 a 1 a 1 a 1 b 1 a 1 b 1 a 1 b 1 10 100 1 100 18/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  43. But eventually we are interested in probability distributions A In the directed case going from factors to a joint probability dis- D B tribution was easy as the factors were themselves conditional probab- C ility distributions φ 1 ( A, B ) φ 2 ( B, C ) φ 3 ( C, D ) φ 4 ( D, A ) a 0 b 0 a 0 b 0 a 0 b 0 a 0 b 0 30 100 1 100 a 0 b 1 a 0 b 1 a 0 b 0 a 0 b 1 5 1 100 1 a 1 b 0 a 1 b 0 a 1 b 1 a 1 b 0 1 1 100 1 a 1 a 1 a 1 b 1 a 1 b 1 a 1 b 1 10 100 1 100 18/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  44. But eventually we are interested in probability distributions A In the directed case going from factors to a joint probability dis- D B tribution was easy as the factors were themselves conditional probab- C ility distributions We could just write the joint probab- φ 1 ( A, B ) φ 2 ( B, C ) φ 3 ( C, D ) φ 4 ( D, A ) a 0 b 0 a 0 b 0 a 0 b 0 a 0 b 0 30 100 1 100 ility distribution as the product of the a 0 b 1 a 0 b 1 a 0 b 0 a 0 b 1 5 1 100 1 factors (without violating the axioms a 1 b 0 a 1 b 0 a 1 b 1 a 1 b 0 1 1 100 1 a 1 a 1 a 1 b 1 a 1 b 1 a 1 b 1 10 100 1 100 of probability) 18/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  45. But eventually we are interested in probability distributions A In the directed case going from factors to a joint probability dis- D B tribution was easy as the factors were themselves conditional probab- C ility distributions We could just write the joint probab- φ 1 ( A, B ) φ 2 ( B, C ) φ 3 ( C, D ) φ 4 ( D, A ) a 0 b 0 a 0 b 0 a 0 b 0 a 0 b 0 30 100 1 100 ility distribution as the product of the a 0 b 1 a 0 b 1 a 0 b 0 a 0 b 1 5 1 100 1 factors (without violating the axioms a 1 b 0 a 1 b 0 a 1 b 1 a 1 b 0 1 1 100 1 a 1 a 1 a 1 b 1 a 1 b 1 a 1 b 1 10 100 1 100 of probability) What do we do in this case when the factors are not probability distribu- tions 18/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  46. Well we could still write it as a product Assignment Unnormalized Normalized a 0 b 0 c 0 d 0 300,000 4.17E-02 of these factors and normalize it appro- a 0 b 0 c 0 d 1 300,000 4.17E-02 priately a 0 b 0 c 1 d 0 300,000 4.17E-02 a 0 b 0 c 1 d 1 30 4.17E-06 a 0 b 1 c 0 d 0 500 6.94E-05 a 0 b 1 c 0 d 1 500 6.94E-05 a 0 b 1 c 1 d 0 5,000,000 6.94E-01 a 0 b 1 c 1 d 1 500 6.94E-05 a 1 b 0 c 0 d 0 100 1.39E-05 a 1 b 0 c 0 d 1 1,000,000 1.39E-01 a 1 b 0 c 1 d 0 100 1.39E-05 a 1 b 0 c 1 d 1 100 1.39E-05 a 1 b 1 c 0 d 0 10 1.39E-06 a 1 b 1 c 0 d 1 100,000 1.39E-02 a 1 b 1 c 1 d 0 100,000 1.39E-02 a 1 b 1 c 1 d 1 100,000 1.39E-02 19/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  47. Well we could still write it as a product Assignment Unnormalized Normalized a 0 b 0 c 0 d 0 300,000 4.17E-02 of these factors and normalize it appro- a 0 b 0 c 0 d 1 300,000 4.17E-02 priately a 0 b 0 c 1 d 0 300,000 4.17E-02 a 0 b 0 c 1 d 1 30 4.17E-06 a 0 b 1 c 0 d 0 500 6.94E-05 P ( a, b, c, d ) = a 0 b 1 c 0 d 1 500 6.94E-05 1 a 0 b 1 c 1 d 0 5,000,000 6.94E-01 Z φ 1 ( a, b ) φ 2 ( b, c ) φ 3 ( c, d ) φ 4 ( d, a ) a 0 b 1 c 1 d 1 500 6.94E-05 a 1 b 0 c 0 d 0 100 1.39E-05 a 1 b 0 c 0 d 1 1,000,000 1.39E-01 a 1 b 0 c 1 d 0 100 1.39E-05 a 1 b 0 c 1 d 1 100 1.39E-05 a 1 b 1 c 0 d 0 10 1.39E-06 a 1 b 1 c 0 d 1 100,000 1.39E-02 a 1 b 1 c 1 d 0 100,000 1.39E-02 a 1 b 1 c 1 d 1 100,000 1.39E-02 19/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  48. Well we could still write it as a product Assignment Unnormalized Normalized a 0 b 0 c 0 d 0 300,000 4.17E-02 of these factors and normalize it appro- a 0 b 0 c 0 d 1 300,000 4.17E-02 priately a 0 b 0 c 1 d 0 300,000 4.17E-02 a 0 b 0 c 1 d 1 30 4.17E-06 a 0 b 1 c 0 d 0 500 6.94E-05 P ( a, b, c, d ) = a 0 b 1 c 0 d 1 500 6.94E-05 1 a 0 b 1 c 1 d 0 5,000,000 6.94E-01 Z φ 1 ( a, b ) φ 2 ( b, c ) φ 3 ( c, d ) φ 4 ( d, a ) a 0 b 1 c 1 d 1 500 6.94E-05 a 1 b 0 c 0 d 0 100 1.39E-05 a 1 b 0 c 0 d 1 1,000,000 1.39E-01 where a 1 b 0 c 1 d 0 100 1.39E-05 a 1 b 0 c 1 d 1 100 1.39E-05 � a 1 b 1 c 0 d 0 Z = φ 1 ( a, b ) φ 2 ( b, c ) φ 3 ( c, d ) φ 4 ( d, a ) 10 1.39E-06 a 1 b 1 c 0 d 1 100,000 1.39E-02 a,b,c,d a 1 b 1 c 1 d 0 100,000 1.39E-02 a 1 b 1 c 1 d 1 100,000 1.39E-02 19/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  49. Well we could still write it as a product Assignment Unnormalized Normalized a 0 b 0 c 0 d 0 300,000 4.17E-02 of these factors and normalize it appro- a 0 b 0 c 0 d 1 300,000 4.17E-02 priately a 0 b 0 c 1 d 0 300,000 4.17E-02 a 0 b 0 c 1 d 1 30 4.17E-06 a 0 b 1 c 0 d 0 500 6.94E-05 P ( a, b, c, d ) = a 0 b 1 c 0 d 1 500 6.94E-05 1 a 0 b 1 c 1 d 0 5,000,000 6.94E-01 Z φ 1 ( a, b ) φ 2 ( b, c ) φ 3 ( c, d ) φ 4 ( d, a ) a 0 b 1 c 1 d 1 500 6.94E-05 a 1 b 0 c 0 d 0 100 1.39E-05 a 1 b 0 c 0 d 1 1,000,000 1.39E-01 where a 1 b 0 c 1 d 0 100 1.39E-05 a 1 b 0 c 1 d 1 100 1.39E-05 � a 1 b 1 c 0 d 0 Z = φ 1 ( a, b ) φ 2 ( b, c ) φ 3 ( c, d ) φ 4 ( d, a ) 10 1.39E-06 a 1 b 1 c 0 d 1 100,000 1.39E-02 a,b,c,d a 1 b 1 c 1 d 0 100,000 1.39E-02 a 1 b 1 c 1 d 1 100,000 1.39E-02 Based on the values that we had assigned to the factors we can now compute the full joint probability distribution 19/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  50. Well we could still write it as a product Assignment Unnormalized Normalized a 0 b 0 c 0 d 0 300,000 4.17E-02 of these factors and normalize it appro- a 0 b 0 c 0 d 1 300,000 4.17E-02 priately a 0 b 0 c 1 d 0 300,000 4.17E-02 a 0 b 0 c 1 d 1 30 4.17E-06 a 0 b 1 c 0 d 0 500 6.94E-05 P ( a, b, c, d ) = a 0 b 1 c 0 d 1 500 6.94E-05 1 a 0 b 1 c 1 d 0 5,000,000 6.94E-01 Z φ 1 ( a, b ) φ 2 ( b, c ) φ 3 ( c, d ) φ 4 ( d, a ) a 0 b 1 c 1 d 1 500 6.94E-05 a 1 b 0 c 0 d 0 100 1.39E-05 a 1 b 0 c 0 d 1 1,000,000 1.39E-01 where a 1 b 0 c 1 d 0 100 1.39E-05 a 1 b 0 c 1 d 1 100 1.39E-05 � a 1 b 1 c 0 d 0 Z = φ 1 ( a, b ) φ 2 ( b, c ) φ 3 ( c, d ) φ 4 ( d, a ) 10 1.39E-06 a 1 b 1 c 0 d 1 100,000 1.39E-02 a,b,c,d a 1 b 1 c 1 d 0 100,000 1.39E-02 a 1 b 1 c 1 d 1 100,000 1.39E-02 Based on the values that we had assigned to the factors we can now compute the full joint probability distribution Z is called the partition function. 19/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  51. Let us build on the original example by adding some more students 20/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  52. Let us build on the original example by adding some more students E A F D B C 20/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  53. Let us build on the original example by adding some more students Once again there is an edge between E A F two students if they study together D B C 20/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  54. Let us build on the original example by adding some more students Once again there is an edge between E A F two students if they study together One way of interpreting these new D B connections is that { A, D, E } from a study group or a clique C 20/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  55. Let us build on the original example by adding some more students Once again there is an edge between E A F two students if they study together One way of interpreting these new D B connections is that { A, D, E } from a study group or a clique C Similarly { A, F, B } form a study group and { C, D } form a study group and { B, C } form a study group 20/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  56. Now, what should the factors be? E A F D B C 21/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  57. Now, what should the factors be? We could still have factors which cap- ture pairwise interactions E A F D B C 21/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  58. Now, what should the factors be? We could still have factors which cap- ture pairwise interactions E A F D B C φ 1 ( A, E ) φ 2 ( A, F ) φ 3 ( B, F ) φ 4 ( A, B ) φ 5 ( A, D ) φ 6 ( D, E ) φ 7 ( B, C ) φ 8 ( C, D ) 21/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  59. Now, what should the factors be? We could still have factors which cap- ture pairwise interactions E A F But could we do something smarter (and more efficient) D B C φ 1 ( A, E ) φ 2 ( A, F ) φ 3 ( B, F ) φ 4 ( A, B ) φ 5 ( A, D ) φ 6 ( D, E ) φ 7 ( B, C ) φ 8 ( C, D ) 21/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  60. Now, what should the factors be? We could still have factors which cap- ture pairwise interactions E A F But could we do something smarter (and more efficient) D B Instead of having a factor for each pair of nodes why not have it for each C maximal clique? φ 1 ( A, E ) φ 2 ( A, F ) φ 3 ( B, F ) φ 4 ( A, B ) φ 5 ( A, D ) φ 6 ( D, E ) φ 7 ( B, C ) φ 8 ( C, D ) 21/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  61. Now, what should the factors be? We could still have factors which cap- ture pairwise interactions E A F But could we do something smarter (and more efficient) D B Instead of having a factor for each pair of nodes why not have it for each C maximal clique? φ 1 ( A, E ) φ 2 ( A, F ) φ 3 ( B, F ) φ 4 ( A, B ) φ 5 ( A, D ) φ 6 ( D, E ) φ 7 ( B, C ) φ 8 ( C, D ) φ 1 ( A, E, D ) φ 2 ( A, F, B ) φ 3 ( B, C ) φ 4 ( C, D ) 21/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  62. What if we add one more student? 22/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  63. What if we add one more student? E A F G D B C 22/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  64. What if we add one more student? What will be the factors in this case? E A F G D B C 22/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  65. What if we add one more student? What will be the factors in this case? Remember, we are interested in max- imal cliques E A F G D B C 22/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  66. What if we add one more student? What will be the factors in this case? Remember, we are interested in max- imal cliques E A F So instead of having factors φ ( EAG ) G φ ( GAD ) φ ( EGD ) we will have a single factor φ ( AEGD ) correspond- D B ing to the maximal clique C 22/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  67. Difficulty Intellligence Grade SAT Letter 23/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  68. Difficulty Intellligence Grade SAT Letter A distribution P factorizes over a Bayesian Network G if P can be expressed as n � P ( X 1 , . . . , X n ) = P ( X i | P a Xi ) i =1 23/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  69. E A F Difficulty Intellligence B C Grade SAT D Letter A distribution P factorizes over a Bayesian Network G if P can be expressed as n � P ( X 1 , . . . , X n ) = P ( X i | P a Xi ) i =1 23/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  70. E A F Difficulty Intellligence B C Grade SAT D Letter A distribution factorizes over a Markov A distribution P factorizes over a Bayesian Network H if P can be expressed as Network G if P can be expressed as m P ( X 1 , . . . , X n ) = 1 � n φ ( D i ) Z � P ( X 1 , . . . , X n ) = P ( X i | P a Xi ) i =1 i =1 where each D i is a complete sub-graph (maximal clique) in H 23/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  71. E A F Difficulty Intellligence B C Grade SAT D Letter A distribution factorizes over a Markov A distribution P factorizes over a Bayesian Network H if P can be expressed as Network G if P can be expressed as m P ( X 1 , . . . , X n ) = 1 � n φ ( D i ) Z � P ( X 1 , . . . , X n ) = P ( X i | P a Xi ) i =1 i =1 where each D i is a complete sub-graph (maximal clique) in H A distribution is a Gibbs distribution parametrized by a set of factors Φ = { φ 1 ( D 1 ) , . . . , φ m ( D m ) } if it is defined as m P ( X 1 , . . . , X n ) = 1 � φ i ( D i ) Z i =1 23/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

  72. Module 18.3: Local Independencies in a Markov Network 24/29 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

Recommend


More recommend