probabilistic graphical models
play

Probabilistic Graphical Models Lecture 10 Undirected Models - PowerPoint PPT Presentation

Probabilistic Graphical Models Lecture 10 Undirected Models CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due this Wednesday (Nov 4) in class Project milestones due next Monday (Nov 9) About half the work should be done 4 pages


  1. Probabilistic Graphical Models Lecture 10 – Undirected Models CS/CNS/EE 155 Andreas Krause

  2. Announcements Homework 2 due this Wednesday (Nov 4) in class Project milestones due next Monday (Nov 9) About half the work should be done 4 pages of writeup, NIPS format http://nips.cc/PaperInformation/StyleFiles 2

  3. Markov Networks (a.k.a., Markov Random Field, Gibbs Distribution, …) A Markov Network consists of An undirected graph, where each node represents a RV A collection of factors defined over cliques in the graph X 1 Joint probability X 2 X 4 X 3 X 6 X 5 X 7 X 8 X 9 A distribution factorizes over undirected graph G if 3

  4. Computing Joint Probabilities Computing joint probabilities in BNs Computing joint probabilities in Markov Nets 4

  5. Local Markov Assumption for MN The Markov Blanket MB(X) of a node X is the set of neighbors of X X 1 Local Markov Assumption: X � EverythingElse | MB(X) X 2 X 4 I loc (G) = set of all local independences X 3 X 6 X 5 G is called an I-map of X 7 X 8 distribution P if I loc (G) � I(P) X 9 5

  6. Factorization Theorem for Markov Nets “ � ” � � � � � � � � � � � � � � � � � � � � � � � �� � � � �� � �� � �� True distribution P can be represented exactly as I loc (G) � I(P) a Markov net (G,P) G is an I-map of P (independence map) 6

  7. Factorization Theorem for Markov Nets “  ” Hammersley-Clifford Theorem � � � � � � � � � � � � � � � � � � � � � � � � � �� � �� � �� � �� True distribution P can be represented exactly as I loc (G) � I(P) i.e., P can be represented as G is an I-map of P a Markov net (G,P) (independence map) and P>0 7

  8. Global independencies A trail X—X 1 —…—X m —Y is X 1 called active for evidence E, if none of X 1 ,…,X m � E Variables X and Y are called X 2 X 4 separated by E if there is no active trail for E connecting X, Y X 3 X 6 X 5 Write sep(X,Y | E ) X 7 X 8 I(G) = {X � Y | E : sep(X,Y| E )} X 9 8

  9. Soundness of separation Know: For positive distributions P>0 I loc (G) � I(P) � P factorizes over G Theorem : Soundness of separation For positive distributions P>0 I loc (G) � I(P) � I(G) � I(P) Hence, separation captures only true independences How about I(G) = I(P)? 9

  10. Completeness of separation Theorem : Completeness of separation I(G) = I(P) for “almost all” distributions P that factorize over G “almost all”: Except for of potential parameterizations of measure 0 (assuming no finite set have positive measure) 10

  11. Minimal I-maps For BNs: Minimal I-map not unique E B E B J M A A E B J M J M A For MNs: For positive P, minimal I-map is unique!! 11

  12. P-maps Do P-maps always exist? For BNs: no How about Markov Nets? 12

  13. Exact inference in MNs Variable elimination and junction tree inference work exactly the same way! Need to construct junction trees by obtaining chordal graph through triangulation 13

  14. Pairwise MNs A pairwise MN is a MN where all factors are defined over single variables or pairs of variables Can reduce any MN to pairwise MN! X 1 X 2 X 4 X 3 X 5 14

  15. Logarithmic representation Can represent any positive distribution in log domain 15

  16. Log-linear models Feature functions φ i (D) defined over cliques Log linear model over undirected graph G Feature functions φ 1 (D 1 ),…, φ k (D k ) Domains D i can overlap Set of weights w i learnt from data 16

  17. Converting BNs to MNs C D I G S L J H Theorem : Moralized Bayes net is minimal Markov I-map 17

  18. Converting MNs to BNs X 1 X 2 X 3 X 6 X 7 X 8 X 9 Theorem : Minimal Bayes I-map for MN must be chordal 18

  19. So far Markov Network Representation Local/Global Markov assumptions; Separation Soundness and completeness of separation Markov Network Inference Variable elimination and Junction Tree inference work exactly as in Bayes Nets How about Learning Markov Nets? 19

  20. Parameter Learning for Bayes nets 20

  21. Algorithm for BN MLE 21

  22. MLE for Markov Nets Log likelihood of the data 22

  23. Log-likelihood doesn’t decompose Log likelihood l(D | θ ) is concave function! Log Partition function log Z( θ ) doesn’t decompose 23

  24. Derivative of log-likelihood 24

  25. Derivative of log-likelihood 25

  26. Computing the derivative Derivative C D I G S L J H Computing P(c i | � ) requires inference! Can optimize using conjugate gradient etc. 26

  27. Alternative approach: Iterative Proportional Fitting (IPF) At optimum, it must hold that � Solve fixed point equation Must recompute parameters every iteration 27

  28. Parameter learning for log-linear models Feature functions � � (C i ) defined over cliques Log linear model over undirected graph G Feature functions � 1 (C 1 ),…, � k (C k ) Domains C i can overlap Joint distribution How do we get weights w i ? 28

  29. Derivative of Log-likelihood 1 29

  30. Derivative of Log-likelihood 2 30

  31. Optimizing parameters Gradient of log-likelihood Thus, w is MLE � 31

  32. Regularization of parameters Put prior on parameters w 32

  33. Summary: Parameter learning in MN MLE in BN is easy (score decomposes) MLE in MN requires inference (score doesn’t decompose) Can optimize using gradient ascent or IPF 33

  34. Tasks Read Koller & Friedman Chapters 20.1-20.3, 4.6.1 34

Recommend


More recommend