probabilistic graphical models
play

Probabilistic graphical models Yifeng Tao School of Computer - PowerPoint PPT Presentation

Introduction to Machine Learning Probabilistic graphical models Yifeng Tao School of Computer Science Carnegie Mellon University Slides adapted from Eric Xing, Matt Gormley Yifeng Tao Carnegie Mellon University 1 Recap of Basic Probability


  1. Introduction to Machine Learning Probabilistic graphical models Yifeng Tao School of Computer Science Carnegie Mellon University Slides adapted from Eric Xing, Matt Gormley Yifeng Tao Carnegie Mellon University 1

  2. Recap of Basic Probability Concepts o Representation: the joint probability distribution on multiple binary variables? o State configurations in total: 2 8 o Are they all needed to be represented? o Do we get any scientific/medical insight? o Learning: where do we get all this probabilities? o Maximal-likelihood estimation? o Inference: If not all variables are observable, how to compute the conditional distribution of latent variables given evidence? o Computing p ( H | A ) would require summing over all 2 6 configurations of the unobserved variables [Slide from Eric Xing.] Yifeng Tao Carnegie Mellon University 2

  3. Graphical Model: Structure Simplifies Representation o Dependencies among variables [Slide from Eric Xing.] Yifeng Tao Carnegie Mellon University 3

  4. Probabilistic Graphical Models o If X i ’s are conditionally independent (as described by a PGM ), the joint can be factored to a product of simpler terms, e.g., o Why we may favor a PGM? o Incorporation of domain knowledge and causal (logical) structures o 2+2+4+4+4+8+4+8=36, an 8-fold reduction from 2 8 in representation cost! [Slide from Eric Xing.] Yifeng Tao Carnegie Mellon University 4

  5. Two types of GMs o Directed edges give causality relationships ( Bayesian Network or Directed Graphical Model ): o Undirected edges simply give correlations between variables ( Markov Random Field or Undirected Graphical model ): [Slide from Eric Xing.] Yifeng Tao Carnegie Mellon University 5

  6. Bayesian Network o Definition: o It consists of a graph G and the conditional probabilities P o These two parts full specify the distribution: o Qualitative Specification: G o Quantitative Specification: P [Slide from Eric Xing.] Yifeng Tao Carnegie Mellon University 6

  7. Where does the qualitative specification come from? o Prior knowledge of causal relationships o Learning from data (i.e. structure learning) o We simply prefer a certain architecture (e.g. a layered graph) o … [Slide from Matt Gormley.] Yifeng Tao Carnegie Mellon University 7

  8. Quantitative Specification o Example: Conditional probability tables (CPTs) for discrete random variables [Slide from Eric Xing.] Yifeng Tao Carnegie Mellon University 8

  9. Quantitative Specification o Example: Conditional probability density functions (CPDs) for continuous random variables [Slide from Eric Xing.] Yifeng Tao Carnegie Mellon University 9

  10. Observed Variables o In a graphical model, shaded nodes are “ observed ”, i.e. their values are given [Slide from Matt Gormley.] Yifeng Tao Carnegie Mellon University 10

  11. GMs are your old friends o Density estimation o Parametric and nonparametric methods o Regression o Linear, conditional mixture, nonparametric o Classification o Generative and discriminative approach o Clustering [Slide from Eric Xing.] Yifeng Tao Carnegie Mellon University 11

  12. What Independencies does a Bayes Net Model? o Independency of X and Z given Y ? P(X|Y)P(Z|Y) = P(X,Z|Y) o Three cases of interest... o Proof? [Slide from Matt Gormley.] Yifeng Tao Carnegie Mellon University 12

  13. The “Burglar Alarm” example o Your house has a twitchy burglar alarm that is also sometimes triggered by earthquakes. o Earth arguably doesn’t care whether your house is currently being burgled. o While you are on vacation, one of your neighbors calls and tells you your home’s burglar alarm is ringing. [Slide from Matt Gormley.] Yifeng Tao Carnegie Mellon University 13

  14. Markov Blanket o Def: the co-parents of a node are the parents of its children o Def: the Markov Blanket of a node is the set containing the node’s parents, children, and co-parents. o Thm: a node is conditionally independent of every other node in the graph given its Markov blanket o Example: The Markov Blanket of X 6 is { X 3 , X 4 , X 5 , X 8 , X 9 , X 10 } [Slide from Matt Gormley.] Yifeng Tao Carnegie Mellon University 14

  15. Markov Blanket o Example: The Markov Blanket of X 6 is { X 3 , X 4 , X 5 , X 8 , X 9 , X 10 } [Slide from Matt Gormley.] Yifeng Tao Carnegie Mellon University 15

  16. D-Separation o Thm: If variables X and Z are d-separated given a set of variables E Then X and Z are conditionally independent given the set E o Definition: o Variables X and Z are d-separated given a set of evidence variables E iff every path from X to Z is “blocked”. [Slide from Matt Gormley.] Yifeng Tao Carnegie Mellon University 16

  17. D-Separation o Variables X and Z are d-separated given a set of evidence variables E iff every path from X to Z is “blocked”. [Slide from Eric Xing.] Yifeng Tao Carnegie Mellon University 17

  18. Machine Learning [Slide from Matt Gormley.] Yifeng Tao Carnegie Mellon University 18

  19. Recipe for Closed-form MLE [Slide from Matt Gormley.] Yifeng Tao Carnegie Mellon University 19

  20. Learning Fully Observed BNs o How do we learn these conditional and marginal distributions for a Bayes Net? [Slide from Matt Gormley.] Yifeng Tao Carnegie Mellon University 20

  21. Learning Fully Observed BNs o Learning this fully observed Bayesian Network is equivalent to learning five (small / simple) independent networks from the same data [Slide from Matt Gormley.] Yifeng Tao Carnegie Mellon University 21

  22. Learning Fully Observed BNs [Slide from Matt Gormley.] Yifeng Tao Carnegie Mellon University 22

  23. Learning Partially Observed BNs o Partially Observed Bayesian Network: o Maximal likelihood estimation à Incomplete log-likelihood o The log-likelihood contains unobserved latent variables o Solve with EM algorithm o Example: Gaussian Mixture Models (GMMs) [Slide from Eric Xing.] Yifeng Tao Carnegie Mellon University 23

  24. Inference of BNs o Suppose we already have the parameters of a Bayesian Network... [Slide from Matt Gormley.] Yifeng Tao Carnegie Mellon University 24

  25. Approaches to inference o Exact inference algorithms o The elimination algorithm à Message Passing o Belief propagation o The junction tree algorithms o Approximate inference techniques o Variational algorithms o Stochastic simulation / sampling methods o Markov chain Monte Carlo methods [Slide from Eric Xing.] Yifeng Tao Carnegie Mellon University 25

  26. Marginalization and Elimination [Slide from Eric Xing.] Yifeng Tao Carnegie Mellon University 26

  27. Marginalization and Elimination [Slide from Eric Xing.] Yifeng Tao Carnegie Mellon University 27

  28. [Slide from Eric Xing.] Yifeng Tao Carnegie Mellon University 28

  29. o Step 8: Wrap-up [Slide from Eric Xing.] Yifeng Tao Carnegie Mellon University 29

  30. Elimination algorithm o Elimination on trees is equivalent to message passing on branches o Message-passing is consistent in trees o Application: HMM [Slide from Eric Xing.] Yifeng Tao Carnegie Mellon University 30

  31. Gibbs Sampling [Slide from Matt Gormley.] Yifeng Tao Carnegie Mellon University 31

  32. Gibbs Sampling [Slide from Matt Gormley.] Yifeng Tao Carnegie Mellon University 32

  33. Gibbs Sampling [Slide from Matt Gormley.] Yifeng Tao Carnegie Mellon University 33

  34. Gibbs Sampling o Full conditionals only need to condition on the Markov Blanket o Must be “easy” to sample from conditionals o Many conditionals are log-concave and are amenable to adaptive rejection sampling [Slide from Matt Gormley.] Yifeng Tao Carnegie Mellon University 34

  35. Take home message o Graphical models portrays the sparse dependencies of variables o Two types of graphical models: Bayesian network and Markov random field o Conditional independence, Markov blanket, and d-separation o Learning fully observed and partially observed Bayesian networks o Exact inference and approximate inference of Bayesian networks Yifeng Tao Carnegie Mellon University 35

  36. References o Eric Xing, Ziv Bar-Joseph. 10701 Introduction to Machine Learning: http://www.cs.cmu.edu/~epxing/Class/10701/ o Matt Gormley. 10601 Introduction to Machine Learning: http://www.cs.cmu.edu/~mgormley/courses/10601/index.html Yifeng Tao Carnegie Mellon University 36

Recommend


More recommend