undirected graphical models
play

Undirected Graphical Models Michael Gutmann Probabilistic Modelling - PowerPoint PPT Presentation

Undirected Graphical Models Michael Gutmann Probabilistic Modelling and Reasoning (INFR11134) School of Informatics, University of Edinburgh Spring semester 2018 Recap The number of free parameters in probabilistic models increases with


  1. Undirected Graphical Models Michael Gutmann Probabilistic Modelling and Reasoning (INFR11134) School of Informatics, University of Edinburgh Spring semester 2018

  2. Recap ◮ The number of free parameters in probabilistic models increases with the number of random variables involved. ◮ Making statistical independence assumptions reduces the number of free parameters that need to be specified. ◮ Starting with the chain rule and an ordering of the random variables, we used statistical independencies to simplify the representation. ◮ We thus obtained a factorisation in terms of a product of conditional pdfs that we visualised as a DAG. ◮ In turn, we used DAGs to define sets of distributions (“directed graphical models”). ◮ We discussed independence properties satisfied by the distributions, d-separation, and the equivalence to the factorisation. Michael Gutmann Undirected Graphical Models 2 / 51

  3. The directionality in directed graphical models ◮ So far we mainly exploited the property x ⊥ ⊥ y | z ⇐ ⇒ p ( y | x , z ) = p ( y | z ) ◮ But when working with p ( y | x , z ) we impose an ordering or directionality from x and z to y . ◮ Directionality matters in directed graphical models y y versus x z x z ◮ In some cases, directionality is natural but in others we do not want to choose one direction over another. ◮ We now discuss how to represent independencies in a symmetric manner without assuming a directionality or ordering of the variables. Michael Gutmann Undirected Graphical Models 3 / 51

  4. Program 1. Representing probability distributions without imposing a directionality between the random variables 2. Undirected graphs, separation, and statistical independencies 3. Definition of undirected graphical models 4. Further independencies in undirected graphical models Michael Gutmann Undirected Graphical Models 4 / 51

  5. Program 1. Representing probability distributions without imposing a directionality between the random variables Factorisation and statistical independence Gibbs distributions Visualising Gibbs distributions with undirected graphs Conditioning corresponds to removing nodes and edges from the graph 2. Undirected graphs, separation, and statistical independencies 3. Definition of undirected graphical models 4. Further independencies in undirected graphical models Michael Gutmann Undirected Graphical Models 5 / 51

  6. Further characterisation of statistical independence ◮ From tutorials: For non-negative functions a ( x , z ) , b ( y , z ): x ⊥ ⊥ y | z ⇐ ⇒ p ( x , y , z ) = a ( x , z ) b ( y , z ) ◮ More general version of p ( x , y , z ) = p ( x | z ) p ( y | z ) p ( z ) ◮ No directionality or ordering of the variables is imposed. ◮ Unconditional version: For non-negative functions a ( x ) , b ( y ): x ⊥ ⊥ y ⇐ ⇒ p ( x , y ) = a ( x ) b ( y ) ◮ The important point is the factorisation of p ( x , y , z ) into two factors: ◮ if the factors share a variable z , then we have conditional independence, ◮ if not, we have unconditional independence. Michael Gutmann Undirected Graphical Models 6 / 51

  7. Further characterisation of statistical independence ◮ Since p ( x , y , z ) must sum (integrate) to one, we must have � a ( x , z ) b ( y , z ) = 1 x , y , z ◮ Normalisation condition often ensured by re-defining a ( x , z ) b ( y , z ): p ( x , y , z ) = 1 � Z φ A ( x , z ) φ B ( y , z ) Z = φ A ( x , z ) φ B ( y , z ) x , y , z ◮ Z: normalisation constant (related to partition function, see later) ◮ φ i : factors (also called potential functions). Do generally not correspond to (conditional) probabilities. They measure “compatibility”, “agreement”, or “affinity” Michael Gutmann Undirected Graphical Models 7 / 51

  8. What does it mean? ⇒ p ( x , y , z ) = 1 x ⊥ ⊥ y | z ⇐ Z φ A ( x , z ) φ B ( y , z ) “ ⇒ ” If we want our model to satisfy x ⊥ ⊥ y | z we should write the pdf (pmf) as p ( x , y , z ) ∝ φ A ( x , z ) φ B ( y , z ) “ ⇐ ” If the pdf (pmf) can be written as p ( x , y , z ) ∝ φ A ( x , z ) φ B ( y , z ) then we have x ⊥ ⊥ y | z equivalent for unconditional version Michael Gutmann Undirected Graphical Models 8 / 51

  9. Example Consider p ( x 1 , x 2 , x 3 , x 4 ) ∝ φ 1 ( x 1 , x 2 ) φ 2 ( x 2 , x 3 ) φ 3 ( x 4 ) What independencies does p satisfy? ◮ We can write p ( x 1 , x 2 , x 3 , x 4 ) ∝ [ φ 1 ( x 1 , x 2 ) φ 2 ( x 2 , x 3 )] [ φ 3 ( x 4 )] � �� � ˜ φ 1 ( x 1 , x 2 , x 3 ) ∝ ˜ φ 1 ( x 1 , x 2 , x 3 ) φ 3 ( x 4 ) so that x 4 ⊥ ⊥ x 1 , x 2 , x 3 . ◮ Integrating out x 4 gives � p ( x 1 , x 2 , x 3 ) = p ( x 1 , x 2 , x 3 , x 4 ) d x 4 ∝ φ 1 ( x 1 , x 2 ) φ 2 ( x 2 , x 3 ) so that x 1 ⊥ ⊥ x 3 | x 2 Michael Gutmann Undirected Graphical Models 9 / 51

  10. Gibbs distributions ◮ Example is a special case of a class of pdfs/pmfs that factorise as p ( x 1 , . . . , x d ) = 1 � φ c ( X c ) Z c ◮ X c ⊆ { x 1 , . . . , x d } ◮ φ c are non-negative factors (potential functions) Do generally not correspond to (conditional) probabilities. They measure “compatibility”, “agreement”, or “affinity” ◮ Z is a normalising constant so that p ( x 1 , . . . , x d ) integrates (sums) to one. ◮ Known as Gibbs (or Boltzmann) distributions p ( x 1 , . . . , x d ) = � ◮ ˜ c φ c ( X c ) is an example of an unnormalised model: ˜ p ≥ 0 but does not necessarily integrate (sum) to one. Michael Gutmann Undirected Graphical Models 10 / 51

  11. Energy-based model ◮ With φ c ( X c ) = exp ( − E c ( X c )), we have equivalently � � p ( x 1 , . . . , x d ) = 1 � Z exp − E c ( X c ) c ◮ � c E c ( X c ) is the energy of the configuration ( x 1 , . . . , x d ). low energy ⇐ ⇒ high probability Michael Gutmann Undirected Graphical Models 11 / 51

  12. Example Other examples of Gibbs distributions: p ( x 1 , . . . , x 6 ) ∝ φ 1 ( x 1 , x 2 , x 4 ) φ 2 ( x 2 , x 3 , x 4 ) φ 3 ( x 3 , x 5 ) φ 4 ( x 3 , x 6 ) p ( x 1 , . . . , x 6 ) ∝ φ 1 ( x 1 , x 2 ) φ 2 ( x 2 , x 3 ) φ 3 ( x 2 , x 5 ) φ 4 ( x 1 , x 4 ) φ 5 ( x 4 , x 5 ) φ 6 ( x 5 , x 6 ) φ 7 ( x 3 , x 6 )? Independencies? ◮ In principle, the independencies follow from x ⊥ ⊥ y | z ⇐ ⇒ p ( x , y , z ) ∝ φ A ( x , z ) φ B ( y , z ) with appropriately defined factors φ A and φ B . ◮ But the mathematical manipulations of grouping together factors and integrating variables out become unwieldy. Let us use graphs to better see what’s going on. Michael Gutmann Undirected Graphical Models 12 / 51

  13. Visualising Gibbs distributions with undirected graphs p ( x 1 , . . . , x d ) ∝ � c φ c ( X c ) ◮ Node for each x i ◮ For all factors φ c : draw an undirected edge between all x i and x j that belong to X c ◮ Results in a fully-connected subgraph for all x i that are part of the same factor (this subgraph is called a clique). Example: Graph for p ( x 1 , . . . , x 6 ) ∝ φ 1 ( x 1 , x 2 , x 4 ) φ 2 ( x 2 , x 3 , x 4 ) φ 3 ( x 3 , x 5 ) φ 4 ( x 3 , x 6 ) x 2 x 5 x 1 x 3 x 4 x 6 Michael Gutmann Undirected Graphical Models 13 / 51

  14. Effect of conditioning Let p ( x 1 , . . . , x 6 ) ∝ φ 1 ( x 1 , x 2 , x 4 ) φ 2 ( x 2 , x 3 , x 4 ) φ 3 ( x 3 , x 5 ) φ 4 ( x 3 , x 6 ). ◮ What is p ( x 1 , x 2 , x 4 , x 5 , x 6 | x 3 = α )? ◮ By definition p ( x 1 , x 2 , x 4 , x 5 , x 6 | x 3 = α ) p ( x 1 , x 2 , x 3 = α, x 4 , x 5 , x 6 ) � p ( x 1 , x 2 , x 3 = α, x 4 , x 5 , x 6 ) d x 1 d x 2 d x 4 d x 5 d x 6 = φ 1 ( x 1 , x 2 , x 4 ) φ 2 ( x 2 , α, x 4 ) φ 3 ( α, x 5 ) φ 4 ( α, x 6 ) = � φ 1 ( x 1 , x 2 , x 4 ) φ 2 ( x 2 , α, x 4 ) φ 3 ( α, x 5 ) φ 4 ( α, x 6 ) d x 1 d x 2 d x 4 d x 5 d x 6 1 Z ( α ) φ 1 ( x 1 , x 2 , x 4 ) φ α 2 ( x 2 , x 4 ) φ α 3 ( x 5 ) φ α = 4 ( x 6 ) ◮ Gibbs distribution with derived factors φ α i of reduced domain and new normalisation “constant” Z ( α ) ◮ Note that Z ( α ) depends on the conditioning value α . Michael Gutmann Undirected Graphical Models 14 / 51

  15. Effect of conditioning Let p ( x 1 , . . . , x 6 ) ∝ φ 1 ( x 1 , x 2 , x 4 ) φ 2 ( x 2 , x 3 , x 4 ) φ 3 ( x 3 , x 5 ) φ 4 ( x 3 , x 6 ). ◮ Conditional p ( x 1 , x 2 , x 4 , x 5 , x 6 | x 3 = α ) is 1 Z ( α ) φ 1 ( x 1 , x 2 , x 4 ) φ α 2 ( x 2 , x 4 ) φ α 3 ( x 5 ) φ α 4 ( x 6 ) ◮ Conditioning on variables removes the corresponding nodes and connecting edges from the undirected graph x 2 x 5 x 1 x 4 x 6 Michael Gutmann Undirected Graphical Models 15 / 51

  16. Program 1. Representing probability distributions without imposing a directionality between the random variables Factorisation and statistical independence Gibbs distributions Visualising Gibbs distributions with undirected graphs Conditioning corresponds to removing nodes and edges from the graph 2. Undirected graphs, separation, and statistical independencies 3. Definition of undirected graphical models 4. Further independencies in undirected graphical models Michael Gutmann Undirected Graphical Models 16 / 51

  17. Program 1. Representing probability distributions without imposing a directionality between the random variables 2. Undirected graphs, separation, and statistical independencies Separation in undirected graphs Statistical independencies from graph separation Global Markov property I-map 3. Definition of undirected graphical models 4. Further independencies in undirected graphical models Michael Gutmann Undirected Graphical Models 17 / 51

Recommend


More recommend