Inference and computing with decomposable graphs Peter Green 1 Alun - PowerPoint PPT Presentation

Inference and computing with decomposable graphs Peter Green 1 Alun Thomas 2 1 School of Mathematics University of Bristol 2 Genetic Epidemiology University of Utah 6 September 2011 / Bayes 250 Green/Thomas (Bristol/Utah) Decomposable graphs in statistics Edinburgh, September 2011 1 / 54

Outline Decomposable graphs 1 Bayesian model determination 2 Examples 3 Green/Thomas (Bristol/Utah) Decomposable graphs in statistics Edinburgh, September 2011 2 / 54

Decomposable graphs Graphical models The conditional independence graph of a multivariate distribution (for a random vector X , say) tells us much about the structure of the distribution. Recall that G = ( V , E ) where the vertex set V is the set of indices of the components of X , and there is an (undirected) edge between vertices i and j , written i ∼ j X i ⊥ ⊥ X j | X V \{ i , j } unless Under conditions (positivity is sufficient), global and local Markov properties also hold. Given i.i.d. observations on X , we are often interested in inferring G , sometimes known as structural learning. Green/Thomas (Bristol/Utah) Decomposable graphs in statistics Edinburgh, September 2011 3 / 54

Decomposable graphs Decomposable graphical models The case where G is decomposable has been much studied. Decomposability is a graph theory concept with statistical and computational implications. A graph is complete if every pair of vertices is joined by an edge. A maximal complete subgraph is called a clique. An ordering of the cliques of an undirected graph, ( C 1 , C 2 , . . . , C c ) is said to be perfect if for each i = 2 , 3 , . . . , c , there exists h = h ( i ) such that i − 1 � S i = C i ∩ C j ⊆ C h j = 1 The sets S i are called separators. If an undirected graph admits a perfect ordering, it is said to be decomposable. Green/Thomas (Bristol/Utah) Decomposable graphs in statistics Edinburgh, September 2011 4 / 54

Decomposable graphs Decomposability: junction trees Decomposable graphs are also known as triangulated: a graph is decomposable if and only if it has no chordless k -cycles for k ≥ 4. A perfect ordering guides the construction of a junction tree: a graph whose vertices are cliques, and with edges between C i and C h ( i ) , often labelled with S i , for i = 2 , 3 , . . . , c . There may be many perfect orderings, and many junction trees, for a given decomposable graph. Green/Thomas (Bristol/Utah) Decomposable graphs in statistics Edinburgh, September 2011 5 / 54

Decomposable graphs A small decomposable graph Non-uniqueness 7 6 5 of junction tree 4 1 2 3 267 267 236 236 3456 3456 26 26 36 36 2 12 12 Green/Thomas (Bristol/Utah) Decomposable graphs in statistics Edinburgh, September 2011 6 / 54

Decomposable graphs A small decomposable graph Non-uniqueness 7 6 5 of junction tree 4 1 2 3 267 267 236 236 3456 3456 26 26 36 36 2 2 12 12 13 Green/Thomas (Bristol/Utah) Decomposable graphs in statistics Edinburgh, September 2011 7 / 54

Decomposable graphs Probabilistic significance of decomposability If the distribution of a random vector X has a decomposable conditional independence graph, then it has a remarkable representation in terms of (often low-dimensional) marginals: � c i = 1 p ( X C i ) p ( X ) = � c i = 2 p ( X S i ) This is the ultimate generalisation of the fact that for an ordinary Markov chain � N N � i = 1 p ( X { i − 1 , i } ) p ( X ) = p ( X 0 ) p ( X i | X i − 1 ) = � N − 1 i = 2 p ( X i − 1 ) i = 1 For a general decomposable graph, the same kind of factorisation follows the edges of the junction tree. Green/Thomas (Bristol/Utah) Decomposable graphs in statistics Edinburgh, September 2011 8 / 54

Decomposable graphs Computational significance of decomposability There are many consequences for computing with distributions on decomposable graphs, including junction tree algorithms (message passing/probability propagation) for Bayes nets (discrete graphical models). Green/Thomas (Bristol/Utah) Decomposable graphs in statistics Edinburgh, September 2011 9 / 54

Decomposable graphs Message passing A A B B C C AB B BC A=0 A=1 C=0 C=1 B=0 1 B=0 3/4 1/4 B=0 .3 .1 B=1 1 B=1 2/3 1/3 B=1 .4 .2 A 0 A=0 A 1 A=1 B=0 .4 3/4  .4/1 1/4  .4/1 B=0 B=1 .6 2/3  6/1 2/3  .6/1 1/3  6/1 1/3  .6/1 B=1 B=1 Green/Thomas (Bristol/Utah) Decomposable graphs in statistics Edinburgh, September 2011 10 / 54

Decomposable graphs Message passing A A B B C C AB B BC A=0 A=1 C=0 C=1 B=0 .4 B=0 .3 .1 B=0 .3 .1 B=1 .6 B=1 .4 .2 B=1 .4 .2 A 0 A=0 A 1 A=1 B=0 .4 3/4  .4/1 1/4  .4/1 B=0 B=1 .6 2/3  6/1 2/3  .6/1 1/3  6/1 1/3  .6/1 B=1 B=1 Green/Thomas (Bristol/Utah) Decomposable graphs in statistics Edinburgh, September 2011 11 / 54

Decomposable graphs Scheduling the messages root root Green/Thomas (Bristol/Utah) Decomposable graphs in statistics Edinburgh, September 2011 12 / 54

Decomposable graphs Scheduling the messages root root Green/Thomas (Bristol/Utah) Decomposable graphs in statistics Edinburgh, September 2011 13 / 54

Decomposable graphs Statistical significance of decomposability Maximum likelihood estimates can be computed exactly for contingency tables and multivariate Gaussian distributions on decomposable graphs, and there are exact tests for conditional independence. Some of this theory extends to mixed data models based on CG distributions. In Bayesian modelling, the ideas of hyper Markov modelling allow the construction of prior distributions respecting the graphical structure, which in turn supports the adoption of priors that are guaranteed to be consistent across models. The clique–separator factorisation yields dramatic speed-ups in computing MCMC updates in structural learning, and in simulation and posterior analysis of fitted models. Green/Thomas (Bristol/Utah) Decomposable graphs in statistics Edinburgh, September 2011 14 / 54

Decomposable graphs How restrictive is decomposability?       How many graphs are decomposable?     There are 2 ( v 2 ) graphs altogether on v vertices. For v ≤ 3 vertices, all are decomposable for 4 vertices, 61 / 64 for 6, ≈ 80 % for 16, ≈ 45 % .  61/64 – all but: The 3 non-decomposable 4-vertex graphs: 16 16 45% 45% Green/Thomas (Bristol/Utah) Decomposable graphs in statistics Edinburgh, September 2011 15 / 54

Bayesian model determination Bayesian graphical model determination Given n i.i.d. samples X = ( X 1 , X 2 , . . . , X n ) from a multivariate distribution on R v parameterised by the graph G and parameters ψ , a typical formulation takes the form p ( G , ψ, X ) = p ( G ) p ( ψ | G ) p ( X | G , ψ ) and we perform joint structural/quantitative learning by computing the posterior p ( G , ψ | X ) ∝ p ( G , ψ, X ) . Decomposable G : see Giudici & G (1999) (Gaussian case) and by Giudici, G & Tarantola (2000) (contingency table case). These follow the important work of Dawid & Lauritzen (1993) on hyper-Markov laws that encode parameter priors p ( ψ | G ) that are consistent across G . Green/Thomas (Bristol/Utah) Decomposable graphs in statistics Edinburgh, September 2011 16 / 54

Bayesian model determination Bayesian graphical model determination Given n i.i.d. samples X = ( X 1 , X 2 , . . . , X n ) from a multivariate distribution on R v parameterised by the graph G and parameters ψ , a typical formulation takes the form p ( G , ψ, X ) = p ( G ) p ( ψ | G ) p ( X | G , ψ ) and we perform joint structural/quantitative learning by computing the posterior p ( G , ψ | X ) ∝ p ( G , ψ, X ) . General G : Earlier and later work, by Dellaportas & Forster and others – but use non-hierarchical non-necessarily-consistent formulations. See also Jones et al , Stat. Sci. , 2005. Green/Thomas (Bristol/Utah) Decomposable graphs in statistics Edinburgh, September 2011 17 / 54

Bayesian model determination Bayesian graphical model determination The Giudici & G work on decomposable graphical gaussian model determination considers the joint posterior p ( G , ψ | X ) . In the gaussian case X ∼ N v ( µ, Σ) , the graph G is encoded in the pattern of zeroes in the concentration (inverse variance) matrix: (Σ − 1 ) ij = 0 ⇔ X i ⊥ ⊥ X j | X V \{ i , j } The model places a hyper inverse Wishart prior on Σ − 1 , in various versions, and exploits ideas of covariance selection and positive definite matrix completion. Green/Thomas (Bristol/Utah) Decomposable graphs in statistics Edinburgh, September 2011 18 / 54

Inference and computing with decomposable graphs Peter Green 1 Alun - PowerPoint PPT Presentation

Inference and computing with decomposable graphs Peter Green 1 Alun Thomas 2 1 School of Mathematics University of Bristol 2 Genetic Epidemiology University of Utah 6 September 2011 / Bayes 250 Green/Thomas (Bristol/Utah) Decomposable graphs in

A Decomposable Attention Model for Natural Language Inference Ankur Parikh, Oscar Tackstrom,

Graphs () Graphs () Graphs Graphs Graphs are collections of nodes

Weighted graphs Weighted graphs Weighted graphs Weighted graphs Graphs with numbers, called

Vertex decomposable graphs and obstructions to shellability Russ Woodroofe Washington U in St

Week 4 Kullmann Graphs and directed graphs Elementary Graph Algorithms Representing graphs

Graphs Graphs Examples Definitions Implementation/Representation of graphs Graphs

On some classes of Deza graphs Deza graphs without 3-cocliques Line graphs V.V. Kabanov 1 Deza

Searching on Graphs November 16, 2016 CMPE 250 Graphs- Searching on Graphs November 16, 2016 1

CS200: Graphs Prichard Ch. 14 Rosen Ch. 10 CS200 - Graphs 1 Graphs A collection of What can

Graphs Graphs Simple graphs Algorithms Depth-first search Breadth-first search

Today. Types of graphs. Today. Types of graphs. Complete Graphs. Trees. Hypercubes. Today.

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

NCDawareRank A Novel Ranking Method that Exploits the Decomposable Structure of the Web

Decomposable Schur multipliers and non-commutative Fourier multipliers Christoph Kriegler

Smoothing Structured Decomposable Circuits Andy Shih 1 Guy Van den Broeck 2 Paul Beame 3 Antoine

Clones of pivotally decomposable operations Bruno Teheux joint work with Miguel Couceiro

Linear Classifiers CS 4100: Artificial Intelligence Perceptrons and Logistic Regression

Document-Centered Discussion and Decision Making in the Deme Platform Todd Davies, Mike D. Mintz,

Robust optimization of uncertain multistage inventory systems with inexact data in decision rules

Welfarism and the assessment of social decision rules Claus Beisbart and Stephan Hartmann

Inference and Representation David Sontag New York University Lecture 3, Sept. 15, 2014 David

A Lagrangean Based Branch-and-Cut Algorithm for Global Optimization of Nonconvex Mixed-Integer

Updating the Knowledge Compilation Map Simone Bova (TU Wien) Dagstuhl Seminar on Recent Trends in

Capturing Independence Graphically; Directed Graphs COMPSCI 276, Spring 2011 Set 3: Rina Dechter