an empirical bayes procedure for the selection of
play

An empirical Bayes procedure for the selection of Gaussian graphical - PowerPoint PPT Presentation

An empirical Bayes procedure for the selection of Gaussian graphical models Estimation bay esienne pour les mod` eles graphiques gaussiens d ecomposables Jean-Michel Marin I3M, Universit e Montpellier 2 joint with Sophie Donnet ,


  1. An empirical Bayes procedure for the selection of Gaussian graphical models Estimation bay´ esienne pour les mod` eles graphiques gaussiens d´ ecomposables Jean-Michel Marin I3M, Universit´ e Montpellier 2 joint with Sophie Donnet , Universit´ e Paris Dauphine Journ´ ee MSTGA, CIRAD Montpellier, 01/06/2011 Page 1

  2. Introduction The last decade has witnessed the apparition of applied problems typ- ified by very high-dimensional variables, in marketing database or gene expression studies for instance. Graphical modelling is a form of multivariate analysis that uses graphs to represent models. They enable concise representations of associational and causal relations between variables under study. Journ´ ee MSTGA, CIRAD Montpellier, 01/06/2011 Page 2

  3. There is two main types of graphical models: • undirected graphical models; • directed acyclic graphical models. Lauritzen (1996) We shall concentrate on undirected graphs. Journ´ ee MSTGA, CIRAD Montpellier, 01/06/2011 Page 3

  4. Example of an undirected graph 1841 employees of a car factory 6 binary variables S: smoking (yes or not) M: strenuous mental work (yes or not) P: strenuous physical work (yes or not) B: blood pressure ( < 140 or ≥ 140) L: ratio of lipoproteins ( < 3 or ≥ 3) F: family history of coronary heart disease (yes or not) Madigan and Raftery (1994) Journ´ ee MSTGA, CIRAD Montpellier, 01/06/2011 Page 4

  5. Journ´ ee MSTGA, CIRAD Montpellier, 01/06/2011 Page 5

  6. If the graph is known, the parameters of the model are easily estimated. However, a quite challenging issue is the determination of the set of most appropriate graphs for a given dataset. We consider this problem and the case of decomposable Gaussian graph- ical models Dawid and Lauritzen (1993) Journ´ ee MSTGA, CIRAD Montpellier, 01/06/2011 Page 6

  7. Plan • Background on Bayesian model selection • Background on decomposable Gaussian graphical models • Bayesian tools for Gaussian graphical models • An empirical Bayes procedure via the SAEM-MCMC algorithm • A new Metropolis-Hastings sampler to explore the space of graphs • Numerical experiments Journ´ ee MSTGA, CIRAD Montpellier, 01/06/2011 Page 7

  8. Background on Bayesian model selection Several models available for the same observation M i : y ∼ f i ( y | θ i ) , i ∈ I where I can be finite or infinite Journ´ ee MSTGA, CIRAD Montpellier, 01/06/2011 Page 8

  9. Probabilise the entire model/parameter space • allocate probabilities p i to all models M i • define priors π i ( θ i ) for each parameter space Θ i • compute � f i ( y | θ i ) π i ( θ i )d θ i p i Θ i P ( M i | y ) = � � p j f j ( y | θ j ) π j ( θ j )d θ j Θ j j • take largest P ( M i | y ) to determine “best” model, or use averaged predictive � � f j ( y ′ | θ j , y ) π j ( θ j | y )d θ j P ( M j | y ) Θ j j Journ´ ee MSTGA, CIRAD Montpellier, 01/06/2011 Page 9

  10. Background on decomposable Gaussian graphical models Let G = ( V, E ) be an undirected graph: • V = { 1 , . . . , p } is the vertex set; • E ⊆ { ( i, j ) : 1 ≤ i < j ≤ p } is the edge set: if ( a, b ) ∈ E then vertices a and b are adjacent in G . A graph or subgraph is complete if all its vertices are joined by an edge. A complete subgraph that is not contained within another complete sub- graph is called a clique. Journ´ ee MSTGA, CIRAD Montpellier, 01/06/2011 Page 10

  11. Let C = { C 1 , . . . , C k } be the set of cliques of G . An ordering of all the cliques ( C 1 , . . . , C k ) is said to be perfect if the ver- tices of each clique C i also contained in any previous clique C 1 , . . . , C i − 1 are all members of one previous clique; that is ∀ i = 2 , 3 , . . . , k , S i = C i ∩ ∪ i − 1 j =1 C i ⊆ C h for some h = h ( i ) ∈ { 1 , 2 , . . . , i − 1 } . S = { S 2 , . . . , S k } is the set of separators associated to the perfect ordering { C 1 , . . . , C k } . If an undirected graph admits a perfect ordering it is said to be decom- posable. Journ´ ee MSTGA, CIRAD Montpellier, 01/06/2011 Page 11

  12. The following graph (used as benchmark in the following) is decompos- able. k = 5, C 1 = { 1 , 2 , 3 } , C 2 = { 2 , 3 , 5 , 6 } , C 3 = { 2 , 4 , 5 } , C 4 = { 5 , 6 , 7 } and C 5 = { 6 , 7 , 8 , 9 } , S 2 = { 2 , 3 } , S 3 = { 2 , 5 } , S 4 = { 5 , 6 } and S 5 = { 6 , 7 } . Journ´ ee MSTGA, CIRAD Montpellier, 01/06/2011 Page 12

  13. If (2 , 6) / ∈ E and (3 , 5) / ∈ E , the graph is not decomposable any more. k = 5, C 1 = { 1 , 2 , 3 } , C 2 = { 2 , 4 , 5 } , C 3 = { 3 , 6 } , C 4 = { 5 , 6 , 7 } and C 5 = { 6 , 7 , 8 , 9 } Journ´ ee MSTGA, CIRAD Montpellier, 01/06/2011 Page 13

  14. With p vertices, the number of possible edges is T = p ( p − 1) and the total 2 number of graphs is 2 T . The total number of decomposable graphs with p vertices can be calcu- lated for moderate values of p , for instance: if p = 6 there is 32 , 768 graphs and 18 , 154 are decomposable (around 55%); if p = 8, there is 268 , 435 , 456 graphs and 30 , 888 , 596 are decomposable (around 12%). Journ´ ee MSTGA, CIRAD Montpellier, 01/06/2011 Page 14

  15. A pair ( A, B ) of subsets of the vertex set V of an undirected graph G is said to form a decomposition of G if • V = A ∪ B ; • A ∩ B is complete; • A ∩ B separates A from B (any path from a vertex in A to a vertex in B goes through A ∩ B ). Journ´ ee MSTGA, CIRAD Montpellier, 01/06/2011 Page 15

  16. To each vertex v ∈ V , we associate a random variable y v . For A ⊆ V , y A = ( y v ) v ∈ A indicates the collection of random variables { y v : v ∈ A } . To ease the notation, let y = y V . The probability distribution of y is said to be Markov with respect to G , if for any decomposition ( A, B ) of G , y A is independent of y B given y A ∩ B (global Markov property). A graphical model is a family of distributions on y which are Markov with respect to a graph. Journ´ ee MSTGA, CIRAD Montpellier, 01/06/2011 Page 16

  17. A Gaussian graphical model is such that y |G , Σ G ∼ N p (0 p , Σ G ) , (1) where Σ G is a positive definite matrix which ensures that the distribution of y is Markov with respect to G . Σ G ensures that the distribution of y is Markov if and only if � � Σ − 1 ( i, j ) / ∈ E ⇐ ⇒ ( i,j ) = 0 . G Dempster (1972) (covariance selection models) Journ´ ee MSTGA, CIRAD Montpellier, 01/06/2011 Page 17

  18. In a Gaussian graphical model, the global, local and pairwise Markov properties are equivalent. Local Markov property: every variable is conditionally independent of the remaining, given its neighbours. Pairwise Markov property: any non-adjacent pair of random variables are conditionally independent given the remaning. Journ´ ee MSTGA, CIRAD Montpellier, 01/06/2011 Page 18

  19. The mean parameter is typically set to zero: the data we analyze will be expressed as deviation from the sample mean. We observe a sample y 1 , . . . , y n from (1) (the data are centered). We would like to identify the set of most relevant graphs. For the considered multivariate random phenomenon, we are interested in the set of most relevant conditional independence structures. = ⇒ explore huge graph space. Journ´ ee MSTGA, CIRAD Montpellier, 01/06/2011 Page 19

  20. Bayesian tools for Gaussian graphical models We consider the Bayesian paradigm. Conditionally on G , we use a Hyper-Inverse Wishart (HIW) distribution associated to the graph G as prior distribution on Σ G : Σ G |G , δ G , Φ G ∼ HIW G ( δ G , Φ G ) where δ G > 0 and Φ G is a p × p symmetric positive definite matrix. Dawid and Lauritzen (1993), Giudici and Green (1999), Armstrong et al. (2006) Journ´ ee MSTGA, CIRAD Montpellier, 01/06/2011 Page 20

  21. Conditionally on G , the HIW distribution is conjugate � � n � y i � y i � T Σ G | y 1 , . . . , y n , G , δ G , Φ G ∼ HIW G δ G + n, Φ G + . (2) i =1 Moreover, for such a prior, h G ( δ G , Φ G ) f ( y 1 , . . . , y n |G , δ G , Φ G ) = � � n � y i � y i � T (2 π ) − np/ 2 h G δ G + n, Φ G + i =1 where h G is the normalizing constant of the HIW distribution associated to the graph G . Roverato (2002) extends Hyper-Inverse Wishart distribution to non- decomposable case. Journ´ ee MSTGA, CIRAD Montpellier, 01/06/2011 Page 21

  22. i =1 y i � y i � T . Let Y = ( y 1 , . . . , y n ) and S Y = � n If we assume a uniform prior distribution in the space of graphs, π ( G ) ∝ 1: π ( G| Y , δ G , Φ G ) ∝ f ( Y |G , δ G , Φ G ) . Uniform distribution on the space of graphs typically not satisfactory: with p vertices, the number of possible edges is equal to p ( p − 1) and, for 2 an uniform prior over all graphs, the prior number of edges has mode around p ( p − 1) . 4 Wong, Carter and Kohn (2003), Jones et al. (2005), Armstrong et al. (2009), Carvalho and Scott (2009) Journ´ ee MSTGA, CIRAD Montpellier, 01/06/2011 Page 22

  23. An alternative to the naive uniform prior is to set a Bernouilli distribution of parameter r on the inclusion or not of each edge: p ( p − 1) − k G , π ( G| r ) ∝ r k G (1 − r ) 2 where k G is the number of edges of G . The parameter r has to be calibrate. If r = 1 / 2, this prior resumes to the uniform one. We deduce easily that h G ( δ G , Φ G ) π ( G| Y , δ G , Φ G , r ) ∝ h G ( δ G + n, Φ G + S Y ) π ( G| r ) . (3) Journ´ ee MSTGA, CIRAD Montpellier, 01/06/2011 Page 23

Recommend


More recommend