a start of variational methods for ergm
play

A start of Variational Methods for ERGM Ranran Wang, UW MURI-UCI - PowerPoint PPT Presentation

A start of Variational Methods for ERGM Ranran Wang, UW MURI-UCI April 24, 2009 A start of Variational Methods for ERGM [1] Outline Introduction to ERGM Current methods of parameter estimation: MCMCMLE: Markov chain Monte-Carlo


  1. A start of Variational Methods for ERGM Ranran Wang, UW MURI-UCI April 24, 2009

  2. A start of Variational Methods for ERGM [1] Outline • Introduction to ERGM • Current methods of parameter estimation: – MCMCMLE: Markov chain Monte-Carlo estimation – MPLE: Maximum pseudo-likelihood estimation • Variational methods: – Exponential families and variational inference – Approximation of intractable families – Application on ERGM – Simulation study

  3. A start of Variational Methods for ERGM [2] Introduction to ERGM Network Notations • m actors; n = m ( m − 1) dyads 2 • Sociomatrix (adjacency matrix) Y : { y i,j } i,j =1 , ··· ,n • Edge set { ( i, j ) : y i,j = 1 } . • Undirected network: { y i,j = y j,i = 1 }

  4. A start of Variational Methods for ERGM [3] ERGM Exponential Family Random Graph Model (Frank and Strauss, 1986; Wasserman and Pattison, 1996; Handcock, Hunter, Butts, Goodreau and Morris, 2008): log[ P ( Y = y obs ; η )] = η T φ ( y obs ) − κ ( η, Y ) , y ∈ Y where • Y is the random matrix • η ∈ Ω ⊂ R q is the vector of model parameters • φ ( y ) is a q -vector of statistics • κ ( η, Y ) = log P z ∈Y exp { η T φ ( z ) } is the normalizing factor, which is difficult to calculate. • R package: statnet

  5. A start of Variational Methods for ERGM [4] Current estimation approaches for ERGM MCMC-MLE (Geyer and Thompson 1992, Snijders, 2002; Hunter, Handcock, Butts, Goodreau and Morris, 2008): 1. Set an initial value η 0 , for parameter η . 2. Generate MCMC samples of size m from P η 0 by Metropolis algorithm. 3. Iterate to obtain a maximizer ˜ η of the approximate log-likelihood ratio: h 1 ¯i m X ˘ ( η − η 0 ) T φ ( y obs ) − log ( η − η 0 ) T φ ( Y i ) exp m i =1 4. If the estimated variance of the approximate log-likelihood ratio is too large in comparison to the estimated log-likelihood for ˜ η , return to step 2 with η 0 = ˜ η . 5. Return ˜ η as MCMCMLE.

  6. A start of Variational Methods for ERGM [5] MPLE (Besag, 1975; Strauss and Ikeda, 1990): Conditional formulation: logit [ P ( Y ij = 1 | Y C ij = y C ij )] = η T δ ( y C ij ) . ij ) − φ ( y − where δ ( y C ij ) = φ ( y + ij ) , the change in φ ( y ) when y ij changes from 0 to 1 while the rest of network remains y C ij .

  7. A start of Variational Methods for ERGM [6] Comparison Simulation study: van Duijn, Gile and Handcock (2008) MCMC-MLE MPLE • Slow-mixing • Deterministic model; computation is fast • Highly depends on initial values • Unstable • Be able to model various network • Dyadic-independent model; characteristics together. could not capture high-order network characteristics.

  8. A start of Variational Methods for ERGM [7] Variational method Exponential families and variational representations Basics of exponential family: log[ p ( x ; θ )] = � θ, φ ( x ) � − κ ( θ ) . • Sufficient statistics: φ ( x ) . • Log-partition function: κ ( θ ) = log P x ∈X exp � θ, φ ( x ) � . • Mean value parametrization: µ ∈ R q := E ( φ ( x )) • Mean value space (convex hull): X ˘ ¯ µ ∈ R q | ∃ p ( · ) s.t. M = φ ( x ) p ( x ) = µ . X

  9. A start of Variational Methods for ERGM [8] The log-partition function is smooth and convex in terms of θ . Suppose θ = ( θ α , θ β , · · · ) and φ ( x ) = ( φ α ( x ) , φ β ( x ) , · · · ) : X ∂κ ( θ ) = E [ φ α ( x )] := φ α ( x ) p ( x ; θ ) . (1) ∂θ α x ∈X ∂κ ( θ ) = E [ φ α ( x ) φ β ( x )] − E [ φ α ( x )] E [ φ β ( x )] . (2) ∂θ α ∂θ β So, µ ( θ ) can be reexpressed as µ ( θ ) = ∂κ ∂θ ( θ ) and it has gradient ∂ 2 κ ∂θ T ∂θ ( θ ) . (Barndorff-Nielson, 1978; Handcock, 2003; Wainwright and Jordan, 2003)

  10. A start of Variational Methods for ERGM [9] Exp : Ising model on graph G ( V, E ) X X log p ( x, θ ) = { θ st x s x t − κ ( θ ) } , θ s x s + (3) s ∈ V ( s,t ) ∈ E where: • x s , associated with s ∈ V is a Bernoulli random variable; • components x s and x t are allowed to interact directly only if s and t are joined by an edge in the graph. The relevant mean parameters in this representation are as follows: µ s = E θ [ x s ] = p ( x s = 1; θ ) , µ st = E θ [ x s x t ] = p ( x s = 1 , x t = 1; θ ) . For each edge ( s, t ) , the triplet { µ s , µ t , µ st } uniquely determines a joint marginal p ( x s , x t ; µ ) as follows: » (1 + µ st − µ s − µ t ) – ( µ t − µ st ) p ( x s , x t ; µ ) = . ( µ s − µ st ) µ st

  11. A start of Variational Methods for ERGM [10] To ensure the joint marginal, we impose non-negativity constraints on all four entries, as follows: 1 + µ st − µ s − µ t ≥ 0 ≥ µ st 0 µ s ( /t ) − µ st ≥ 0 The inequalities above define M .

  12. A start of Variational Methods for ERGM [11] Variational inference and mean value estimation For any µ ∈ ri M (ri: relative interior), we have following lower bound: � θ, µ � − κ ∗ ( µ ) κ ( θ ) = sup (4) µ ∈M X exp {� θ, φ ( x ) �} κ ( θ ) = log p ( x ; θ ) p ( x ; θ ) x ∈X X ` exp {� θ, φ ( x ) �} ´ ≥ log p ( x ; θ ) p ( x ; θ ) x ∈X X X � θ, φ ( x ) � p ( x ; θ ) − = log( p ( x ; θ )) p ( x ; θ ) x ∈X x ∈X E � θ, φ ( x ) � − E [log( p ( x ; θ ))] = � θ, µ � − κ ∗ ( µ ) . = The inequality follows from Jensen’s inequality, and the last equality follows from E ( φ ( x )) = µ and κ ∗ ( µ ) = E [log( p ( x ; θ ( µ )))] , the negative entropy of distribution p ( x ; θ ) .

  13. A start of Variational Methods for ERGM [12] Why variational method? • Variational representation turns the problem of calculating intractable summation/integrals to optimization problem (finding lower bound of κ over M ). • The problem of computing mean parameters can be solved simultaneously. Two main difficulties: • The constraint set M of realizable mean parameters is difficult to characterize in an explicit manner. • κ ∗ ( µ ) is lack of explicit form and needs proper approximation.

  14. A start of Variational Methods for ERGM [13] Mean value estimation • µ is obtained by solving the optimization problem in (4). • However, the dual function κ ∗ lacks an explicit form in many cases. • We restrict the choice of µ to a tractable subset M t ( H ) of M ( G ) , where H is the tractable subgraph of G. The lower bound in (4) will then be computable. • The solution of the optimization problem {� µ, θ � − κ ∗ sup H ( µ ) } µ ∈M t ( H ) specifies optimal approximation ˜ µ t of µ . • The optimal ˜ µ t , in fact, minimizes the Kullback-Leibler divergence between the tractable M t and the target constraint M , and KL divergence between their natural parameter spaces as well.

  15. A start of Variational Methods for ERGM [14] Ising model on Graph: Approximation of κ ∗ Exp : Ising model on Graph: Approximation of κ ∗ Assume the tractable graph H 0 is fully disconnected, then the mean value parameter set is M 0 ( H 0 ) = { ( µ s , µ st ) | 0 ≤ µ s ≤ 1 , µ st = µ s µ t } Here, µ s = p ( x s = 1) and µ st = p ( x s = 1 , x t = 1) = µ s µ t . So, the distribution on H 0 is fully factorizable. Deriving from Bernoulli distribution, X κ ∗ H 0 ( µ ) = [ µ s log µ s + (1 − µ s ) log(1 − µ s )] . s ∈ V By (4), ˘ X X X ¯ θ st µ s µ t − [ µ s log µ s +(1 − µ s ) log(1 − µ s )] κ ( θ ) = max θ s µ s + . { µs }∈ [0 , 1] n s ∈ V s ∈ V ( s,t ) ∈ E (5)

  16. A start of Variational Methods for ERGM [15] After taking gradient and setting it to zero, we have following updates for µ : X logit ( µ s ) ← θ s + θ st µ t . (6) t ∈N ( s ) Apply (6) iteratively (coordinate ascent) to each node until convergence is reached.

  17. A start of Variational Methods for ERGM [16] Applications to ERGM Dependence Graph • G Y is a graph with m actors and n = m ( m − 1) dyads 2 • Construct a dependence graph D Y to describe the dependence structure of G Y : D Y = G ( V ( D ) , E ( D )) . – Each dyad ( i, j ) , i < j on G is an actor on D . – Each actor ( ij ) ∈ V ( D ) has a binary variable y ij . – Each edge on D exists if ( ij ) and ( kl ) as actors on D Y share a common value, i.e ( ij ) and ( kl ) as dyads on G Y share a node. • Frank and Strauss, 1986. Dependence Graph: D 12 Original Graph: G 3 23 24 1 13 14 2 4 34 Figure 1: Dependence Graph D

  18. A start of Variational Methods for ERGM [17] Exp : Erdos-Renyi Model: For an undirected random graph Y = { Y ij } , all dyads are mutually independent, so the dependency graph D is fully disconnected. Each y ij , ( ij ) ∈ D ( V ) is a Bernoulli random variable. The model can be written as X θ ij y ij − κ ( θ, Y ) , y ∈ Y . log[ P θ ( Y = y )] = i<j Calculating entropy of Bernoulli distribution, we have X κ ∗ ( µ ) = [ µ ij log( µ ij ) + (1 − µ ij ) log(1 − µ ij )] , (7) i<j where µ ij = P ( Y ij = 1) . Then, X {� θ, µ � − κ ∗ ( µ ) } = κ ( θ ) = sup log(1 + exp( θ ij )) , µ ∈M i<j µij when θ ij = log( 1 − µij ) .

  19. A start of Variational Methods for ERGM [18] 2-star ERGM model Analogous to Ising model, on dependence graph D = G ( V ( D ) , E ( D )) , X X θ st y s y t − κ ( θ ) , s : ( ij ) ∈ V ( G ) . log P ( Y, θ ) = θ s y s + s ∈ V ( D ) ( s,t ) ∈ E ( D ) If θ s = η 1 , s ∈ V and θ st = η 2 , ( s, t ) ∈ E , X X X log P ( Y, η ) = { η 1 y ij y ik − κ ( η ) } , y ij + η 2 i<j i j,k>i which corresponds to the canonical 2-star model.

Recommend


More recommend