Variational methods for overlapping and non-overlapping stochastic block models Pierre Latouche Universit´ e Paris 1 Panth´ eon-Sorbonne Laboratoire SAMM MSTGA 2012 Pierre Latouche 1
Contents Introduction Real networks Graph clustering Stochastic block models Model selection The overlapping stochastic block model Model selection Bayesian framework Inference The regulation term β Model selection Experiments Simulated data The French blogosphere network Pierre Latouche 2
Real networks ◮ Many scientific fields : ◮ World Wide Web ◮ Biology, sociology, physics ◮ Nature of data under study : ◮ Interactions between N objects ◮ O ( N 2 ) possible interactions ◮ Network topology : ◮ Describes the way nodes interact, structure/function Sample of 250 blogs (nodes) with their links relationship (edges) of the French political Blogosphere. Pierre Latouche 3
In Biology The metabolic network of bacteria Escherichia coli (Lacroix et al., 2006). Pierre Latouche 4
In Biology Subset of the yeast transcriptional regulatory network (Milo et al., 2002). Pierre Latouche 5
Real networks ◮ Properties : ◮ Sparsity : m = O( N ) ◮ Existence of a giant component ◮ Heterogeneity ◮ Preferential attachment ◮ Small world → Topological structure (groups of vertices) ֒ Pierre Latouche 6
Real networks ◮ Properties : ◮ Sparsity : m = O( N ) ◮ Existence of a giant component ◮ Heterogeneity ◮ Preferential attachment ◮ Small world → Topological structure (groups of vertices) ֒ Pierre Latouche 6
Graph clustering ◮ Existing methods look for : ◮ Community structure ◮ Disassortative mixing ◮ Heterogeneous structure Pierre Latouche 7
Graph clustering ◮ Existing methods look for : ◮ Community structure ◮ Disassortative mixing ◮ Heterogeneous structure Pierre Latouche 7
Graph clustering ◮ Existing methods look for : ◮ Community structure ◮ Disassortative mixing ◮ Heterogeneous structure Pierre Latouche 7
Graph clustering ◮ Existing methods look for : ◮ Community structure ◮ Disassortative mixing ◮ Heterogeneous structure Pierre Latouche 7
Stochastic Block Model (SBM) ◮ Nowicki and Snijders (2001) ◮ Earlier work : Govaert et al. (1977) ◮ Z i independent hidden variables : � � ◮ Z i ∼ M 1 , α = ( α 1 , α 2 , . . . , α K ) ◮ Z ik = 1 : vertex i belongs to class k ◮ X | Z edges drawn independently : X ij |{ Z ik Z jl = 1 } ∼ B ( π kl ) ◮ A mixture model for graphs : K K � � X ij ∼ α k α l B ( π kl ) k =1 l =1 Pierre Latouche 8
π •• 6 6 π •• 5 5 7 7 3 π •• 4 4 8 8 1 2 π •• 10 π •• 9 Pierre Latouche 9
Maximum likelihood estimation ◮ Log-likelihoods of the model : ◮ Observed-data : log p ( X | α , Π ) = log { � Z p ( X , Z | α , Π ) } → K N terms ֒ ◮ Expectation Maximization (EM) algorithm requires the knowledge of p ( Z | X , α , Π ) Problem p ( Z | X , α , Π ) is not tractable (no conditional independence) Variational EM Daudin et al. (2008) Pierre Latouche 10
Maximum likelihood estimation ◮ Log-likelihoods of the model : ◮ Observed-data : log p ( X | α , Π ) = log { � Z p ( X , Z | α , Π ) } → K N terms ֒ ◮ Expectation Maximization (EM) algorithm requires the knowledge of p ( Z | X , α , Π ) Problem p ( Z | X , α , Π ) is not tractable (no conditional independence) Variational EM Daudin et al. (2008) Pierre Latouche 10
Maximum likelihood estimation ◮ Log-likelihoods of the model : ◮ Observed-data : log p ( X | α , Π ) = log { � Z p ( X , Z | α , Π ) } → K N terms ֒ ◮ Expectation Maximization (EM) algorithm requires the knowledge of p ( Z | X , α , Π ) Problem p ( Z | X , α , Π ) is not tractable (no conditional independence) Variational EM Daudin et al. (2008) Pierre Latouche 10
Model selection Criteria Since log p ( X | α , Π ) is not tractable, we cannot rely on: α , ˆ ◮ AIC = log p ( X | ˆ Π ) − C α , ˆ 2 log N ( N − 1) Π ) − C ◮ BIC = log p ( X | ˆ 2 ICL Biernacki et al. (2000) ֒ → Daudin et al. (2008) Variational Bayes EM ֒ → ILvb Latouche et al. (2012) Pierre Latouche 11
Model selection Criteria Since log p ( X | α , Π ) is not tractable, we cannot rely on: α , ˆ ◮ AIC = log p ( X | ˆ Π ) − C α , ˆ 2 log N ( N − 1) Π ) − C ◮ BIC = log p ( X | ˆ 2 ICL Biernacki et al. (2000) ֒ → Daudin et al. (2008) Variational Bayes EM ֒ → ILvb Latouche et al. (2012) Pierre Latouche 11
Bayesian framework ◮ Conjugate prior distributions : � � α | n 0 = { n 0 ◮ p 1 , . . . , n 0 = Dir( α ; n 0 ) K } � � kl ) , ζ 0 = ( ζ 0 Π | η 0 = ( η 0 = � ◮ p k ≤ l Beta( π kl ; η 0 kl , ζ 0 kl ) kl ) ◮ Non informative Jeffreys prior : ◮ n 0 k = 1 / 2 ◮ η 0 kl = ζ 0 kl = 1 / 2 Pierre Latouche 12
Variational Bayes EM Latouche et al. (2009) ◮ p ( Z , α , Π | X ) not tractable Decomposition log p ( X ) = L ( q ) + KL ( q ( · ) || p ( ·| X )) where � p ( X , Z , α , Π ) � � � � L ( q ) = q ( Z , α , Π ) log d α d Π q ( Z , α , Π ) Z Factorization N � q ( Z , α , Π ) = q ( α ) q ( Π ) q ( Z ) = q ( α ) q ( Π ) q ( Z i ) i =1 Pierre Latouche 13
Variational Bayes EM Latouche et al. (2009) E-step ◮ q ( Z i ) = M ( Z i ; 1 , τ i = { τ i 1 , . . . , τ iK } ) M-step ◮ q ( α ) = Dir( α ; n ) ◮ q ( Π ) = � K k ≤ l Beta( π kl ; η kl , ζ kl ) Pierre Latouche 14
A new model selection criterion : ILvb Latouche et al. (2012) ◮ log p ( X | K ) = L ( q ) + KL( ... ) ◮ After convergence, use L ( q ) as an approximation of log p ( X | K ) ILvb � � Γ( � K k ) � K k =1 n 0 k =1 Γ( n k ) IL vb = log Γ( � K k =1 n k ) � K k =1 Γ( n 0 k ) K � Γ( η 0 � N K kl + ζ 0 kl )Γ( η kl )Γ( ζ kl ) � � � − + log τ ik log τ ik Γ( η kl + ζ kl )Γ( η 0 kl )Γ( ζ 0 kl ) k ≤ l i =1 k =1 Pierre Latouche 15
Contents Introduction Real networks Graph clustering Stochastic block models Model selection The overlapping stochastic block model Model selection Bayesian framework Inference The regulation term β Model selection Experiments Simulated data The French blogosphere network Pierre Latouche 16
Overlaps in networks Palla et al. (2006) Problem The stochastic block model (SBM) and most existing methods assume that each vertex belongs to a single class Pierre Latouche 17
Stochastic Block Model (SBM) ◮ Nowicki and Snijders (2001) ◮ Z i independent hidden variables : � � Z i ∼ M 1 , α = ( α 1 , α 2 , . . . , α K ) Pierre Latouche 18
Overlapping Stochastic Block model (OSBM) ◮ Latouche et al. (2011) ◮ Z ik independent hidden variables : K K � � α Z ik (1 − α k ) 1 − Z ik Z i ∼ B ( Z ik ; α k ) = k k =1 k =1 Pierre Latouche 18
Overlapping Stochastic Block model (OSBM) ◮ Latouche et al. (2011) ◮ X | Z edges drawn independently : � � X ij | Z i , Z j ∼ B X ij ; Π Z i , Z j ) � � ◮ Π Z i , Z j = g a Z i , Z j + V ⊺ Z j + W ∗ ◮ a Z i , Z j = Z ⊺ + Z ⊺ i W Z j i U ���� � �� � � �� � � �� � bias i ↔ j i → ? ? → j ◮ g ( t ) = 1 / (1 + exp( − t )) is the logistic function Pierre Latouche 18
OSBM ◮ ˜ Z i = ( Z i , 1) ⊺ � W � U ˜ W = ◮ W ∗ V ⊺ ◮ a Z i , Z j = ˜ ⊺ i ˜ W ˜ Z Z j � � α , ˜ ◮ Parameter set : W Pierre Latouche 19
Contents Introduction Real networks Graph clustering Stochastic block models Model selection The overlapping stochastic block model Model selection Bayesian framework Inference The regulation term β Model selection Experiments Simulated data The French blogosphere network Pierre Latouche 20
Bayesian framework ◮ Conjugate prior distributions : ◮ p ( α ) = � K k =1 Beta( α k ; η 0 k , ζ 0 k ) vec ) = N ( ˜ vec ; ˜ vec ◮ p ( ˜ 0 , S 0 ) W W W ◮ The vec operator : if � A 11 � A 12 A = , A 21 A 22 then A 11 A 21 A vec = A 12 A 22 Pierre Latouche 21
Bayesian framework ◮ x ⊺ A y = ( y ⊗ x ) ⊺ A vec vec ◮ In practice : set ˜ = 0 and S 0 = I W 0 β Problem p ( Z , α , ˜ W | X ) not tractable Pierre Latouche 22
q Transformation Decomposition log p ( X ) = L ( r ) + KL( r || p ) where � � p ( X | Z , ˜ W ) p ( Z | α ) p ( α ) p ( ˜ W ) � � r ( Z , α , ˜ d α d ˜ L ( r ) = W ) log W r ( Z , α , ˜ W ) Z Lower bound log p ( X ) ≥ L ( r ) Problem L ( r ) has a too complex form ֒ → no variational Bayes EM algorithm ?? Pierre Latouche 23
Local bound ◮ Use the bound of Jaakkola and Jordan (2000) for Bayesian logistic regression log p ( X | Z , ˜ W ) ≥ log h ( Z , ˜ W , ξ ) , ∀ ξ ∈ R N × N where N � ( X ij − 1 2) a Z i , Z j − ξ ij � log h ( Z , ˜ W , ξ ) = 2 + log g ( ξ ij ) i � = j � − λ ( ξ ij )( a 2 Z i , Z j − ξ 2 ij ) and λ ( ξ ) = 1 4 ξ tanh( ξ 2) = 1 g ( ξ ) − 1 � � 2 ξ 2 Pierre Latouche 24
ξ Transformation Lower Bound �� � � p ( X | Z , ˜ W ) p ( Z | α ) p ( α ) p ( ˜ W ) d α d ˜ log p ( X ) = log W Z ≥ L ( ξ ) where �� � � h ( Z , ˜ W , ξ ) p ( Z | α ) p ( α ) p ( ˜ W ) d α d ˜ L ( ξ ) = log W Z Pierre Latouche 25
Recommend
More recommend