variational methods for overlapping and non overlapping
play

Variational methods for overlapping and non-overlapping stochastic - PowerPoint PPT Presentation

Variational methods for overlapping and non-overlapping stochastic block models Pierre Latouche Universit e Paris 1 Panth eon-Sorbonne Laboratoire SAMM MSTGA 2012 Pierre Latouche 1 Contents Introduction Real networks Graph


  1. Variational methods for overlapping and non-overlapping stochastic block models Pierre Latouche Universit´ e Paris 1 Panth´ eon-Sorbonne Laboratoire SAMM MSTGA 2012 Pierre Latouche 1

  2. Contents Introduction Real networks Graph clustering Stochastic block models Model selection The overlapping stochastic block model Model selection Bayesian framework Inference The regulation term β Model selection Experiments Simulated data The French blogosphere network Pierre Latouche 2

  3. Real networks ◮ Many scientific fields : ◮ World Wide Web ◮ Biology, sociology, physics ◮ Nature of data under study : ◮ Interactions between N objects ◮ O ( N 2 ) possible interactions ◮ Network topology : ◮ Describes the way nodes interact, structure/function Sample of 250 blogs (nodes) with their links relationship (edges) of the French political Blogosphere. Pierre Latouche 3

  4. In Biology The metabolic network of bacteria Escherichia coli (Lacroix et al., 2006). Pierre Latouche 4

  5. In Biology Subset of the yeast transcriptional regulatory network (Milo et al., 2002). Pierre Latouche 5

  6. Real networks ◮ Properties : ◮ Sparsity : m = O( N ) ◮ Existence of a giant component ◮ Heterogeneity ◮ Preferential attachment ◮ Small world → Topological structure (groups of vertices) ֒ Pierre Latouche 6

  7. Real networks ◮ Properties : ◮ Sparsity : m = O( N ) ◮ Existence of a giant component ◮ Heterogeneity ◮ Preferential attachment ◮ Small world → Topological structure (groups of vertices) ֒ Pierre Latouche 6

  8. Graph clustering ◮ Existing methods look for : ◮ Community structure ◮ Disassortative mixing ◮ Heterogeneous structure Pierre Latouche 7

  9. Graph clustering ◮ Existing methods look for : ◮ Community structure ◮ Disassortative mixing ◮ Heterogeneous structure Pierre Latouche 7

  10. Graph clustering ◮ Existing methods look for : ◮ Community structure ◮ Disassortative mixing ◮ Heterogeneous structure Pierre Latouche 7

  11. Graph clustering ◮ Existing methods look for : ◮ Community structure ◮ Disassortative mixing ◮ Heterogeneous structure Pierre Latouche 7

  12. Stochastic Block Model (SBM) ◮ Nowicki and Snijders (2001) ◮ Earlier work : Govaert et al. (1977) ◮ Z i independent hidden variables : � � ◮ Z i ∼ M 1 , α = ( α 1 , α 2 , . . . , α K ) ◮ Z ik = 1 : vertex i belongs to class k ◮ X | Z edges drawn independently : X ij |{ Z ik Z jl = 1 } ∼ B ( π kl ) ◮ A mixture model for graphs : K K � � X ij ∼ α k α l B ( π kl ) k =1 l =1 Pierre Latouche 8

  13. π •• 6 6 π •• 5 5 7 7 3 π •• 4 4 8 8 1 2 π •• 10 π •• 9 Pierre Latouche 9

  14. Maximum likelihood estimation ◮ Log-likelihoods of the model : ◮ Observed-data : log p ( X | α , Π ) = log { � Z p ( X , Z | α , Π ) } → K N terms ֒ ◮ Expectation Maximization (EM) algorithm requires the knowledge of p ( Z | X , α , Π ) Problem p ( Z | X , α , Π ) is not tractable (no conditional independence) Variational EM Daudin et al. (2008) Pierre Latouche 10

  15. Maximum likelihood estimation ◮ Log-likelihoods of the model : ◮ Observed-data : log p ( X | α , Π ) = log { � Z p ( X , Z | α , Π ) } → K N terms ֒ ◮ Expectation Maximization (EM) algorithm requires the knowledge of p ( Z | X , α , Π ) Problem p ( Z | X , α , Π ) is not tractable (no conditional independence) Variational EM Daudin et al. (2008) Pierre Latouche 10

  16. Maximum likelihood estimation ◮ Log-likelihoods of the model : ◮ Observed-data : log p ( X | α , Π ) = log { � Z p ( X , Z | α , Π ) } → K N terms ֒ ◮ Expectation Maximization (EM) algorithm requires the knowledge of p ( Z | X , α , Π ) Problem p ( Z | X , α , Π ) is not tractable (no conditional independence) Variational EM Daudin et al. (2008) Pierre Latouche 10

  17. Model selection Criteria Since log p ( X | α , Π ) is not tractable, we cannot rely on: α , ˆ ◮ AIC = log p ( X | ˆ Π ) − C α , ˆ 2 log N ( N − 1) Π ) − C ◮ BIC = log p ( X | ˆ 2 ICL Biernacki et al. (2000) ֒ → Daudin et al. (2008) Variational Bayes EM ֒ → ILvb Latouche et al. (2012) Pierre Latouche 11

  18. Model selection Criteria Since log p ( X | α , Π ) is not tractable, we cannot rely on: α , ˆ ◮ AIC = log p ( X | ˆ Π ) − C α , ˆ 2 log N ( N − 1) Π ) − C ◮ BIC = log p ( X | ˆ 2 ICL Biernacki et al. (2000) ֒ → Daudin et al. (2008) Variational Bayes EM ֒ → ILvb Latouche et al. (2012) Pierre Latouche 11

  19. Bayesian framework ◮ Conjugate prior distributions : � � α | n 0 = { n 0 ◮ p 1 , . . . , n 0 = Dir( α ; n 0 ) K } � � kl ) , ζ 0 = ( ζ 0 Π | η 0 = ( η 0 = � ◮ p k ≤ l Beta( π kl ; η 0 kl , ζ 0 kl ) kl ) ◮ Non informative Jeffreys prior : ◮ n 0 k = 1 / 2 ◮ η 0 kl = ζ 0 kl = 1 / 2 Pierre Latouche 12

  20. Variational Bayes EM Latouche et al. (2009) ◮ p ( Z , α , Π | X ) not tractable Decomposition log p ( X ) = L ( q ) + KL ( q ( · ) || p ( ·| X )) where � p ( X , Z , α , Π ) � � � � L ( q ) = q ( Z , α , Π ) log d α d Π q ( Z , α , Π ) Z Factorization N � q ( Z , α , Π ) = q ( α ) q ( Π ) q ( Z ) = q ( α ) q ( Π ) q ( Z i ) i =1 Pierre Latouche 13

  21. Variational Bayes EM Latouche et al. (2009) E-step ◮ q ( Z i ) = M ( Z i ; 1 , τ i = { τ i 1 , . . . , τ iK } ) M-step ◮ q ( α ) = Dir( α ; n ) ◮ q ( Π ) = � K k ≤ l Beta( π kl ; η kl , ζ kl ) Pierre Latouche 14

  22. A new model selection criterion : ILvb Latouche et al. (2012) ◮ log p ( X | K ) = L ( q ) + KL( ... ) ◮ After convergence, use L ( q ) as an approximation of log p ( X | K ) ILvb � � Γ( � K k ) � K k =1 n 0 k =1 Γ( n k ) IL vb = log Γ( � K k =1 n k ) � K k =1 Γ( n 0 k ) K � Γ( η 0 � N K kl + ζ 0 kl )Γ( η kl )Γ( ζ kl ) � � � − + log τ ik log τ ik Γ( η kl + ζ kl )Γ( η 0 kl )Γ( ζ 0 kl ) k ≤ l i =1 k =1 Pierre Latouche 15

  23. Contents Introduction Real networks Graph clustering Stochastic block models Model selection The overlapping stochastic block model Model selection Bayesian framework Inference The regulation term β Model selection Experiments Simulated data The French blogosphere network Pierre Latouche 16

  24. Overlaps in networks Palla et al. (2006) Problem The stochastic block model (SBM) and most existing methods assume that each vertex belongs to a single class Pierre Latouche 17

  25. Stochastic Block Model (SBM) ◮ Nowicki and Snijders (2001) ◮ Z i independent hidden variables : � � Z i ∼ M 1 , α = ( α 1 , α 2 , . . . , α K ) Pierre Latouche 18

  26. Overlapping Stochastic Block model (OSBM) ◮ Latouche et al. (2011) ◮ Z ik independent hidden variables : K K � � α Z ik (1 − α k ) 1 − Z ik Z i ∼ B ( Z ik ; α k ) = k k =1 k =1 Pierre Latouche 18

  27. Overlapping Stochastic Block model (OSBM) ◮ Latouche et al. (2011) ◮ X | Z edges drawn independently : � � X ij | Z i , Z j ∼ B X ij ; Π Z i , Z j ) � � ◮ Π Z i , Z j = g a Z i , Z j + V ⊺ Z j + W ∗ ◮ a Z i , Z j = Z ⊺ + Z ⊺ i W Z j i U ���� � �� � � �� � � �� � bias i ↔ j i → ? ? → j ◮ g ( t ) = 1 / (1 + exp( − t )) is the logistic function Pierre Latouche 18

  28. OSBM ◮ ˜ Z i = ( Z i , 1) ⊺ � W � U ˜ W = ◮ W ∗ V ⊺ ◮ a Z i , Z j = ˜ ⊺ i ˜ W ˜ Z Z j � � α , ˜ ◮ Parameter set : W Pierre Latouche 19

  29. Contents Introduction Real networks Graph clustering Stochastic block models Model selection The overlapping stochastic block model Model selection Bayesian framework Inference The regulation term β Model selection Experiments Simulated data The French blogosphere network Pierre Latouche 20

  30. Bayesian framework ◮ Conjugate prior distributions : ◮ p ( α ) = � K k =1 Beta( α k ; η 0 k , ζ 0 k ) vec ) = N ( ˜ vec ; ˜ vec ◮ p ( ˜ 0 , S 0 ) W W W ◮ The vec operator : if � A 11 � A 12 A = , A 21 A 22 then   A 11 A 21   A vec =   A 12   A 22 Pierre Latouche 21

  31. Bayesian framework ◮ x ⊺ A y = ( y ⊗ x ) ⊺ A vec vec ◮ In practice : set ˜ = 0 and S 0 = I W 0 β Problem p ( Z , α , ˜ W | X ) not tractable Pierre Latouche 22

  32. q Transformation Decomposition log p ( X ) = L ( r ) + KL( r || p ) where � � p ( X | Z , ˜ W ) p ( Z | α ) p ( α ) p ( ˜ W ) � � r ( Z , α , ˜ d α d ˜ L ( r ) = W ) log W r ( Z , α , ˜ W ) Z Lower bound log p ( X ) ≥ L ( r ) Problem L ( r ) has a too complex form ֒ → no variational Bayes EM algorithm ?? Pierre Latouche 23

  33. Local bound ◮ Use the bound of Jaakkola and Jordan (2000) for Bayesian logistic regression log p ( X | Z , ˜ W ) ≥ log h ( Z , ˜ W , ξ ) , ∀ ξ ∈ R N × N where N � ( X ij − 1 2) a Z i , Z j − ξ ij � log h ( Z , ˜ W , ξ ) = 2 + log g ( ξ ij ) i � = j � − λ ( ξ ij )( a 2 Z i , Z j − ξ 2 ij ) and λ ( ξ ) = 1 4 ξ tanh( ξ 2) = 1 g ( ξ ) − 1 � � 2 ξ 2 Pierre Latouche 24

  34. ξ Transformation Lower Bound �� � � p ( X | Z , ˜ W ) p ( Z | α ) p ( α ) p ( ˜ W ) d α d ˜ log p ( X ) = log W Z ≥ L ( ξ ) where �� � � h ( Z , ˜ W , ξ ) p ( Z | α ) p ( α ) p ( ˜ W ) d α d ˜ L ( ξ ) = log W Z Pierre Latouche 25

Recommend


More recommend