Variational methods for overlapping and non-overlapping stochastic - PowerPoint PPT Presentation

Variational methods for overlapping and non-overlapping stochastic block models Pierre Latouche Universit´ e Paris 1 Panth´ eon-Sorbonne Laboratoire SAMM MSTGA 2012 Pierre Latouche 1

Contents Introduction Real networks Graph clustering Stochastic block models Model selection The overlapping stochastic block model Model selection Bayesian framework Inference The regulation term β Model selection Experiments Simulated data The French blogosphere network Pierre Latouche 2

Real networks ◮ Many scientific fields : ◮ World Wide Web ◮ Biology, sociology, physics ◮ Nature of data under study : ◮ Interactions between N objects ◮ O ( N 2 ) possible interactions ◮ Network topology : ◮ Describes the way nodes interact, structure/function Sample of 250 blogs (nodes) with their links relationship (edges) of the French political Blogosphere. Pierre Latouche 3

In Biology The metabolic network of bacteria Escherichia coli (Lacroix et al., 2006). Pierre Latouche 4

In Biology Subset of the yeast transcriptional regulatory network (Milo et al., 2002). Pierre Latouche 5

Real networks ◮ Properties : ◮ Sparsity : m = O( N ) ◮ Existence of a giant component ◮ Heterogeneity ◮ Preferential attachment ◮ Small world → Topological structure (groups of vertices) ֒ Pierre Latouche 6

Graph clustering ◮ Existing methods look for : ◮ Community structure ◮ Disassortative mixing ◮ Heterogeneous structure Pierre Latouche 7

Stochastic Block Model (SBM) ◮ Nowicki and Snijders (2001) ◮ Earlier work : Govaert et al. (1977) ◮ Z i independent hidden variables : � � ◮ Z i ∼ M 1 , α = ( α 1 , α 2 , . . . , α K ) ◮ Z ik = 1 : vertex i belongs to class k ◮ X | Z edges drawn independently : X ij |{ Z ik Z jl = 1 } ∼ B ( π kl ) ◮ A mixture model for graphs : K K � � X ij ∼ α k α l B ( π kl ) k =1 l =1 Pierre Latouche 8

π •• 6 6 π •• 5 5 7 7 3 π •• 4 4 8 8 1 2 π •• 10 π •• 9 Pierre Latouche 9

Maximum likelihood estimation ◮ Log-likelihoods of the model : ◮ Observed-data : log p ( X | α , Π ) = log { � Z p ( X , Z | α , Π ) } → K N terms ֒ ◮ Expectation Maximization (EM) algorithm requires the knowledge of p ( Z | X , α , Π ) Problem p ( Z | X , α , Π ) is not tractable (no conditional independence) Variational EM Daudin et al. (2008) Pierre Latouche 10

Model selection Criteria Since log p ( X | α , Π ) is not tractable, we cannot rely on: α , ˆ ◮ AIC = log p ( X | ˆ Π ) − C α , ˆ 2 log N ( N − 1) Π ) − C ◮ BIC = log p ( X | ˆ 2 ICL Biernacki et al. (2000) ֒ → Daudin et al. (2008) Variational Bayes EM ֒ → ILvb Latouche et al. (2012) Pierre Latouche 11

Bayesian framework ◮ Conjugate prior distributions : � � α | n 0 = { n 0 ◮ p 1 , . . . , n 0 = Dir( α ; n 0 ) K } � � kl ) , ζ 0 = ( ζ 0 Π | η 0 = ( η 0 = � ◮ p k ≤ l Beta( π kl ; η 0 kl , ζ 0 kl ) kl ) ◮ Non informative Jeffreys prior : ◮ n 0 k = 1 / 2 ◮ η 0 kl = ζ 0 kl = 1 / 2 Pierre Latouche 12

Variational Bayes EM Latouche et al. (2009) ◮ p ( Z , α , Π | X ) not tractable Decomposition log p ( X ) = L ( q ) + KL ( q ( · ) || p ( ·| X )) where � p ( X , Z , α , Π ) � � � � L ( q ) = q ( Z , α , Π ) log d α d Π q ( Z , α , Π ) Z Factorization N � q ( Z , α , Π ) = q ( α ) q ( Π ) q ( Z ) = q ( α ) q ( Π ) q ( Z i ) i =1 Pierre Latouche 13

Variational Bayes EM Latouche et al. (2009) E-step ◮ q ( Z i ) = M ( Z i ; 1 , τ i = { τ i 1 , . . . , τ iK } ) M-step ◮ q ( α ) = Dir( α ; n ) ◮ q ( Π ) = � K k ≤ l Beta( π kl ; η kl , ζ kl ) Pierre Latouche 14

A new model selection criterion : ILvb Latouche et al. (2012) ◮ log p ( X | K ) = L ( q ) + KL( ... ) ◮ After convergence, use L ( q ) as an approximation of log p ( X | K ) ILvb � � Γ( � K k ) � K k =1 n 0 k =1 Γ( n k ) IL vb = log Γ( � K k =1 n k ) � K k =1 Γ( n 0 k ) K � Γ( η 0 � N K kl + ζ 0 kl )Γ( η kl )Γ( ζ kl ) � � � − + log τ ik log τ ik Γ( η kl + ζ kl )Γ( η 0 kl )Γ( ζ 0 kl ) k ≤ l i =1 k =1 Pierre Latouche 15

Overlaps in networks Palla et al. (2006) Problem The stochastic block model (SBM) and most existing methods assume that each vertex belongs to a single class Pierre Latouche 17

Stochastic Block Model (SBM) ◮ Nowicki and Snijders (2001) ◮ Z i independent hidden variables : � � Z i ∼ M 1 , α = ( α 1 , α 2 , . . . , α K ) Pierre Latouche 18

Overlapping Stochastic Block model (OSBM) ◮ Latouche et al. (2011) ◮ Z ik independent hidden variables : K K � � α Z ik (1 − α k ) 1 − Z ik Z i ∼ B ( Z ik ; α k ) = k k =1 k =1 Pierre Latouche 18

Overlapping Stochastic Block model (OSBM) ◮ Latouche et al. (2011) ◮ X | Z edges drawn independently : � � X ij | Z i , Z j ∼ B X ij ; Π Z i , Z j ) � � ◮ Π Z i , Z j = g a Z i , Z j + V ⊺ Z j + W ∗ ◮ a Z i , Z j = Z ⊺ + Z ⊺ i W Z j i U �� bias i ↔ j i → ? ? → j ◮ g ( t ) = 1 / (1 + exp( − t )) is the logistic function Pierre Latouche 18

OSBM ◮ ˜ Z i = ( Z i , 1) ⊺ � W � U ˜ W = ◮ W ∗ V ⊺ ◮ a Z i , Z j = ˜ ⊺ i ˜ W ˜ Z Z j � � α , ˜ ◮ Parameter set : W Pierre Latouche 19

Bayesian framework ◮ Conjugate prior distributions : ◮ p ( α ) = � K k =1 Beta( α k ; η 0 k , ζ 0 k ) vec ) = N ( ˜ vec ; ˜ vec ◮ p ( ˜ 0 , S 0 ) W W W ◮ The vec operator : if � A 11 � A 12 A = , A 21 A 22 then   A 11 A 21   A vec =   A 12   A 22 Pierre Latouche 21

Bayesian framework ◮ x ⊺ A y = ( y ⊗ x ) ⊺ A vec vec ◮ In practice : set ˜ = 0 and S 0 = I W 0 β Problem p ( Z , α , ˜ W | X ) not tractable Pierre Latouche 22

q Transformation Decomposition log p ( X ) = L ( r ) + KL( r || p ) where � � p ( X | Z , ˜ W ) p ( Z | α ) p ( α ) p ( ˜ W ) � � r ( Z , α , ˜ d α d ˜ L ( r ) = W ) log W r ( Z , α , ˜ W ) Z Lower bound log p ( X ) ≥ L ( r ) Problem L ( r ) has a too complex form ֒ → no variational Bayes EM algorithm ?? Pierre Latouche 23

Local bound ◮ Use the bound of Jaakkola and Jordan (2000) for Bayesian logistic regression log p ( X | Z , ˜ W ) ≥ log h ( Z , ˜ W , ξ ) , ∀ ξ ∈ R N × N where N � ( X ij − 1 2) a Z i , Z j − ξ ij � log h ( Z , ˜ W , ξ ) = 2 + log g ( ξ ij ) i � = j � − λ ( ξ ij )( a 2 Z i , Z j − ξ 2 ij ) and λ ( ξ ) = 1 4 ξ tanh( ξ 2) = 1 g ( ξ ) − 1 � � 2 ξ 2 Pierre Latouche 24

ξ Transformation Lower Bound �� p ( X | Z , ˜ W ) p ( Z | α ) p ( α ) p ( ˜ W ) d α d ˜ log p ( X ) = log W Z ≥ L ( ξ ) where �� h ( Z , ˜ W , ξ ) p ( Z | α ) p ( α ) p ( ˜ W ) d α d ˜ L ( ξ ) = log W Z Pierre Latouche 25

Variational methods for overlapping and non-overlapping stochastic - PowerPoint PPT Presentation

Variational methods for overlapping and non-overlapping stochastic block models Pierre Latouche Universit e Paris 1 Panth eon-Sorbonne Laboratoire SAMM MSTGA 2012 Pierre Latouche 1 Contents Introduction Real networks Graph

http://cs224w.stanford.edu Non overlapping vs overlapping communities Non overlapping

An Introduction to An Introduction to Variational Variational Methods for Graphical Models

Variational Auto-encoders 2 VARIATIONAL AUTO-ENCODERS INTRODUCTION VARIATIONAL AUTO-ENCODERS

Variational methods for effective dynamics Robert L. Jerrard Department of Mathematics

Ego-Splitting Framework: from Non-Overlapping to Overlapping Clusters. Alessandro Epasto

Deep Variational Inference FLARE Reading Group Presentation Wesley Tansey 9/28/2016 What is

Variational Inference for GPs: Presenters Group1: Stochastic variational inference. Slides 2 - 28

Rejection Sampling Variational Inference Karan Grewal CSC2547 / STA4273 Overview Variational

Global convergence rates of some multilevel methods for variational and quasi-variational

A start of Variational Methods for ERGM Ranran Wang, UW MURI-UCI April 24, 2009 A start of

Snakes: Snakes: Snakes: Snakes: Active Contours Active Contours Active Contours Active

Variational methods for photometric 3D-reconstruction Yvain Q UAU CNRS, GREYC laboratory,

Variational Laplace Autoencoders Yookoon Park, Chris Dongjoo Kim and Gunhee Kim Vision and

Probabilistic & Unsupervised Learning Factored Variational Approximations and Variational

Probabilistic & Unsupervised Learning Factored Variational Approximations and Variational

Handling of Position Errors in Variational and Hybrid Ensemble/Variational Data Assimilation Using

Jure Leskovec Machine Learning Department Carnegie Mellon University Currently: Soon: Today:

The ground truth about metadata and community detection in 8 8 7 7 8 8 networks 5 5 0 0

CSE 158 Lecture 13 Web Mining and Recommender Systems Algorithms for advertising

A Blogging Application for Smart Spaces Diana Zaiceva, Ivan Galov, Dmitry Korzun Petrozavodsk

Internet Identity Initiatives Internet Identity Initiatives RL Bob Morgan University of

Please feel free to include these slides in your own material, or modify them as you see fit. If

Dynamics of Social Networks Hamed Haddadi Hamed.haddadi@cl.cam.ac.uk 11th November 2010 Mphil

OPTIMIZE YOUR SALES & MARKETING FUNNEL Brand Advertising Events Email Marketing PR

Variational methods for overlapping and non-overlapping stochastic - PowerPoint PPT Presentation

Variational methods for overlapping and non-overlapping stochastic block models Pierre Latouche Universit e Paris 1 Panth eon-Sorbonne Laboratoire SAMM MSTGA 2012 Pierre Latouche 1 Contents Introduction Real networks Graph

http://cs224w.stanford.edu Non overlapping vs overlapping communities Non overlapping

An Introduction to An Introduction to Variational Variational Methods for Graphical Models

Variational Auto-encoders 2 VARIATIONAL AUTO-ENCODERS INTRODUCTION VARIATIONAL AUTO-ENCODERS

Variational methods for effective dynamics Robert L. Jerrard Department of Mathematics

Ego-Splitting Framework: from Non-Overlapping to Overlapping Clusters. Alessandro Epasto

Deep Variational Inference FLARE Reading Group Presentation Wesley Tansey 9/28/2016 What is

Variational Inference for GPs: Presenters Group1: Stochastic variational inference. Slides 2 - 28

Rejection Sampling Variational Inference Karan Grewal CSC2547 / STA4273 Overview Variational

Global convergence rates of some multilevel methods for variational and quasi-variational

A start of Variational Methods for ERGM Ranran Wang, UW MURI-UCI April 24, 2009 A start of

Snakes: Snakes: Snakes: Snakes: Active Contours Active Contours Active Contours Active

Variational methods for photometric 3D-reconstruction Yvain Q UAU CNRS, GREYC laboratory,

Variational Laplace Autoencoders Yookoon Park, Chris Dongjoo Kim and Gunhee Kim Vision and

Probabilistic &amp; Unsupervised Learning Factored Variational Approximations and Variational

Probabilistic &amp; Unsupervised Learning Factored Variational Approximations and Variational

Handling of Position Errors in Variational and Hybrid Ensemble/Variational Data Assimilation Using

Jure Leskovec Machine Learning Department Carnegie Mellon University Currently: Soon: Today:

The ground truth about metadata and community detection in 8 8 7 7 8 8 networks 5 5 0 0

CSE 158 Lecture 13 Web Mining and Recommender Systems Algorithms for advertising

A Blogging Application for Smart Spaces Diana Zaiceva, Ivan Galov, Dmitry Korzun Petrozavodsk

Internet Identity Initiatives Internet Identity Initiatives RL Bob Morgan University of

Please feel free to include these slides in your own material, or modify them as you see fit. If

Dynamics of Social Networks Hamed Haddadi Hamed.haddadi@cl.cam.ac.uk 11th November 2010 Mphil

OPTIMIZE YOUR SALES &amp; MARKETING FUNNEL Brand Advertising Events Email Marketing PR

Probabilistic & Unsupervised Learning Factored Variational Approximations and Variational

Probabilistic & Unsupervised Learning Factored Variational Approximations and Variational

OPTIMIZE YOUR SALES & MARKETING FUNNEL Brand Advertising Events Email Marketing PR