on consistency of community detection in networks
play

On consistency of community detection in networks Yunpeng Zhao - PowerPoint PPT Presentation

On consistency of community detection in networks Yunpeng Zhao Department of Statistics, George Mason University Joint work with: Elizaveta Levina and Ji Zhu Outline 1 Consistency of community detection criteria under degree-corrected block


  1. On consistency of community detection in networks Yunpeng Zhao Department of Statistics, George Mason University Joint work with: Elizaveta Levina and Ji Zhu

  2. Outline 1 Consistency of community detection criteria under degree-corrected block models 2 Community extraction

  3. Network data Network data appear in many fields: Social and friendship networks, citation networks World Wide Web Gene regulatory networks, food webs

  4. Definition of networks A network N = ( V , E ) : V is the set of nodes, | V | = n , E is the set of edges N is represented by its n × n adjacency matrix A : � 1 if there is an edge from node i to node j , A ij = 0 otherwise. A can be symmetric (undirected networks) or asymmetric (directed networks). We only focus on undirected networks.

  5. From a statistical point of view A network is an n × n random matrix A = [ A ij ] . One may put a probability distribution P on A . Examples of network models: Block models (Holland et al 1983, Faust & Wasserman 1992) Exponential Random Graph Models (Robins et al 2006) Latent space models (Hoff et al 2002).

  6. Statistical questions 1 Test goodness of fit (Hunter et al 2008) 2 Fitting models ( Bickel & Chen 2009, Snijders 2002) 3 Statistical inference and uncertainty assessment (Chatterjee & Diaconis 2011, Shalizi & Rinaldo 2011)

  7. Community detection An important topic: community detection Communities are cohesive groups of nodes Most common interpretation: many links within and few links between The community detection problem is typically formulated as finding a disjoint partition V = V 1 ∪···∪ V K

  8. Example: Karate club A friendship network of a karate club (Zachary 1977), split into two groups, which can be used as “ground truth”. Node size is proportional to degree. 16 4 6 11 10 5 21 17 0 12 1 3 19 7 13 30 8 2 14 15 31 32 33 9 18 28 22 27 24 20 23 29 25 26

  9. Community detection methods Existing methods can be loosely classified into three categories. Greedy algorithms: hierarchical clustering, edge removal (Girvan & Newman 2002) Optimizing a global criterion over all partitions: normalized cuts (Shi & Malik 2000), modularity (Newman 2006), extraction (Zhao et al 2011b), and many others Fitting a model for a network with communities: block models (Bickel & Chen 2009), degree-corrected block models (Karrer & Newman 2010), and others

  10. Block model Holland et al (1983) 1. Each node is independently assigned a community label c i , multinomial with parameter π = ( π 1 ,..., π K ) T . 2. Given node labels c , the edges A ij are independent Bernoulli random variables with P ( A ij = 1 ) = P c i c j , where P = [ P ab ] is a K × K symmetric matrix.

  11. Block model Holland et al (1983) 1. Each node is independently assigned a community label c i , multinomial with parameter π = ( π 1 ,..., π K ) T . 2. Given node labels c , the edges A ij are independent Bernoulli random variables with P ( A ij = 1 ) = P c i c j , where P = [ P ab ] is a K × K symmetric matrix.

  12. Block model Holland et al (1983) 1. Each node is independently assigned a community label c i , multinomial with parameter π = ( π 1 ,..., π K ) T . 2. Given node labels c , the edges A ij are independent Bernoulli random variables with P ( A ij = 1 ) = P c i c j , where P = [ P ab ] is a K × K symmetric matrix.

  13. Block model Fitting: MCMC (Snijders & Nowicki 1997), profile likelihood (Bickel & Chen 2009), or variational approach (Daudin et al 2008) The “null” model ( K = 1): the Erdos-Renyi graph (all edges form independently with probability p ) Limitation: node degrees within one community are homogeneous, which does not allow for “hubs”–nodes with very high degrees.

  14. Degree-corrected block model Karrer & Newman (2010) Generalizes the block model to allow for varying degrees within communities Each node is associated with a degree parameter θ i , and P ( A ij = 1 ) = θ i θ j P c i c j . The standard block model corresponds to θ i ≡ const . The “null” model ( K = 1): the expected degree random graph, a.k.a. configuration model (all edges form independently with P ( A ij = 1 ) ∝ θ i θ j ). Fits a number of datasets better than the block model

  15. Example: Karate club Block model With degree-correction 16 16 4 4 6 6 11 11 10 10 5 5 21 21 17 17 0 12 0 12 1 1 3 3 19 19 7 7 13 13 30 30 8 8 2 2 14 14 15 15 31 31 32 33 32 33 9 9 18 18 28 28 22 22 27 27 20 24 20 24 29 23 29 23 25 25 26 26

  16. Notation For any community label assignment e = { e 1 ,..., e n } , e i ∈ { 1 ,..., K } , define O kl = ∑ A ij I { e i = k , e j = l } , # edges between communities k and l ij O k = ∑ O kl , total degrees in community k l L = ∑ O kl , total # edges kl n k = ∑ I { e i = k } , # nodes in community k k Depend only on the data

  17. Likelihood Maximize the profile likelihood of the block model (Bickel & Chen 2009) : O kl log O kl Q BL ( e ) = ∑ n k n l kl Maximize the profile likelihood of the degree-corrected block model (Karrer & Newman 2010): O kl log O kl Q DCBL ( e ) = ∑ O k O l kl

  18. Modularity Maximize observed number of edges within communities minus expected under a null model, over all label assignments e : e Q ( e ) max Q ( e ) = ∑ [ A ij − E [ A ij ]] I ( e i = e j ) ij where E [ A ij ] is the (estimated) expectation under the null model.

  19. Modularity When the null model is Erdos-Renyi graph, E [ A ij ] = L / n 2 and Q ( e ) becomes ( O kk − n 2 Q ERM ( e ) = ∑ k n 2 L ) . k When the null model is the expected degree random graph, E [ A ij ] = k i k j / L and Q ( e ) becomes ( O kk − O 2 Q NGM ( e ) = ∑ k L ) . k This is the well-known Newman-Girvan Modularity.

  20. Community detection criteria Block model Degree correction ∑ k ( O kk − n 2 ∑ k ( O kk − O 2 n 2 L ) L 2 L ) k k Modularity ∑ kl O kl log O kl ∑ kl O kl log O kl Likelihood n k n l O k O l The block model measures “community size” by the number of nodes, and the degree-corrected block model by the number of edges. Modularity encourages the number of edges within communities larger than the average.

  21. Consistency of label assignments Strong consistency (Bickel & Chen 2009): A label c is strongly consistent if estimator ˆ c = c ] → 1 , as n → ∞ . P [ˆ c is weakly consistent Weak consistency: A label estimator ˆ if �� � � n 1 ∑ ∀ ε > 0 , P c i � = c i ) < ε → 1 , as n → ∞ . 1 (ˆ n i = 1

  22. Consistency of label assignments Parametrize the probability matrix by P n = ρ n P , where ρ n = P ( A ij = 1 ) is the probability of an edge, and λ n = n ρ n is the average expected degree of the graph. λ n log n → ∞ . Strong consistency assumes that Weak consistency assumes that λ n → ∞ .

  23. A variant of the degree-corrected block model Our interpretation of Karrer & Newman Given node labels c , each node is independently assigned a discrete “degree variable” θ i , with E [ θ i ] = 1 for identifiability. Given c and θ , the edges A ij are independent Bernoulli random variables with P ( A ij = 1 | c , θ ) = θ i θ j P c i c j .

  24. A general theorem on consistency under degree-corrected block models Theorem (Zhao, Levina, and Zhu 2011a) For any criterion Q of the form � O �� � n 1 n ,..., n K Q ( e ) = F n 2 , , n if F satisfies some regularity conditions and its population ver- sion is uniquely maximized by the true partition, then Q is con- sistent under degree-corrected block models.

  25. Notation For simplicity, assume θ i in the degree-corrected block model is discrete, P ( c i = k , θ i = d m ) = Π km .

  26. Notation For simplicity, assume θ i in the degree-corrected block model is discrete, P ( c i = k , θ i = d m ) = Π km . For any k , define ˜ π k = ∑ m d m Π km . (For the standard block π k = π k .) model, ˜ k P kk ′ π k ˜ π ′ P 0 = ∑ kk ′ ˜ k P kk ′ , � W kk ′ = ˜ Define ˜ π k ˜ π ′ , and P 0 ˜ W 1 ) T . E = � W − ( � W 1 )( � ˜

  27. Consistency of modularity Theorem (Zhao, Levina, and Zhu 2011a) Newman-Girvan modularity is consistent under the degree-corrected block model with the parameter constraint for all k � = k ′ . E kk > 0 , ˜ ˜ E kk ′ < 0 When K = 2, the condition can be simplified as P 11 P 22 > P 2 12 .

  28. Consistency of modularity Theorem (Zhao, Levina, and Zhu 2011a) Newman-Girvan modularity is consistent under the degree-corrected block model with the parameter constraint for all k � = k ′ . E kk > 0 , ˜ ˜ E kk ′ < 0 When K = 2, the condition can be simplified as P 11 P 22 > P 2 12 . Theorem (Zhao, Levina, and Zhu 2011a) Erdos-Renyi modularity is consistent under the block model with the parameter constraint P kk > P 0 , P kk ′ < P 0 for all k � = k ′ , where P 0 = ∑ kk ′ π k π k ′ P kk ′ .

  29. Consistency of likelihood Theorem (Bickel & Chen 2009) Block model likelihood is consistent under the block model.

  30. Consistency of likelihood Theorem (Bickel & Chen 2009) Block model likelihood is consistent under the block model. Theorem (Zhao, Levina, and Zhu 2011a) Degree-corrected block model likelihood is consistent under both the block model and the degree-corrected block model.

  31. Summary of consistency results Likelihoods are always consistent under their assumed model

Recommend


More recommend