  1. Identifying and Characterizing Nodes Important to Community Using the Spectrum of the graph

  2. Citation => Published in volume 6 of the journal PLoS ONE’s November 2011 edition => Authors: Yang Wang, Zengru Di, Ying Fan all from the department of Systems Science, Beijing Normal University, China

  3. Overview • Networks represent the interaction structure among components in a wide range of real complex systems • Exploring network communities • reveals the network • provides new aspect of dynamic processes • uncovers the relationship among the nodes • This paper devices a new approach to identify the important nodes without knowing the exact partition of the network

  4. Construction • Based on the implication that the Spectrum of the adjacency matrix gives indication of community structure in network • Distinguishes the critical nodes as • community core - eigenvalues • bridge – graph Laplacian • Experiments on synthetic and real networks

  5. Definitions • Eigen vector: A non-zero column vector v is a eigenvector of a matrix A iff there exists a number λ such that Av= λ v. • Eigen value: The number λ is called the eigen value corresponding to that eigenvector v.

  6. Identifying important nodes • Proposed Method: A Centrality Metric based on the spectrum of Adjacency Matrix • Definitions: Binary network G=(V,E) • |V| = m, |E| = n • Eigenvectors are orthogonal and normalized • Objective Function : • Maximize eigenvalues ( λ ) using perturbation theory • where P k is the relative change in the c largest eigenvalues as node k is removed

  7. Centrality Metric • where V ik is the k th element of v i and P k lies in the interval [0,1]. If a node k is important to the community structure, P k will be large • In a network with n nodes and c communities, • To scale the index to 1, I k = P k / c where • If the index I is large than 1/n , it is an important node

  8. Distinguish two kinds of important nodes • RatioCut Technique: | C i | is the size of the community C i . Ratio cut problem reduces to Mincut problem when the sizes of the communities are almost the same. • Case 1: c = 2 Index vector s with N elements

  9. Continued • RatioCut function becomes:: L is the graph Laplacian defined as L ij =-A ij for i≠j and L ii =k i where k i is the degree of node i . Also there are two constraints on s

  10. Continued • The partition problem can be devised as the following minimization problem • Solution to this problem is found to be the eigenvector corresponding to the second-smallest eigenvalue of L , denoted by u 2 • Community core nodes: | u i 2 | is relatively large • bridge nodes: | u i 2 | is near zero

  11. Continued • Case 2: c > 2 A new n x c -index matrix S is defined as s i,j = 1 /√| C j | if vertex i є C j , else 0 RatioCut= Tr( S T LS ). L is a symmetric matrix which can be written as L = UDU T where U is the eigenvector of L and D is the diagonal matrix of eigenvalues D ii = β i RatioCut can be written as

  12. Continued • Defining vertex vector of i as r i and let [ r i ] j = U ij the equates can be written as given that the network has almost equal sized communities. [ G k : set of vertices in community k ] Minimizing the RatioCut equates to the maximization problem Where p is a parameter. For clear community structure, p=c can be chosen.

  13. Continued • If the community structure is quire clear, vertex vector magnitude | r i | in the first p terms give the identity of bridge nodes, denoted by b if the index b of a given vertex is near zero, it indicates that the presence of that node results in a large RatioCut and hence it is a bridge node.

  14. Continued • In order to scale the index to 1, a new term is defined as w k where w k = b k / c • Considering an ER random network with n nodes as a null model, index of each node would be 1/n • If w-score of any node is smaller than 1/n, this vertex has nearly equal membership in more than one community and hence it is a bridge node.

  15. Pros of this approach • Less computational cost O(mn)

  16. Experimental Results • Synthetic Network  The centrality metric I predicts node 1, 8 and 15 as important nodes. W-score identifies 15 as the bridge node  ΔH index also gives correct prediction, however requires significant computational cost  M can identify cores only

  17. Experimental Results (contd.) Real World Network Zachary’s karate club (social network) with c=2  The centrality metric I identifies the community core: node 1 and node 34 (administrator and Instructor).  The w-score identifies node 3 as the overlapping node i.e. the bridge between these two communities

  18. Zachary’s karate club visualization  The diameter of each vertex is proportional to I  Large diameter indicates important vertex  Color of each vertex is related to the index w-score  Red vertices behave like “overlapping” nodes or bridges  Yellow vertices lie inside their own communities

  19. Word Association Network  Four communities: Intelligence, Astronomy, Light, Colors  word Bright is related to all of them. Likewise Sun  Community critical nodes: Bright, Sun, Moon, Smart  Community cores: Moon and Smart  Bridges: Bright and Sun

  20. Scientist Collaboration Network  Network represents scientists whose research centers on the properties of networks of one kind or another  Edges placed between scientists who have published one paper together  Centrality metric I identifies the group leader: Newman, Boccaletti, Barabasi  w-score is not large as they have collaboration between scientists outside their own communities

  21. C. Elegans neural network  Network is divided into 3 communities (sensory, interneuron, motor neuron)  Each node represents a neuron and each edge represents a synaptic connection between neurons  high centrality metric I : important interneurons ( AVA, AVB , … )  w-score is very small because most of the important nodes act as bridge since the connection between communities is more necessary

  22. Applications in weighted networks Artificial Network  Adjacency matrix for undirected network is real and symmetric  Works well in small artificial network  10 nodes with two communities  Higher weight means closer relationship between vertices  4 and 9 are the core of the communities  11 is the bridge between communities

  23. Applications in weighted networks (Contd.) Real Network: SFI (Santa Fe Collaboration)  SFI collaboration network  Vertices 2, 12 and 24 are group leaders (community cores)  Vertices 1, 9 and 11 are bridges  The result is different from the corresponding unweighted network  edge weight might affect the result s

  24. Limitations  In case of many heterogeneous cluster size, the community identification fails  This limitation is a result of the adjacency matrix property  N small 2 < N large , small communities cannot be detected  δ = N large / N small  I cannot identify the important nodes in the small communities when the communities are in very different size

  25. Conclusion/Observation  Proposed method works well in many cases without knowing the exact community structure  The number of communities must be known, although  This paper does not say anything about the effect of removing/adding any node  The underlying community structure change is not taken into consideration  The directed case is not considered which is subject to future research  The identification of such key nodes is important and could potentially be used  to identify the organizer of the community in social networks,  to develop an immunization strategy in an epidemic process,  to identify key nodes in biological networks


