Identifying and Characterizing Nodes Important to Community Using the Spectrum of the graph
Citation => Published in volume 6 of the journal PLoS ONE’s November 2011 edition => Authors: Yang Wang, Zengru Di, Ying Fan all from the department of Systems Science, Beijing Normal University, China
Overview • Networks represent the interaction structure among components in a wide range of real complex systems • Exploring network communities • reveals the network • provides new aspect of dynamic processes • uncovers the relationship among the nodes • This paper devices a new approach to identify the important nodes without knowing the exact partition of the network
Construction • Based on the implication that the Spectrum of the adjacency matrix gives indication of community structure in network • Distinguishes the critical nodes as • community core - eigenvalues • bridge – graph Laplacian • Experiments on synthetic and real networks
Definitions • Eigen vector: A non-zero column vector v is a eigenvector of a matrix A iff there exists a number λ such that Av= λ v. • Eigen value: The number λ is called the eigen value corresponding to that eigenvector v.
Identifying important nodes • Proposed Method: A Centrality Metric based on the spectrum of Adjacency Matrix • Definitions: Binary network G=(V,E) • |V| = m, |E| = n • Eigenvectors are orthogonal and normalized • Objective Function : • Maximize eigenvalues ( λ ) using perturbation theory • where P k is the relative change in the c largest eigenvalues as node k is removed
Centrality Metric • where V ik is the k th element of v i and P k lies in the interval [0,1]. If a node k is important to the community structure, P k will be large • In a network with n nodes and c communities, • To scale the index to 1, I k = P k / c where • If the index I is large than 1/n , it is an important node
Distinguish two kinds of important nodes • RatioCut Technique: | C i | is the size of the community C i . Ratio cut problem reduces to Mincut problem when the sizes of the communities are almost the same. • Case 1: c = 2 Index vector s with N elements
Continued • RatioCut function becomes:: L is the graph Laplacian defined as L ij =-A ij for i≠j and L ii =k i where k i is the degree of node i . Also there are two constraints on s
Continued • The partition problem can be devised as the following minimization problem • Solution to this problem is found to be the eigenvector corresponding to the second-smallest eigenvalue of L , denoted by u 2 • Community core nodes: | u i 2 | is relatively large • bridge nodes: | u i 2 | is near zero
Continued • Case 2: c > 2 A new n x c -index matrix S is defined as s i,j = 1 /√| C j | if vertex i є C j , else 0 RatioCut= Tr( S T LS ). L is a symmetric matrix which can be written as L = UDU T where U is the eigenvector of L and D is the diagonal matrix of eigenvalues D ii = β i RatioCut can be written as
Continued • Defining vertex vector of i as r i and let [ r i ] j = U ij the equates can be written as given that the network has almost equal sized communities. [ G k : set of vertices in community k ] Minimizing the RatioCut equates to the maximization problem Where p is a parameter. For clear community structure, p=c can be chosen.
Continued • If the community structure is quire clear, vertex vector magnitude | r i | in the first p terms give the identity of bridge nodes, denoted by b if the index b of a given vertex is near zero, it indicates that the presence of that node results in a large RatioCut and hence it is a bridge node.
Continued • In order to scale the index to 1, a new term is defined as w k where w k = b k / c • Considering an ER random network with n nodes as a null model, index of each node would be 1/n • If w-score of any node is smaller than 1/n, this vertex has nearly equal membership in more than one community and hence it is a bridge node.
Pros of this approach • Less computational cost O(mn)
Experimental Results • Synthetic Network The centrality metric I predicts node 1, 8 and 15 as important nodes. W-score identifies 15 as the bridge node ΔH index also gives correct prediction, however requires significant computational cost M can identify cores only
Experimental Results (contd.) Real World Network Zachary’s karate club (social network) with c=2 The centrality metric I identifies the community core: node 1 and node 34 (administrator and Instructor). The w-score identifies node 3 as the overlapping node i.e. the bridge between these two communities
Zachary’s karate club visualization The diameter of each vertex is proportional to I Large diameter indicates important vertex Color of each vertex is related to the index w-score Red vertices behave like “overlapping” nodes or bridges Yellow vertices lie inside their own communities
Word Association Network Four communities: Intelligence, Astronomy, Light, Colors word Bright is related to all of them. Likewise Sun Community critical nodes: Bright, Sun, Moon, Smart Community cores: Moon and Smart Bridges: Bright and Sun
Scientist Collaboration Network Network represents scientists whose research centers on the properties of networks of one kind or another Edges placed between scientists who have published one paper together Centrality metric I identifies the group leader: Newman, Boccaletti, Barabasi w-score is not large as they have collaboration between scientists outside their own communities
C. Elegans neural network Network is divided into 3 communities (sensory, interneuron, motor neuron) Each node represents a neuron and each edge represents a synaptic connection between neurons high centrality metric I : important interneurons ( AVA, AVB , … ) w-score is very small because most of the important nodes act as bridge since the connection between communities is more necessary
Applications in weighted networks Artificial Network Adjacency matrix for undirected network is real and symmetric Works well in small artificial network 10 nodes with two communities Higher weight means closer relationship between vertices 4 and 9 are the core of the communities 11 is the bridge between communities
Applications in weighted networks (Contd.) Real Network: SFI (Santa Fe Collaboration) SFI collaboration network Vertices 2, 12 and 24 are group leaders (community cores) Vertices 1, 9 and 11 are bridges The result is different from the corresponding unweighted network edge weight might affect the result s
Limitations In case of many heterogeneous cluster size, the community identification fails This limitation is a result of the adjacency matrix property N small 2 < N large , small communities cannot be detected δ = N large / N small I cannot identify the important nodes in the small communities when the communities are in very different size
Conclusion/Observation Proposed method works well in many cases without knowing the exact community structure The number of communities must be known, although This paper does not say anything about the effect of removing/adding any node The underlying community structure change is not taken into consideration The directed case is not considered which is subject to future research The identification of such key nodes is important and could potentially be used to identify the organizer of the community in social networks, to develop an immunization strategy in an epidemic process, to identify key nodes in biological networks
Recommend
More recommend