A multilevel approach for overlapping community detection Alan Valejo, Jorge Valverde-Rebaza and Alneu de Andrade Lopes Department of Computer Science ICMC, University of São Paulo C.P. 668, CEP 13560-970, São Carlos, SP, Brazil {alan,jvalverr,alneu}@icmc.usp.br October, 2014
Outline 1. Introduction 2. Multilevel overlapping community detection 3. Experiments 4. Conclusion and Future Work
Introduction
Introduction Multilevel overlapping community detection Experiments Conclusion and Future Work Introduction Graph partition techniques aim to divide the set of vertices of a graph into k disjoint partitions � Social network � Biological network � Information network � Technology network • Vertices belonging to the same partitions share common properties and have similar roles • Graph partitioning is useful to understand the topological structure and dynamic processes of networks Valejo et al. 1 / 18
Introduction Multilevel overlapping community detection Experiments Conclusion and Future Work Introduction Many real world networks have overlapping community structure, i.e. a vertex belongs to one or more communities Figure: In (a), the network is partitioned into disjoint communities. In (b), the network have overlapping communities. The black vertices belong to more than one community. Valejo et al. 2 / 18
Introduction Multilevel overlapping community detection Experiments Conclusion and Future Work For instance: • In social networks, naturally, users create relationships with others from various communities, such as family, friends, colleagues, etc [Reid et al., 2013]. • In addition, online social network users may belong to many groups [Valverde-Rebaza and Lopes, 2014]. • This also occurs in other types of complex networks, such as biological networks, where a large fraction of proteins belong to many complex [Gavin et al., 2006]. Valejo et al. 3 / 18
Introduction Multilevel overlapping community detection Experiments Conclusion and Future Work Introduction The graph partitioning problem is NP -complete • The identification of an optimal solution is a computationally expensive task • Infeasible for large-scale networks Big Data � Facebook, Web networks, Biological, Biomedical, ... Valejo et al. 4 / 18
Introduction Multilevel overlapping community detection Experiments Conclusion and Future Work Introduction Algorithm Reference Complexity O ( n 2 ) CPM [Palla et al., 2005] O ( n 2 ) LFM [Lancichinetti et al., 2009] O ( deg 2 HCL [Ahn et al., 2010] max n ) O ( m 2 ) Game [Chen et al., 2010] O ( nk 2 ) iLCD [Cazabet et al., 2010] O ( n 2 ) OSLOM [Lancichinetti et al., 2011] O ( cn 2 ) NMF [Psorakis et al., 2011] O ( ln 2 ) UEOC [Jin et al., 2011] O ( n 2 ) CIS [Kelley et al., 2012] Table: Algorithms for overlapping community detection with their respective computational complexity. Adapted from [Xie et al., 2013]. Most of the algorithms in the literature have good accuracy, but these have a high computational cost, prohibitive to address large-scale problems Valejo et al. 5 / 18
Introduction Multilevel overlapping community detection Experiments Conclusion and Future Work • Recently, extensive researches emerged on multilevel strategies for partitioning large-scale networks (MLP) [Bichot, 2013] • However, this strategy has not been explored in the overlapping communities context We propose a multilevel approach to overlapping communities detection context Valejo et al. 6 / 18
Proposal Multilevel overlapping community detection
Introduction Multilevel overlapping community detection Experiments Conclusion and Future Work Proposal Figure: The multilevel overlapping community detection scheme. Valejo et al. 7 / 18
Introduction Multilevel overlapping community detection Experiments Conclusion and Future Work Coarsening Phase G i G i +1 Figure: Coarsening graph process uses the matching concept. the weight of all edges is 1. The rf = 0 . 5, thus, size o matching = number of vertices x 0 . 5 • The reduction factor rf limits the number of pairs of vertices merged • When rf = 0 . 5, the number of vertices in the graph is reduced to half Valejo et al. 8 / 18
Introduction Multilevel overlapping community detection Experiments Conclusion and Future Work Initial Overlapping Partitioning Phase Computes the initial partition C in the coarser graph G N We adapted two overlapping community detection algorithms: • Clique Percolation Method (CPM) [Palla et al., 2005] • Hierarchical Link Clustering (HCL) [Ahn et al., 2010] We named them as CPM-MLP and HCL-MLP, respectively Valejo et al. 9 / 18
Introduction Multilevel overlapping community detection Experiments Conclusion and Future Work Overlapping Uncoarsening Phase G i G i − 1 Figure: The uncoarsening process. Dashed ellipses represent communities. Black vertices belong to more than one community. Valejo et al. 10 / 18
Introduction Multilevel overlapping community detection Experiments Conclusion and Future Work • Metrics • Sensitivity � truth overlapping vertices • Specificity � truth non-overlapping ones • Accuracy � weighted average of Sensitivity and Specificity • Modularity � measure inter- and intra-community quality • Execution time (in seconds) • Our multilevel algorithms (CPM-MLP and HCL-MLP) have been configured to perform three levels with reduction factor of 0 . 1, 0 . 2, 0 . 3, 0 . 4 and 0 . 5 • We carried out experiments in two popular real world networks: • Facebook (social network) • Yeast (biological network) Valejo et al. 11 / 18
Introduction Multilevel overlapping community detection Experiments Conclusion and Future Work Facebook ego-networks Method Sensitivity Specificity Accuracy Modularity time (s) CPM 0.557 0.639 0.595 0.179 310.179 CPM-MLP 0.448 0.779 0.569 0.171 5.201 HCL 0.302 0.801 0.439 0.166 68.559 HCL-MLP 0.318 0.917 0.472 0.211 0.904 0.22 300 0.60 250 HCL 0.20 CPM 200 modularity accuracy HCL time (s) 0.50 CPM 150 HCL 0.18 CPM 100 0.40 50 0.16 0.30 0 0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5 reduction factor by three levels reduction factor by three levels reduction factor by three levels Valejo et al. 12 / 18
Introduction Multilevel overlapping community detection Experiments Conclusion and Future Work Yeast protein complexes Method Sensitivity Specificity Accuracy Modularity time (s) CPM 0.606 0.586 0.596 0.438 7.09 CPM-MLP 0.809 0.444 0.627 0.593 0.19 HCL 0.419 0.658 0.538 0.497 8.26 HCL-MLP 0.678 0.667 0.672 0.642 0.30 0.65 8 0.65 0.60 0.60 6 modularity 0.55 accuracy time (s) 0.55 4 HCL 0.50 HCL HCL CPM 0.50 CPM CPM 2 0.45 0.45 0 0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5 reduction factor by three levels reduction factor by three levels reduction factor by three levels Valejo et al. 13 / 18
Conclusion
Introduction Multilevel overlapping community detection Experiments Conclusion and Future Work Conclusion • We propose a multilevel approach to overlapping communities detection context • The strategy proposed here is the first employing multilevel strategy in the overlapping communities context • The definition of multilevel strategy in overlapping context allows to use computationally expensive algorithms in large-scale applications without significant impact on general performance. • The application to a real network suggests that our approach consistently produces better partitions than those produced by single-level approach substantially faster. Valejo et al. 14 / 18
Introduction Multilevel overlapping community detection Experiments Conclusion and Future Work Future Work • In uncoarsening phase it is possible by using refinement algorithms to improve the solution quality Valejo et al. 15 / 18
References I Ahn, Y.-Y., Bagrow, J. P., and Lehmann, S. (2010). Link communities reveal multiscale complexity in networks. Nature , 466:761–764. Bichot, C.-E. (2013). A Partitioning Requiring Rapidity and Quality: The Multilevel Method and Partitions Refinement Algorithms , pages 27–63. John Wiley & Sons, Inc. Cazabet, R., Amblard, F., and Hanachi, C. (2010). Detection of overlapping communities in dynamical social networks. In Social Computing (SocialCom), 2010 IEEE Second International Conference on , pages 309–314. Chen, W., Liu, Z., Sun, X., and Wang, Y. (2010). A game-theoretic framework to identify overlapping communities in social networks. Data Mining and Knowledge Discovery , 21(2):224–240.
Recommend
More recommend