recent advances in local graph clustering and the
play

Recent advances in local graph clustering and the transition to - PowerPoint PPT Presentation

Recent advances in local graph clustering and the transition to global analysis Kimon Fountoulakis @CS UWaterloo 02/07/2020 Workshop: From Local to Global Information Motivation: detection of small clusters in large and noisy graphs - Real


  1. Recent advances in local graph clustering and the transition to global analysis Kimon Fountoulakis @CS UWaterloo 02/07/2020 Workshop: From Local to Global Information

  2. Motivation: detection of small clusters in large and noisy graphs - Real large-scale graphs have rich local structure - We often have to detect small clusters in large and noisy graphs: Rather than partitioning graphs with nice structure US-Senate graph, nice bi-partition in year 1865 around the end of protein-protein interaction graph, the American civil ward color denotes similar functionality

  3. Our goals Large scale data with multiple noisy small-scale and meso-scale clusters determine the need for - new methods that are able probe graphs with billions of nodes and edges, - the running time of the new methods should depend on the size of the output instead of the size of the whole graph, - the new methods should be supported by worst- and average-case theoretical guarantees.

  4. Existing and new local graph clustering methods The vast majority of methods perform some sort of linear diffusion, i.e., PageRank. We need models that are better than simply averaging of probabilities. - As a warm-up: non-linear PageRank. - Non-linear combinatorial diffusions. - Non-linear diffusions which balance between spectral and combinatorial diffusions.

  5. Current local and global developments for local graph clustering methods Local to global Local analysis

  6. About this talk -I will mostly discuss methods, I will demonstrate theoretical results and I will present experiments that promote understanding of the methods within the available time. -For extensive experiments on real-data please check the cited papers. We literally have performed hundreds of experiments for measuring performance of local graph clustering methods.

  7. Local Graph Clustering

  8. The local graph clustering problem? A B -Definition: find set of nodes given a seed node in set A B -Set has good precision/recall w.r.t set A -The running time depends on instead of the whole graph

  9. Facebook Johns Hopkins social network: color denotes class year Students of year 2009 Data: Facebook Johns Hopkins, A. L. Traud, P. J. Mucha and M. A. Porter, Physica A, 391(16), 2012

  10. Local graph clustering: example Data: Facebook Johns Hopkins, A. L. Traud, P. J. Mucha and M. A. Porter, Physica A, 391(16), 2012

  11. Protein structure similarity: color denotes similar function Data: The MIPS mammalian protein-protein interaction database. Bioinformatics, 21(6):832-834 , 2005

  12. Local graph clustering finds 2% of the graph Data: The MIPS mammalian protein-protein interaction database. Bioinformatics, 21(6):832-834 , 2005

  13. Local graph clustering finds 1% of the graph Data: The MIPS mammalian protein-protein interaction database. Bioinformatics, 21(6):832-834 , 2005

  14. Or we might want to detect galaxies

  15. Warm-up: non-linear PageRank

  16. Some definitions G = ( V , E ) | V | = n | E | = m Graph: , , ⏟ ⏟ - edges nodes - n x n adjacency matrix: A A - An element of is equal to 1 if two nodes are connected

  17. Some definitions D = diag ( A 1 n ) 1 n - Degree matrix: , is a vector of all ones. D - Each element of shows the number of neighbors of a node - Random walk matrix: AD − 1 - Lazy random walk matrix: W = 1 2 ( I + AD − 1 ) - Graph Laplacian: L = D − A

  18. Linear diffusion: personalized PageRank α ∈ (0,1) - Let be the teleportation parameter - Consider a diffusion process where we perform lazy random walk with 1 − α α probability , and jump to a given seed node with probability : α s 1 T n + (1 − α ) W s - where is an indicator vector of the seed node and alpha is the teleportation parameter. - Simple idea: use a random walk from a seed node. The nodes with the k highest probability after steps consist a cluster.

  19. Let’s get rid off the tail - For the stationary personalized PageRank vector most of the probability mass is concentrated around the seed node. - This means that the ordered personalized PageRank vector has long tail for nodes far away from the seed node. - We can efficiently cut the tale using l1-regularized PageRank without even having to compute the long tail.

  20. Non-linear PageRank diffusion - Instead of using power method to compute the PageRank vector, we can perform a non-linear power method where we do a random walk step first and then threshold small values to zero. p k +1 = prox ρα d ∥⋅∥ 1 ( (1 − α ) Wp k + α s ) random walk step ρα d - where prox operator reduces components smaller than to zero. prox ρα d ∥⋅∥ 1 ( x ) = { x − ρα d if x ≥ ρα d otherwise 0

  21. Far stretched relation to graph neural networks Non-linear PageRank p k +1 = prox ρα d ∥⋅∥ 1 ( (1 − α ) Wp k + α s ) random walk step Graph Neural Network Layer p k +1 = ReLU ( Random Walk Matrix × Parameters × p k )

Recommend


More recommend