SLIDE 1 Recent advances in local graph clustering and the transition to global analysis
Kimon Fountoulakis @CS UWaterloo 02/07/2020 Workshop: From Local to Global Information
SLIDE 2 Motivation: detection of small clusters in large and noisy graphs
- Real large-scale graphs have rich local structure
- We often have to detect small clusters in large and
noisy graphs: Rather than partitioning graphs with nice structure
protein-protein interaction graph, color denotes similar functionality US-Senate graph, nice bi-partition in year 1865 around the end of the American civil ward
SLIDE 3 Our goals
- new methods that are able probe graphs with billions of nodes and
edges,
- the running time of the new methods should depend on the size of
the output instead of the size of the whole graph,
- the new methods should be supported by worst- and average-case
theoretical guarantees. Large scale data with multiple noisy small-scale and meso-scale clusters determine the need for
SLIDE 4 Existing and new local graph clustering methods
- As a warm-up: non-linear PageRank.
- Non-linear combinatorial diffusions.
- Non-linear diffusions which balance between spectral and
combinatorial diffusions. The vast majority of methods perform some sort of linear diffusion, i.e., PageRank. We need models that are better than simply averaging of probabilities.
SLIDE 5
Current local and global developments for local graph clustering methods
Local analysis Local to global
SLIDE 6 About this talk
- I will mostly discuss methods, I will demonstrate theoretical results
and I will present experiments that promote understanding of the methods within the available time.
- For extensive experiments on real-data please check the cited
- papers. We literally have performed hundreds of experiments for
measuring performance of local graph clustering methods.
SLIDE 7
Local Graph Clustering
SLIDE 8 The local graph clustering problem?
- Definition: find set of nodes given a seed node in set
- Set has good precision/recall w.r.t set
- The running time depends on instead of the whole graph
A B A B A
SLIDE 9 Data: Facebook Johns Hopkins, A. L. Traud, P. J. Mucha and M. A. Porter, Physica A, 391(16), 2012
Facebook Johns Hopkins social network: color denotes class year
Students of year 2009
SLIDE 10 Local graph clustering: example
Data: Facebook Johns Hopkins, A. L. Traud, P. J. Mucha and M. A. Porter, Physica A, 391(16), 2012
SLIDE 11 Data: The MIPS mammalian protein-protein interaction database. Bioinformatics, 21(6):832-834, 2005
Protein structure similarity: color denotes similar function
SLIDE 12 Data: The MIPS mammalian protein-protein interaction database. Bioinformatics, 21(6):832-834, 2005
Local graph clustering finds 2% of the graph
SLIDE 13 Data: The MIPS mammalian protein-protein interaction database. Bioinformatics, 21(6):832-834, 2005
Local graph clustering finds 1% of the graph
SLIDE 14
Or we might want to detect galaxies
SLIDE 15
Warm-up: non-linear PageRank
SLIDE 16 Some definitions
- n x n adjacency matrix: A
- Graph:
, ,
G = ( V
⏟ nodes
, E
⏟ edges
) |V| = n |E| = m
- An element of is equal to 1 if two nodes are connected
A
SLIDE 17 Some definitions
- Lazy random walk matrix: W = 1
2 (I + AD−1)
- Random walk matrix: AD−1
- Each element of
shows the number of neighbors of a node
D
- Graph Laplacian: L = D − A
- Degree matrix:
, is a vector of all ones.
D = diag(A1n) 1n
SLIDE 18
- Consider a diffusion process where we perform lazy random walk with
probability , and jump to a given seed node with probability :
1 − α α
- where is an indicator vector of the seed node and alpha is the teleportation
parameter.
s
- Simple idea: use a random walk from a seed node. The nodes with the
highest probability after steps consist a cluster.
k
be the teleportation parameter
α ∈ (0,1)
αs1T
n + (1 − α)W
Linear diffusion: personalized PageRank
SLIDE 19 Let’s get rid off the tail
- For the stationary personalized PageRank vector most of the
probability mass is concentrated around the seed node.
- This means that the ordered personalized PageRank vector has long
tail for nodes far away from the seed node.
- We can efficiently cut the tale using l1-regularized PageRank without
even having to compute the long tail.
SLIDE 20 Non-linear PageRank diffusion
- Instead of using power method to compute the PageRank vector, we
can perform a non-linear power method where we do a random walk step first and then threshold small values to zero.
pk+1 = proxραd∥⋅∥1( (1 − α)Wpk + αs random walk step )
proxραd∥⋅∥1(x) = { x − ραd if x ≥ ραd
- therwise
- where
- perator reduces components smaller than
to zero. prox
ραd
SLIDE 21
Far stretched relation to graph neural networks
pk+1 = proxραd∥⋅∥1( (1 − α)Wpk + αs random walk step )
pk+1 = ReLU(Random Walk Matrix × Parameters × pk) Non-linear PageRank Graph Neural Network Layer
SLIDE 22 L1-regularized PageRank
where Q = αD + 1 − α
2 L
<latexit sha1_base64="JAe47C3ZiYMhlyhW3tdBkc0CfY=">ACDHicbVDLSgMxFL1TX7W+qi7dBIsgiGWmCLoRirpw4aIF+4BOKZk04ZmMkOSEcowH+DGX3HjQhG3foA7/8a0nYW2HgczjmXm3u8iDOlbfvbyi0tr6yu5dcLG5tb2zvF3b2mCmNJaIOEPJRtDyvKmaANzTSn7UhSHictrzR9cRvPVCpWCju9Ti3QAPBPMZwdpIvWKpji6Ri3k0xOgGoRPk+hKTxDmdaWlSe9Myi7bU6BF4mSkBlqveKX2w9JHFChCcdKdRw70t0ES80Ip2nBjRWNMBnhAe0YKnBAVTeZHpOiI6P0kR9K84RGU/X3RIDpcaBZ5IB1kM1703E/7xOrP2LbsJEFGsqyGyRH3OkQzRpBvWZpETzsSGYSGb+isgQmza06a9gSnDmT14kzUrZsctO/axUvcrqyMBHMIxOHAOVbiFGjSAwCM8wyu8WU/Wi/VufcyiOSub2Yc/sD5/AFsWmU4=</latexit><latexit sha1_base64="JAe47C3ZiYMhlyhW3tdBkc0CfY=">ACDHicbVDLSgMxFL1TX7W+qi7dBIsgiGWmCLoRirpw4aIF+4BOKZk04ZmMkOSEcowH+DGX3HjQhG3foA7/8a0nYW2HgczjmXm3u8iDOlbfvbyi0tr6yu5dcLG5tb2zvF3b2mCmNJaIOEPJRtDyvKmaANzTSn7UhSHictrzR9cRvPVCpWCju9Ti3QAPBPMZwdpIvWKpji6Ri3k0xOgGoRPk+hKTxDmdaWlSe9Myi7bU6BF4mSkBlqveKX2w9JHFChCcdKdRw70t0ES80Ip2nBjRWNMBnhAe0YKnBAVTeZHpOiI6P0kR9K84RGU/X3RIDpcaBZ5IB1kM1703E/7xOrP2LbsJEFGsqyGyRH3OkQzRpBvWZpETzsSGYSGb+isgQmza06a9gSnDmT14kzUrZsctO/axUvcrqyMBHMIxOHAOVbiFGjSAwCM8wyu8WU/Wi/VufcyiOSub2Yc/sD5/AFsWmU4=</latexit><latexit sha1_base64="JAe47C3ZiYMhlyhW3tdBkc0CfY=">ACDHicbVDLSgMxFL1TX7W+qi7dBIsgiGWmCLoRirpw4aIF+4BOKZk04ZmMkOSEcowH+DGX3HjQhG3foA7/8a0nYW2HgczjmXm3u8iDOlbfvbyi0tr6yu5dcLG5tb2zvF3b2mCmNJaIOEPJRtDyvKmaANzTSn7UhSHictrzR9cRvPVCpWCju9Ti3QAPBPMZwdpIvWKpji6Ri3k0xOgGoRPk+hKTxDmdaWlSe9Myi7bU6BF4mSkBlqveKX2w9JHFChCcdKdRw70t0ES80Ip2nBjRWNMBnhAe0YKnBAVTeZHpOiI6P0kR9K84RGU/X3RIDpcaBZ5IB1kM1703E/7xOrP2LbsJEFGsqyGyRH3OkQzRpBvWZpETzsSGYSGb+isgQmza06a9gSnDmT14kzUrZsctO/axUvcrqyMBHMIxOHAOVbiFGjSAwCM8wyu8WU/Wi/VufcyiOSub2Yc/sD5/AFsWmU4=</latexit><latexit sha1_base64="JAe47C3ZiYMhlyhW3tdBkc0CfY=">ACDHicbVDLSgMxFL1TX7W+qi7dBIsgiGWmCLoRirpw4aIF+4BOKZk04ZmMkOSEcowH+DGX3HjQhG3foA7/8a0nYW2HgczjmXm3u8iDOlbfvbyi0tr6yu5dcLG5tb2zvF3b2mCmNJaIOEPJRtDyvKmaANzTSn7UhSHictrzR9cRvPVCpWCju9Ti3QAPBPMZwdpIvWKpji6Ri3k0xOgGoRPk+hKTxDmdaWlSe9Myi7bU6BF4mSkBlqveKX2w9JHFChCcdKdRw70t0ES80Ip2nBjRWNMBnhAe0YKnBAVTeZHpOiI6P0kR9K84RGU/X3RIDpcaBZ5IB1kM1703E/7xOrP2LbsJEFGsqyGyRH3OkQzRpBvWZpETzsSGYSGb+isgQmza06a9gSnDmT14kzUrZsctO/axUvcrqyMBHMIxOHAOVbiFGjSAwCM8wyu8WU/Wi/VufcyiOSub2Yc/sD5/AFsWmU4=</latexit>
minimize 1 2xT Qx αxT s | {z }
f(x)
+ ραkDxk1 | {z }
g(x)
<latexit sha1_base64="gNWgDYqXzUVlMJvmiht5BTN5Bn8=">ACWnicbVHNSxwxHM2M9WtdW29RJcCkpxmZFCe5TaQ48Krgqbdchkf7MbzMeYZMqscf7JXorgv1Iwu86hah+EPN7vPZK85KXg1iXJfRQvVleWV1b72y8fbe51d1+f251ZRgMmBbaXObUguAKBo47AZelASpzARf59fF8fvELjOVanblZCSNJ4oXnFEXpKx7Q2Suay+54pLfQoMJpUag8kNZeBJETafNv6wqa/O8Cmu8QEmVJRTioNgm8wXe/V+gz8/j5mpbl3k7kdN7rI0OCcLZ9btJf1kAfyapC3poRYnWfc3GWtWSVCOCWrtME1KN/LUOM4ENB1SWSgpu6YTGAaqAQ78otqGvwpKGNcaBOWcnih/pvwVFo7k3lwSuqm9uVsLv5vNqxc8W3kuSorB4o9HVRUAjuN5z3jMTfAnJgFQpnh4a6YTWlox4Xf6IQS0pdPfk3OD/tp0k9Pv/SOvrd1rKGPaBftoR9RUfoJzpBA8TQH/Q3WolWo4c4jtfjSdrHLWZD+gZ4p1Hvzi0xg=</latexit><latexit sha1_base64="gNWgDYqXzUVlMJvmiht5BTN5Bn8=">ACWnicbVHNSxwxHM2M9WtdW29RJcCkpxmZFCe5TaQ48Krgqbdchkf7MbzMeYZMqscf7JXorgv1Iwu86hah+EPN7vPZK85KXg1iXJfRQvVleWV1b72y8fbe51d1+f251ZRgMmBbaXObUguAKBo47AZelASpzARf59fF8fvELjOVanblZCSNJ4oXnFEXpKx7Q2Suay+54pLfQoMJpUag8kNZeBJETafNv6wqa/O8Cmu8QEmVJRTioNgm8wXe/V+gz8/j5mpbl3k7kdN7rI0OCcLZ9btJf1kAfyapC3poRYnWfc3GWtWSVCOCWrtME1KN/LUOM4ENB1SWSgpu6YTGAaqAQ78otqGvwpKGNcaBOWcnih/pvwVFo7k3lwSuqm9uVsLv5vNqxc8W3kuSorB4o9HVRUAjuN5z3jMTfAnJgFQpnh4a6YTWlox4Xf6IQS0pdPfk3OD/tp0k9Pv/SOvrd1rKGPaBftoR9RUfoJzpBA8TQH/Q3WolWo4c4jtfjSdrHLWZD+gZ4p1Hvzi0xg=</latexit><latexit sha1_base64="gNWgDYqXzUVlMJvmiht5BTN5Bn8=">ACWnicbVHNSxwxHM2M9WtdW29RJcCkpxmZFCe5TaQ48Krgqbdchkf7MbzMeYZMqscf7JXorgv1Iwu86hah+EPN7vPZK85KXg1iXJfRQvVleWV1b72y8fbe51d1+f251ZRgMmBbaXObUguAKBo47AZelASpzARf59fF8fvELjOVanblZCSNJ4oXnFEXpKx7Q2Suay+54pLfQoMJpUag8kNZeBJETafNv6wqa/O8Cmu8QEmVJRTioNgm8wXe/V+gz8/j5mpbl3k7kdN7rI0OCcLZ9btJf1kAfyapC3poRYnWfc3GWtWSVCOCWrtME1KN/LUOM4ENB1SWSgpu6YTGAaqAQ78otqGvwpKGNcaBOWcnih/pvwVFo7k3lwSuqm9uVsLv5vNqxc8W3kuSorB4o9HVRUAjuN5z3jMTfAnJgFQpnh4a6YTWlox4Xf6IQS0pdPfk3OD/tp0k9Pv/SOvrd1rKGPaBftoR9RUfoJzpBA8TQH/Q3WolWo4c4jtfjSdrHLWZD+gZ4p1Hvzi0xg=</latexit><latexit sha1_base64="gNWgDYqXzUVlMJvmiht5BTN5Bn8=">ACWnicbVHNSxwxHM2M9WtdW29RJcCkpxmZFCe5TaQ48Krgqbdchkf7MbzMeYZMqscf7JXorgv1Iwu86hah+EPN7vPZK85KXg1iXJfRQvVleWV1b72y8fbe51d1+f251ZRgMmBbaXObUguAKBo47AZelASpzARf59fF8fvELjOVanblZCSNJ4oXnFEXpKx7Q2Suay+54pLfQoMJpUag8kNZeBJETafNv6wqa/O8Cmu8QEmVJRTioNgm8wXe/V+gz8/j5mpbl3k7kdN7rI0OCcLZ9btJf1kAfyapC3poRYnWfc3GWtWSVCOCWrtME1KN/LUOM4ENB1SWSgpu6YTGAaqAQ78otqGvwpKGNcaBOWcnih/pvwVFo7k3lwSuqm9uVsLv5vNqxc8W3kuSorB4o9HVRUAjuN5z3jMTfAnJgFQpnh4a6YTWlox4Xf6IQS0pdPfk3OD/tp0k9Pv/SOvrd1rKGPaBftoR9RUfoJzpBA8TQH/Q3WolWo4c4jtfjSdrHLWZD+gZ4p1Hvzi0xg=</latexit>
Fountoulakis et al. Variational Perspective of Local Graph Clustering, Mathematical Programming, 2017
- The stationary vector of the non-linear PageRank diffusion
corresponds to the optimal solution of the l1-regularized PageRank problem:
SLIDE 23 Properties of the l1-regularized optimal solution
- Theorem
- If the graph is unweighted then the number of nonzero nodes in the
- ptimal solution is bounded by
.
1/ρ
- If the graph is weighted then the volume of nonzero nodes in the optimal
solution is bounded by .
1/ρ
Fountoulakis et al. Variational Perspective of Local Graph Clustering, Mathematical Programming, 2017
SLIDE 24 The solution path is monotonic
be the solution of the l1-regularized problem as a function of .
̂ x(ρ) ρ
is a component-wise monotone function
̂ x(ρ)
- The inequality becomes strict when a component is positive.
̂ x(ρ0) ≤ ̂ x(ρ1) for ρ0 > ρ1
- W. Ha, K. Fountoulakis, M. Mahoney. Statistical Guarantees of Local Graph Clustering. AISTATS-2020
SLIDE 25 Stage-wise for recovering the whole path
- Corollary
- The stage-wise algorithm converges to the l1-regularized solution path if
we drag the step-size of the algorithm to zero.
2) Update [xk+1]i = [xk]i + η di
<latexit sha1_base64="Edbuj4S8YQqS2MrsrFatzvo1vPc=">ACI3icbZDLSsQwFIZT7463UZdugoOgCEM7CIogiG5cKjgXmJaSpqcaJr2QnIpD6bu48VXcuFDEjQvfxXSchY7+EPjyn3NIzh9kUmi07Q9ranpmdm5+YbG2tLyulZf3+joNFc2jyVqeoFTIMUCbRoIRepoDFgYRuMDiv6t07UFqkyTUOM/BidpOISHCGxvLrx24cpPdFa4+2s5Ah0JL27/1isO+Uni/oSXUbVLRP3UgxXriArCxCX5R+vWE37ZHoX3DG0CBjXfr1NzdMeR5DglwyrfuOnaFXMIWCSyhrbq4hY3zAbqBvMGExaK8Y7VjSHeOENEqVOQnSkftzomCx1sM4MJ0xw1s9WavM/2r9HKMjrxBJliMk/PuhKJcU1oFRkOhgKMcGmBcCfNXym+ZiQJNrDUTgjO58l/otJqO3XSuDhqnZ+M4FsgW2Sa7xCGH5JRckEvSJpw8kCfyQl6tR+vZerPev1unrPHMJvkl6/MLWs+jgQ=</latexit><latexit sha1_base64="Edbuj4S8YQqS2MrsrFatzvo1vPc=">ACI3icbZDLSsQwFIZT7463UZdugoOgCEM7CIogiG5cKjgXmJaSpqcaJr2QnIpD6bu48VXcuFDEjQvfxXSchY7+EPjyn3NIzh9kUmi07Q9ranpmdm5+YbG2tLyulZf3+joNFc2jyVqeoFTIMUCbRoIRepoDFgYRuMDiv6t07UFqkyTUOM/BidpOISHCGxvLrx24cpPdFa4+2s5Ah0JL27/1isO+Uni/oSXUbVLRP3UgxXriArCxCX5R+vWE37ZHoX3DG0CBjXfr1NzdMeR5DglwyrfuOnaFXMIWCSyhrbq4hY3zAbqBvMGExaK8Y7VjSHeOENEqVOQnSkftzomCx1sM4MJ0xw1s9WavM/2r9HKMjrxBJliMk/PuhKJcU1oFRkOhgKMcGmBcCfNXym+ZiQJNrDUTgjO58l/otJqO3XSuDhqnZ+M4FsgW2Sa7xCGH5JRckEvSJpw8kCfyQl6tR+vZerPev1unrPHMJvkl6/MLWs+jgQ=</latexit><latexit sha1_base64="Edbuj4S8YQqS2MrsrFatzvo1vPc=">ACI3icbZDLSsQwFIZT7463UZdugoOgCEM7CIogiG5cKjgXmJaSpqcaJr2QnIpD6bu48VXcuFDEjQvfxXSchY7+EPjyn3NIzh9kUmi07Q9ranpmdm5+YbG2tLyulZf3+joNFc2jyVqeoFTIMUCbRoIRepoDFgYRuMDiv6t07UFqkyTUOM/BidpOISHCGxvLrx24cpPdFa4+2s5Ah0JL27/1isO+Uni/oSXUbVLRP3UgxXriArCxCX5R+vWE37ZHoX3DG0CBjXfr1NzdMeR5DglwyrfuOnaFXMIWCSyhrbq4hY3zAbqBvMGExaK8Y7VjSHeOENEqVOQnSkftzomCx1sM4MJ0xw1s9WavM/2r9HKMjrxBJliMk/PuhKJcU1oFRkOhgKMcGmBcCfNXym+ZiQJNrDUTgjO58l/otJqO3XSuDhqnZ+M4FsgW2Sa7xCGH5JRckEvSJpw8kCfyQl6tR+vZerPev1unrPHMJvkl6/MLWs+jgQ=</latexit><latexit sha1_base64="Edbuj4S8YQqS2MrsrFatzvo1vPc=">ACI3icbZDLSsQwFIZT7463UZdugoOgCEM7CIogiG5cKjgXmJaSpqcaJr2QnIpD6bu48VXcuFDEjQvfxXSchY7+EPjyn3NIzh9kUmi07Q9ranpmdm5+YbG2tLyulZf3+joNFc2jyVqeoFTIMUCbRoIRepoDFgYRuMDiv6t07UFqkyTUOM/BidpOISHCGxvLrx24cpPdFa4+2s5Ah0JL27/1isO+Uni/oSXUbVLRP3UgxXriArCxCX5R+vWE37ZHoX3DG0CBjXfr1NzdMeR5DglwyrfuOnaFXMIWCSyhrbq4hY3zAbqBvMGExaK8Y7VjSHeOENEqVOQnSkftzomCx1sM4MJ0xw1s9WavM/2r9HKMjrxBJliMk/PuhKJcU1oFRkOhgKMcGmBcCfNXym+ZiQJNrDUTgjO58l/otJqO3XSuDhqnZ+M4FsgW2Sa7xCGH5JRckEvSJpw8kCfyQl6tR+vZerPev1unrPHMJvkl6/MLWs+jgQ=</latexit>
1) Choose i such that
i rif(xk)
<latexit sha1_base64="HB1dEsK7n/2WdcYsdB8P6Ykh+0A=">ACVXicbVFNb9NAFybUkr4aIAjlxURUnsgsiskOFb0wrFIpK0UG+t582yvsh/W7jNq5OZP9oL4J1yQ2CQ+QMtIK41m3uzHbNkq6SlJfkbxg72H+48OHo+ePH32/HD84uWFt50TOBNWXdVgkclDc5IksKr1iHoUuFluTzb+Jf0XlpzVdatZhrqI2spAKUjFWmS7tdZ8e87PGWo98zSXfadx3ouHUAUxU1jRzaKQ3/p36TozUCoJK+OrovlceZk3dDNEJM+hJArcDV64qCtqcMOc5MX40kyTbg90k6kAkbcF6Mb7OFZ1GQ0KB9/M0aSnvwZEUCtejrPYglhCjfNADWj0eb9tZc3fBmXBK+vCMsS36t+JHrT3K12GSQ3U+LveRvyfN+o+pj30rQdoRG7g6pOcbJ8UzFfSIeC1CoQE6Gu3LRgANB4SNGoYT07pPvk4uTaZpM0y/vJ6efhjoO2Gv2h2xlH1gp+wzO2czJtgt+xVFURz9iH7He/H+bjSOhswr9g/iwz9ZQrMp</latexit><latexit sha1_base64="HB1dEsK7n/2WdcYsdB8P6Ykh+0A=">ACVXicbVFNb9NAFybUkr4aIAjlxURUnsgsiskOFb0wrFIpK0UG+t582yvsh/W7jNq5OZP9oL4J1yQ2CQ+QMtIK41m3uzHbNkq6SlJfkbxg72H+48OHo+ePH32/HD84uWFt50TOBNWXdVgkclDc5IksKr1iHoUuFluTzb+Jf0XlpzVdatZhrqI2spAKUjFWmS7tdZ8e87PGWo98zSXfadx3ouHUAUxU1jRzaKQ3/p36TozUCoJK+OrovlceZk3dDNEJM+hJArcDV64qCtqcMOc5MX40kyTbg90k6kAkbcF6Mb7OFZ1GQ0KB9/M0aSnvwZEUCtejrPYglhCjfNADWj0eb9tZc3fBmXBK+vCMsS36t+JHrT3K12GSQ3U+LveRvyfN+o+pj30rQdoRG7g6pOcbJ8UzFfSIeC1CoQE6Gu3LRgANB4SNGoYT07pPvk4uTaZpM0y/vJ6efhjoO2Gv2h2xlH1gp+wzO2czJtgt+xVFURz9iH7He/H+bjSOhswr9g/iwz9ZQrMp</latexit><latexit sha1_base64="HB1dEsK7n/2WdcYsdB8P6Ykh+0A=">ACVXicbVFNb9NAFybUkr4aIAjlxURUnsgsiskOFb0wrFIpK0UG+t582yvsh/W7jNq5OZP9oL4J1yQ2CQ+QMtIK41m3uzHbNkq6SlJfkbxg72H+48OHo+ePH32/HD84uWFt50TOBNWXdVgkclDc5IksKr1iHoUuFluTzb+Jf0XlpzVdatZhrqI2spAKUjFWmS7tdZ8e87PGWo98zSXfadx3ouHUAUxU1jRzaKQ3/p36TozUCoJK+OrovlceZk3dDNEJM+hJArcDV64qCtqcMOc5MX40kyTbg90k6kAkbcF6Mb7OFZ1GQ0KB9/M0aSnvwZEUCtejrPYglhCjfNADWj0eb9tZc3fBmXBK+vCMsS36t+JHrT3K12GSQ3U+LveRvyfN+o+pj30rQdoRG7g6pOcbJ8UzFfSIeC1CoQE6Gu3LRgANB4SNGoYT07pPvk4uTaZpM0y/vJ6efhjoO2Gv2h2xlH1gp+wzO2czJtgt+xVFURz9iH7He/H+bjSOhswr9g/iwz9ZQrMp</latexit><latexit sha1_base64="HB1dEsK7n/2WdcYsdB8P6Ykh+0A=">ACVXicbVFNb9NAFybUkr4aIAjlxURUnsgsiskOFb0wrFIpK0UG+t582yvsh/W7jNq5OZP9oL4J1yQ2CQ+QMtIK41m3uzHbNkq6SlJfkbxg72H+48OHo+ePH32/HD84uWFt50TOBNWXdVgkclDc5IksKr1iHoUuFluTzb+Jf0XlpzVdatZhrqI2spAKUjFWmS7tdZ8e87PGWo98zSXfadx3ouHUAUxU1jRzaKQ3/p36TozUCoJK+OrovlceZk3dDNEJM+hJArcDV64qCtqcMOc5MX40kyTbg90k6kAkbcF6Mb7OFZ1GQ0KB9/M0aSnvwZEUCtejrPYglhCjfNADWj0eb9tZc3fBmXBK+vCMsS36t+JHrT3K12GSQ3U+LveRvyfN+o+pj30rQdoRG7g6pOcbJ8UzFfSIeC1CoQE6Gu3LRgANB4SNGoYT07pPvk4uTaZpM0y/vJ6efhjoO2Gv2h2xlH1gp+wzO2czJtgt+xVFURz9iH7He/H+bjSOhswr9g/iwz9ZQrMp</latexit>
η
<latexit sha1_base64="TZ6dp5tyCBo6NdZnLwRhzdKcg4s=">AB63icbVA9SwNBEN2LXzF+RS1tFoNgFe5E0DJoYxnBxEByhL3NJFmyu3fszgnhyF+wsVDE1j9k579xL7lCEx8MPN6bYWZelEh0fe/vdLa+sbmVnm7srO7t39QPTxq2zg1HFo8lrHpRMyCFBpaKFBCJzHAVCThMZrc5v7jExgrYv2A0wRCxUZaDAVnmEs9QNav1vy6PwdJUFBaqRAs1/96g1inirQyCWzthv4CYZMyi4hFml1pIGJ+wEXQd1UyBDbP5rTN65pQBHcbGlUY6V39PZExZO1WR61QMx3bZy8X/vG6Kw+swEzpJETRfLBqmkmJM8fpQBjgKeOMG6Eu5XyMTOMo4un4kIl9eJe2LeuDXg/vLWuOmiKNMTsgpOScBuSINckeapEU4GZNn8krePOW9eO/ex6K15BUzx+QPvM8fCOKOA=</latexit><latexit sha1_base64="TZ6dp5tyCBo6NdZnLwRhzdKcg4s=">AB63icbVA9SwNBEN2LXzF+RS1tFoNgFe5E0DJoYxnBxEByhL3NJFmyu3fszgnhyF+wsVDE1j9k579xL7lCEx8MPN6bYWZelEh0fe/vdLa+sbmVnm7srO7t39QPTxq2zg1HFo8lrHpRMyCFBpaKFBCJzHAVCThMZrc5v7jExgrYv2A0wRCxUZaDAVnmEs9QNav1vy6PwdJUFBaqRAs1/96g1inirQyCWzthv4CYZMyi4hFml1pIGJ+wEXQd1UyBDbP5rTN65pQBHcbGlUY6V39PZExZO1WR61QMx3bZy8X/vG6Kw+swEzpJETRfLBqmkmJM8fpQBjgKeOMG6Eu5XyMTOMo4un4kIl9eJe2LeuDXg/vLWuOmiKNMTsgpOScBuSINckeapEU4GZNn8krePOW9eO/ex6K15BUzx+QPvM8fCOKOA=</latexit><latexit sha1_base64="TZ6dp5tyCBo6NdZnLwRhzdKcg4s=">AB63icbVA9SwNBEN2LXzF+RS1tFoNgFe5E0DJoYxnBxEByhL3NJFmyu3fszgnhyF+wsVDE1j9k579xL7lCEx8MPN6bYWZelEh0fe/vdLa+sbmVnm7srO7t39QPTxq2zg1HFo8lrHpRMyCFBpaKFBCJzHAVCThMZrc5v7jExgrYv2A0wRCxUZaDAVnmEs9QNav1vy6PwdJUFBaqRAs1/96g1inirQyCWzthv4CYZMyi4hFml1pIGJ+wEXQd1UyBDbP5rTN65pQBHcbGlUY6V39PZExZO1WR61QMx3bZy8X/vG6Kw+swEzpJETRfLBqmkmJM8fpQBjgKeOMG6Eu5XyMTOMo4un4kIl9eJe2LeuDXg/vLWuOmiKNMTsgpOScBuSINckeapEU4GZNn8krePOW9eO/ex6K15BUzx+QPvM8fCOKOA=</latexit><latexit sha1_base64="TZ6dp5tyCBo6NdZnLwRhzdKcg4s=">AB63icbVA9SwNBEN2LXzF+RS1tFoNgFe5E0DJoYxnBxEByhL3NJFmyu3fszgnhyF+wsVDE1j9k579xL7lCEx8MPN6bYWZelEh0fe/vdLa+sbmVnm7srO7t39QPTxq2zg1HFo8lrHpRMyCFBpaKFBCJzHAVCThMZrc5v7jExgrYv2A0wRCxUZaDAVnmEs9QNav1vy6PwdJUFBaqRAs1/96g1inirQyCWzthv4CYZMyi4hFml1pIGJ+wEXQd1UyBDbP5rTN65pQBHcbGlUY6V39PZExZO1WR61QMx3bZy8X/vG6Kw+swEzpJETRfLBqmkmJM8fpQBjgKeOMG6Eu5XyMTOMo4un4kIl9eJe2LeuDXg/vLWuOmiKNMTsgpOScBuSINckeapEU4GZNn8krePOW9eO/ex6K15BUzx+QPvM8fCOKOA=</latexit>
- The running time of stage-wise depends on the nonzero nodes and its
neighbors and not on the size of the whole graph.
- Stage-wise algorithm
- W. Ha, K. Fountoulakis, M. Mahoney. Statistical Guarantees of Local Graph Clustering. AISTATS-2020
SLIDE 26 L1-reg. Path Stagewise path η = 10−4
Stage-wise for recovering the whole path - example
- W. Ha, K. Fountoulakis, M. Mahoney. Statistical Guarantees of Local Graph Clustering. AISTATS-2020
SLIDE 27 What if we do not want to recover the whole path?
Proximal gradient descent
xk+1 := argmin g(x) + f(xk) + hrf(xk), x xki | {z } first-order Taylor approximation + 1 2kx xkk2
2
| {z }
upper bound on the approximation error
Requires careful implementation to avoid excessive running time
- Need to maintain a set of non-zero nodes
- Update x and gradient only for non-zero nodes and their neighbors at
each iteration
minimize 1 2xT Qx αxT s | {z }
f(x)
+ ραkDxk1 | {z }
g(x)
<latexit sha1_base64="gNWgDYqXzUVlMJvmiht5BTN5Bn8=">ACWnicbVHNSxwxHM2M9WtdW29RJcCkpxmZFCe5TaQ48Krgqbdchkf7MbzMeYZMqscf7JXorgv1Iwu86hah+EPN7vPZK85KXg1iXJfRQvVleWV1b72y8fbe51d1+f251ZRgMmBbaXObUguAKBo47AZelASpzARf59fF8fvELjOVanblZCSNJ4oXnFEXpKx7Q2Suay+54pLfQoMJpUag8kNZeBJETafNv6wqa/O8Cmu8QEmVJRTioNgm8wXe/V+gz8/j5mpbl3k7kdN7rI0OCcLZ9btJf1kAfyapC3poRYnWfc3GWtWSVCOCWrtME1KN/LUOM4ENB1SWSgpu6YTGAaqAQ78otqGvwpKGNcaBOWcnih/pvwVFo7k3lwSuqm9uVsLv5vNqxc8W3kuSorB4o9HVRUAjuN5z3jMTfAnJgFQpnh4a6YTWlox4Xf6IQS0pdPfk3OD/tp0k9Pv/SOvrd1rKGPaBftoR9RUfoJzpBA8TQH/Q3WolWo4c4jtfjSdrHLWZD+gZ4p1Hvzi0xg=</latexit><latexit sha1_base64="gNWgDYqXzUVlMJvmiht5BTN5Bn8=">ACWnicbVHNSxwxHM2M9WtdW29RJcCkpxmZFCe5TaQ48Krgqbdchkf7MbzMeYZMqscf7JXorgv1Iwu86hah+EPN7vPZK85KXg1iXJfRQvVleWV1b72y8fbe51d1+f251ZRgMmBbaXObUguAKBo47AZelASpzARf59fF8fvELjOVanblZCSNJ4oXnFEXpKx7Q2Suay+54pLfQoMJpUag8kNZeBJETafNv6wqa/O8Cmu8QEmVJRTioNgm8wXe/V+gz8/j5mpbl3k7kdN7rI0OCcLZ9btJf1kAfyapC3poRYnWfc3GWtWSVCOCWrtME1KN/LUOM4ENB1SWSgpu6YTGAaqAQ78otqGvwpKGNcaBOWcnih/pvwVFo7k3lwSuqm9uVsLv5vNqxc8W3kuSorB4o9HVRUAjuN5z3jMTfAnJgFQpnh4a6YTWlox4Xf6IQS0pdPfk3OD/tp0k9Pv/SOvrd1rKGPaBftoR9RUfoJzpBA8TQH/Q3WolWo4c4jtfjSdrHLWZD+gZ4p1Hvzi0xg=</latexit><latexit sha1_base64="gNWgDYqXzUVlMJvmiht5BTN5Bn8=">ACWnicbVHNSxwxHM2M9WtdW29RJcCkpxmZFCe5TaQ48Krgqbdchkf7MbzMeYZMqscf7JXorgv1Iwu86hah+EPN7vPZK85KXg1iXJfRQvVleWV1b72y8fbe51d1+f251ZRgMmBbaXObUguAKBo47AZelASpzARf59fF8fvELjOVanblZCSNJ4oXnFEXpKx7Q2Suay+54pLfQoMJpUag8kNZeBJETafNv6wqa/O8Cmu8QEmVJRTioNgm8wXe/V+gz8/j5mpbl3k7kdN7rI0OCcLZ9btJf1kAfyapC3poRYnWfc3GWtWSVCOCWrtME1KN/LUOM4ENB1SWSgpu6YTGAaqAQ78otqGvwpKGNcaBOWcnih/pvwVFo7k3lwSuqm9uVsLv5vNqxc8W3kuSorB4o9HVRUAjuN5z3jMTfAnJgFQpnh4a6YTWlox4Xf6IQS0pdPfk3OD/tp0k9Pv/SOvrd1rKGPaBftoR9RUfoJzpBA8TQH/Q3WolWo4c4jtfjSdrHLWZD+gZ4p1Hvzi0xg=</latexit><latexit sha1_base64="gNWgDYqXzUVlMJvmiht5BTN5Bn8=">ACWnicbVHNSxwxHM2M9WtdW29RJcCkpxmZFCe5TaQ48Krgqbdchkf7MbzMeYZMqscf7JXorgv1Iwu86hah+EPN7vPZK85KXg1iXJfRQvVleWV1b72y8fbe51d1+f251ZRgMmBbaXObUguAKBo47AZelASpzARf59fF8fvELjOVanblZCSNJ4oXnFEXpKx7Q2Suay+54pLfQoMJpUag8kNZeBJETafNv6wqa/O8Cmu8QEmVJRTioNgm8wXe/V+gz8/j5mpbl3k7kdN7rI0OCcLZ9btJf1kAfyapC3poRYnWfc3GWtWSVCOCWrtME1KN/LUOM4ENB1SWSgpu6YTGAaqAQ78otqGvwpKGNcaBOWcnih/pvwVFo7k3lwSuqm9uVsLv5vNqxc8W3kuSorB4o9HVRUAjuN5z3jMTfAnJgFQpnh4a6YTWlox4Xf6IQS0pdPfk3OD/tp0k9Pv/SOvrd1rKGPaBftoR9RUfoJzpBA8TQH/Q3WolWo4c4jtfjSdrHLWZD+gZ4p1Hvzi0xg=</latexit>
Fountoulakis et al. Variational Perspective of Local Graph Clustering, Mathematical Programming, 2017
SLIDE 28 Theorem: non-decreasing non-zero nodes
200 400 600 800 1000 1200 1400 1600 1800
Iterations
500 1000 1500
Number of nonzeros
Proximal Gradient Optimal number of non-zeros
Fountoulakis et al. Variational Perspective of Local Graph Clustering, Mathematical Programming, 2017
SLIDE 29 Open problem: is accelerated prox. grad. a local algorithm?
Gradient descent running time
200 400 600 800 1000 1200 1400 1600 1800
Iterations
200 400 600 800 1000 1200 1400 1600
Number of nonzeros
Accelerated prox. grad. Proximal Gradient
˜ 𝒫 ( vol( ̂ S) μ ) ˜ 𝒫 ( vol(G) μ ) ˜ 𝒫 ( vol( ̂ S) μ )
- : support of optimal solution, i.e., non-zero nodes.
̂ S
- strong convexity parameter of the problem.
μ
SLIDE 30 Two ways to measure performance of the l1-regularized PageRank model
- Performance under stochastic block model - recover a cluster using the
- utput of l1-regularized PageRank.
- Use conductance to measure quality of the output. Show that the output
has conductance value similar to a target cluster around the seed node. Average-case Worst-case
Fountoulakis et al. Variational Perspective of Local Graph Clustering, Mathematical Programming, 2017
- W. Ha, K. Fountoulakis, M. Mahoney. Statistical Guarantees of Local Graph Clustering. AISTATS-2020
Zhu et al. A local algorithm for finding well-connected clusters, ICML, 2013
SLIDE 31
Average-case guarantees
SLIDE 32 Average-case performance
Local random model
with nodes, let be a target cluster inside .
G n K G
are connected with probability
K p
are connected with probability .
K Kc q
- The rest of edges can be drawn using any other model.
SLIDE 33 Expected l1-regularized PageRank
- The optimal solution of the expected problem identifies the target cluster.
- Theorem
- Suppose that the seed node is selected from target cluster K. The optimal
solution of
- satisfies
- as long as ρ = 𝒫 (p/ ¯
d2)
- where is the expected degree of nodes in the target cluster.
¯ d
- W. Ha, K. Fountoulakis, M. Mahoney. Statistical Guarantees of Local Graph Clustering. AISTATS-2020
supp(x*) = K
x* := argmin1 2 xT𝔽[Q]x − αxTs + ρα∥𝔽[D]x∥1
SLIDE 34 Results for l1-regularized PageRank for noisy data
- In practice, we do not have access to the expected graph. We are given a
realization of the local random model that includes “noise”, i.e., edges from the target cluster to the rest of the graph.
- We have two results for the noisy case.
- First result.
- Second result.
- Zero false negatives.
- Bounded false positives.
- With additional assumptions on the seed nodes we can show exact
recovery.
- W. Ha, K. Fountoulakis, M. Mahoney. Statistical Guarantees of Local Graph Clustering. AISTATS-2020
SLIDE 35 Results for l1-regularized PageRank for noisy data
- Theorem (bounded false positives)
- Suppose
, where is the size of the target cluster.
p2k ≥ 𝒫(log k) k
d2)
, i.e., the probability of staying inside the target cluster in one step.
γ = pk/ ¯ d
the optimal solution of the realized problem has zero false negatives and the false positives are bounded
1 − 6exp(−𝒫(p2k))
vol(FP) ≤ vol(K)(𝒫 ( 1 γ2) − 1)
- W. Ha, K. Fountoulakis, M. Mahoney. Statistical Guarantees of Local Graph Clustering. AISTATS-2020
SLIDE 36 Results for l1-regularized PageRank for noisy data
where , i.e., the probability of staying inside the target cluster in one step.
γ = pk ¯ d
Assuming is the smaller part of the graph
B
Φ(B) := ( number of edges leaving B sum of edges of vertices in B)
Definitions
SLIDE 37
- Theorem (exact recovery)
- Let q = 𝒫 (1/n)
- Then with probability at least
there exists a good seed node such that if we use that seed node we get
1 − 𝒫(e−k)
supp( ̂ x) = K
dj ≥ 𝒫 ( 1 γp ) ∀j ∈ Kc
Results for l1-regularized PageRank for noisy data
- W. Ha, K. Fountoulakis, M. Mahoney. Statistical Guarantees of Local Graph Clustering. AISTATS-2020
SLIDE 38
- The assumption that q = 𝒫 (1/n)
- but it is not, because it also covers the case were the size of the target
cluster is k = 𝒫(1)
- This is a realistic local graph clustering setting where we attempt to
recover a very small target cluster of constant size with constant number
- f edges leaving the cluster.
- implies that there are constant number of edges leaving the cluster,
which sounds artificial.
Results for l1-regularized PageRank for noisy data
- W. Ha, K. Fountoulakis, M. Mahoney. Statistical Guarantees of Local Graph Clustering. AISTATS-2020
SLIDE 39
Worst-case guarantees
SLIDE 40 Some definitions
- Internal connectivity of target cluster
B
- Conductance of target cluster :
B
Assuming is the smaller part of the graph
B
Φ(B) := ( number of edges leaving B sum of edges of vertices in B) IC(B) := the minimum conductance of the subgraph induced by B
SLIDE 41 Worst-case performance
Zhu et al. A local algorithm for finding well-connected clusters, ICML, 2013
- Assume that the internal connectivity of the target cluster K is larger
than its conductance
- False positives are bounded by
- True positives are bounded by
IC2(K) Φ(K)log vol(K) ≥ Ω(1) vol(FP) ≤ 𝒫 ( Φ(K) IC(K)) vol(K) vol(FN) ≤ 𝒫 ( Φ(K) IC(K)) vol(K)
SLIDE 42 Compare average- and worst-case
False Positives False Negatives Average-case Worst-case zero
- The average-case result on FP is stronger for large values of .
γ
- Also for the average-case we can also prove exact recovery.
vol(FP) ≤ vol(K)(𝒫 ( 1 γ2) − 1) vol(FP) ≤ vol(K)𝒫((1 − γ)log k) vol(FN) ≤ vol(K)𝒫((1 − γ)log k)
- W. Ha, K. Fountoulakis, M. Mahoney. Statistical Guarantees of Local Graph Clustering. AISTATS-2020
SLIDE 43 Comparison to planted cluster model
and
- Using semidefinite programming one can achieve exact recovery as long
as ,
- while our results guarantee zero false negative and a constant proportion
- f false positives.
- However, our model is not allowed to touch the whole graph.
p = 1 q = 𝒫(log n/n) k ≥ 𝒫(log n)
- W. Ha, K. Fountoulakis, M. Mahoney. Statistical Guarantees of Local Graph Clustering. AISTATS-2020
SLIDE 44
Combinatorial Diffusion: Capacity Releasing Diffusion
SLIDE 45 Problem: spectral diffusions might leak mass
- regularized PageRank (best tuning)
Precision=0.73, Recall=0.91
ℓ1
Target cluster: Students of Year 2008
Red nodes: output of the algorithm Data: Facebook Colgate University, A. L. Traud, P. J. Mucha and M. A. Porter, Physica A, 391(16), 2012
SLIDE 46 Solving the problem of spreading mass indiscriminately by gradual release of edge capacity
- Even distribution of the residual probability mass to neighbors
Spectral diffusions
- Controls the amount of mass to be send over an edge by using the height “h”
- f a node
Capacity Releasing Diffusion
- In theory this results in bounded mass leaked outside of the target cluster
- In practice this results in much better precision and recall
- D. Wang, K. Fountoulakis, M. Mahoney, S. Rao. Capacity Releasing Diffusion for Speed and Locality. ICML 2017
SLIDE 47 Capacity Releasing Diffusion algorithm
A B C D E F G H m=0, h=0 m=0, h=0 m=0, h=0 m=0, h=0 m=0, h=0 m=0, h=0 m=0, h=0 m=0, h=0 Saturated nodes: m(v) >= deg(v) Excess mass = max(m(v) - deg(v),0) degree(v): #edges of node v Maintain mass “m” and height “h” for each node
- D. Wang, K. Fountoulakis, M. Mahoney, S. Rao. Capacity Releasing Diffusion for Speed and Locality. ICML 2017
SLIDE 48 Capacity Releasing Diffusion algorithm
A B C D E F G H m=0, h=0 m=4, h=1 m=0, h=0 m=0, h=0 m=0, h=0 m=0, h=0 m=0, h=0 m=0, h=0
Overflow the seed: m(A) = 2deg(A)
Saturated nodes: m(v) >= deg(v) Excess mass = max(m(v) - deg(v),0) degree(v): #edges of node v
Iterate m(v) <= 2deg(v) for all nodes v m(v) <= deg(v) for all nodes v
Algorithm
Overflow: m(v) = 2m(v) Push excess mass to unsaturated nodes with lower height
- D. Wang, K. Fountoulakis, M. Mahoney, S. Rao. Capacity Releasing Diffusion for Speed and Locality. ICML 2017
SLIDE 49 A B C D E F G H m=0, h=0 m=4, h=1 m=0, h=0 m=0, h=0 m=0, h=0 m=0, h=0 m=0, h=0 m=0, h=0
Overflow the seed: m(A) = 2deg(A) Push excess mass to unsaturated nodes with lower height
Saturated nodes: m(v) >= deg(v) Excess mass = max(m(v) - deg(v),0) degree(v): #edges of node v
Iterate m(v) <= deg(v) for all nodes v
Algorithm
Overflow: m(v) = 2m(v)
- D. Wang, K. Fountoulakis, M. Mahoney, S. Rao. Capacity Releasing Diffusion for Speed and Locality. ICML 2017
Capacity Releasing Diffusion algorithm
SLIDE 50 A B C D E F G H m=0, h=0 m=4, h=1 m=0, h=0 m=0, h=0 m=0, h=0 m=0, h=0 m=0, h=0 m=0, h=0
Overflow the seed: m(A) = 2deg(A)
Saturated nodes: m(v) >= deg(v) Excess mass = max(m(v) - deg(v),0) degree(v): #edges of node v
Iterate m(v) <= deg(v) for all nodes v
Algorithm
Overflow: m(v) = 2m(v) Pick node A (has excess mass) and a neighbor of A with lower height “h”
- D. Wang, K. Fountoulakis, M. Mahoney, S. Rao. Capacity Releasing Diffusion for Speed and Locality. ICML 2017
Capacity Releasing Diffusion algorithm
SLIDE 51 A B C D E F G H m=0, h=0 m=4, h=1 m=0, h=0 m=0, h=0 m=0, h=0 m=0, h=0 m=0, h=0 m=0, h=0
Overflow the seed: m(A) = 2deg(A)
Saturated nodes: m(v) >= deg(v) Excess mass = max(m(v) - deg(v),0) degree(v): #edges of node v
Iterate Pick node C m(v) <= deg(v) for all nodes v
Algorithm
Overflow: m(v) = 2m(v)
- D. Wang, K. Fountoulakis, M. Mahoney, S. Rao. Capacity Releasing Diffusion for Speed and Locality. ICML 2017
Capacity Releasing Diffusion algorithm
SLIDE 52 A B C D E F G H m=0, h=0 m=4, h=1 m=0, h=0 m=0, h=0 m=0, h=0 m=0, h=0 m=0, h=0 m=0, h=0
Overflow the seed: m(A) = 2deg(A)
Saturated nodes: m(v) >= deg(v) Excess mass = max(m(v) - deg(v),0) degree(v): #edges of node v
Iterate Pick node A (has excess mass) m(v) <= deg(v) for all nodes v
Algorithm
Push 1 unit
Overflow: m(v) = 2m(v)
Gradual release: do not push more than the height of A
Push at most "h" flow to a chosen neighbor
- D. Wang, K. Fountoulakis, M. Mahoney, S. Rao. Capacity Releasing Diffusion for Speed and Locality. ICML 2017
Capacity Releasing Diffusion algorithm
SLIDE 53 A B C D E F G H m=0, h=0 m=3, h=1 m=1, h=0 m=0, h=0 m=0, h=0 m=0, h=0 m=0, h=0 m=0, h=0
Overflow the seed: m(A) = 2deg(A)
Saturated nodes: m(v) >= deg(v) Excess mass = max(m(v) - deg(v),0) degree(v): #edges of node v
Iterate m(v) <= deg(v) for all nodes v
Algorithm
Push excess mass to unsaturated nodes with lower height Overflow: m(v) = 2m(v)
- D. Wang, K. Fountoulakis, M. Mahoney, S. Rao. Capacity Releasing Diffusion for Speed and Locality. ICML 2017
Capacity Releasing Diffusion algorithm
SLIDE 54 A B C D E F G H m=0, h=0 m=3, h=1 m=1, h=0 m=0, h=0 m=0, h=0 m=0, h=0 m=0, h=0 m=0, h=0
Overflow the seed: m(A) = 2deg(A)
Saturated nodes: m(v) >= deg(v) Excess mass = max(m(v) - deg(v),0) degree(v): #edges of node v
Iterate m(v) <= deg(v) for all nodes v
Algorithm
P u s h 1 u n i t
Overflow: m(v) = 2m(v) Pick node A (has excess mass) and a new edge of node A of residual flow less than “h”
- D. Wang, K. Fountoulakis, M. Mahoney, S. Rao. Capacity Releasing Diffusion for Speed and Locality. ICML 2017
Capacity Releasing Diffusion algorithm
SLIDE 55 A B C D E F G H m=1, h=0 m=2, h=1 m=1, h=0 m=0, h=0 m=0, h=0 m=0, h=0 m=0, h=0 m=0, h=0
Overflow the seed: m(A) = 2deg(A)
Saturated nodes: m(v) >= deg(v) Excess mass = max(m(v) - deg(v),0) degree(v): #edges of node v
Iterate m(v) <= 2deg(v) for all nodes v m(v) <= deg(v) for all nodes v
Algorithm
Push excess mass to unsaturated nodes with lower height Overflow: m(v) = 2m(v)
- D. Wang, K. Fountoulakis, M. Mahoney, S. Rao. Capacity Releasing Diffusion for Speed and Locality. ICML 2017
Capacity Releasing Diffusion algorithm
SLIDE 56 A B C D E F G H m=2, h=0 m=4, h=1 m=2, h=0 m=0, h=0 m=0, h=0 m=0, h=0 m=0, h=0 m=0, h=0
Overflow the seed: m(A) = 2deg(A)
Saturated nodes: m(v) >= deg(v) Excess mass = max(m(v) - deg(v),0) degree(v): #edges of node v
Iterate m(v) <= 2deg(v) for all nodes v m(v) <= deg(v) for all nodes v
Algorithm
Push excess mass to unsaturated nodes with lower height Overflow: m(v) = 2m(v)
- D. Wang, K. Fountoulakis, M. Mahoney, S. Rao. Capacity Releasing Diffusion for Speed and Locality. ICML 2017
Capacity Releasing Diffusion algorithm
SLIDE 57 A B C D E F G H m=2, h=0 m=4, h=1 m=2, h=0 m=0, h=0 m=0, h=0 m=0, h=0 m=0, h=0 m=0, h=0 Saturated nodes: m(v) >= deg(v) Excess mass = max(m(v) - deg(v),0) degree(v): #edges of node v Algorithm
Overflow the seed: m(A) = 2deg(A) Iterate m(v) <= deg(v) for all nodes v Overflow: m(v) = 2m(v) Pick node A (has excess mass) and a neighbor of A with lower height “h”
- D. Wang, K. Fountoulakis, M. Mahoney, S. Rao. Capacity Releasing Diffusion for Speed and Locality. ICML 2017
Capacity Releasing Diffusion algorithm
SLIDE 58 A B C D E F G H m=2, h=0 m=4, h=1 m=2, h=0 m=0, h=0 m=0, h=0 m=0, h=0 m=0, h=0 m=0, h=0
Overflow the seed: m(A) = 2deg(A)
Saturated nodes: m(v) >= deg(v) Excess mass = max(m(v) - deg(v),0) degree(v): #edges of node v
Iterate Pick node A (has excess mass) m(v) <= deg(v) for all nodes v
Algorithm
Push 1 unit
Overflow: m(v) = 2m(v)
Gradual release: do not push more than the height of A
Push at most "h" flow to a chosen neighbor
- D. Wang, K. Fountoulakis, M. Mahoney, S. Rao. Capacity Releasing Diffusion for Speed and Locality. ICML 2017
Capacity Releasing Diffusion algorithm
SLIDE 59 A B C D E F G H m=2, h=0 m=3, h=1 m=3, h=0 m=0, h=0 m=0, h=0 m=0, h=0 m=0, h=0 m=0, h=0
Overflow the seed: m(A) = 2deg(A)
Saturated nodes: m(v) >= deg(v) Excess mass = max(m(v) - deg(v),0) degree(v): #edges of node v
Iterate Note C has excess it has to be added to the candidate nodes m(v) <= deg(v) for all nodes v
Algorithm
Overflow: m(v) = 2m(v)
- D. Wang, K. Fountoulakis, M. Mahoney, S. Rao. Capacity Releasing Diffusion for Speed and Locality. ICML 2017
Capacity Releasing Diffusion algorithm
SLIDE 60 A B C D E F G H m=2, h=0 m=3, h=1 m=3, h=0 m=0, h=0 m=0, h=0 m=0, h=0 m=0, h=0 m=0, h=0
Overflow the seed: m(A) = 2deg(A)
Saturated nodes: m(v) >= deg(v) Excess mass = max(m(v) - deg(v),0) degree(v): #edges of node v
Iterate m(v) <= deg(v) for all nodes v
Algorithm
Overflow: m(v) = 2m(v) Pick node C (has excess mass) and a neighbor of C with lower height “h”
- D. Wang, K. Fountoulakis, M. Mahoney, S. Rao. Capacity Releasing Diffusion for Speed and Locality. ICML 2017
Capacity Releasing Diffusion algorithm
SLIDE 61 A B C D E F G H m=2, h=0 m=3, h=1 m=3, h=1 m=0, h=0 m=0, h=0 m=0, h=0 m=0, h=0 m=0, h=0
Overflow the seed: m(A) = 2deg(A)
Saturated nodes: m(v) >= deg(v) Excess mass = max(m(v) - deg(v),0) degree(v): #edges of node v
Iterate m(v) <= deg(v) for all nodes v
Algorithm
Overflow: m(v) = 2m(v) There is no neighbor of C with lower height so increase the height of C by 1
- D. Wang, K. Fountoulakis, M. Mahoney, S. Rao. Capacity Releasing Diffusion for Speed and Locality. ICML 2017
Capacity Releasing Diffusion algorithm
SLIDE 62 A B C D E F G H m=2, h=0 m=3, h=1 m=3, h=1 m=0, h=0 m=0, h=0 m=0, h=0 m=0, h=0 m=0, h=0
Overflow the seed: m(A) = 2deg(A)
Saturated nodes: m(v) >= deg(v) Excess mass = max(m(v) - deg(v),0) degree(v): #edges of node v
Iterate m(v) <= deg(v) for all nodes v
Algorithm
Overflow: m(v) = 2m(v) Repeat until there is no node with excess mass
- D. Wang, K. Fountoulakis, M. Mahoney, S. Rao. Capacity Releasing Diffusion for Speed and Locality. ICML 2017
Capacity Releasing Diffusion algorithm
SLIDE 63 Theoretical comparison to spectral diffusions
- Theoretical bound on FP/FN needs: “signal” polylog stronger than “noise”, as
- pposed to: quadratically stronger for spectral methods
- The running time is
times faster than spectral
1/IC(B)
Weaker assumptions Better running time Better worst-case guarantees
- Internal connectivity (“signal”) of target
B
- Conductance of target (“noise”):
B
, as opposed to
Φ(A) ≤ 𝒫(Φ(B)) Φ(A) ≤ 𝒫(Φ(B)/IC(B))
Assuming is the smaller part of the graph
B
Φ(B) := ( number of edges leaving B sum of edges of vertices in B )
IC(B) := the minimum conductance of the subgraph induced by B
- D. Wang, K. Fountoulakis, M. Mahoney, S. Rao. Capacity Releasing Diffusion for Speed and Locality. ICML 2017
SLIDE 64 Example on Facebook Colgate University social network
Year 2008
- regularized PageRank(best tuning)
Precision=0.73, Recall=0.94
ℓ1
Capacity Releasing Diffusion Precision=0.93, Recall=0.94
- D. Wang, K. Fountoulakis, M. Mahoney, S. Rao. Capacity Releasing Diffusion for Speed and Locality. ICML 2017
SLIDE 65 Example on Facebook Johns Hopkins social network
Capacity Releasing Diffusion Precision=0.87, Recall=0.94
- regularized PageRank (best tuning)
Precision=0.71, Recall=0.91
ℓ1
Same major
- D. Wang, K. Fountoulakis, M. Mahoney, S. Rao. Capacity Releasing Diffusion for Speed and Locality. ICML 2017
SLIDE 67 Spectrum of methods
- S. Yang, D. Wang, K. Fountoulakis. p-Norm Flow Diffusion for Local Graph Clustering.
Spectral Diffusions Combinatorial Diffusions e.g., PageRank easy to understand fast in practice e.g., capacity releasing diffusion robust to noise
SLIDE 68 Spectrum of methods
Spectral Diffusions Combinatorial Diffusions
p
- -norm flow diffusion is a family of convex optimization problems that
characterizes the trade-off between spectral and combinatorial diffusions.
- This allows us to define methods that are the best of both worlds.
p
- S. Yang, D. Wang, K. Fountoulakis. p-Norm Flow Diffusion for Local Graph Clustering.
SLIDE 69 Some definitions - incidence matrix
Incidence matrix B
A B C D E F G H A-B
1
A-C
1
B-C
1
C-D
1
D-E
1
D-F
1
D-G
1
F-H
1
A B C D E F G H
- Ordering of edges and direction is arbitrary
- S. Yang, D. Wang, K. Fountoulakis. p-Norm Flow Diffusion for Local Graph Clustering.
SLIDE 70 Some definitions - flow variables
- Let be a vector and each component of corresponds to an edge,
for example:
f f
A B C D E F G H
fAC fCD fAB fBC fDE fDG fDF fFH
- The magnitude of is the amount of flow that passes through an edge
- The sign of is the direction of flow
f f
- S. Yang, D. Wang, K. Fountoulakis. p-Norm Flow Diffusion for Local Graph Clustering.
SLIDE 71 Some definitions - net flow
be a non-negative vector, each component of indicates the initial mass at a node.
- is a vector that captures the net flow on a node.
- indicates the net mass on every node.
Δ Δ BTf BTf + Δ
- S. Yang, D. Wang, K. Fountoulakis. p-Norm Flow Diffusion for Local Graph Clustering.
SLIDE 72 Node capacities
- We will require that each node has capacity equal to its degree
- We will say that the initial mass
has been diffused, when the net mass on each node is less than its capacity:
di Δ
BTf + Δ net mass per node ≤ d
⏟
capacity per node
- S. Yang, D. Wang, K. Fountoulakis. p-Norm Flow Diffusion for Local Graph Clustering.
SLIDE 73 Diffusion as an optimization formulation
minimize kfkp subject to: BT f + ∆ d
<latexit sha1_base64="sUIrAJhP7kZkLC3+vs7ahvgonLY=">ACNXicbVDLSgMxFM34tr6qLt0EiyIZUYExVWpLly4ULCt0NSe9obJIZkoxYx/6UG/DlS5cKOLWXzB9LHwdCBzOZebe8JEcGN9/9kbGR0bn5icms7NzM7NL+QXl6omTjWDCotFrM9CakBwBRXLrYCzRAOVoYBa2N7v+bVr0IbH6tR2EmhIeqF4xBm1Tmrmj4gM45tMcsUlv4UudljHBJO7iNw1E0Jyg4BJwytgFt4r9sPlM9PI7yJyQEISzERgFu4mS/4Rb8P/JcEQ1JAQxw384+kFbNUgrJMUGPqgZ/YRka15UxAN0dSAwlbXoBdUcVlWAaWf/qLl5zSgtHsXZPWdxXv09kVBrTkaFLSmovzW+vJ/7n1VMb7TYyrpLUgmKDRVEq3O24VyFuce2qEB1HKNPc/RWzS6ops67onCsh+H3yX1LdKgZ+MTjZLpTKwzqm0ApaRsoQDuohA7RMaoghu7RE3pFb96D9+K9ex+D6Ig3nFlGP+B9fgE2KkN</latexit><latexit sha1_base64="sUIrAJhP7kZkLC3+vs7ahvgonLY=">ACNXicbVDLSgMxFM34tr6qLt0EiyIZUYExVWpLly4ULCt0NSe9obJIZkoxYx/6UG/DlS5cKOLWXzB9LHwdCBzOZebe8JEcGN9/9kbGR0bn5icms7NzM7NL+QXl6omTjWDCotFrM9CakBwBRXLrYCzRAOVoYBa2N7v+bVr0IbH6tR2EmhIeqF4xBm1Tmrmj4gM45tMcsUlv4UudljHBJO7iNw1E0Jyg4BJwytgFt4r9sPlM9PI7yJyQEISzERgFu4mS/4Rb8P/JcEQ1JAQxw384+kFbNUgrJMUGPqgZ/YRka15UxAN0dSAwlbXoBdUcVlWAaWf/qLl5zSgtHsXZPWdxXv09kVBrTkaFLSmovzW+vJ/7n1VMb7TYyrpLUgmKDRVEq3O24VyFuce2qEB1HKNPc/RWzS6ops67onCsh+H3yX1LdKgZ+MTjZLpTKwzqm0ApaRsoQDuohA7RMaoghu7RE3pFb96D9+K9ex+D6Ig3nFlGP+B9fgE2KkN</latexit><latexit sha1_base64="sUIrAJhP7kZkLC3+vs7ahvgonLY=">ACNXicbVDLSgMxFM34tr6qLt0EiyIZUYExVWpLly4ULCt0NSe9obJIZkoxYx/6UG/DlS5cKOLWXzB9LHwdCBzOZebe8JEcGN9/9kbGR0bn5icms7NzM7NL+QXl6omTjWDCotFrM9CakBwBRXLrYCzRAOVoYBa2N7v+bVr0IbH6tR2EmhIeqF4xBm1Tmrmj4gM45tMcsUlv4UudljHBJO7iNw1E0Jyg4BJwytgFt4r9sPlM9PI7yJyQEISzERgFu4mS/4Rb8P/JcEQ1JAQxw384+kFbNUgrJMUGPqgZ/YRka15UxAN0dSAwlbXoBdUcVlWAaWf/qLl5zSgtHsXZPWdxXv09kVBrTkaFLSmovzW+vJ/7n1VMb7TYyrpLUgmKDRVEq3O24VyFuce2qEB1HKNPc/RWzS6ops67onCsh+H3yX1LdKgZ+MTjZLpTKwzqm0ApaRsoQDuohA7RMaoghu7RE3pFb96D9+K9ex+D6Ig3nFlGP+B9fgE2KkN</latexit><latexit sha1_base64="sUIrAJhP7kZkLC3+vs7ahvgonLY=">ACNXicbVDLSgMxFM34tr6qLt0EiyIZUYExVWpLly4ULCt0NSe9obJIZkoxYx/6UG/DlS5cKOLWXzB9LHwdCBzOZebe8JEcGN9/9kbGR0bn5icms7NzM7NL+QXl6omTjWDCotFrM9CakBwBRXLrYCzRAOVoYBa2N7v+bVr0IbH6tR2EmhIeqF4xBm1Tmrmj4gM45tMcsUlv4UudljHBJO7iNw1E0Jyg4BJwytgFt4r9sPlM9PI7yJyQEISzERgFu4mS/4Rb8P/JcEQ1JAQxw384+kFbNUgrJMUGPqgZ/YRka15UxAN0dSAwlbXoBdUcVlWAaWf/qLl5zSgtHsXZPWdxXv09kVBrTkaFLSmovzW+vJ/7n1VMb7TYyrpLUgmKDRVEq3O24VyFuce2qEB1HKNPc/RWzS6ops67onCsh+H3yX1LdKgZ+MTjZLpTKwzqm0ApaRsoQDuohA7RMaoghu7RE3pFb96D9+K9ex+D6Ig3nFlGP+B9fgE2KkN</latexit>
- Out of all possible flows that satisfy the capacities we are interested in the
- ne with minimum
norm, where .
f Lp p ∈ [2,∞)
- S. Yang, D. Wang, K. Fountoulakis. p-Norm Flow Diffusion for Local Graph Clustering.
SLIDE 74 Relation to other methods
the dual of the -norm flow diffusion problem is
p = 2 2
minimize 1 2 ∥Bx∥2
2 − xTΔ + ∥Dx∥1
- which is a regularized spectral problem, very similar
- regularized
PageRank.
ℓ1
the dual of the
- norm flow diffusion problem is
p → ∞ ∞
minimize ∥Bx∥1 − xTΔ + ∥Dx∥1
- which is a regularized min-cut problem, very similar to the so-called flow-
improve methods
- S. Yang, D. Wang, K. Fountoulakis. p-Norm Flow Diffusion for Local Graph Clustering.
SLIDE 75 Rounding
- S. Yang, D. Wang, K. Fountoulakis. p-Norm Flow Diffusion for Local Graph Clustering.
- Sort the dual variables in descending order
- Output the prefix set with smallest conductance.
- In practice we solve the dual of the -norm flow problem
p
minimize −xTΔ + ∥Dx∥1 subject to: ∥Bx∥q ≤ 1 x ≥ 0
- So we have direct access to the dual variables
SLIDE 76
- norm network flow diffusions - conductance guarantees
p
be the target cluster with conductance , if is initialized inside , and the input seed set sufficiently overlaps with , then the output satisfies
C Φ(C) Δ C C A
Φ(A) ≤ 𝒫 (Φ(B)1−1/p)
.
- Constant factor approximation when
, similar to combinatorial diffusions.
- Smooth transition for general values in between
p = 2 p → ∞ p
- S. Yang, D. Wang, K. Fountoulakis. p-Norm Flow Diffusion for Local Graph Clustering.
SLIDE 77
- norm network flow diffusions - algorithm
p
- Simple randomized coordinate descent
- Running time
𝒫( |Δ| γ (
|Δ| ϵ ) 1−2/plog 1 ϵ )
- represents the magnitude of the initial mass.
- is the strong convexity parameter of the dual problem.
- is the required accuracy
|Δ| γ ϵ
- gives the usual running time for spectral methods
- gives the usual running time for combinatorial methods
p = 2 ˜ 𝒫(|Δ|) p → ∞ ˜ 𝒫(|Δ|2/ϵ)
- S. Yang, D. Wang, K. Fountoulakis. p-Norm Flow Diffusion for Local Graph Clustering.
SLIDE 78
- norm network flow diffusions - summary
p
- There is a trade-off between quality of output and running time
- The larger is the better the output with respect to conductance.
- However, the larger is the more the running time for solving problem.
- In practice, small values of
gives the best of both worlds.
p p p ∈ [2,8]
- S. Yang, D. Wang, K. Fountoulakis. p-Norm Flow Diffusion for Local Graph Clustering.
SLIDE 79 Performance
- S. Yang, D. Wang, K. Fountoulakis. p-Norm Flow Diffusion for Local Graph Clustering.
- LFR synthetics model, basically a stochastic block model
- is a parameter that controls noise, the higher the more noise.
μ
0.1 0.2 0.3 0.4 0.1 0.2 0.3 0.4 0.5 0.6
Conductance
nonlinear power L1-reg. pr p = 8 p = 4 p = 2
0.1 0.2 0.3 0.4 0.4 0.6 0.8 1
F1 measure
p = 2 p = 4 p = 8 L1-reg. pr nonlinear power
SLIDE 80
Local to Global Applications: Network Community Profiles, Node embeddings, Graph Visualization, Semi- Supervised Learning
(no theory ☹, preliminary work)
SLIDE 81
Network Community Profiles
Clusters with smallest conductance correspond to galaxies
SLIDE 82 Node embeddings
Types of node embeddings
- Global embeddings
- Local embeddings, i.e., spectral and combinatorial
- Goal: Represent a node with a low dimensional vector.
- We use node embeddings for graph visualization, semi-supervised learning
and graph partitioning.
SLIDE 83 Global embeddings
- Compute the Laplacian matrix
- Compute non-trivial eigenvectors of
- Stuck the eigenvectors as columns of a
matrix .
is a vector representation (node embedding) of a node.
L = D − A k L n × k U U
SLIDE 84 Local spectral embeddings
seed sets
- For each seed set run a local spectral algorithm.
- Stuck eigenvectors as columns of a
matrix .
- Compute principal left singular vectors of
- Stuck the singular vectors as columns of a
matrix .
is a vector representation (node embedding) of a node.
N n × N X k X n × k U U
SLIDE 85 Local flow embeddings
seed sets
- For each seed set run a local flow algorithm
- Stuck eigenvectors as columns of a
matrix .
- Compute principal left singular vectors of
- Stuck the singular vectors as columns of a
matrix .
is a vector representation (node embedding) of a node.
N n × N X k X n × k U U
SLIDE 86 Graph visualization - US highway network
- Edges represent naturally funded highways, and nodes represent
intersections.
- Mostly toy-graph for demonstration purposes
SLIDE 87 Graph visualization - global embeddings
- Color shows true longitude
- Global embeddings seem to correlate with longitude
- But, compresses major regions of the northeastern US (Washington, New
York, Boston) as well as the Western US (Los Angeles, San Diego, Phoenix).
SLIDE 88 map Local spectral embeddings Local flow embeddings
Graph visualization - local embeddings
- With global embeddings Western US (Los Angeles, San Diego, Phoenix) was
quite compressed.
- Local embeddings help in de-compressing the region.
- Local spectral and flow embeddings seem to be qualitatively different.
SLIDE 89 Main Galaxy Sample data
- Each node is a galaxy
- Edges represent distance among galaxies
- The distance is determined by measuring the distance of the emission spectra
- f two galaxies
- There are 517182 galaxies (nodes) and each galaxy has 4 neighbor galaxies
(edges)
Mapping the similarities of spectra: global and local approaches to sdss galaxies. The Astrophysical Journal. 2016.
SLIDE 90
Local spectral and flow embeddings - Main Galaxy Sample data
Global embedding Local spectral Local flow Zoom-in this dense region
SLIDE 91 Local spectral and flow embeddings - Main Galaxy Sample data
- Structural differences in visualization also translate to clusters with smaller
conductance. k = 50 k = 100
SLIDE 92 Semi-supervised learning
- Infer unknown labels for all nodes, when given a few nodes with known labels.
- We assume that the graph edges represent a high likelihood of sharing a
label.
- For each class, we randomly select a small subset of nodes, and we fix the
labels of these nodes as known.
- We then run a spectral or a flow method where this set of nodes is the
- reference. This gives one spectral or flow vector per class.
- For each unlabelled node we look at the corresponding coordinate in the
vectors and we give it the label that corresponds to the class with the highest value. Problem Algorithm
SLIDE 93 Semi-supervised learning
True labels included in seeds
- PubMed is a citation network. 19717 scientific
publications about diabetes with 44338 citation links.
- By construction of the graph, articles about one
type of diabetes cite others about the same type more often. Info about the data
SLIDE 94
Software
LocalGraphClustering on
SLIDE 95
Thank you!