Extension Can we identify competitors of an Ads campaign in a specific category? Campaigns Queries Also in this setting by using some pre-computation we can compute the PPR efficiently. Charles River Workshop on Private Analysis of Social Networks
Local random walk and clustering in practice Joint work with: Raimondas Kiveris (Google Research NY) Vahab Mirrokni (Google Research NY) Charles River Workshop on Private Analysis of Social Networks
Some basic intuitions It would be nice to have the number and the length all the possible paths between two nodes. Charles River Workshop on Private Analysis of Social Networks
Some basic intuitions It would be nice to have the number and the length all the possible paths between two nodes. Infeasible. Charles River Workshop on Private Analysis of Social Networks
Some basic intuitions It would be nice to have the number and the length all the possible paths between two nodes. Infeasible. We are interested just in strong relationship, we can sample. Charles River Workshop on Private Analysis of Social Networks
Truncated random walk techniques Run several truncated random walk of a specific length. Charles River Workshop on Private Analysis of Social Networks
Truncated random walk techniques Run several truncated random walk of a specific length. Local algorithms based on this intuition: Truncated random walk, Personalized PageRank, Evolving set Charles River Workshop on Private Analysis of Social Networks
Nice experimental properties of PPR We can approximate it efficiently in MapReduce by analyzing short random walks recursively. Charles River Workshop on Private Analysis of Social Networks
Nice experimental properties of PPR We can approximate it efficiently in MapReduce by analyzing short random walks recursively. It works well in synthetic settings Charles River Workshop on Private Analysis of Social Networks
Nice experimental properties of PPR We can approximate it efficiently in MapReduce by analyzing short random walks recursively. It works well in synthetic settings It works well in practice: * On public graphs with 8M nodes -- Overlapping Clustering and Distributed Computation (WSDM'11, Andersen, Gleich, Mirrokni) * On YouTube co-watch Graph with 100M nodes with 100s of machines -- Large-scale Community Detection on Youtube graph (ICWSM'11, Gargi, Lu, Mirrokni, Yoon) * For sybil detection in social networks -- The evolution of Sybil Defense via Social Networks (S&P’13, Alvisi, Clement, Epasto, Lattanzi, Panconesi) Charles River Workshop on Private Analysis of Social Networks
Why does it work? Suppose to have a set with few edges going outside C v Charles River Workshop on Private Analysis of Social Networks
Why does it work? Suppose to have a set with few edges going outside Most of the time a random walk will stay in C C v Charles River Workshop on Private Analysis of Social Networks
Why does it work? Suppose to have a set with few edges going outside Most of the time a random walk will stay in C C v Charles River Workshop on Private Analysis of Social Networks
Why does it work? Suppose to have a set with few edges going outside Most of the time a random walk will stay in C C v Charles River Workshop on Private Analysis of Social Networks
Why does it work? Suppose to have a set with few edges going outside Most of the time a random walk will stay in C C v Charles River Workshop on Private Analysis of Social Networks
Why does it work? Suppose to have a set with few edges going outside Most of the time a random walk will stay in C C v It is possible to bound the amount of score that goes outside C Charles River Workshop on Private Analysis of Social Networks
Local clustering via random walk Charles River Workshop on Private Analysis of Social Networks
How should we define a cluster? Good clusters have cut conductance φ | cut ( C, V − C ) | φ = min ( V ol ( C ) , V ol ( V − C )) Charles River Workshop on Private Analysis of Social Networks
How should we define a cluster? Good clusters have cut conductance φ | cut ( C, V − C ) | φ = min ( V ol ( C ) , V ol ( V − C )) 1 17 Charles River Workshop on Private Analysis of Social Networks
Set of minimum conductance Problem is NP-hard Algorithms: p φ ( S ) = O ( φ ) Spectral algorithms [Jerrum&Sinclair’89] [Leighten-Rao’99] φ ( S ) = O (log n ) φ p [Arora-Rao-Vazirani’04] φ ( S ) = O ( log n ) φ Charles River Workshop on Private Analysis of Social Networks
Set of minimum conductance Problem is NP-hard Algorithms: p φ ( S ) = O ( φ ) Spectral algorithms [Jerrum&Sinclair’89] [Leighten-Rao’99] φ ( S ) = O (log n ) φ p [Arora-Rao-Vazirani’04] φ ( S ) = O ( log n ) φ Running time is at least linear in the size of the graph... Charles River Workshop on Private Analysis of Social Networks
Local Graph Clustering Charles River Workshop on Private Analysis of Social Networks
Local Graph Clustering Do we really need to explore all the graph?!? Charles River Workshop on Private Analysis of Social Networks
Local Clustering Algorithm Given a good node v, the algorithm: - Returns a set around v of good conductance - Runs in time proportional to the size of the output - Explores only the local neighborhood of v - Returns a set with roughly the same size of S Charles River Workshop on Private Analysis of Social Networks
Previous results Approximation guarantee Running time ✓ V ol ( S ) ◆ Truncated random walk 2 ˜ 1 3 log O φ 3 n [Spielman-Teng’04] φ 5 / 3 ✓ V ol ( S ) ◆ Truncated random walk 3 ˜ p O φ log 2 n [Spielman-Teng’08] φ 2 ✓ V ol ( S ) ◆ PageRank random walk p ˜ φ log n O [Andersen-Chung-Lang’06] φ ✓ V ol ( S ) ◆ Evolving Set p ˜ φ log n O [Andersen-Peres’08] √ φ r ✓ V ol ( S ) 1+ ✏ ◆ Evolving Set � ˜ O [Gharan-Trevisan’12] √ φ ✏ Charles River Workshop on Private Analysis of Social Networks
Previous results Approximation guarantee Running time ✓ V ol ( S ) ◆ Truncated random walk 2 ˜ 1 3 log O φ 3 n [Spielman-Teng’04] φ 5 / 3 ✓ V ol ( S ) ◆ Truncated random walk 3 ˜ p O φ log 2 n [Spielman-Teng’08] φ 2 ✓ V ol ( S ) ◆ PageRank random walk p ˜ φ log n O [Andersen-Chung-Lang’06] φ ✓ V ol ( S ) ◆ Evolving Set p ˜ φ log n O [Andersen-Peres’08] √ φ r ✓ V ol ( S ) 1+ ✏ ◆ Evolving Set � ˜ O [Gharan-Trevisan’12] √ φ ✏ Cheeger’s inequality barrier Charles River Workshop on Private Analysis of Social Networks
Previous results Approximation guarantee Running time ✓ V ol ( S ) ◆ Truncated random walk 2 ˜ 1 3 log O φ 3 n [Spielman-Teng’04] φ 5 / 3 ✓ V ol ( S ) ◆ Truncated random walk 3 ˜ p O φ log 2 n [Spielman-Teng’08] φ 2 ✓ V ol ( S ) ◆ PageRank random walk p ˜ φ log n O [Andersen-Chung-Lang’06] φ ✓ V ol ( S ) ◆ Evolving Set p ˜ φ log n O [Andersen-Peres’08] √ φ r ✓ V ol ( S ) 1+ ✏ ◆ Evolving Set � ˜ O [Gharan-Trevisan’12] √ φ ✏ Cheeger’s inequality Running time depends φ barrier only on and S Charles River Workshop on Private Analysis of Social Networks
Previous results Approximation guarantee Running time ✓ V ol ( S ) ◆ Truncated random walk 2 ˜ 1 3 log O φ 3 n [Spielman-Teng’04] φ 5 / 3 ✓ V ol ( S ) ◆ Truncated random walk 3 ˜ p O φ log 2 n [Spielman-Teng’08] φ 2 ✓ V ol ( S ) ◆ PageRank random walk p ˜ φ log n O [Andersen-Chung-Lang’06] φ ✓ V ol ( S ) ◆ Evolving Set p ˜ φ log n O [Andersen-Peres’08] √ φ r ✓ V ol ( S ) 1+ ✏ ◆ Evolving Set � ˜ O [Gharan-Trevisan’12] √ φ ✏ Cheeger’s inequality Running time depends φ barrier only on and S Charles River Workshop on Private Analysis of Social Networks
Clustering using PPR Approximate Personalized PageRank vector for v 0.01 0.07 0.09 0.002 0.09 0.09 v 0.03 0.06 0.08 Charles River Workshop on Private Analysis of Social Networks
Clustering using PPR Approximate Personalized PageRank vector for v Sort the nodes according their normalized score 0.005 0.035 0.0225 0.001 ppr ( v, u ) d ( u ) 0.03 0.03 v 0.01 0.02 0.04 Charles River Workshop on Private Analysis of Social Networks
Clustering using PPR Approximate Personalized PageRank vector for v Sort the nodes according their normalized score Select the sweep cut of best conductance 0.005 0.035 0.0225 0.001 0.03 0.03 v 0.01 0.02 0.04 Charles River Workshop on Private Analysis of Social Networks
Local clustering beyond Cheeger’s barrier Joint work with: Vahab Mirrokni (Google Research NY) Zeyaun Allen Zhu (MIT) ICML 2013 Charles River Workshop on Private Analysis of Social Networks
How should we define a cluster? Good clusters have cut conductance φ | cut ( C, V − C ) | φ = min ( V ol ( C ) , V ol ( V − C )) Charles River Workshop on Private Analysis of Social Networks
How should we define a cluster? Good clusters have cut conductance φ | cut ( C, V − C ) | φ = min ( V ol ( C ) , V ol ( V − C )) Is it enough to define a good cluster? Charles River Workshop on Private Analysis of Social Networks
How should we define a cluster? Good clusters have cut conductance φ | cut ( C, V − C ) | φ = min ( V ol ( C ) , V ol ( V − C )) Is it enough to define a good cluster? Same cut conductance... Charles River Workshop on Private Analysis of Social Networks
How should we define a cluster? Good clusters have cut conductance φ | cut ( C, V − C ) | φ = min ( V ol ( C ) , V ol ( V − C )) Good cluster have good set conductance ψ | cut ( S, C − S ) | ψ = min min ( V ol ( S ) , V ol ( C − S ) S ⊆ C Charles River Workshop on Private Analysis of Social Networks
How should we define a cluster? Good clusters have cut conductance φ | cut ( C, V − C ) | φ = min ( V ol ( C ) , V ol ( V − C )) Good cluster have good set conductance ψ | cut ( S, C − S ) | ψ = min min ( V ol ( S ) , V ol ( C − S ) S ⊆ C Can we do better when ? ψ >> φ Charles River Workshop on Private Analysis of Social Networks
Previous results Approximation guarantee Running time ✓ V ol ( S ) ◆ Truncated random walk 2 ˜ 1 3 log O φ 3 n [Spielman-Teng’04] φ 5 / 3 ✓ V ol ( S ) ◆ Truncated random walk 3 ˜ p O φ log 2 n [Spielman-Teng’08] φ 2 ✓ V ol ( S ) ◆ PageRank random walk p ˜ φ log n O [Andersen-Chung-Lang’06] φ ✓ V ol ( S ) ◆ Evolving Set p ˜ φ log n O [Andersen-Peres’08] √ φ r ✓ V ol ( S ) 1+ ✏ ◆ Evolving Set � ˜ O [Gharan-Trevisan’12] √ φ ✏ Cheeger’s inequality Running time depends φ barrier only on and S Charles River Workshop on Private Analysis of Social Networks
Our hypothesis ✓ ◆ φ 1 We study the problem when ψ 2 < O log n Charles River Workshop on Private Analysis of Social Networks
Our hypothesis ✓ ◆ φ 1 We study the problem when ψ 2 < O log n Similar problem studied Makarychev et al. in STOC12 They assume that φ < C λ 1 give a global SDP that can find communities with cut conductance φ Charles River Workshop on Private Analysis of Social Networks
Can we obtain the same results locally? Can we obtain a similar result using the Personalized PageRank? Theorem If there is a cluster of cut conductance and set φ conductance exists then normalized personalized ψ ✓ φ ◆ PageRank find a cluster with conductance ˜ O ψ Charles River Workshop on Private Analysis of Social Networks
Main proof ideas Bound the probability of leaving a set in t step knowing that in each step we leave with probability φ V Charles River Workshop on Private Analysis of Social Networks
Main proof ideas Bound the probability of leaving a set in t step knowing that in each step we leave with probability φ V Suppose that we are mixed inside C, then we would leak probability mass at each step. φ Charles River Workshop on Private Analysis of Social Networks
Main proof ideas Bound the probability of leaving a set in t step knowing that in each step we leave with probability φ V 1 φ So in steps, we would leak α α Charles River Workshop on Private Analysis of Social Networks
Main proof ideas Bound the probability of leaving a set in t step knowing that in each step we leave with probability φ V If we start from a good node is: pr ( u ) < 2 φ X α u/ ∈ S Charles River Workshop on Private Analysis of Social Networks
Main proof ideas 1 Inside S the random walk would be mixed in steps ψ 2 Charles River Workshop on Private Analysis of Social Networks
Main proof ideas 1 Inside S the random walk would be mixed in steps ψ 2 1 d ( u ) So after each node would have a score ψ 2 V ol ( S ) Charles River Workshop on Private Analysis of Social Networks
Main proof ideas 1 Inside S the random walk would be mixed in steps ψ 2 We can express the score of a node inside as: pr ( v ) ≥ ˜ pr ( v ) − pr l ( v ) Charles River Workshop on Private Analysis of Social Networks
Main proof ideas 1 Inside S the random walk would be mixed in steps ψ 2 We can express the score of a node inside as: pr ( v ) ≥ ˜ pr ( v ) − pr l ( v ) But we have a bound: ✓ 1 ◆ ppr ( z ) ≤ 2 φ X X pr l ( v ) = ψ 2 < O log n v ∈ S z / ∈ S Charles River Workshop on Private Analysis of Social Networks
Main proof ideas We can prove that we find a set that partially overlaps with S - Most of nodes in the cluster have high score - Most of nodes outside the cluster have low score Charles River Workshop on Private Analysis of Social Networks
Main proof ideas We can prove that we find a set that partially overlaps with S This implies bound on conductance!! Charles River Workshop on Private Analysis of Social Networks
Can we do better? Theorem 2 If there is a cluster of cut conductance and set φ conductance exists then normalized personalized ψ PageRank find a cluster with conductance ✓ φ ◆ Ω ψ Charles River Workshop on Private Analysis of Social Networks
Results Theorem 1 If there is a cluster of cut conductance and set φ conductance exists then normalized personalized ψ ✓ φ ◆ PageRank find a cluster with conductance ˜ O ψ Theorem 2 If there is a cluster of cut conductance and set φ conductance exists then normalized personalized ψ PageRank find a cluster with conductance ✓ φ ◆ Ω ψ Charles River Workshop on Private Analysis of Social Networks
Experiments Charles River Workshop on Private Analysis of Social Networks
Experiments Charles River Workshop on Private Analysis of Social Networks
Experiments Experiments using Watts-Strogatz model for the set S As the gap decreases, precision increases Charles River Workshop on Private Analysis of Social Networks
Conclusion and open problems Charles River Workshop on Private Analysis of Social Networks
Conclusion and open problems Random walk based techniques can be used to solve efficiently the similarity and the clustering problem Internal connectivity is very important for random walk techniques Can we say something when the gap between internal and external connectivity is smaller? Charles River Workshop on Private Analysis of Social Networks
Recommend
More recommend