Local clustering with graph diffusions and spectral solution paths - PowerPoint PPT Presentation

Local clustering with � graph diffusions and � spectral solution paths Joint with Kyle Kloster � David F David F. . Gleich Gleich, (Purdue), supported by � Purdue University � NSF CAREER 1149756-CCF

Local Clustering Given seed(s) S in G , find a good cluster near S seed

Local Clustering Given seed(s) S in G , find a good cluster near S seed “Near”? -> local, small containing S “Good”? -> low conductance

Low-conductance sets are clusters # edges leaving T conductance( T ) = # edge endpoints in T (for small sets T, i.e. vol(T) < vol(G)/2) = “ chance a random edge � that touches T exits T ”

Low-conductance sets are clusters # edges leaving T conductance( T ) = # edge endpoints in T (for small sets T, i.e. vol(T) < vol(G)/2) For a global cluster, could use Fiedler… But we want a local cluster

Fiedler Compute Fiedler vector, v : L v = λ 2 D v “Sweep” over v : 1. sort: v (1) ≥ v (2) ≥ · · · 2. for each set S k = (1,…,k) compute conductance φ ( S k ) 3. output best S k

Fiedler Compute Fiedler vector, v : L v = λ 2 D v Cheeger Inequality: � Fiedler finds a cluster “not “Sweep” over v : too much worse” than � 1. sort: global optimal v (1) ≥ v (2) ≥ · · · 2. for each set S k = (1,…,k) But we want local… compute conductance φ ( S k ) 3. output best S k

Local Fiedler and diffusions [Mahoney Orecchia Vishnoi 12] “A local spectral method…” Fiedler L v = D v [ λ ] with local bias (MOV) L v = D v [ λ ] + “ s ” (normalized seed vector s ) THM: MOV is a scaling of personalized PageRank*!

Local Fiedler and diffusions Intuition: why MOV ~ PageRank Fiedler L v = D v [ λ ] with local bias L v = D v [ λ ] + “ s ” ( I − D − 1 / 2 AD − 1 / 2 )ˆ v = ˆ v [ λ ] + “ s ” AD − 1 ˆ v = ˆ v [1 − λ ] + “ s ” PageRank vector, � ( I − α P ) ˆ v = “ s ” a diffusion

PageRank and other diffusions “Personalized” PageRank (PPR) [Andersen, Chung, Lang 06]: local Cheeger inequality � and fast algorithm, “Push” procedure Diffusion perspective Standard setting α k P k ˆ X x = ( I − α P ) x = ˆ s s k =0

PageRank and other diffusions α k P k ˆ X x = s “Personalized” PageRank (PPR) k =0 [Andersen, Chung, Lang 06]: local Cheeger inequality � and fast algorithm, “Push” procedure k ! P k ˆ t k X Heat Kernel diffusion (HK) � f = s (many more!) k =0 0 10 α =0.99 Various diffusions Weight explore different − 5 10 aspects of graphs. α =0.85 t=1 t=5 t=15 0 20 40 60 80 100 Length

Diffusions, theory & practice good fast conductance algorithm Local Cheeger Inequality [Andersen Chung Lang 06] PR “PPR-push” is O(1/( ε (1- 𝛽 ))) Local Cheeger Inequality [K., Gleich 2014] HK [Chung 07] “HK-push” is O(e t C/ ε ) [Avron, Horesh 2015] TDPR Open question Gen � This talk Open question Diff

Diffusions, theory & practice good fast conductance algorithm Local Cheeger Inequality [Andersen Chung Lang 06] PR “PPR-push” is O(1/( ε (1- 𝛽 ))) Local Cheeger Inequality [K., Gleich 2014] HK [Chung 07] “HK-push” is O(e t C/ ε ) [Avron, Horesh 2015] TDPR Open question Gen � This talk Open question Diff David Gleich and I are working with Olivia Simpson (a student of Fan Chung’s)

General diffusions: intuition A diffusion propagates “rank” from a seed across a graph. seed = high � diffusion value � = low � = local cluster / � low-conductance set �

General diffusions A diffusion propagates “rank” from a seed across a graph. General diffusion vector c k P k ˆ X f = s k =0 f = + … + p 3 p 0 + p 1 + p 2 c 3 c 0 c 1 c 2 Sweep over f !

General algorithm k D − 1 ( f � ˆ 1. Approximate f so f ) k ∞  ✏ D − 1 ˆ 2. Scale, f 3. Then sweep! How to do this efficiently?

Algorithm Intuition From parameters c k , ε , seed s … � Starting from here… seed seed p 3 … p 0 p 1 p 2 How to end up here? + p 3 + … + p 0 + p 1 p 2 c 3 c 1 c 2 c 0

Algorithm Intuition Begin with mass at seed(s) seed seed in a “residual” staging area, r 0 r 3 … r 0 r 1 r 2 The residuals r k hold mass that is unprocessed – it’s like error Idea : “push” any entry r k (j)/ d j > (some threshold) + p 3 + … p 0 + p 1 + p 2 c 3 c 1 c 2 c 0

Push Operation push – (1) remove entry in r k , � (2) put in f , r 3 … r 0 r 1 r 2 + p 3 + … p 0 + p 1 + p 2 c 3 c 1 c 2 c 0

Push Operation push – (1) remove entry in r k , � c 1 (2) put in f , (3) then scale and r 3 … r 0 r 1 r 2 spread to neighbors in next r � + p 3 + … p 0 + p 1 + p 2 c 3 c 1 c 2 c 0

Push Operation push – (1) remove entry in r k , � c 2 (2) put in f , (3) then scale and r 3 … r 0 r 1 r 2 spread to neighbors in next r (repeat) + p 3 + … p 0 + p 1 + p 2 c 3 c 1 c 2 c 0

Push Operation push – (1) remove entry in r k , � (2) put in f , (3) then scale and r 3 … r 0 r 1 r 2 spread to neighbors in next r c 2 (repeat) + p 3 + … + c 3 p 0 + p 1 p 2 c 1 c 2 c 0

Push Operation push – (1) remove entry in r k , � (2) put in f , (3) then scale and r 3 … r 0 r 1 r 2 spread to neighbors in next r c 2 (repeat) c 3 + p 3 + … p 0 + p 1 + p 2 c 3 c 1 c 2 c 0

Thresholds entries < threshold ERROR equals weighted sum of entries left in r k r 3 … r 0 r 1 r 2 à Set threshold so “leftovers” sum to < ε + p 3 + … p 0 + p 1 + p 2 c 3 c 1 c 2 c 0

Thresholds entries < threshold ERROR equals weighted sum of entries left in r k r 3 … r 0 r 1 r 2 à Set threshold so “leftovers” sum to < ε Threshold for stage r k is � 0 1 ∞ X ✏ / c j @ A + p 3 + … p 0 + p 1 + p 2 c 3 c 1 c 2 c 0 j = k +1 k D − 1 ( f � ˆ Then f ) k ∞  ✏

Another perspective Fiedler L v = D v [ λ ] with local bias L v = D v [ λ ] + “ s ” ( I − D − 1 / 2 AD − 1 / 2 )ˆ v = ˆ v [ λ ] + “ s ” AD − 1 ˆ v = ˆ v [1 − λ ] + “ s ” PageRank vector, � ( I − α P ) ˆ v = “ s ” a diffusion

Another perspective L V k = D V k Λ k Fiedler with local bias L V k = D V k Λ k + S V k Λ k + ˆ ( I − D − 1 / 2 AD − 1 / 2 ) ˆ V k = ˆ S AD − 1 ˆ V k ( I − Λ k ) + ˆ V k = ˆ S

Another perspective L V k = D V k Λ k Fiedler with local bias L V k = D V k Λ k + S V k Λ k + ˆ ( I − D − 1 / 2 AD − 1 / 2 ) ˆ V k = ˆ S AD − 1 ˆ V k ( I − Λ k ) + ˆ V k = ˆ S Mix-product property � V k + ¯ P ˆ V k Γ = ˆ S For Kronecker product

Another perspective L V k = D V k Λ k Fiedler with local bias L V k = D V k Λ k + S V k Λ k + ˆ ( I − D − 1 / 2 AD − 1 / 2 ) ˆ V k = ˆ S AD − 1 ˆ V k ( I − Λ k ) + ˆ V k = ˆ S Mix-product property � V k + ¯ P ˆ V k Γ = ˆ S For Kronecker product ( I − Γ T ⊗ P )vec( ˆ V k ) = vec( ˜ S )

Another perspective ( I − Γ T ⊗ P )vec( ˆ V k ) = vec( ˜ S ) ( I − α P ) ˆ v = ˜ s - generalizes PageRank to “matrix teleportation parameter” Γ = ( I − Λ k ) − 1 Standard spectral approach:

Another perspective ( I − Γ T ⊗ P )vec( ˆ V k ) = vec( ˜ S ) ( I − α P ) ˆ v = ˜ s - generalizes PageRank to “matrix teleportation parameter”   c 0 ˜ 0 ... Our framework   0   Γ = is equivalent to:   ...   c N ˜   0 (Details in [K., Gleich KDD 14])

General diffusions: conclusion THM : For diffusion coefficients c k >= 0 satisfying N ∞ “rate of X and X c k = 1 c k ≤ ✏ / 2 decay” k =0 k =0 “generalized push” approximates the diffusion f k D − 1 ( f � ˆ on a symmetric graph so that f ) k ∞  ✏ in work bounded by O (2 N 2 / ✏ ) Constant for any inputs! (If diffusion decays fast)

Proof sketch N X 1. Stop pushing after N terms. c k ≤ ✏ / 2 k =0 2. Push residual entries in first N terms if r k ( j ) ≥ d ( j ) ✏ / (2 N ) m k N − 1 3. Total work is # pushes: X X d ( j t ) t =1 k =0

Push Recap d(j) work push – (1) remove entry in r k , � (2) put in p , (3) then scale and r 3 … r 0 r 1 r 2 spread to neighbors in next r c 2 c 3 + p 3 + … p 0 + p 1 + p 2 c 3 c 1 c 2 c 0

Proof sketch N X 1. Stop pushing after N terms. c k ≤ ✏ / 2 k =0 2. Push residual entries in first N terms if r k ( j ) ≥ d ( j ) ✏ / (2 N ) m k N − 1 3. Total work is # pushes: X X d ( j t ) t =1 k =0

Proof sketch N X 1. Stop pushing after N terms. c k ≤ ✏ / 2 k =0 2. Push residual entries in first N terms if r k ( j ) ≥ d ( j ) ✏ / (2 N ) m k m k N − 1 N − 1 3. Total work is # pushes: X X X X r k ( j t )(2 N ) / ✏ d ( j t ) ≤ t =1 k =0 t =1 k =0

Proof sketch N X 1. Stop pushing after N terms. c k ≤ ✏ / 2 k =0 2. Push residual entries in first N terms if r k ( j ) ≥ d ( j ) ✏ / (2 N ) m k m k N − 1 N − 1 3. Total work is # pushes: X X X X r k ( j t )(2 N ) / ✏ d ( j t ) ≤ t =1 k =0 t =1 k =0 m k 4. Each r k sums to <= 1 X r k ( j t ) ≤ 1 (each push is added to f , which sums to 1) t =1 O (2 N 2 / ✏ )

Solutions Paths Benefit of these “push” diffusions? A direct decomposition is a black box: Feed in input, get output. In contrast, the iterative nature of “push” means running the algorithm is essentially “watching” the diffusion process occur.

Local clustering with graph diffusions and spectral solution paths - PowerPoint PPT Presentation

Local clustering with graph diffusions and spectral solution paths Joint with Kyle Kloster David F David F. . Gleich Gleich, (Purdue), supported by Purdue University NSF CAREER 1149756-CCF Local Clustering Given seed(s) S in G

Spectral Clustering Spectral Clustering? Spectral methods Methods using eigenvectors of

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Degenerate Diffusions in Genetics In Memory of Gennadi Population Genetics Henkin Charles L.

Poster #190 1 Spectral Clustering of Signed Graphs Poster #190 Our Goal: Extend Spectral

Lecture 23: Spectral clustering Hierarchical clustering What is a good clustering?

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Graph Clustering Why graph clustering is useful? Distance matrices are graphs as useful as

Spectral Clustering Lecture 16 David Sontag New York

Spectral Clustering Guokun Lai 2016/10 1 / 37 Organization Graph Cut Fundamental

PAC-Bayesian Analysis of Co-clustering, Graph Clustering and Pairwise Clustering Yevgeny Seldin

Spectral Graph Theory and its Applications Lillian Dai 6.454 Oct. 20, 2004 1 Outline Basic

Guarantees for Spectral Clustering with Fairness Constraints Matthus Kleindessner, Samira Samadi

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Clustering A Categorization of Major Clustering Methods Partitioning Methods

1 Matrix notation and preliminaries from spectral graph theory Spectral graph theory studies

Avoiding artifacts in spectral white matter fiber clustering and embedding Demian Wassermann

Predictions of Quantum Theory Quantum computing is possible There are non-abelian anyons

Relativizing the substructural hierarchy. [Partly based on joint work with a) A. Ciabattoni, K.

Proof, message and certificate CICM 2012 Bremen, Germany, july 2012 Andrea Asperti Dipartimento

Finding Structure with Randomness Joel A. Tropp Computing + Mathematical Sciences

Draft Lecture III notes for Les Houches 2014 Joel E. Moore, UC Berkeley and LBNL (Dated: August

Scalable methods for optimal control of systems governed by PDEs with random coefficient fields

ProofTheory: Logicaland Philosophical Aspects Class 3: BeyondSequents Greg Restall and Shawn

and Size of Social Networks via Random Walk Stephen J. Hardiman* Liran Katzir Capital Fund