Introduction to Spectral Clustering 1 / 42 Motivation Image - PowerPoint PPT Presentation

ECS 231 Introduction to Spectral Clustering 1 / 42

Motivation Image segmentation in computer vision 2 / 42

Motivation Community detection in network analysis 3 / 42

Outline I. Graph and graph Laplacian ◮ Graph ◮ Weighted graph ◮ Graph Laplacian II. Graph clustering ◮ Graph clustering ◮ Normalized cut ◮ Spectral clustering 4 / 42

I.1 Graph An (undirected) graph is G = ( V, E ) , where ◮ V = { v i } is a set of vertices; ◮ E = { ( v i , v j ) , v i , v j ∈ V } is a subset of V × V . Remarks: ◮ An edge is a pair { v i , v j } with v i � = v j (no self-loop); ◮ There is at most one edge from v i to v j (simple graph). 5 / 42

I.1 Graph ◮ For every vertex v i ∈ V , the degree d ( v i ) of v i is the number of edges adjacent to v : d ( v i ) = |{ v j ∈ V |{ v j , v i } ∈ E }| . ◮ Let d i = d ( v i ) , the degree matrix D = D ( G ) = diag ( d 1 , . . . , d n ) .   2 0 0 0   0 3 0 0   D =  .  0 0 3 0 0 0 0 2 6 / 42

I.1 Graph ◮ Given a graph G = ( V, E ) , with | V | = n and | E | = m , the incidence matrix � D ( G ) of G is an n × m matrix with � 1 , if ∃ k s.t. e j = { v i , v k } ˜ d ij = 0 , otherwise e 1 e 2 e 3 e 4 e 5   v 1 1 1 0 0 0 v 2 1 0 1 1 0   � D ( G ) =   v 3 0 1 1 0 1 v 4 0 0 0 1 1 7 / 42

I.1 Graph ◮ Given a graph G = ( V, E ) , with | V | = n and | E | = m , the adjacency matrix A ( G ) of G is a symmetric n × n matrix with � 1 , if { v i , v j } ∈ E a ij = . 0 , otherwise   0 1 1 0   1 0 1 1   A ( G ) =   1 1 0 1 0 1 1 0 8 / 42

I.2 Weighted graph A weighted graph is G = ( V, W ) where ◮ V = { v i } is a set of vertices and | V | = n ; ◮ W ∈ R n × n is called weight matrix with � w ji ≥ 0 if i � = j w ij = 0 if i = j The underlying graph of G is � G = ( V, E ) with E = {{ v i , v j }| w ij > 0 } . ◮ If w ij ∈ { 0 , 1 } , W = A , ◮ Since w ii = 0 , there is no self-loops in � G . 9 / 42

I.2 Weighted graph ◮ For every vertex v i ∈ V , the degree d ( v i ) of v i is the sum of the weights of the edges adjacent to v i : n � d ( v i ) = w ij . j =1 ◮ Let d i = d ( v i ) , the degree matrix D = D ( G ) = diag ( d 1 , . . . , d n ) . ◮ Let d = diag(D) and denote 1 = (1 , . . . , 1) T , then d = W 1 . 10 / 42

I.2 Weighted graph ◮ Given a subset of vertices A ⊆ V , we define the volume by   � � � n  .  vol ( A ) = d ( v i ) = w ij j =1 v i ∈ A v i ∈ A ◮ If vol ( A ) = 0 , all the vertices in A are isolated. Example: If A = { v 1 , v 3 } , then vol ( A ) = d ( v 1 ) + d ( v 3 ) = ( w 12 + w 13 )+ ( w 31 + w 32 + w 34 ) 11 / 42

I.2 Weighted graph ◮ Given two subsets of vertices A, B ⊆ V , the links is defined by � links ( A, B ) = w ij . v i ∈ A,v j ∈ B Remarks: ◮ A and B are not necessarily distinct; ◮ Since W is symmetric, links ( A, B ) = links ( B, A ) ; ◮ vol ( A ) = links ( A, V ) . 12 / 42

I.2 Weighted graph ◮ The quantity cut ( A ) is defined by cut ( A ) = links ( A, V − A ) . ◮ The quantity assoc ( A ) is defined by assoc ( A ) = links ( A, A ) . Remarks: ◮ cut ( A ) measures how many links escape from A ; ◮ assoc ( A ) measures how many links stay within A ; ◮ cut ( A ) + assoc ( A ) = vol ( A ) . 13 / 42

I.3 Graph Laplacian Given a weighted graph G = ( V, W ) , the (graph) Laplacian L of G is defined by L = D − W. where D is the degree matrix of G , and D = diag ( W · 1 ) . 14 / 42

I.3 Graph Laplacian Properties of Laplacian n � 1. x T Lx = 1 w ij ( x i − x j ) 2 for ∀ x ∈ R n , 2 i,j =1 2. L ≥ 0 if w ij ≥ 0 for all i, j , 3. L · 1 = 0 , 4. If the underlying graph of G is connected, then 0 = λ 1 < λ 2 ≤ λ 3 ≤ . . . ≤ λ n , where λ i are the eigenvalues of L . 5. If the underlying graph of G is connected, then the dimension of the nullspace of L is 1. 15 / 42

I.3 Graph Laplacian Proof of Property 1. Since L = D − W , we have x T Lx = x T Dx − x T Wx n n � � d i x 2 = i − w ij x i x j i =1 i,j =1 n n n � � � = 1 d i x 2 d j x 2 2( i − 2 w ij x i x j + j ) i i,j =1 j =1 � n � n � n = 1 w ij x 2 w ij x 2 2( i − 2 w ij x i x j + j ) i,j =1 i,j =1 i,j =1 � n = 1 w ij ( x i − x j ) 2 . 2 i,j =1 16 / 42

I.3 Graph Laplacian Proof of Property 2. ◮ Since L T = D − W T = D − W = L , L is symmetric. � n ◮ Since x T Lx = 1 i,j =1 w ij ( x i − x j ) 2 and w ij ≥ 0 for all i, j , 2 we have x T Lx ≥ 0 . 17 / 42

I.3 Graph Laplacian Proof of Property 3. L · 1 = ( D − W ) 1 = D 1 − W 1 = d − d = 0 . Proofs of Properties 4 and 5 are skipped, see § 2.2 of [Gallier’13]. 18 / 42

Outline I. Graph and graph Laplacian ◮ Graph ◮ Weighted graph ◮ Graph Laplacian II. Graph clustering ◮ Graph clustering ◮ Normalized cut ◮ Spectral clustering 19 / 42

II.1 Graph clustering k -way partitioning: given a weighted graph G = ( V, W ) , find a partition A 1 , A 2 , . . . , A k of V , such that ◮ A 1 ∪ A 2 ∪ . . . ∪ A k = V ; ◮ A 1 ∩ A 2 ∩ . . . ∩ A k = ∅ ; ◮ for any i and j , the edges between ( A i , A j ) have low weight and the edges within A i have high weight. If k = 2 , it is a two-way partitioning . 20 / 42

II.1 Graph clustering ◮ Recall: (two-way) cut: � cut ( A ) = links ( A, V − A ) = w ij v i ∈ A, v j ∈ V − A 21 / 42

II.1 Graph clustering problems The mincut is defined by � min cut ( A ) = min w ij . A v i ∈ A, v j ∈ V − A In practice, the mincut typically yields unbalanced partitions. min cut ( A ) = 1 + 2 = 3 ; 22 / 42

II.2 Normalized cut The normalized cut 1 is defined by vol ( A ) + cut ( ¯ Ncut ( A ) = cut ( A ) A ) A ) . vol ( ¯ where ¯ A = V − A . 1 Jianbo Shi and Jitendra Malik, 2000 23 / 42

II.2 Normalized cut Minimal Ncut: min Ncut ( A ) Example: 4 3+6+6+3 = 4 4 min Ncut ( A ) = 3+6+6+3 + 9 . 24 / 42

II.2 Normalized cut Let x = ( x 1 , . . . , x n ) be the indicator vector , such that � 1 if v i ∈ A x i = if v i ∈ ¯ − 1 A = V − A Then 1. ( 1 + x ) T D ( 1 + x ) = 4 � v i ∈ A d i = 4 · vol ( A ) ; 2. ( 1 + x ) T W ( 1 + x ) = 4 � v i ∈ A,v j ∈ A w ij = 4 · assoc ( A ) . 3. ( 1 + x ) T L ( 1 + x ) = 4 · ( vol ( A ) − assoc ( A )) = 4 · cut ( A ) ; 4. ( 1 − x ) T D ( 1 − x ) = 4 � A d i = 4 · vol ( ¯ A ) ; v i ∈ ¯ 5. ( 1 − x ) T W ( 1 − x ) = 4 � A w ij = 4 · assoc ( ¯ A ) . v i ∈ ¯ A,v j ∈ ¯ 6. ( 1 − x ) T L ( 1 − x ) = 4 · ( vol ( ¯ A ) − assoc ( ¯ A )) = 4 · cut ( ¯ A ) . 7. vol ( V ) = 1 T D 1 . 25 / 42

II.2 Normalized cut ◮ With the above basic properties, Ncut ( A ) can now be written as � ( 1 + x ) T L ( 1 + x ) � + ( 1 − x ) T L ( 1 − x ) Ncut ( A ) = 1 k ( 1 T D 1 ) (1 − k )( 1 T D 1 ) 4 4 · (( 1 + x ) − b ( 1 − x )) T L (( 1 + x ) − b ( 1 − x )) = 1 . b ( 1 T D 1 ) where k = vol ( A ) / vol ( V ) , b = k/ (1 − k ) and vol ( V ) = 1 T D 1 . ◮ Let y = ( 1 + x ) − b ( 1 − x ) , we have y T Ly Ncut ( A ) = 1 4 · b ( 1 T D 1 ) where � 2 if v i ∈ A y i = A . if v i ∈ ¯ − 2 b 26 / 42

II.2 Normalized cut ◮ Since b = k/ (1 − k ) = vol ( A ) / vol ( ¯ A ) , we have � d i + b 2 � 1 4( y T Dy ) = d i = vol ( A ) + b 2 vol ( ¯ A ) v i ∈ A v i ∈ ¯ A A ) + vol ( A )) = b · ( 1 T D 1 ) . = b ( vol ( ¯ ◮ In addition, � � y T D 1 = y T d = 2 · d i − 2 b · d i v i ∈ ¯ v i ∈ A A = 2 · vol ( A ) − 2 b · vol ( ¯ A ) = 0 27 / 42

II.2 Normalized cut In summary, the minimal normalized cut is to solve the following binary optimization : y T Ly min (1) y T Dy y s.t. y ( i ) ∈ { 2 , − 2 b } y T D 1 = 0 By Relaxation, we solve y T Ly min (2) y T Dy y y ∈ R n s.t. y T D 1 = 0 28 / 42

II.2 Normalized cut Variational principle ◮ Let A, B ∈ R n × n , A T = A , B T = B > 0 and λ 1 ≤ λ 2 ≤ . . . λ n be the eigenvalues of Au = λBu with corresponding eigenvectors u 1 , u 2 , . . . , u n , ◮ then x T Ax x T Ax min x T Bx = λ 1 , arg min x T Bx = u 1 x x and x T Ax x T Ax min x T Bx = λ 2 , arg min x T Bx = u 2 . x T Bu 1 =0 x T Bu 1 =0 ◮ More general form exists. 29 / 42

II.2 Normalized cut ◮ For the matrix pair ( L, D ) , it is known that ( λ 1 , y 1 ) = (0 , 1 ) . ◮ By the variational principle, the relaxed minimal Ncut (2) is equivalent to finding the second smallest eigenpair ( λ 2 , y 2 ) of Ly = λDy (3) Remarks: ◮ L is extremely sparse and D is diagonal; ◮ Precision requirement for eigenvectors is low, say O (10 − 3 ) . 30 / 42

II.2 Normalized cut Image segmentation: original graph 31 / 42

II.2 Normalized cut Image segmentation: heatmap of eigenvectors 32 / 42

II.2 Normalized cut Image segmentation: result of min Ncut 33 / 42

II.3 Spectral clustering Ncut remaining issues ◮ Once the indicator vector is computed, how to search the splitting point that the resulting partition has the minimal Ncut ( A ) value? ◮ How to use the extreme eigenvectors to do the k -way partitioning? The above two problems are addressed in spectral clustering algorithm. 34 / 42

Introduction to Spectral Clustering 1 / 42 Motivation Image - PowerPoint PPT Presentation

ECS 231 Introduction to Spectral Clustering 1 / 42 Motivation Image segmentation in computer vision 2 / 42 Motivation Community detection in network analysis 3 / 42 Outline I. Graph and graph Laplacian Graph Weighted graph

Spectral Clustering Spectral Clustering? Spectral methods Methods using eigenvectors of

Poster #190 1 Spectral Clustering of Signed Graphs Poster #190 Our Goal: Extend Spectral

Lecture 23: Spectral clustering Hierarchical clustering What is a good clustering?

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Spectral Clustering Lecture 16 David Sontag New York

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Clustering A Categorization of Major Clustering Methods Partitioning Methods

An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 An Introduction to Spectral

CSCE 478/878 Lecture 8: Stephen Scott Clustering Introduction Outline Clustering Stephen

Avoiding artifacts in spectral white matter fiber clustering and embedding Demian Wassermann

Guarantees for Spectral Clustering with Fairness Constraints Matthus Kleindessner, Samira Samadi

Clustering Hierarchical clustering, k-mean clustering Genome 559: Introduction to Statistical and

Introduction to Machine Learning, Clustering and EM Barnab s P czos Contents Clustering

Trust based Clustering for Group Trust based Clustering for Group Trust based Clustering for

Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering

Rough paths methods 1: Introduction Samy Tindel Purdue University University of Aarhus 2016

Positive semidefinite rank Hamza Fawzi (MIT, LIDS) Joint work with Jo ao Gouveia (Coimbra),

MATH 612 Computational methods for equation solving and function minimization Week # 3

MATH 612 Computational methods for equation solving and function minimization Week # 4

Towards Model Fusion in Traditional Methods . . . Geophysics: How to How to Estimate . . . How

Constraint Satisfaction Problems Multi-dimensional Selection Problems Given a set of

Numerical Reduced Order Modeling for Wave Equations in Heterogeneous Media Tom Hagstrom Southern

Totally Disconnected L.C. Groups: Tidy subgroups and the scale George Willis The University of