ECS 231 Introduction to Spectral Clustering 1 / 42
Motivation Image segmentation in computer vision 2 / 42
Motivation Community detection in network analysis 3 / 42
Outline I. Graph and graph Laplacian ◮ Graph ◮ Weighted graph ◮ Graph Laplacian II. Graph clustering ◮ Graph clustering ◮ Normalized cut ◮ Spectral clustering 4 / 42
I.1 Graph An (undirected) graph is G = ( V, E ) , where ◮ V = { v i } is a set of vertices; ◮ E = { ( v i , v j ) , v i , v j ∈ V } is a subset of V × V . Remarks: ◮ An edge is a pair { v i , v j } with v i � = v j (no self-loop); ◮ There is at most one edge from v i to v j (simple graph). 5 / 42
I.1 Graph ◮ For every vertex v i ∈ V , the degree d ( v i ) of v i is the number of edges adjacent to v : d ( v i ) = |{ v j ∈ V |{ v j , v i } ∈ E }| . ◮ Let d i = d ( v i ) , the degree matrix D = D ( G ) = diag ( d 1 , . . . , d n ) . 2 0 0 0 0 3 0 0 D = . 0 0 3 0 0 0 0 2 6 / 42
I.1 Graph ◮ Given a graph G = ( V, E ) , with | V | = n and | E | = m , the incidence matrix � D ( G ) of G is an n × m matrix with � 1 , if ∃ k s.t. e j = { v i , v k } ˜ d ij = 0 , otherwise e 1 e 2 e 3 e 4 e 5 v 1 1 1 0 0 0 v 2 1 0 1 1 0 � D ( G ) = v 3 0 1 1 0 1 v 4 0 0 0 1 1 7 / 42
I.1 Graph ◮ Given a graph G = ( V, E ) , with | V | = n and | E | = m , the adjacency matrix A ( G ) of G is a symmetric n × n matrix with � 1 , if { v i , v j } ∈ E a ij = . 0 , otherwise 0 1 1 0 1 0 1 1 A ( G ) = 1 1 0 1 0 1 1 0 8 / 42
I.2 Weighted graph A weighted graph is G = ( V, W ) where ◮ V = { v i } is a set of vertices and | V | = n ; ◮ W ∈ R n × n is called weight matrix with � w ji ≥ 0 if i � = j w ij = 0 if i = j The underlying graph of G is � G = ( V, E ) with E = {{ v i , v j }| w ij > 0 } . ◮ If w ij ∈ { 0 , 1 } , W = A , ◮ Since w ii = 0 , there is no self-loops in � G . 9 / 42
I.2 Weighted graph ◮ For every vertex v i ∈ V , the degree d ( v i ) of v i is the sum of the weights of the edges adjacent to v i : n � d ( v i ) = w ij . j =1 ◮ Let d i = d ( v i ) , the degree matrix D = D ( G ) = diag ( d 1 , . . . , d n ) . ◮ Let d = diag(D) and denote 1 = (1 , . . . , 1) T , then d = W 1 . 10 / 42
I.2 Weighted graph ◮ Given a subset of vertices A ⊆ V , we define the volume by � � � n . vol ( A ) = d ( v i ) = w ij j =1 v i ∈ A v i ∈ A ◮ If vol ( A ) = 0 , all the vertices in A are isolated. Example: If A = { v 1 , v 3 } , then vol ( A ) = d ( v 1 ) + d ( v 3 ) = ( w 12 + w 13 )+ ( w 31 + w 32 + w 34 ) 11 / 42
I.2 Weighted graph ◮ Given two subsets of vertices A, B ⊆ V , the links is defined by � links ( A, B ) = w ij . v i ∈ A,v j ∈ B Remarks: ◮ A and B are not necessarily distinct; ◮ Since W is symmetric, links ( A, B ) = links ( B, A ) ; ◮ vol ( A ) = links ( A, V ) . 12 / 42
I.2 Weighted graph ◮ The quantity cut ( A ) is defined by cut ( A ) = links ( A, V − A ) . ◮ The quantity assoc ( A ) is defined by assoc ( A ) = links ( A, A ) . Remarks: ◮ cut ( A ) measures how many links escape from A ; ◮ assoc ( A ) measures how many links stay within A ; ◮ cut ( A ) + assoc ( A ) = vol ( A ) . 13 / 42
I.3 Graph Laplacian Given a weighted graph G = ( V, W ) , the (graph) Laplacian L of G is defined by L = D − W. where D is the degree matrix of G , and D = diag ( W · 1 ) . 14 / 42
I.3 Graph Laplacian Properties of Laplacian n � 1. x T Lx = 1 w ij ( x i − x j ) 2 for ∀ x ∈ R n , 2 i,j =1 2. L ≥ 0 if w ij ≥ 0 for all i, j , 3. L · 1 = 0 , 4. If the underlying graph of G is connected, then 0 = λ 1 < λ 2 ≤ λ 3 ≤ . . . ≤ λ n , where λ i are the eigenvalues of L . 5. If the underlying graph of G is connected, then the dimension of the nullspace of L is 1. 15 / 42
I.3 Graph Laplacian Proof of Property 1. Since L = D − W , we have x T Lx = x T Dx − x T Wx n n � � d i x 2 = i − w ij x i x j i =1 i,j =1 n n n � � � = 1 d i x 2 d j x 2 2( i − 2 w ij x i x j + j ) i i,j =1 j =1 � n � n � n = 1 w ij x 2 w ij x 2 2( i − 2 w ij x i x j + j ) i,j =1 i,j =1 i,j =1 � n = 1 w ij ( x i − x j ) 2 . 2 i,j =1 16 / 42
I.3 Graph Laplacian Proof of Property 2. ◮ Since L T = D − W T = D − W = L , L is symmetric. � n ◮ Since x T Lx = 1 i,j =1 w ij ( x i − x j ) 2 and w ij ≥ 0 for all i, j , 2 we have x T Lx ≥ 0 . 17 / 42
I.3 Graph Laplacian Proof of Property 3. L · 1 = ( D − W ) 1 = D 1 − W 1 = d − d = 0 . Proofs of Properties 4 and 5 are skipped, see § 2.2 of [Gallier’13]. 18 / 42
Outline I. Graph and graph Laplacian ◮ Graph ◮ Weighted graph ◮ Graph Laplacian II. Graph clustering ◮ Graph clustering ◮ Normalized cut ◮ Spectral clustering 19 / 42
II.1 Graph clustering k -way partitioning: given a weighted graph G = ( V, W ) , find a partition A 1 , A 2 , . . . , A k of V , such that ◮ A 1 ∪ A 2 ∪ . . . ∪ A k = V ; ◮ A 1 ∩ A 2 ∩ . . . ∩ A k = ∅ ; ◮ for any i and j , the edges between ( A i , A j ) have low weight and the edges within A i have high weight. If k = 2 , it is a two-way partitioning . 20 / 42
II.1 Graph clustering ◮ Recall: (two-way) cut: � cut ( A ) = links ( A, V − A ) = w ij v i ∈ A, v j ∈ V − A 21 / 42
II.1 Graph clustering problems The mincut is defined by � min cut ( A ) = min w ij . A v i ∈ A, v j ∈ V − A In practice, the mincut typically yields unbalanced partitions. min cut ( A ) = 1 + 2 = 3 ; 22 / 42
II.2 Normalized cut The normalized cut 1 is defined by vol ( A ) + cut ( ¯ Ncut ( A ) = cut ( A ) A ) A ) . vol ( ¯ where ¯ A = V − A . 1 Jianbo Shi and Jitendra Malik, 2000 23 / 42
II.2 Normalized cut Minimal Ncut: min Ncut ( A ) Example: 4 3+6+6+3 = 4 4 min Ncut ( A ) = 3+6+6+3 + 9 . 24 / 42
II.2 Normalized cut Let x = ( x 1 , . . . , x n ) be the indicator vector , such that � 1 if v i ∈ A x i = if v i ∈ ¯ − 1 A = V − A Then 1. ( 1 + x ) T D ( 1 + x ) = 4 � v i ∈ A d i = 4 · vol ( A ) ; 2. ( 1 + x ) T W ( 1 + x ) = 4 � v i ∈ A,v j ∈ A w ij = 4 · assoc ( A ) . 3. ( 1 + x ) T L ( 1 + x ) = 4 · ( vol ( A ) − assoc ( A )) = 4 · cut ( A ) ; 4. ( 1 − x ) T D ( 1 − x ) = 4 � A d i = 4 · vol ( ¯ A ) ; v i ∈ ¯ 5. ( 1 − x ) T W ( 1 − x ) = 4 � A w ij = 4 · assoc ( ¯ A ) . v i ∈ ¯ A,v j ∈ ¯ 6. ( 1 − x ) T L ( 1 − x ) = 4 · ( vol ( ¯ A ) − assoc ( ¯ A )) = 4 · cut ( ¯ A ) . 7. vol ( V ) = 1 T D 1 . 25 / 42
II.2 Normalized cut ◮ With the above basic properties, Ncut ( A ) can now be written as � ( 1 + x ) T L ( 1 + x ) � + ( 1 − x ) T L ( 1 − x ) Ncut ( A ) = 1 k ( 1 T D 1 ) (1 − k )( 1 T D 1 ) 4 4 · (( 1 + x ) − b ( 1 − x )) T L (( 1 + x ) − b ( 1 − x )) = 1 . b ( 1 T D 1 ) where k = vol ( A ) / vol ( V ) , b = k/ (1 − k ) and vol ( V ) = 1 T D 1 . ◮ Let y = ( 1 + x ) − b ( 1 − x ) , we have y T Ly Ncut ( A ) = 1 4 · b ( 1 T D 1 ) where � 2 if v i ∈ A y i = A . if v i ∈ ¯ − 2 b 26 / 42
II.2 Normalized cut ◮ Since b = k/ (1 − k ) = vol ( A ) / vol ( ¯ A ) , we have � d i + b 2 � 1 4( y T Dy ) = d i = vol ( A ) + b 2 vol ( ¯ A ) v i ∈ A v i ∈ ¯ A A ) + vol ( A )) = b · ( 1 T D 1 ) . = b ( vol ( ¯ ◮ In addition, � � y T D 1 = y T d = 2 · d i − 2 b · d i v i ∈ ¯ v i ∈ A A = 2 · vol ( A ) − 2 b · vol ( ¯ A ) = 0 27 / 42
II.2 Normalized cut In summary, the minimal normalized cut is to solve the following binary optimization : y T Ly min (1) y T Dy y s.t. y ( i ) ∈ { 2 , − 2 b } y T D 1 = 0 By Relaxation, we solve y T Ly min (2) y T Dy y y ∈ R n s.t. y T D 1 = 0 28 / 42
II.2 Normalized cut Variational principle ◮ Let A, B ∈ R n × n , A T = A , B T = B > 0 and λ 1 ≤ λ 2 ≤ . . . λ n be the eigenvalues of Au = λBu with corresponding eigenvectors u 1 , u 2 , . . . , u n , ◮ then x T Ax x T Ax min x T Bx = λ 1 , arg min x T Bx = u 1 x x and x T Ax x T Ax min x T Bx = λ 2 , arg min x T Bx = u 2 . x T Bu 1 =0 x T Bu 1 =0 ◮ More general form exists. 29 / 42
II.2 Normalized cut ◮ For the matrix pair ( L, D ) , it is known that ( λ 1 , y 1 ) = (0 , 1 ) . ◮ By the variational principle, the relaxed minimal Ncut (2) is equivalent to finding the second smallest eigenpair ( λ 2 , y 2 ) of Ly = λDy (3) Remarks: ◮ L is extremely sparse and D is diagonal; ◮ Precision requirement for eigenvectors is low, say O (10 − 3 ) . 30 / 42
II.2 Normalized cut Image segmentation: original graph 31 / 42
II.2 Normalized cut Image segmentation: heatmap of eigenvectors 32 / 42
II.2 Normalized cut Image segmentation: result of min Ncut 33 / 42
II.3 Spectral clustering Ncut remaining issues ◮ Once the indicator vector is computed, how to search the splitting point that the resulting partition has the minimal Ncut ( A ) value? ◮ How to use the extreme eigenvectors to do the k -way partitioning? The above two problems are addressed in spectral clustering algorithm. 34 / 42
Recommend
More recommend