1-means clustering and conductance Twan van Laarhoven Radboud University Nijmegen, The Netherlands Institute for Computing and Information Sciences November 11th, 2016 1 / 30
Outline Network community detection with conductance The relation to k -means clustering Algorithms Experiments Conclusions 2 / 30
Outline Network community detection with conductance The relation to k -means clustering Algorithms Experiments Conclusions 3 / 30
Network community detection Global community detection Given a network, find all tightly connected sets of nodes (communities). Local community detection Given a network and a seed node, find the community/communities containing that seed. Without inspecting the whole graph. 4 / 30
Network community detection Global community detection Given a network, find all tightly connected sets of nodes (communities). Local community detection Given a network and a seed node, find the community/communities containing that seed. Without inspecting the whole graph. 4 / 30
Communities as optima Graphs G = ( V , E ), a ij = a ji = 1 if ( i , j ) ∈ E else 0 Score function φ G : C ( G ) → R Note: I’ll consider sets and vectors interchangeably, so C ( G ) = P ( V ) or C ( G ) = R V . 5 / 30
Conductance Definition Fraction of incident edges leaving the community φ ( c ) = # { ( i , j ) ∈ E | i ∈ c , j / ∈ c } # { ( i , j ) ∈ E | i ∈ c , j ∈ V } , or � i , j ∈ V c i a ij c j φ ( c ) = 1 − � i , j ∈ V c i a ij where c i ∈ { 0 , 1 } . Very popular objective for finding network communities. 6 / 30
Conductance Definition Fraction of incident edges leaving the community φ ( c ) = # { ( i , j ) ∈ E | i ∈ c , j / ∈ c } # { ( i , j ) ∈ E | i ∈ c , j ∈ V } , or � i , j ∈ V c i a ij c j φ ( c ) = 1 − � i , j ∈ V c i a ij where c i ∈ { 0 , 1 } . Very popular objective for finding network communities. 6 / 30
Conductance Definition Fraction of incident edges leaving the community φ ( c ) = # { ( i , j ) ∈ E | i ∈ c , j / ∈ c } # { ( i , j ) ∈ E | i ∈ c , j ∈ V } , or � i , j ∈ V c i a ij c j φ ( c ) = 1 − � i , j ∈ V c i a ij where c i ∈ { 0 , 1 } . Very popular objective for finding network communities. 6 / 30
Continuous optimization As an optimization problem minimize φ ( c ) c subject to c i ∈ { 0 , 1 } for all i . Karush-Kuhn-Tucker conditions c is a local optimum if for all c i 0 ≤ c i ≤ 1, and ∂φ ( c ) ≤ 0 if c i ≤ 1, and ∂ c i ∂φ ( c ) ≥ 0 if c i ≥ 0 . ∂ c i 7 / 30
Continuous optimization As an optimization problem minimize φ ( c ) c subject to 0 ≤ c i ≤ 1 for all i . Karush-Kuhn-Tucker conditions c is a local optimum if for all c i 0 ≤ c i ≤ 1, and ∂φ ( c ) ≤ 0 if c i ≤ 1, and ∂ c i ∂φ ( c ) ≥ 0 if c i ≥ 0 . ∂ c i 7 / 30
Continuous optimization As an optimization problem minimize φ ( c ) c subject to 0 ≤ c i ≤ 1 for all i . Karush-Kuhn-Tucker conditions c is a local optimum if for all c i 0 ≤ c i ≤ 1, and ∂φ ( c ) ≤ 0 if c i ≤ 1, and ∂ c i ∂φ ( c ) ≥ 0 if c i ≥ 0 . ∂ c i 7 / 30
Continuous optimization As an optimization problem minimize φ ( c ) c subject to 0 ≤ c i ≤ 1 for all i . Karush-Kuhn-Tucker conditions c is a local optimum if for all c i 0 ≤ c i ≤ 1 ∇ φ ( c ) i ≥ 0 if c i = 0 ∇ φ ( c ) i = 0 if 0 < c i < 1 , ∇ φ ( c ) i ≤ 0 if c i = 1 . 7 / 30
Local optima Local optima are discrete If c as a strict local minimum of φ , then c i ∈ { 0 , 1 } for all i . Proof sketch Look at φ as a function of a single c i : φ ( c i ) = α 1 + α 2 c i + α 3 c 2 i . α 4 + α 5 c i If 0 < c i < 1 and φ ′ ( c i ) = 0, then φ ′′ ( c i ) = 2 α 3 / ( α 4 + α 5 c i ) 3 ≥ 0. 8 / 30
Local optima Local optima are discrete If c as a strict local minimum of φ , then c i ∈ { 0 , 1 } for all i . Proof sketch Look at φ as a function of a single c i : φ ( c i ) = α 1 + α 2 c i + α 3 c 2 i . α 4 + α 5 c i If 0 < c i < 1 and φ ′ ( c i ) = 0, then φ ′′ ( c i ) = 2 α 3 / ( α 4 + α 5 c i ) 3 ≥ 0. 8 / 30
Outline Network community detection with conductance The relation to k -means clustering Algorithms Experiments Conclusions 9 / 30
k -means clustering k -means clustering n k � � c ij � x i − µ j � 2 minimize 2 c i =1 j =1 Subject to the constraint that exactly one c ij is 1 for every i . 1-means clustering � c i � x i − µ � 2 � � minimize w i 2 + (1 − c i ) λ i c i 10 / 30
k -means clustering weighted k -means clustering n k � � w i c ij � x i − µ j � 2 minimize 2 c i =1 j =1 Subject to the constraint that exactly one c ij is 1 for every i . 1-means clustering � c i � x i − µ � 2 � � minimize w i 2 + (1 − c i ) λ i c i 10 / 30
k -means clustering weighted k -means clustering n k � � w i c ij � x i − µ j � 2 minimize 2 c i =1 j =1 Subject to the constraint that exactly one c ij is 1 for every i . 1-means clustering � c i � x i − µ � 2 � � minimize w i 2 + (1 − c i ) λ i c i 10 / 30
k -means clustering (cont.) Optimal µ Fix cluster assignment c i , then � i w i c i x i µ = . � i w i c i Optimal c Fix µ , then c i is 1 if � x i − µ � < λ i , and 0 otherwise. 11 / 30
Kernel k -means clustering Kernels K ( i , j ) = � x i , x j � so � x i − x j � 2 2 = K ( i , i ) + K ( j , j ) − 2 K ( i , j ). Implicit centroid The centroid is then a linear combination of points, µ = � i µ i x i , giving � � � x i − µ � 2 2 = K ( i , i ) − 2 µ j K ( i , j ) + µ j K ( j , k ) µ k . j j , k Optimal µ becomes w i c i µ i = . � j w j c j 12 / 30
Kernel k -means clustering Kernels K ( i , j ) = � x i , x j � so � x i − x j � 2 2 = K ( i , i ) + K ( j , j ) − 2 K ( i , j ). Implicit centroid The centroid is then a linear combination of points, µ = � i µ i x i , giving � � � x i − µ � 2 2 = K ( i , i ) − 2 µ j K ( i , j ) + µ j K ( j , k ) µ k . j j , k Optimal µ becomes w i c i µ i = . � j w j c j 12 / 30
Kernel k -means clustering (cont.) Implicit centroid � � � x i − µ � 2 2 = K ( i , i ) − 2 µ j K ( i , j ) + µ j K ( j , k ) µ k . j j , k w i c i µ i = . � j w j c j 1 -means objective � c i � x i − µ � 2 � � minimize 2 + (1 − c i ) λ i w i c i 13 / 30
Kernel k -means clustering (cont.) Implicit centroid � � � x i − µ � 2 2 = K ( i , i ) − 2 µ j K ( i , j ) + µ j K ( j , k ) µ k . j j , k w i c i µ i = . � j w j c j 1 -means objective � � � minimize w i c i K ( i , i ) − 2 w i c i µ j K ( i , j ) c i j � � + w i c i µ j K ( j , k ) µ k + w i (1 − c i ) λ i j , k 13 / 30
Kernel k -means clustering (cont.) Implicit centroid � � � x i − µ � 2 2 = K ( i , i ) − 2 µ j K ( i , j ) + µ j K ( j , k ) µ k . j j , k w i c i µ i = . � j w j c j 1 -means objective � � minimize w i c i ( K ( i , i ) − λ i ) + w i λ i c i i � i , j w i c i w j c j K ( i , j ) − . � i w i c i 13 / 30
Kernel k -means clustering (cont.) Implicit centroid � � � x i − µ � 2 2 = K ( i , i ) − 2 µ j K ( i , j ) + µ j K ( j , k ) µ k . j j , k w i c i µ i = . � j w j c j 1 -means objective � i , j w i c i w j c j K ( i , j ) minimize 1 − , � i w i c i c taking λ i = K ( i , i ). 13 / 30
What is the kernel? Idea K = W − 1 AW − 1 , � w i = a ij j turns the objective into � i , j c i c j a ij minimize 1 − = φ ( c ) , � i , j c i a ij c We get conductance! But this kernel is not positive definite. 14 / 30
What is the kernel? Idea K = W − 1 AW − 1 , � w i = a ij j turns the objective into � i , j c i c j a ij minimize 1 − = φ ( c ) , � i , j c i a ij c We get conductance! But this kernel is not positive definite. 14 / 30
What is the kernel? Idea K = W − 1 AW − 1 , � w i = a ij j turns the objective into � i , j c i c j a ij minimize 1 − = φ ( c ) , � i , j c i a ij c We get conductance! But this kernel is not positive definite. 14 / 30
Positive definite kernel Add a diagonal K = W − 1 AW − 1 + σ W − 1 The objective becomes � � i , j c 2 i , j c i c j a ij i a ij minimize 1 − − σ = φ σ ( c ) . � � ij c i a ij ij c i a ij c When c i ∈ { 0 , 1 } , c 2 i = c i , so the last term is constant. 15 / 30
Positive definite kernel Add a diagonal K = W − 1 AW − 1 + σ W − 1 The objective becomes � � i , j c 2 i , j c i c j a ij i a ij minimize 1 − − σ = φ σ ( c ) . � � ij c i a ij ij c i a ij c When c i ∈ { 0 , 1 } , c 2 i = c i , so the last term is constant. 15 / 30
A look at local optima Relaxing the optimization problem minimize φ σ ( c ) (1) subject to 0 ≤ c i ≤ 1 for all i ∈ V . Theorem When σ ≥ 2, every discrete community c is a local optimum of (1). In practice Higher σ ⇒ more clusters are local optima. 16 / 30
A look at local optima Relaxing the optimization problem minimize φ σ ( c ) (1) subject to 0 ≤ c i ≤ 1 for all i ∈ V . Theorem When σ ≥ 2, every discrete community c is a local optimum of (1). In practice Higher σ ⇒ more clusters are local optima. 16 / 30
Recommend
More recommend