Scalable Clustering of Signed Networks Using Balance Normalized Cut Kai-Yang Chiang, Joyce Jiyoung Whang, Inderjit S. Dhillon University of Texas at Austin The 21st ACM International Conference on Information and Knowledge Management (CIKM 2012) Oct. 29 - Nov. 2, 2012 Joyce Jiyoung Whang University of Texas at Austin
Contents Introduction Clustering of Unsigned Networks Signed Networks and Social Balance Clustering via Signed Laplacian k -way Signed Objectives for Clustering Multilevel Approach for Large-scale Signed Graph Clustering Experimental Results Conclusions Joyce Jiyoung Whang University of Texas at Austin
Introduction Social Networks Nodes: the individual actors Edges: the relationships (social interactions) between the actors Joyce Jiyoung Whang University of Texas at Austin
Introduction Signed Networks Positive relationship: friendship, collaboration Negative relationship: distrust, disagreement Clustering problem in signed networks Entities within the same cluster have a positive relationship. Entities between different clusters have a negative relationship. Contributions New k -way objectives and kernels for signed networks. Show equivalence between our new k -way objectives and a general weighted kernel k -means objective. Fast and scalable clustering algorithm for signed networks. Joyce Jiyoung Whang University of Texas at Austin
Clustering of Unsigned Networks Joyce Jiyoung Whang University of Texas at Austin
Graph Cuts on Unsigned Networks Ratio Cut objective Minimizes the number of edges between different clusters relative to the size of the cluster. The graph Laplacian L = D − A where D ii = � n j =1 A ij . � k � x T c L x c � min . x T c x c { x 1 ,..., x k }∈ I c =1 Under the special case k = 2, �� | π 2 | / | π 1 | , if node i ∈ π 1 , � x T L x � min , where x i = � x − | π 1 | / | π 2 | , if node i ∈ π 2 . Joyce Jiyoung Whang University of Texas at Austin
Graph Cuts on Unsigned Networks Ratio Association objective Maximizes the number of edges within clusters relative to the size of the cluster. � k � � x T c A x c 1 , if node i ∈ π c , � max , where x c ( i ) = x T c x c 0 , otherwise . { x 1 ,..., x k }∈ I c =1 Normalized Association and Normalized Cut objectives Normalized by the volume of each cluster. The volume of a cluster: the sum of degrees of nodes in the cluster. � k � k � � x T x T c A x c c L x c � � max ≡ min . x T c D x c x T c D x c { x 1 ,..., x k }∈ I { x 1 ,..., x k }∈ I c =1 c =1 Joyce Jiyoung Whang University of Texas at Austin
Weighted Kernel K -means A general weighted kernel k -means objective is equivalent to a weighted graph clustering objective . (Dhillon et al. 2007) Weighted kernel k -means Objective k � v i ∈ π c w i ϕ ( v i ) � � w i � ϕ ( v i ) − m c � 2 , where m c = min . � v i ∈ π c w i π 1 ...π k c =1 v i ∈ π c Algorithm Computes the closest centroid for every node, and assigns the node to the closest cluster. After all the nodes are considered, the centroids are updated. Given the Kernel matrix K , where K ji = � ϕ ( v j ) , ϕ ( v i ) � , 2 � � � j ∈ c w j K ji l ∈ c w j w l K jl j ∈ c D ( v i , m c ) = K ii − + . � ( � j ∈ c w j ) 2 j ∈ c w j Joyce Jiyoung Whang University of Texas at Austin
Signed Networks and Social Balance Joyce Jiyoung Whang University of Texas at Austin
Social Balance Certain configuration of positive and negative edges are more plausible than others. A friend of my friend is my friend. An enemy of my friend is my enemy. An enemy of my enemy is my friend. Joyce Jiyoung Whang University of Texas at Austin
Balance Theory A network is balanced iff (i) all of its edges are positive, or (ii) nodes can be clustered into two groups such that edges within groups are positive and edges between groups are negative. (Cartwright and Harary) Joyce Jiyoung Whang University of Texas at Austin
Weak Balance Theory Allows an enemy of one’s enemy to still be an enemy. A network is weakly balanced iff (i) all of its edges are positive, or (ii) nodes can be clustered into k groups such that edges within groups are positive and edges between groups are negative. (Davis 1967) Joyce Jiyoung Whang University of Texas at Austin
Clustering via Signed Laplacian Joyce Jiyoung Whang University of Texas at Austin
Signed Laplacian The signed Laplacian ¯ L = ¯ D − A where ¯ D is the diagonal absolute degree matrix, i.e., ¯ D ii = � n j =1 | A ij | . (Kunegis et al. 2010) ¯ L is always positive semidefinite: ∀ x ∈ R n , x T ¯ | A ij | ( x i − sgn( A ij ) x j ) 2 ≥ 0 . � L x = ( i , j ) k -way ratio cut for signed networks The sum of positive edge weights for edges that lie between different clusters and the sum of negative edge weights of all edges lie within the same cluster, normalized by each cluster’s size. Joyce Jiyoung Whang University of Texas at Austin
Signed Laplacian The 2-way signed ratio cut objective can be formulated as an optimization problem with a quadratic form: � x T ¯ � min L x , x where the 2-class indicator x has the following form: � 1 � � 2 ( | π 2 | / | π 1 | + | π 1 | / | π 2 | ) , if node i ∈ π 1 , x i = − 1 � � 2 ( | π 2 | / | π 1 | + | π 1 | / | π 2 | ) , if node i ∈ π 2 . Joyce Jiyoung Whang University of Texas at Austin
Extension of Signed Laplacian to k -way Clustering Extension to k -way objective � k � c ¯ x T L x c � min . x T c x c { x 1 ,..., x k }∈ I c =1 Theorem There does not exist any representation of { x 1 , ..., x k } such that the objective minimizes the general k-way signed ratio cut. This direct extension suffers a weakness. No matter how we select an indicator vector, we will always punish some desirable clustering patterns. Joyce Jiyoung Whang University of Texas at Austin
k -way Signed Objectives for Clustering Joyce Jiyoung Whang University of Texas at Austin
Proposed k -way Signed Objectives Adjacency matrix of a signed network > 0 , if relationship of ( i , j ) is positive , A ij < 0 , if relationship of ( i , j ) is negative , = 0 , if relationship of ( i , j ) is unknown . We can break A into its positive part A + and negative part A − . Formally, A + ij = max( A ij , 0) and A − ij = − min( A ij , 0). By this definition, we have A = A + − A − . Joyce Jiyoung Whang University of Texas at Austin
Proposed k -way Signed Objectives Overview of k -way signed objectives Joyce Jiyoung Whang University of Texas at Austin
Proposed k -way Signed Objectives Positive/Negative Ratio Association Positive Ratio Association � k � x T c A + x c � max . x T c x c { x 1 ,..., x k }∈ I c =1 Negative Ratio Association � k � c A − x c x T � min . x T c x c { x 1 ,..., x k }∈ I c =1 Joyce Jiyoung Whang University of Texas at Austin
Proposed k -way Signed Objectives Positive/Negative Ratio Cut Positive Ratio Cut Minimizes the number of positive edges between clusters. c ( D + − A + ) x c � k k � c L + x c x T x T � � min = , x T c x c x T c x c { x 1 ,..., x k }∈ I c =1 c =1 where D + is the diagonal degree matrix of A + . The Negative Ratio Cut can also be defined similarly. Joyce Jiyoung Whang University of Texas at Austin
Proposed k -way Signed Objectives (a) Balance Ratio Cut (b) Balance Ratio Association Balance Ratio Cut/Association Balance Ratio Cut � k c ( D + − A ) x c � x T � min . x T c x c { x 1 ,..., x k }∈ I c =1 Balance Ratio Association � k c ( D − + A ) x c � x T � max . x T c x c { x 1 ,..., x k }∈ I c =1 Joyce Jiyoung Whang University of Texas at Austin
Proposed k -way Signed Objectives Balance Normalized Cut Objectives normalized by cluster volume instead of by the number of nodes in the clusters. Balance Normalized Cut � k c ( D + − A ) x c � x T � min . c ¯ x T D x c { x 1 ,..., x k }∈ I c =1 Theorem Minimizing balance normalized cut is equivalent to maximizing balance normalized association. Joyce Jiyoung Whang University of Texas at Austin
Multilevel Approach for Large-scale Signed Graph Clustering Joyce Jiyoung Whang University of Texas at Austin
Equivalence of Objectives Equivalence between k -ways signed objectives and weighted kernel k -means objective Theorem (Equivalence of objectives) For any signed cut or association objective, there exists some corresponding weighted kernel k-means objective (with properly chosen kernel matrix), such that these two objectives are mathematically equivalent. We can use k -means like algorithm to optimize the objectives. Fast and scalable multilevel clustering algorithm for signed networks. Joyce Jiyoung Whang University of Texas at Austin
Multilevel Framework of Graph Clustering Overview Joyce Jiyoung Whang University of Texas at Austin
Recommend
More recommend