Hybrid Clustering of multi-view data via MLSVD Xinhai Liu, Lieven De Lathauwer, Wolfgang Gl¨ anzel, Bart De Moor ESAT-SCD Katholieke Universiteit Leuven TDA, September, 14, 2010, Bari, Italy
Outline Introduction Hybrid clustering of multi-view data Experiments Discussion and Outlook Acknowledgement
Outline Introduction Hybrid clustering of multi-view data Experiments Discussion and Outlook Acknowledgement
Introduction Motivation ◮ Booming demand: grouping multi-view data for better partition (Web mining, Social network, Literature analysis). ◮ Clustering methods ◮ Most methods: single-view data ◮ Hybrid clustering: multi-view data ◮ Tensor methods ◮ powerful tool to handle multi-way data sources. ◮ multi-linear singular value decomposition (MLSVD) (Tucker, 1964 & 1966; De Lathauwer et al, 2000a)
Spectral clustering Spectral projection by each 2D single -view Original Data Hybrid clustering Spectral projection by MSVD Figure: Demo of a hybrid clustering by MLSVD on synthetic 3D data sets
Introduction Related work ◮ Hybrid clustering: multiple kernel fusion (MKF)(Joachims et al, 2001) and clustering ensemble (Strehl & Ghosh, 2002) ◮ MLSVD based clustering on image recognition (Huang & Ding, 2008) ◮ Multi-way latent semantic analysis (Sun et al, 2006) ◮ CANDECOMP/PARAFAC (CP): Scientific publication data with multiple linkage (Dunlavy, Kolda, et al, 2006; Selee, Kolda et al, 2007)
Introduction Main contributions ◮ An extendable framework of hybrid clustering based on MLSVD ◮ Modelling the multi-view data as a tensor ◮ Seeking a joint optimal subspace by tensor analysis ◮ Two novel clustering algorithms: AHC-MLSVD and WHC-HOOI. ◮ Experiments on both synthetic data and real Application on Web of Science (WoS) journal database.
Outline Introduction Hybrid clustering of multi-view data Experiments Discussion and Outlook Acknowledgement
Hybrid clustering Spectral clustering Given S ∈ R N × N , the affine matrix (similarity matrix) of a graph G ; D , the degree matrix; our Laplacian matrix L = D − 1 / 2 SD − 1 / 2 (1) Let an relaxed indicator matrix be U , U ∈ R N × M , M is the number of clusters tr ( U T LU ) , max U (2) s.t. U T U = I . Eigenvalue decomposition of matrix L : the solution of spectral clustering (Luxburg, 2007)
Hybrid clustering Concept overview . . . Data source 1 Data source 2 Data source n Data source . . . L (1) L (2) L (3) L Laplacian matrix 1 Laplacian matrix 2 Laplacian matrix n Laplacian matrix Matrix decomposition Laplacian tensor Subject Optimal Objects U subspace Tensor decomposition Clustering Weighting vector U T Optimal Optimal U subspace subspace Clustering Hybrid clustering based Spectral Clustering on MLSVD based on SVD
Hybrid clustering Laplacian tensor From a set of K Laplacian matrices L ( i ) ∈ R N × N , i = 1 , ..., K to a Laplacian tensor A ∈ R N × N × K N Objects N Objects Laplacian View K Matrix Multiple K N views ... Objects Objects Laplacian N Tensor N Objects Laplacian View 2 Matrix View 1 N Objects Figure: The formulation of a Laplacian tensor
Hybrid clustering AHC-MLSVD Averaging multi-view data for joint analysis: Identity matrix K I K K Views Tensor N Objects M Subjects Decomposition K N Objects N Objects N Objects M Subjects M U T Laplacian U Tensor Core M Tensor Joint optimal subspace Figure: Average hybrid clustering of multi-view data U ∈ R N × M , the joint optimal subspace I ∈ R K × K , an indentity matrix.
Hybrid clustering AHC-MLSVD The optimization of average hybrid clustering, 2 U �A × 1 U T × 2 U T × 3 I � max F , (3) s.t. U T U = I . The solution of MLSVD (Tucker, 1964 & 1966; De Lathauwer et al, 2000a) ◮ An approximate solution ◮ Usually satisfied results ◮ An upper bound on the approximation error
Hybrid clustering WHC-HOOI Taking the effect of each single-view data into account Weighting Vector K W 1 K Views Tensor N Objects M Subjects Decomposition 1 N Objects N Objects N Objects M Subjects M Laplacian U T U Tensor Core M Tensor Joint optimal subspace Figure: Weighted hybrid clustering of multi-view data W = { α 1 , α 2 , · · · , α K } T : the weighting factor of each view.
Hybrid clustering WHC-HOOI The equivalent optimization of weighted hybrid clustering U , W �A × 1 U T × 2 U T × 3 W T � 2 max F , (4) s.t. U T U = I and W T W = 1 . The solution of higher-order orthogonal iteration (HOOI) (Kroonenberg & De Leeuw, 1980; De Lathauwer et al, 2000b) ◮ An optimal solution ◮ An appropriate weight for each view data ◮ Other tensor methods
Outline Introduction Hybrid clustering of multi-view data Experiments Discussion and Outlook Acknowledgement
Experiments Clustering of a multiplex network Multiplex network: a group of networks which share the same nodes but multiple types of links (Mucha et al, 2010) The synthetic multiplex network: ◮ Three clusters with each having 50,100, 200 members respectively ◮ Three views generated by different noise ◮ Three interaction matrices from each view = ⇒ a tensor Figure: The adjacent matrices from a synthetic multiplex network
Clustering of a multiplex network Spectral clustering Spectral projection by each single -view View A3 View A2 View A1 A Multiplex network Hybrid clustering Spectral projection by MLSVD
Experiments Application on Web of Science (WoS) journal database ◮ Objective: Obtain a good scientific mapping from the WoS journals ◮ Integrating two view data: textual information and journal cross-citations. N = 8 , 305 and d text = 669 , 700 ◮ Cosine similarity matrix of both text and cross-citation
Experiments Clustering evaluation measures ◮ Standard categories: Essential Science Indicator (ESI) from WoS ◮ Normalized mutual information (NMI) NMI = 2 × H ( { c i } ) , { l i } (5) H ( { c i } ) H ( { l i } ) where H ( { c i } , { l i } ) is the mutual information between clustering n labels { c i } n i = 1 and reference category indicators l i i = 1 , H ( { c i } ) and H ( { l i } ) are their entropies. ◮ Cognitive analysis by a bibliometrist
Experiments Clustering performance 0.55 0.5 0.45 NMI Index 0.4 0.35 0.3 0.25 0.2 e D S n x t F M M G A t V O I o K o L e M K M S V A S i T M O t P S C c − L a H t V S a P M i d − C M C − C A C H H W A |−−−−−−−−−−−−−−−−Multi−view −−−−−−−−−−−−−−−−−−−−−−−−−| |−Single−view −| Different Clustering Methods Figure: NMI validation of various clustering methods on WoS journal database (Cluster number:22)
Experiments Visualization of the journal clusters obtained by HC-MLSVD 13.crack,turbul,heat 17.dope,crystal,optic 9.catalyst,polym,acid 20.algebra,theorem,asymptot 8.protein,cell,gene 6.semant,phonolog,cortex 10.music,literari,essai 16.polit,social,court 19.student,teacher,school 18.tumor,cancer,carcinoma 11.speci,habitat,forest 1.firm,price,market 22.dog,hors,infect 14.galaxi,star,stellar 21.nurs,schizophrenia,health 3.cultivar,plant,milk 12.soil,water,sludg 2.steel,microstructur,corros 5.surgeri,clinic,arteri 15.quantum,quark,neutrino 4.ocean,seismic,rock 7.algorithm,fuzzi,wireless Figure: Visualization of 22 clusters on the WoS journal database ( the node : the journal clusters where the circle size is proportional to its scale; the edge : cross-citation between two journal clusters; the annotated terms : the top three text terms within each journal clusters)
Outline Introduction Hybrid clustering of multi-view data Experiments Discussion and Outlook Acknowledgement
Discussion and outlook Discussion ◮ Extendable hybrid clustering framework: ◮ Other learning tasks of multi-view data ( classification, spectral embedding, collaborative filtering) ◮ Other tensor based solutions ◮ Other matrices (similarity matices, modularity matrices) N Nodes Modualrity N Nodes Matrix View K ... Multiple K ... views N Nodes N Nodes N Nodes K Modualrity N Nodes Modularity Matrix Tensor View 2 N Nodes N Nodes Modualrity 2 Matrix View 1 1
Discussion and outlook Outlook ◮ Scalable issue: large-scale database and efficient implementation ◮ Multiple-model tensor (Currently 3-model): dynamic data analysis ◮ Other potential tensor methods (CP , INDSCAL,DEDICOM)
Outline Introduction Hybrid clustering of multi-view data Experiments Discussion and Outlook Acknowledgement
Acknowledgement Research supported by (1) KUL ESAT SISTA research group; (2) China Scholarship Council (CSC, No. 2006153005); (3) Thanks for discussion with Dr. Carlos Alzate in K.U.Leuven.
Thank you for your attending!
Recommend
More recommend