Adaptive Affinity Matrix for Unsupervised Metric Learning Yaoyi Li, - PowerPoint PPT Presentation

Adaptive Affinity Matrix for Unsupervised Metric Learning Yaoyi Li, Junxuan Chen, Yiru Zhao and Hongtao Lu Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai Jiao Tong University, P .R.China July 2016 Yaoyi Li et al. (SJTU) Adaptive Affinity Matrix July 2016 1 / 18

Background: Spectral Clustering Spectral clustering: nonlinear feature reduction. The distribution of real data does not always obey uniform or gaussian. Spectral clustering can preserve the local neighborhood information. 15 15 15 10 10 10 5 5 5 0 0 0 -5 -5 -5 -10 -10 -10 -15 -10 -5 0 5 10 15 20 25 -15 -10 -5 0 5 10 15 20 25 -15 -10 -5 0 5 10 15 20 25 (a) (b) (c) Yaoyi Li et al. (SJTU) Adaptive Affinity Matrix July 2016 2 / 18

Background: Spectral Clustering Spectral clustering demonstrates a splendid performance on many challenge data sets. Objective function: n � w ij � y i − y j � 2 y = arg min 2 , y T Dy = 1 i , j where w ij is the similarity between data sample x i and x j (a.k.a. affinity graph). Shortcomings of spectral clustering Out-of-sample extension is not straightforward Cubic time complexity Sensitive to the affinity graph Yaoyi Li et al. (SJTU) Adaptive Affinity Matrix July 2016 3 / 18

Background: Locality Preserving Projections Locality Preserving Projections (LPP) [HN04] is the linear approximation of Laplacian Eigenmap. Locality Preserving Projections conducts dimensionality reduction by solving the optimization problem: n � w ij � a T x i − a T x j � 2 a = arg min 2 , a T XDXa = 1 i , j The superiority of LPP Explicit projection for out-of-sample extension Complexity is reduced Yaoyi Li et al. (SJTU) Adaptive Affinity Matrix July 2016 4 / 18

Motivation The performance of spectral clustering methods highly depends on the robustness of the affinity graph. Some weighting methods like k − NN heat kernel will be corrupted by noises. Our goal: Learn a robust affinity graph by optimization efficiently. Optimize the linear projection and affinity graph simultaneously. Yaoyi Li et al. (SJTU) Adaptive Affinity Matrix July 2016 5 / 18

Related Works Dominant Neighbors [PP07] reduces the noise of the affinity matrix by maximal cliques. Consensus k-NNs [PK13] builds affinity graph by consensus information. ClustRF-Strct [ZLG14] constructs an affinity graph via the clustering random forests. CAN and PCAN [NWH14] learn data similarity and cluster structure simultaneously. Yaoyi Li et al. (SJTU) Adaptive Affinity Matrix July 2016 6 / 18

AdaAM: Assumption Assumption 1: The affinity matrix W is a positive semidefinite matrix. Hence we have, W = PP T . This assumption also appeared in [CC11] Assumption 2: The ideal affinity matrix W is a low rank matrix (1 for the sample in the same class and 0 for the others). x = W P P T Yaoyi Li et al. (SJTU) Adaptive Affinity Matrix July 2016 7 / 18

AdaAM: Diagram A glance of our algorithm P low rank Δ projection A x Metric x = = sparsification sparsification + projection A k-NN W Yaoyi Li et al. (SJTU) Adaptive Affinity Matrix July 2016 8 / 18

AdaAM: Intermediate Affinity Matrix ∆ Let ∆ be the intermediate affinity matrix, and assume ∆ = PP T . Compute P by solving optimization problem P T P = I tr ( X T ( D ∆ − PP T ) X ) min P T P = I tr ( X T D ∆ X ) + tr ( X T ( − PP T ) X ) ⇒ min similar to spectral clustering When X is normalized with zero mean, we have D ∆ = 0 . The above problem is equivalent to tr ( P T XX T P ) P = arg max P T P = I Yaoyi Li et al. (SJTU) Adaptive Affinity Matrix July 2016 9 / 18

AdaAM: Final Adaptive Affinity Matrix With the intermediate affinity matrix ∆ , we can solve the following problem for a linear projection A: tr ( A T X T ( L + L ∆ ) XA ) A = arg min A T A = I L + L ∆ is the combination of the Laplacian of k -NN heat kernel and the intermediate affinity matrix. With the linear projection A , we can rewrite the affinity optimization problem and update matrix P ( D ∆ = 0 still holds). tr ( P T XAA T X T P ) P = arg max P T P = I Yaoyi Li et al. (SJTU) Adaptive Affinity Matrix July 2016 10 / 18

Experiments We evaluate the proposed approach on five image data sets UMIST, COIL20, USPS, MNIST, ExYaleB We impose the same parameter selection criteria on all the algorithms in our experiments. the size of neighborhood k = Round ( log 2 ( n / c )) projected dimension is the same as the number of classes We denote 10 times of k -Means as a round and select the clustering result with the minimal within-cluster sum as the result of each round of k -Means. Yaoyi Li et al. (SJTU) Adaptive Affinity Matrix July 2016 11 / 18

Accuracy 100 rounds k -Means to each algorithms for the evaluation of the performance. Table: Clustering accuracy on image data sets(%) AdaAM k -NN Cons- k NN DN ClustRF-Bi PCAN- k Means PCAN Avg Max Avg Max Avg Max Avg Max Avg Max Avg Max UMIST 66.06 75.65 58.16 65.39 60.27 69.22 59.15 66.96 64.63 74.44 53.79 56.52 55.30 COIL20 74.72 87.29 71.89 81.18 75.53 84.31 71.95 82.01 76.50 85.07 72.28 83.75 81.74 USPS 69.36 69.61 68.25 68.35 68.21 68.34 68.08 68.31 58.74 65.90 64.04 67.95 64.20 MNIST 60.84 61.34 48.13 48.27 47.88 48.00 49.72 49.76 51.93 52.03 58.93 58.98 59.83 ExYaleB 54.36 57.87 24.17 26.76 25.63 28.75 24.21 27.42 23.10 26.43 25.74 27.63 25.89 Yaoyi Li et al. (SJTU) Adaptive Affinity Matrix July 2016 12 / 18

Accuracy 10 rounds k -Means for the experiment of the sensitivity to the neighborhood size 0.7 kNN kNN cons−kNN 0.65 cons−kNN 0.8 DN DN ClustRF−Bi ClustRF−Bi 0.6 PCAN−kMeans PCAN−kMeans AdaAM Accuracy AdaAM 0.75 Accuracy 0.55 0.5 0.7 0.45 0.65 0.4 0.35 0.6 4 6 8 10 12 14 16 18 20 4 6 8 10 12 14 16 18 20 Neighbourhood Size Neighbourhood Size (d) UMIST (e) COIL20 Figure: Comparison between different with different of neighborhood size k Yaoyi Li et al. (SJTU) Adaptive Affinity Matrix July 2016 13 / 18

Accuracy 0.7 0.6 0.5 0.65 Accuracy Accuracy kNN kNN 0.4 cons−kNN cons−kNN DN 0.6 DN ClustRF−Bi ClustRF−Bi PCAN−kMeans 0.3 PCAN−kMeans AdaAM AdaAM 0.55 0.2 0.5 0.1 4 6 8 10 12 14 16 18 20 4 6 8 10 12 14 16 18 20 Neighbourhood Size Neighbourhood Size (a) USPS (b) ExYaleB Figure: Comparison between different with different of neighborhood size k Requires more information from the pairwise similarity. For small k , sometimes does not perform well. Yaoyi Li et al. (SJTU) Adaptive Affinity Matrix July 2016 14 / 18

Time Consumption 986.8 kNN 620.8 cons−kNN DN ClustRF−Bi PCAN−kMeans 94.97 AdaAM Time(s) 23.24 10.16 3.912 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0 5 0 1 1 2 3 5 1 Number of samples Figure: Time consumption of six approaches with different number of data instances Yaoyi Li et al. (SJTU) Adaptive Affinity Matrix July 2016 15 / 18

Conclusion & Future Work Conclusion We present a novel affinity learning approach for unsupervised metric learning. The affinity matrix is learned from the same framework of spectral clustering. The affinity learning can be reduced to a singular value decomposition problem. We employ the low rank trick to make our approach more efficient. Future Work A better way to learn the parameter of sparsification A better way to fuse low rank ∆ and k -NN W . More applications Yaoyi Li et al. (SJTU) Adaptive Affinity Matrix July 2016 16 / 18

Thanks Thanks for your Attention. Yaoyi Li et al. (SJTU) Adaptive Affinity Matrix July 2016 17 / 18

References [CC11] Xinlei Chen and Deng Cai, Large scale spectral clustering with landmark-based representation. , AAAI, 2011. [HN04] Xiaofei He and Partha Niyogi, Locality preserving projections , NIPS, vol. 16, 2004, p. 153. [NWH14] Feiping Nie, Xiaoqian Wang, and Heng Huang, Clustering and projected clustering with adaptive neighbors , Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, 2014, pp. 977–986. [PK13] Vittal Premachandran and Ramakrishna Kakarala, Consensus of k-nns for robust neighborhood selection on graph-based manifolds , Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on, IEEE, 2013, pp. 1594–1601. [PP07] Massimiliano Pavan and Marcello Pelillo, Dominant sets and pairwise clustering , Pattern Analysis and Machine Intelligence, IEEE Transactions on 29 (2007), no. 1, 167–172. [ZLG14] Xiatian Zhu, Chen Change Loy, and Shaogang Gong, Constructing robust affinity graphs for spectral clustering , Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, IEEE, 2014, pp. 1450–1457. Yaoyi Li et al. (SJTU) Adaptive Affinity Matrix July 2016 18 / 18

Adaptive Affinity Matrix for Unsupervised Metric Learning Yaoyi Li, - PowerPoint PPT Presentation

Adaptive Affinity Matrix for Unsupervised Metric Learning Yaoyi Li, Junxuan Chen, Yiru Zhao and Hongtao Lu Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai Jiao Tong University, P

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

Welcome back... Metric spaces. Approximate metric using a tree. Tree metric: 16 16 A metric

Unsupervised Learning and Clustering l In unsupervised learning you are given a data set with no

Affinity Care Affinity Care Shipley and Westcliffe Medical Practices A new chapter Background

Metric Spaces Definition If d is a metric on X , then the metric topology on X induced by d is

Unsupervised Learning Andrea Passerini passerini@disi.unitn.it Machine Learning Unsupervised

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

Introduction to PCA Unsupervised Learning in R Unsupervised learning Two methods of

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

[3] The Matrix What is a matrix? Traditional answer Neo: What is the Matrix? Trinity: The answer

Matrix Multiplication Matrix Multiplication via Matrix-Vector Mult Defn. If matrix A is m n

Kinetic & Affinity Analysis An introduction What are kinetics and affinity? Kinetics

UN Global Compact Affinity Private Wealth: Communication on Progress, April 2020 This is Affinity

Unsupervised Language Learning: Representation Learning for NLP Katia Shutova ILLC University

VM: Hey VM, can I share a host with you? Affinity rules in a virtual cluster 4 th of

Social Justice Standards and Affinity Groups at PHS Featuring Dr. Elizabeth Denevi Hosted by The

Entropic Affinities: Properties and Efficient Numerical Computation Max Vladymyrov and Miguel

Affinity Graph Supervision for Visual Recognition Paper ID: 7437 Chu Wang 1 , Babak Samari 1 ,

Towards a global IP Anycast service Hitesh Ballani, Paul Francis Cornell University ACM SIGCOMM

Gone WILD Richard Wang, Dana Butnariu, Jennifer Rexford Key Tradeoffs Load Balancing 1. Fast

CS 744: GANDIVA Shivaram Venkataraman Fall 2019 ADMINISTRIVIA - Course project proposal -

Helios: Heterogeneous Multiprocessing with Satellite Kernels Ed Nightingale, Orion Hodson, Ross

Adaptive Affinity Matrix for Unsupervised Metric Learning Yaoyi Li, - PowerPoint PPT Presentation

Adaptive Affinity Matrix for Unsupervised Metric Learning Yaoyi Li, Junxuan Chen, Yiru Zhao and Hongtao Lu Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai Jiao Tong University, P

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

Welcome back... Metric spaces. Approximate metric using a tree. Tree metric: 16 16 A metric

Unsupervised Learning and Clustering l In unsupervised learning you are given a data set with no

Affinity Care Affinity Care Shipley and Westcliffe Medical Practices A new chapter Background

Metric Spaces Definition If d is a metric on X , then the metric topology on X induced by d is

Unsupervised Learning Andrea Passerini passerini@disi.unitn.it Machine Learning Unsupervised

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

Introduction to PCA Unsupervised Learning in R Unsupervised learning Two methods of

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

[3] The Matrix What is a matrix? Traditional answer Neo: What is the Matrix? Trinity: The answer

Matrix Multiplication Matrix Multiplication via Matrix-Vector Mult Defn. If matrix A is m n

Kinetic &amp; Affinity Analysis An introduction What are kinetics and affinity? Kinetics

UN Global Compact Affinity Private Wealth: Communication on Progress, April 2020 This is Affinity

Unsupervised Language Learning: Representation Learning for NLP Katia Shutova ILLC University

VM: Hey VM, can I share a host with you? Affinity rules in a virtual cluster 4 th of

Social Justice Standards and Affinity Groups at PHS Featuring Dr. Elizabeth Denevi Hosted by The

Entropic Affinities: Properties and Efficient Numerical Computation Max Vladymyrov and Miguel

Affinity Graph Supervision for Visual Recognition Paper ID: 7437 Chu Wang 1 , Babak Samari 1 ,

Towards a global IP Anycast service Hitesh Ballani, Paul Francis Cornell University ACM SIGCOMM

Gone WILD Richard Wang, Dana Butnariu, Jennifer Rexford Key Tradeoffs Load Balancing 1. Fast

CS 744: GANDIVA Shivaram Venkataraman Fall 2019 ADMINISTRIVIA - Course project proposal -

Helios: Heterogeneous Multiprocessing with Satellite Kernels Ed Nightingale, Orion Hodson, Ross

Kinetic & Affinity Analysis An introduction What are kinetics and affinity? Kinetics