clustering by support vector manifold learning
play

Clustering by Support Vector Manifold Learning Marcin Orchel AGH - PowerPoint PPT Presentation

Clustering by Support Vector Manifold Learning Marcin Orchel AGH University of Science and Technology in Poland 1 / 12 Problem and My Contributions Problem Characterizations of clusters are boundary, center (prototype), cluster core,


  1. Clustering by Support Vector Manifold Learning Marcin Orchel AGH University of Science and Technology in Poland 1 / 12

  2. Problem and My Contributions Problem Characterizations of clusters are boundary, center (prototype), cluster core, characteristic manifold of a cluster. The multiple manifold learning problem is to fit multiple manifolds (hypersurfaces) to data points and generalize to unseen data. Approach The support vector manifold learning (SVML) transforms a feature space to a kernel-induced feature space and then fits to the data with the hypothesis space containing only hyperplanes and generalize well. For fitting to the data with SVML, we need a regression method that works completely in a kernel-induced feature space. SVML duplicates and shifts points in the kernel-induced feature space in the direction of any training vector and solves a classification problem. 2 / 12

  3. Comparison of Manifold Learning Methods 1.0 1.0 1.0 y y y 0.0 -1.0 -1.0 1.0 0.0 1.0 1.0 x x x (a) (b) (c) Fig. 1: Manifold learning. Points—examples. (a) For points generated from a circle. Solid line—solution of one-class support vector machines (OCSVM) for C = 1 . 0, σ = 0 . 9, dashed line—solution of SVML for C = 100 . 0, σ = 0 . 9, t = 0 . 01, thin dotted line—solution of kernel principal component analysis (KPCA) for σ = 0 . 9. (b) For points generated from a Lissajous curve. Solid line—solution of OCSVM for C = 1000 . 0, σ = 0 . 5, dashed line—solution of SVML for C = 100000 . 0, σ = 0 . 8, t = 0 . 01, thin dotted line—solution of c = � KPCA for σ = 0 . 5. (c) Solid line— solution of SVML for � 0, C = 100 . 0, σ = 0 . 9, t = 0 . 01, dashed line—solution of SVML for random values of � c , C = 100 . 0, σ = 0 . 9, t = 0 . 01. 3 / 12

  4. Support Vector Manifold Learning (SVML) The kernel function for two data points � x i and � x j for i , j = 1 , . . . , n is � � � � K � x i , � x j = K o x i , � � x j + y j tK o ( � x i ,� c ) (1) + y j y i t 2 K o ( � � � + y i tK o � c , � x j c ,� c ) , (2) where � c is the shifting direction defined in an original feature space, t is the translation parameter, y i = 1 for the point shifted up, and y i = − 1 for the point shifted down. The cross kernel is K ( � x i ,� x ) = K o ( � x i ,� x ) + y i tK o ( � c ,� x ) . (3) The number of support vectors is maximally equal to n + 1. The solution is n n � � ( α i − α i + n ) K ( � x i ,� x ) + ( α i + α i + n ) tK ( � c ,� x ) + b = 0 . (4) i = 1 i = 1 4 / 12

  5. Model with Shifted Hyperplanes Proposition 1 Shifting a hyperplane with any value of � c gives a new hyperplane which differs from the original by a free term b. Lemma 1 After duplicating and shifting a n − 1 dimensional hyperplane constrained by n − 1 -dimensional hypersphere, the maximal distance from an original center of a hypersphere to any point belonging to the shifted n − 2 hypersphere is for a point such as after projecting this point to the n − 1 dimensional hyperplane (before shift), a vector from � 0 to this point is parallel to a vector from � 0 to a projected center of one of the shifted n − 2 hyperspheres. 5 / 12

  6. Model with Shifted Hyperplanes Lemma 2 The radius R n of a minimal hypersphere containing both hyperplanes constrained by n − 1 dimensional hypersphere after shifting is equal to R n = � � c + R � c m / � � c m �� (5) where c m is defined as c − b + � w · � c c m = � � w . � (6) � w � 2 � c � 2 + R 2 . and � c m � � = 0 . For � c m � = 0 , we get R n = � � 6 / 12

  7. Generalization bounds for Shifted Hyperplanes We can improve generalization bounds when D 2 c m �� 2 �� 2 ≤ R 2 D 2 � � c + R � c m / � � (7) � � � 1 + D � � c p c m �� 2 � � c + R � c m / � � ≤ R 2 (8) �� 2 � � � 1 + D � � c p For a special case, when � c m � = 0, we get � � � � c p �� 2 ≤ R 2 . � (9) � � � 1 + D � � c p 7 / 12

  8. Model with Shifted Hyperplanes Proposition 2 When � c p is constant and 2 � � c m � ≤ R, then the solution of maximizing a margin between two n − 2 hyperspheres is equivalent to the hyperplane that contains the n − 2 hypersphere before duplicating and shifting. 8 / 12

  9. Performance measure For OCSVM the distance between a point � r and the minimal hypersphere in a kernel-induced feature space can be computed as  n n � � � � R − α i α j K x i , � � x j (10)  i = 1 j = 1 1 / 2  n � � � − 2 α j K � x j ,� r + K ( � r ,� r ) . (11)  j = 1 For kernels for which K ( � x ,� x ) is constant, such as the radial basis function (RBF) kernel, the radius R can be computed as follows � n n � � � + 2 b ∗ . � � � R = � K ( � x ,� x ) + α i α j K x i , � � x j (12) i = 1 j = 1 9 / 12

  10. Performance measure For SVML, the distance between a point � r and the hyperplane in a kernel-induced feature space can be computed as | � w c · � r + b c | = (13) � w c � 2 � � c α ∗ ��� n � i = 1 y i � � i K ( � x i ,� r ) + b c � � . (14) �� n c y j � n c α ∗ i α ∗ � j = 1 y i j K � x i , � x j i = 1 10 / 12

  11. Comparison of Clustering Methods First, we map any two points to the same cluster if there do not exist two points between them with different sign of a functional margin. Second, we map remaining unassigned points to clusters of the nearest neighbors from the assigned points. 1.0 1.0 1.0 y y y 0.0 0.0 0.0 0.0 0.0 0.0 x x x (a) (b) (c) Fig. 2: Clustering by manifold learning. Points—examples, filled points—support vectors. (a) Solid line—solution of support vector clustering (SVCL) for C = 10000 . 0, σ = 0 . 35. (b) Solid line—solution of support vector manifold learning clustering (SVMLC) for C = 100000 . 0, σ = 1 . 1, t = 0 . 01. (c) Solid line—solution of KPCA. 11 / 12

  12. Results For the manifold learning experiment, we check the average distance between points and a solution in a kernel-induced feature space. We validate clustering on classification data sets. We assume that data samples that belong to the same cluster have the same class in a classification problem. Table 1: Performance of SVMLC, SVCL, KPCA, SVML, OCSVM for real world data, part 2. The numbers in descriptions of the columns mean the methods: 1 - SVMLC, 2 - SVCL, 3 - KPCA for the first row, 1 - SVML, 2 - OCSVM, 3 - KPCA for the second row. The test with id=0 is for all tests for the clustering experiment. The test with id=1 is for all tests for the manifold learning experiment. Column descriptions: rs – an average rank of the method for the mean error; the best method is in bold, tsf – the Friedman statistic for average ranks for the mean error; the significant value is in bold, tsn – the Nemenyi statistic for average ranks for the mean error, reported when the Friedman statistic is significant, the significant value is in bold, svr – the average rank for the number of nonzero coefficients (support vectors for support vector machines (SVM) methods); the smallest value is in bold. id rs1 rs2 rs3 tsf tsn12 tsn13 tsn23 sv1 sv2 sv3 0 1.71 1 . 93 2 . 36 4 . 5 – – – 2 . 83 1 . 67 1.5 1 1.49 2 . 98 1 . 53 33.09 -4.82 0 . 3 5.13 1.51 2 . 38 2 . 11 12 / 12

Recommend


More recommend