Clustering by Support Vector Manifold Learning Marcin Orchel AGH - PowerPoint PPT Presentation

Clustering by Support Vector Manifold Learning Marcin Orchel AGH University of Science and Technology in Poland 1 / 12

Problem and My Contributions Problem Characterizations of clusters are boundary, center (prototype), cluster core, characteristic manifold of a cluster. The multiple manifold learning problem is to fit multiple manifolds (hypersurfaces) to data points and generalize to unseen data. Approach The support vector manifold learning (SVML) transforms a feature space to a kernel-induced feature space and then fits to the data with the hypothesis space containing only hyperplanes and generalize well. For fitting to the data with SVML, we need a regression method that works completely in a kernel-induced feature space. SVML duplicates and shifts points in the kernel-induced feature space in the direction of any training vector and solves a classification problem. 2 / 12

Comparison of Manifold Learning Methods 1.0 1.0 1.0 y y y 0.0 -1.0 -1.0 1.0 0.0 1.0 1.0 x x x (a) (b) (c) Fig. 1: Manifold learning. Points—examples. (a) For points generated from a circle. Solid line—solution of one-class support vector machines (OCSVM) for C = 1 . 0, σ = 0 . 9, dashed line—solution of SVML for C = 100 . 0, σ = 0 . 9, t = 0 . 01, thin dotted line—solution of kernel principal component analysis (KPCA) for σ = 0 . 9. (b) For points generated from a Lissajous curve. Solid line—solution of OCSVM for C = 1000 . 0, σ = 0 . 5, dashed line—solution of SVML for C = 100000 . 0, σ = 0 . 8, t = 0 . 01, thin dotted line—solution of c = � KPCA for σ = 0 . 5. (c) Solid line— solution of SVML for � 0, C = 100 . 0, σ = 0 . 9, t = 0 . 01, dashed line—solution of SVML for random values of � c , C = 100 . 0, σ = 0 . 9, t = 0 . 01. 3 / 12

Support Vector Manifold Learning (SVML) The kernel function for two data points � x i and � x j for i , j = 1 , . . . , n is � � � � K � x i , � x j = K o x i , � � x j + y j tK o ( � x i ,� c ) (1) + y j y i t 2 K o ( � � � + y i tK o � c , � x j c ,� c ) , (2) where � c is the shifting direction defined in an original feature space, t is the translation parameter, y i = 1 for the point shifted up, and y i = − 1 for the point shifted down. The cross kernel is K ( � x i ,� x ) = K o ( � x i ,� x ) + y i tK o ( � c ,� x ) . (3) The number of support vectors is maximally equal to n + 1. The solution is n n � � ( α i − α i + n ) K ( � x i ,� x ) + ( α i + α i + n ) tK ( � c ,� x ) + b = 0 . (4) i = 1 i = 1 4 / 12

Model with Shifted Hyperplanes Proposition 1 Shifting a hyperplane with any value of � c gives a new hyperplane which differs from the original by a free term b. Lemma 1 After duplicating and shifting a n − 1 dimensional hyperplane constrained by n − 1 -dimensional hypersphere, the maximal distance from an original center of a hypersphere to any point belonging to the shifted n − 2 hypersphere is for a point such as after projecting this point to the n − 1 dimensional hyperplane (before shift), a vector from � 0 to this point is parallel to a vector from � 0 to a projected center of one of the shifted n − 2 hyperspheres. 5 / 12

Model with Shifted Hyperplanes Lemma 2 The radius R n of a minimal hypersphere containing both hyperplanes constrained by n − 1 dimensional hypersphere after shifting is equal to R n = � � c + R � c m / � � c m �� (5) where c m is defined as c − b + � w · � c c m = � � w . � (6) � w � 2 � c � 2 + R 2 . and � c m � � = 0 . For � c m � = 0 , we get R n = � � 6 / 12

Generalization bounds for Shifted Hyperplanes We can improve generalization bounds when D 2 c m �� 2 �� 2 ≤ R 2 D 2 � � c + R � c m / � � (7) � � � 1 + D � � c p c m �� 2 � � c + R � c m / � � ≤ R 2 (8) �� 2 � � � 1 + D � � c p For a special case, when � c m � = 0, we get � � � � c p �� 2 ≤ R 2 . � (9) � � � 1 + D � � c p 7 / 12

Model with Shifted Hyperplanes Proposition 2 When � c p is constant and 2 � � c m � ≤ R, then the solution of maximizing a margin between two n − 2 hyperspheres is equivalent to the hyperplane that contains the n − 2 hypersphere before duplicating and shifting. 8 / 12

Performance measure For OCSVM the distance between a point � r and the minimal hypersphere in a kernel-induced feature space can be computed as  n n � � � � R − α i α j K x i , � � x j (10)  i = 1 j = 1 1 / 2  n � � � − 2 α j K � x j ,� r + K ( � r ,� r ) . (11)  j = 1 For kernels for which K ( � x ,� x ) is constant, such as the radial basis function (RBF) kernel, the radius R can be computed as follows � n n � � � + 2 b ∗ . � � � R = � K ( � x ,� x ) + α i α j K x i , � � x j (12) i = 1 j = 1 9 / 12

Performance measure For SVML, the distance between a point � r and the hyperplane in a kernel-induced feature space can be computed as | � w c · � r + b c | = (13) � w c � 2 � � c α ∗ �� n � i = 1 y i � � i K ( � x i ,� r ) + b c � � . (14) �� n c y j � n c α ∗ i α ∗ � j = 1 y i j K � x i , � x j i = 1 10 / 12

Comparison of Clustering Methods First, we map any two points to the same cluster if there do not exist two points between them with different sign of a functional margin. Second, we map remaining unassigned points to clusters of the nearest neighbors from the assigned points. 1.0 1.0 1.0 y y y 0.0 0.0 0.0 0.0 0.0 0.0 x x x (a) (b) (c) Fig. 2: Clustering by manifold learning. Points—examples, filled points—support vectors. (a) Solid line—solution of support vector clustering (SVCL) for C = 10000 . 0, σ = 0 . 35. (b) Solid line—solution of support vector manifold learning clustering (SVMLC) for C = 100000 . 0, σ = 1 . 1, t = 0 . 01. (c) Solid line—solution of KPCA. 11 / 12

Results For the manifold learning experiment, we check the average distance between points and a solution in a kernel-induced feature space. We validate clustering on classification data sets. We assume that data samples that belong to the same cluster have the same class in a classification problem. Table 1: Performance of SVMLC, SVCL, KPCA, SVML, OCSVM for real world data, part 2. The numbers in descriptions of the columns mean the methods: 1 - SVMLC, 2 - SVCL, 3 - KPCA for the first row, 1 - SVML, 2 - OCSVM, 3 - KPCA for the second row. The test with id=0 is for all tests for the clustering experiment. The test with id=1 is for all tests for the manifold learning experiment. Column descriptions: rs – an average rank of the method for the mean error; the best method is in bold, tsf – the Friedman statistic for average ranks for the mean error; the significant value is in bold, tsn – the Nemenyi statistic for average ranks for the mean error, reported when the Friedman statistic is significant, the significant value is in bold, svr – the average rank for the number of nonzero coefficients (support vectors for support vector machines (SVM) methods); the smallest value is in bold. id rs1 rs2 rs3 tsf tsn12 tsn13 tsn23 sv1 sv2 sv3 0 1.71 1 . 93 2 . 36 4 . 5 – – – 2 . 83 1 . 67 1.5 1 1.49 2 . 98 1 . 53 33.09 -4.82 0 . 3 5.13 1.51 2 . 38 2 . 11 12 / 12

Clustering by Support Vector Manifold Learning Marcin Orchel AGH - PowerPoint PPT Presentation

Clustering by Support Vector Manifold Learning Marcin Orchel AGH University of Science and Technology in Poland 1 / 12 Problem and My Contributions Problem Characterizations of clusters are boundary, center (prototype), cluster core,

Linear Manifold Clustering Robert Haralick and Rave Harpaz Outline Background The linear

n -dimensional manifold M with T := TM n -dimensional manifold M with T := TM T n -dimensional

Manifold Learning: Applications in Neuroimaging Robin Wolz 23/09/2011 Overview Manifold

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

Support Vector Manifold Learning for Solving Regression Problems via Clustering Marcin Orchel

Why Deep Learning Is More Natural Questions Efficient than Support Support Vector . . . Support

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Clustering A Categorization of Major Clustering Methods Partitioning Methods

Clustering DSE 210 Clustering in R d Two common uses of clustering: Vector quantization Find

Introduction to Machine Learning, Clustering and EM Barnab s P czos Contents Clustering

Game Bot Identification Game Bot Identification based on Manifold Learning based on Manifold

? 17.10.2018 3 17.10.2018 4 Support Vector Machines (SVM): Background Support Vector Machines

Trust based Clustering for Group Trust based Clustering for Group Trust based Clustering for

Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering

ANALYST MEETING Friday 17 th May 2019 DISCLAIMER This presentation has been prepared exclusively

SECTION 12 J REGISTERED PUBLIC COMPANY INVESTS IN QUALIFYING COMPANIES IN TERMS OF 12J ANTON VAN

Kubernetes Administration from Zero to (junior) Hero Lszl Budai Component Soft Ltd.

hours to accommodate public hearings of special interest County Calendar < October 2015

Britvic plc Preliminary Results 2019 27 NOVEMBER 2019 AGENDA 2019 STRATEGIC HIGHLIGHTS

DOF Subsea Group Q4 2019 highlights The operating revenue for the quarter was NOK 1 329

OCS Deep Gas Royalty Relief Proposed Rule J. Keith Couvillion October 16, 2003 Agenda

GENI FEDERATION WITH CHAMELEON: A LARGE-SCALE, RECONFIGURABLE EXPERIMENTAL ENVIRONMENT FOR CLOUD

Clustering by Support Vector Manifold Learning Marcin Orchel AGH - PowerPoint PPT Presentation

Clustering by Support Vector Manifold Learning Marcin Orchel AGH University of Science and Technology in Poland 1 / 12 Problem and My Contributions Problem Characterizations of clusters are boundary, center (prototype), cluster core,

Linear Manifold Clustering Robert Haralick and Rave Harpaz Outline Background The linear

n -dimensional manifold M with T := TM n -dimensional manifold M with T := TM T n -dimensional

Manifold Learning: Applications in Neuroimaging Robin Wolz 23/09/2011 Overview Manifold

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

Support Vector Manifold Learning for Solving Regression Problems via Clustering Marcin Orchel

Why Deep Learning Is More Natural Questions Efficient than Support Support Vector . . . Support

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Clustering A Categorization of Major Clustering Methods Partitioning Methods

Clustering DSE 210 Clustering in R d Two common uses of clustering: Vector quantization Find

Introduction to Machine Learning, Clustering and EM Barnab s P czos Contents Clustering

Game Bot Identification Game Bot Identification based on Manifold Learning based on Manifold

? 17.10.2018 3 17.10.2018 4 Support Vector Machines (SVM): Background Support Vector Machines

Trust based Clustering for Group Trust based Clustering for Group Trust based Clustering for

Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering

ANALYST MEETING Friday 17 th May 2019 DISCLAIMER This presentation has been prepared exclusively

SECTION 12 J REGISTERED PUBLIC COMPANY INVESTS IN QUALIFYING COMPANIES IN TERMS OF 12J ANTON VAN

Kubernetes Administration from Zero to (junior) Hero Lszl Budai Component Soft Ltd.

hours to accommodate public hearings of special interest County Calendar &lt; October 2015

Britvic plc Preliminary Results 2019 27 NOVEMBER 2019 AGENDA 2019 STRATEGIC HIGHLIGHTS

DOF Subsea Group Q4 2019 highlights The operating revenue for the quarter was NOK 1 329

OCS Deep Gas Royalty Relief Proposed Rule J. Keith Couvillion October 16, 2003 Agenda

GENI FEDERATION WITH CHAMELEON: A LARGE-SCALE, RECONFIGURABLE EXPERIMENTAL ENVIRONMENT FOR CLOUD

hours to accommodate public hearings of special interest County Calendar < October 2015