Support Vector Manifold Learning for Solving Regression Problems via - PowerPoint PPT Presentation

Support Vector Manifold Learning for Solving Regression Problems via Clustering Marcin Orchel AGH University of Science and Technology in Poland 1 / 29

2 / 29

1.0 y -1.0 1.0 x (a) 3 / 29

new kernel c )) T ( ϕ ( � ( ϕ ( � x i ) + y i t ϕ ( � x j ) + y j t ϕ ( � c )) = (1) x i ) T ϕ ( � x i ) T ϕ ( � ϕ ( � x j ) + y j t ϕ ( � c ) (2) c ) T φ ( � c ) T ϕ ( � x j ) + y j y i t 2 ϕ ( � + y i t ϕ ( � c ) . (3) K ( � x i , � x j ) + y j tK ( � x i ,� c ) (4) x j ) + y j y i t 2 K ( � + y i tK ( � c , � c ,� c ) . (5) cross kernel c )) T ϕ ( � ( ϕ ( � x i ) + y i t ϕ ( � x ) = (6) x i ) T ϕ ( � c ) T ϕ ( � ϕ ( � x ) + y i t ϕ ( � x ) . (7) So K ( � x i ,� x ) + y i tK ( � c ,� x ) . (8) the number of support vectors is maximally equal to n + 1 4 / 29

Proposition 1 Shifting a hyperplane with any vector � c gives a new hyperplane which differs from the original by a free term b. Lemma 1 After duplicating and shifting a n − 1 dimensional hyperplane constrained by n-dimensional hypersphere, the maximal distance from an original center of a hypersphere to any point belonging to the shifted hyperplane will be for a point such as after projecting this point to the n − 1 dimensional hyperplane (before shift), a vector from � 0 to this point will be parallel to a vector from � 0 to a projected center of one of new hyperspheres (a shifted hyperplane). 5 / 29

Proposition 2 The radius R n of a minimal hypersphere containing all points after shifting is equal to R n = � � c + R � c m / � � c m �� (9) where c m is defined as c − b + � w · � c c m = � � w . � (10) � w � 2 and � c m � � = 0 . For � c m � = 0 , we get R n = � � c � = � � c p � . 6 / 29

Proposition 3 Consider hyperplanes � w c · � x = 0 , where � w c is normalized such that they are in a canonical form, that is for a set of points A = { � x 1 , . . . , � x n } min | � w c · � x i | = 1 . (11) i The set of decision functions f w ( � x ) = sgn � x · � w c defined on A, satisfying the constraint � � w c � ≤ D has a Vapnik-Chervonenkis (VC) dimension satisfying � � R 2 D 2 , m + 1 h ≤ min , (12) where R is the radius of the smallest sphere centered at the origin and containing A. 7 / 29

We can improve generalization bounds when D 2 c m �� 2 c p � ) 2 ≤ R 2 D 2 � � c + R � c m / � � (13) (1 + D � � c m �� 2 � � c + R � c m / � � ≤ R 2 (14) c p � ) 2 (1 + D � � For a special case, when � c m � = 0, we get � � c p � c p � ) 2 ≤ R 2 . (15) (1 + D � � 8 / 29

Performance measure For OCSVM the distance between a point � r and the minimal hypersphere in a kernel-induced feature space can be computed as  n n � � α i α j K ( � x i , � R − x j ) (16)  i =1 j =1 1 / 2  n � − 2 α j K ( � x j ,� r ) + K ( � r ,� r ) . (17)  j =1 For kernels for which K ( � x ,� x ) is constant, such as the radial basis function (RBF) kernel, the radius R can be computed as follows � n n � x j ) + 2 b ∗ . � � � R = � K ( � x ,� x ) + α i α j K ( � x i , � (18) i =1 j =1 9 / 29

Performance measure For SVML, the distance between a point � r and the hyperplane in a kernel-induced feature space can be computed as | � w c · � r + b c | = (19) � w c � 2 � � � + b c �� n i =1 y i c α ∗ � � i K ( � x i ,� r ) � . (20) �� n c y j � n c α ∗ i α ∗ j =1 y i j K ( � x i , � x j ) i =1 10 / 29

1.0 1.0 y y 0.0 0.0 0.0 0.0 x (a) (b) 11 / 29

1.0 y 0.0 0.0 x Fig. 3: Clustering based on curve learning. Points—examples. (a) Solid line—solution of SVCL. (b) Solid line—solution of SVMLC. (c) Solid line—solution of KPCA 12 / 29

1.0 1.0 y y 0.0 0.0 0.0 0.9 0.0 0.9 x x (a) (b) Fig. 4: Regression via clustering. (a) Clustering with SVCL. (b) Clustering with SVMLC. (c) Corresponding two regression functions for (a). (d) Corresponding two regression functions for (b). 13 / 29

1.0 1.0 y y 0.0 0.0 0.0 0.9 0.0 0.9 x x () () Fig. 5: Regression via clustering. (a) Clustering with SVCL. (b) Clustering with SVMLC. (c) Corresponding two regression functions for (a). (d) Corresponding two regression functions for (b). 14 / 29

Goal develop a method for dimensionality reduction based on support vector machines (SVM) reduce dimensionality by fitting a curve to data in the form of vectors (not for classification and not for regression data) it might be seen as a generalization of regression: regression fits a function to data, curve fitting fits a curve to data idea: duplicate points, shift them in a kernel space and use support vector classification (SVC) use recursive dimensionality reduction for linear decision boundary in kernel space: project points to the solution curve, repeat all steps we could also use it for clustering, similar as in self organizing maps we could use it for visualization 15 / 29

Shifting in kernel space shifting points in kernel space: c )) T ( ϕ ( � x i ) T ϕ ( � x i ) T ϕ ( � ( ϕ ( � x i ) + y i t ϕ ( � x j ) + y j t ϕ ( � c )) = ϕ ( � x j ) + y j t ϕ ( � c ) (21) c ) T φ ( � x j ) + y j y i t 2 ϕ ( � c ) T ϕ ( � + y i t ϕ ( � c ) (22) where t is a translation parameter, � c is a shifting point, ϕ ( · ) is some symmetrical kernel. cross kernel: c )) T ϕ ( � x i ) T ϕ ( � c ) T ϕ ( � ( ϕ ( � x i ) + y i t ϕ ( � x ) = ϕ ( � x ) + y i t ϕ ( � x ) (23) we preserve sparsity, for two duplicated points, where y i = 1, y i + size = − 1 � � x i ) T ϕ ( � c ) T ϕ ( � y i α i ϕ ( � x ) + t ϕ ( � x ) (24) � � x i ) T ϕ ( � c ) T ϕ ( � ϕ ( � x ) + y i + size t ϕ ( � + y i + size α i + size x ) = (25) � � x i ) T ϕ ( � c ) T ϕ ( � ( y i α i + y i + size α i + size ) ϕ ( � x ) + ( y i α i + α i + size ) t ϕ ( � x ) (26) 16 / 29

Shifting in a kernel space, cont. The second term can be summed up for all i . � c ) T ϕ ( � ( y i α i + α i + size ) t ϕ ( � x ) (27) i when α i = α i + size = C , c ) T ϕ ( � 2 Ct ϕ ( � x ) (28) so it is like adding artificial point � c to the solution curve with the parameter 2 Ct , we can sum them for multiple points 17 / 29

Shifting in kernel space when ϕ is a linear kernel, we get δ support vector regression ( δ -SVR) hypothesis: it does not matter, how we choose a shifting point due to linear decision boundary in kernel space, for example we can shift only in one direction for three dimensions: � c = (0 . 0 , 0 . 0 , 1 . 0). the shifting strategy has already been tested for an input space for regression in δ -SVR and works fine 18 / 29

Dimensionality reduction parametric form of a straight line through the point ϕ ( � x 1 ) in the w is � direction of � l = ϕ ( � x 1 ) + t � w � l point must belong to the hyperplane, so after substituting w T � � l + b = 0 (29) w T ( ϕ ( � � x 1 ) + t � w ) + b = 0 (30) we need to compute t , so w T ϕ ( � t = − b − � x 1 ) (31) w � 2 � � after substituting t we get the projected point w T ϕ ( � x 1 ) − b + � x 1 ) � z = ϕ ( � w � (32) w � 2 � � 19 / 29

Dimensionality reduction z 1 and z 2 are new points in a kernel space, so in order to compute a kernel we just compute a dot product: � T � � � w T ϕ ( � w T ϕ ( � x 1 ) − b + � x 1 ) x 2 ) − b + � x 2 ) T � z 1 � z 2 = ϕ ( � w � ϕ ( � w � w � 2 w � 2 � � � � (33) w T ϕ ( � x 2 ) − b + � x 1 ) T � x 1 ) T ϕ ( � w T ϕ ( � z 1 � z 2 = ϕ ( � � x 2 ) (34) w � 2 � � x 1 ) T b + � w ϕ ( � w + b + � w ϕ ( � w T b + � w ϕ ( � x 2 ) x 1 ) x 2 ) − ϕ ( � � � � w (35) w � 2 w � 2 w � 2 � � � � � � 20 / 29

Dimensionality reduction w T ϕ ( � x 2 ) − b + � x 1 ) x 1 ) T ϕ ( � T � w T ϕ ( � z 1 � z 2 = ϕ ( � � x 2 ) (36) w � 2 � � w T ϕ ( � x 1 ) T b + � x 2 ) � � � � w T ϕ ( � w T ϕ ( � − ϕ ( � w + � b + � x 1 ) b + � x 2 ) (37) w � 2 � � w T ϕ ( � w T ϕ ( � w T ϕ ( � x 2 ) − b � x 2 ) − 2 � x 1 ) � x 2 ) T � z 1 � z 2 = ϕ ( � x 1 ) ϕ ( � (38) w � 2 w � 2 � � � � x 1 ) T � − b ϕ ( � w + b 2 + b � w T ϕ ( � w T ϕ ( � w T ϕ ( � w T ϕ ( � x 2 ) + b � x 1 ) + � x 1 ) � x 2 ) � � w � 2 (39) we use this iteratively, in the next reduction, we will use kernel values from the previous reduction, in the first iteration we use the shift kernel, � w will be perpendicular to the previously computed � w 21 / 29

Example of proposed curve fitting for folium of Descartes 2.0 y -2.0 -2.0 1.7 x Fig. 6: Prediction of folium of descartes. Parameters are RBF kernel, σ = 1 . 5, C = 100 . 0, t = 0 . 005, � c = (0 . 1 , 0 . 0) 22 / 29

Example of proposed curve fitting for folium of Descartes 2.0 y -2.0 -2.0 1.7 x Fig. 7: Prediction of folium of descartes. Parameters are RBF kernel, σ = 1 . 5, C = 10000 . 0, t = 0 . 005, � c = (0 . 1 , 0 . 0) 23 / 29

Support Vector Manifold Learning for Solving Regression Problems via - PowerPoint PPT Presentation

Support Vector Manifold Learning for Solving Regression Problems via Clustering Marcin Orchel AGH University of Science and Technology in Poland 1 / 29 2 / 29 1.0 y -1.0 1.0 x (a) 3 / 29 new kernel c )) T ( ( ( ( x i ) +

Kernel Methods for Regression Support Vector Regression Gaussian Mixture Regression Gaussian

Linear Manifold Clustering Robert Haralick and Rave Harpaz Outline Background The linear

n -dimensional manifold M with T := TM n -dimensional manifold M with T := TM T n -dimensional

Regression Based on Support Vector Classification Marcin Orchel AGH University of Science and

Manifold Learning: Applications in Neuroimaging Robin Wolz 23/09/2011 Overview Manifold

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

Integrating Problem Solving 2020 Integrating Problem Solving 2020 Integrating Problem Solving

Why Deep Learning Is More Natural Questions Efficient than Support Support Vector . . . Support

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

On-line Support Vector Motivation and antecedents Formulation of SVM regression Machine

Game Bot Identification Game Bot Identification based on Manifold Learning based on Manifold

MACHINE LEARNING Linear and Weighted Regression Support Vector Regression 1 APPLIED MACHINE

? 17.10.2018 3 17.10.2018 4 Support Vector Machines (SVM): Background Support Vector Machines

Charting the Right Manifold: Manifold Mixup for Few-Shot Learning Puneet Mangla 1,2* , Mayank

Matrix and Vector Operations Matrix and Vector Operations 1 / 21 Matrix and Vector Operations

Day 3 Advanced Vector Architectures Session A: Vector Instruction Execution Pipelines Break

GloVe: Global Vectors for Word Representation Fengyang Zhang, Yutong Wang Presentation Overview

Simple, Seedy Derivations of Generating Functions for Simple Polytopes and cd -indices Jiyang Gao,

Ho How Ji Jivana is is us using ing onc ncoly lytic ic vi viruses to tre reat canc ancer

Improving snow nowcasts for airports Elena Saltikoff, Finnish Meteorological Institute (FMI)

Vectors Slide 2 / 36 Scalar versus Vector A scalar has only a physical quantity such as mass,

5-Axis Machining Some Best Practices Longxiang Yang FANUC America IMTS 2018 Conference

High dimensional computing - the upside of the curse of dimensionality Peer Neubert Stefan

An Empirical Comparison of Text Categorization Methods Ana Cardoso-Cachopo and Arlindo L.