MACHINE LEARNING – 2012 MACHINE LEARNING MACHINE LEARNING kernels 1
MACHINE LEARNING – 2012 MACHINE LEARNING Kernels: Intuition How to separate the red class from the grey class? x 2 360 r x 1 Polar coordinates Data become linearly separable 2
MACHINE LEARNING – 2012 MACHINE LEARNING Kernels: Intuition How to separate the red class from the grey class? x 2 360 What is ? 0 i x x i x r x 1 Assume a model (equation) for the 2 2 : transformation. Need at least 3 datapoints to solve. i i i x r , 0 i i i i i i i 0 Solve for , , s.t. sin , cos x r x r r x 3
MACHINE LEARNING – 2012 MACHINE LEARNING Kernels: Intuition Feature Space H Original Space x 2 360 i x i x r x 1 Idea : Send the data X into a feature space H through the nonlinear map . i 1... M 1 ,....., i N M X x X x x In feature space, computation is simpler (e.g. perform linear classification) 4
MACHINE LEARNING – 2012 MACHINE LEARNING Kernels: Intuition In most cases, determining beforehand the transformation may be difficult. Which representation of the data allows to classify easily the three groups of datapoints? 5
MACHINE LEARNING – 2012 MACHINE LEARNING Kernels: Intuition In most cases, determining beforehand the transformation may be difficult. What if the groups live in N dimensions , with N>>1. Grouping may require separate sets of dimensions and can no longer be visualized 6
MACHINE LEARNING – 2012 MACHINE LEARNING Kernel-Induced Feature Space Idea: Send the data X into a feature space H through the nonlinear map . i 1... M 1 ,....., i N M X x X x x H While the dimension of the Original Space original space is N, the dimension of the feature x 2 space may be greater than N! X is lifted onto H Determining is difficult Kernel Trick 7
MACHINE LEARNING – 2012 MACHINE LEARNING The Kernel Trick In most cases, determining the transformation may be difficult. Kernel trick Key idea behind the kernel trick : Most algorithms for classification, regression or clustering compute an inner product across pairs of observations to determine the separating line, the fit or the grouping of datapoints, respectively: i j Inner product across two datapoints: x x , 8
MACHINE LEARNING – 2012 MACHINE LEARNING The Kernel Trick In most cases, determining the transformation may be difficult. No need to compute the transformation , if one expresses everything as a function of the inner product in feature space. Proceed as follows: The function k can be used to determine a metric of similarity across datapoints in feature 1) Define a kernel function : space. k X : X It can extract features that are either common or that distinguish i j i j k x x , x , x . groups of datapoints. 2) Use this transformation to perform classical classification, regression or clustering for the linear case. 9
MACHINE LEARNING – 2012 MACHINE LEARNING Use of Kernels: Example Key idea: Some problems are made simpler if you change the representation of the data Which representation of the data allows to separate linearly the two groups of datapoints? 10
MACHINE LEARNING – 2012 MACHINE LEARNING Use of Kernels: Example Key idea: Some problems are made simpler if you change the representation of the data Data becomes linearly separable when projected onto two first principal component of kernel PCA with RBF kernel (see next lecture) 11
MACHINE LEARNING – 2012 MACHINE LEARNING Use of Kernels: Example Key idea: Some problems are made simpler if you change the representation of the data Which representation of the data allows to classify easily the three groups of datapoints in a different cluster? 12
MACHINE LEARNING – 2012 MACHINE LEARNING Use of Kernels: Example Key idea: Some problems are made simpler if you change the representation of the data Data are correctly clustered when using kernel K-means, using a RBF kernel (see lecture next week) 13
MACHINE LEARNING – 2012 MACHINE LEARNING Popular Kernels • Gaussian / RBF Kernel (translation-invariant): 2 x x ' 2 k x x , ' e 2 , . • Homogeneous Polynomial Kernels: p k x x , ' x x , ' , p ; • Inhomogeneous Polynomial Kernels: p k x x , ' x x , ' c , p , c 0 14
MACHINE LEARNING – 2012 MACHINE LEARNING Kernels: Exercise I 2 ' x x 2 Using the RBF kernel: k x x , ' e 2 , , a) draw the isolines for: 1 1 - one datapoint , i..e.: Find all , s.t. , . x x k x x cst 1 2 1 2 - two datapoints, , : Find all , s.t. , , . x x x k x x k x x cst 1 2 1 2 - two datapo ints, , x x : Find all , s.t. x k x x , k x x , cst . - three datapoints Discuss the effect of on the isolines. c) determine a metric in feature space 15
MACHINE LEARNING – 2012 MACHINE LEARNING Kernels: Solution Exercise I RBF Kernel; M=1, i.e. 1 data point 16
MACHINE LEARNING – 2012 MACHINE LEARNING Kernels: Solution Exercise I Gaussian Kernel; M=2, i.e. 2 data points 17
MACHINE LEARNING – 2012 MACHINE LEARNING Kernels: Solution Exercise I Gaussian Kernel; M=2, i.e. 2 data points Small kernel width Large kernel width 18
MACHINE LEARNING – 2012 MACHINE LEARNING Kernels: Solution Exercise I Gaussian Kernel; M=3, i.e. 3 data points 19
MACHINE LEARNING – 2012 MACHINE LEARNING Kernels: Solution Exercise I Gaussian Kernel; M=3, i.e. 3 data points 20
MACHINE LEARNING – 2012 MACHINE LEARNING Kernels: Solution Exercise I Gaussian Kernel; M=3, i.e. 3 data points 21
MACHINE LEARNING – 2012 MACHINE LEARNING Kernels: Exercise II Using the homogeneous polynomial kernel: p , ' , ' , , k x x x x p draw the isolines as in previous exercise for: a) one datapoint b) two datapoints c) three datapoints Discuss the effect of on the isolines. p 22
MACHINE LEARNING – 2012 MACHINE LEARNING Kernels: Solution Exercise II Polynomial Kernel; order p=1, 2, 3; M=1, i.e. 1 data points p=1 p=2 p=3 The isolines are lines perpendicular to the vector point from the origin. The order p does not change the geometry and only changes the values of the isolines. 23
MACHINE LEARNING – 2012 MACHINE LEARNING Kernels: Solution Exercise II Polynomial Kernel; order p=1, 2, 3; M=2, i.e. 2 data points p=1 p=2 p=3 The isolines are lines perpendicular to the combination of the vector points for p=1. With p=2, we have an ellipse. With p=3, we have hyperbolas. P>3 are similar in concept with change in signs of the values of the isolines depending on whether we have odd or even p. 24
MACHINE LEARNING – 2012 MACHINE LEARNING Kernels: Solution Exercise II Polynomial Kernel; order p=1, 2, 3; M=3, i.e. 3 data points Solutions with p>1 present a symmetry around the origin. 25
MACHINE LEARNING – 2012 MACHINE LEARNING Kernels: Exercise III Another also relatively popular kernel is the linea r kernel: T k x x , ' x x '. 1) Can you tell what this kernel measures? 2) Find an application where using the lin ear kernel provides an in teresting measure . 26
MACHINE LEARNING – 2012 MACHINE LEARNING Kernels: Solution Exercise III Bags of words: [machine, learning, kernel, rbf, robot, vision, dimension, blue, speed,...] You want to group webpages with common groups of words. 1000 Set x with each entry in set to 1 if the word i x s present else zero. x 1 E.g. 1,1,1,0,0,0,.... contains the words machine learning and kernel and nothing else. Features live in low-dimensional space (common group of webpages have a low number of combin ation of words): i j T T T T T k x x , x x x x x x x x x x ... k k 1 1 2 2 3 3 4 4 k 1 j The isoline k x x , 3 delineate the set of webpages that share the same 1 set of three keywords as . x 27
MACHINE LEARNING – 2012 MACHINE LEARNING Kernels: Solution Exercise III Sequence of strings (e.g genetic code): [IPTS L QD VBUV,...] Want to group strings with common subgroups of strings. 1000 Set x , x the number of times sub-string appears x in the string word. Apply same e r asoning as before for grouping. 28
Recommend
More recommend