ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING Kernel PCA 11
ADVANCED MACHINE LEARNING Overview Today’s Lecture • Brief Recap of Classical Principal Component Analysis (PCA) • Derivation of kernel PCA • Exercises to develop a geometrical intuition of kernel PCA 22
ADVANCED MACHINE LEARNING Principal Component Analysis: Overview Take samples of two classes (yellow and pink classes) Each image is a high- dimensional vector 320 240 3 230400 x 33
ADVANCED MACHINE LEARNING Principal Component Analysis: Overview Project the images onto a lower dimensional space 2 2 230400 y through matrix A : y Ax Separating Line 44
ADVANCED MACHINE LEARNING Principal Component Analysis: Overview Project the images onto a lower dimensional space 2 2 230400 y through matrix A : y Ax What is A? PCA discovers the matrix A 55
ADVANCED MACHINE LEARNING Principal Component Analysis: Overview Infinite number of choices for projection matrix A need criteria to reduce the choice 1: minimum information loss (minimal reconstruction error) x 2 x 1 What is the 2D to 1D projection that minimizes the reconstruction error? 66
ADVANCED MACHINE LEARNING Principal Component Analysis: Overview Infinite number of choices for projection matrix A need criteria to reduce the choice 1: minimum information loss(minimal reconstruction error) 2: equivalent to finding the direction with maximum variance Smallest breadth of x x data lost 2 2 Largest breadth of data conserved x x Reconstruction after projection 1 1 What is the 2D to 1D projection that minimizes the reconstruction error? 77
ADVANCED MACHINE LEARNING Principal Component Analysis: Overview 1 2 M Dataset X x x ..... x (data is centered, i.e. E X 0) 1 T Compute covariance matrix of dataset : X C E XX M T Find eigenvalue decomposition: C V V 1 N 1 V e .... e : matrix of eigenvectors e : Diagonal matrix of eigenvalues 2 e 1 N Order .... e e , s.t. ... 1 2 N The eigenvectors form a basis of the space. 1 is aligned with the axis of maximum variance. e Project data onto eigenvectors. Remove projections with low (noise). 88
ADVANCED MACHINE LEARNING PCA for Data Compression x p 0.1 N N Compressed image is y Original image is encoded in . y A x , p Rows of A contains 1st eigenvectors p p Image compressed 90% Original Image 99
ADVANCED MACHINE LEARNING PCA for Feature Extraction Results of decomposition with Principal Component Analysis: eigenvectors Encapsulate main differences across groups of images (in the first eigenvectors) Detailed features (glasses) get encapsulated next (in the following eigenvectors) 10 10
ADVANCED MACHINE LEARNING Principal Component Analysis: Pros & Cons Advantages: a) The projection through PCA ensures minimal reconstruction error. b) The projection does not distort the space (rotation in space). Ease of visualization/interpretation: The features that appear in the projections are often interpretable visually. Limitations: a ) PCA assumes a linear transformation: With centering of data, one can only do a rotation in space. b) It fails at finding directions that require a non-linear transformation. 11 11
ADVANCED MACHINE LEARNING Revisiting the hypotheses of PCA PCA assumed a linear transformation Non-linear PCA (Kernel PCA): find a non-linear embedding of the data and then perform linear PCA. 12 12
ADVANCED MACHINE LEARNING Recall: Principle of kernel Methods Going back to linearity Find a non-linear transformation that send the data in a space where linear computation is again feasible. 13 13
ADVANCED MACHINE LEARNING Kernel PCA: Principle Determine a transformation which brings out features of the data so as to make subsequent computation easier. Original Space After Lifting Data in Feature Space v 2 x 2 x 1 v 1 Example above: Data becomes linearly separable when using a rbf kernel and projecting onto first 2 PC-s of kernel PCA. 14 14
ADVANCED MACHINE LEARNING Kernel PCA: Principle Idea: Send the data X into a feature space H through the nonlinear map f. i 1... M 1 ,....., f f f i N M X x X x x Perform linear pca in feature space and project into set of eigenvectors in feature space f 2 x Original H Space f 2 1 x x 1 x f P x Scholkopf et al, Neural Computation, 1998 15 15
ADVANCED MACHINE LEARNING Kernel PCA: Principle Idea: Send the data X into a feature space H through the nonlinear map f. i 1... M 1 ,....., f f f i N M X x X x x Perform linear pca in feature space and project into set of eigenvectors in feature space Data projected onto the two first principal components in feature space Original Space x 2 v 2 Determining f is difficult Kernel Trick x 1 v 1 16 16
MACHINE LEARNING – 2013 Linear PCA in Feature Space Sending the data in feature space through f: f f : X H x x Assume that, in feature space H, the data are centered: M f i x 0 i 1 The covariance matrix in the feature space is: 1 T C FF f M f i The columns i 1... M of are composed of F x . 17
ADVANCED MACHINE LEARNING Linear PCA in Feature Space As in the original space, in feature space, the covariance matrix can be diagonalized and we have now to find the eigenvalues i > 0 , satisfying: i i C v v f i Primal eigenvalue problem: Finding the eigenvectors of v C f Not possible in feature space! => Formulate everything as a dot product and use kernel trick! 18 18 18
ADVANCED MACHINE LEARNING From Linear PCA to Kernel PCA 1 M Each eigenvector ,..., v v can be expressed as a linear combination of the images of the datapoints: Rewriting PCA in terms of dot products: M 1 T f f i j j i i i Using C v = x x v with C v v f f i M j 1 M M 1 1 T f f f i j i j j i we obtain, v x x v x . j M M j 1 i j 1 i ij Scalar 19 19 19
ADVANCED MACHINE LEARNING Linear PCA in Feature Space Multiplying the equation: i i C v v f i T f j by x , on both sides, we have: f f j i j i x , C v x , v , i j , 1,... M f i M 1 M 1 T f f f i i l i k k i v x C v x x v f l M M l 1 k 1 i 20 20 20
MACHINE LEARNING – 2013 Linear PCA in Feature Space M M M 1 1 f f f f f f j k i k l i j l x , x x , x x , x l l 2 M M k 1 l 1 l 1 i K K K jl jk kl k f f i j i j Use the kernel trick: k x x , : K x , x ij Eigenvalue problem of the form: i K M , K : Gram Matrix i i i Dual eigenvalue problem: Finding the dual eigenvectors . 21
MACHINE LEARNING – 2013 Linear PCA in Feature Space The solutions to the dual eigenvalue problem: 1 M are given by all the eigenvectors ,..., with non-zero eigenvalues ,..., . 1 M Kernel PCA finds at most M eigenvectors M: number of datapoints M>>N dimension of each datapoint Eigenvalue problem of the form: i K M , K : Gram Matrix i i 22
MACHINE LEARNING – 2013 Linear PCA in Feature Space Request that the eigenvectors v of C be normalized, f i i i.e. v v , 1 i 1,..., M 1 M is equivalent to asking that the dual eigenvectors ,..., i i are such that: 1/ . 23
ADVANCED MACHINE LEARNING Constructing the kPCA projections We cannot see the projection in feature space! We can only compute the projections of each point onto each eigenvector i Projection of query point x onto eigenvector : v M M 1 1 f f f i i j i j v , x x , x k x , x j j M M j 1 j 1 i i Sum over all training points Isolines group points with equal projection: f i All points , . : x s t v , x cst . 24 24 24
ADVANCED MACHINE LEARNING kPCA projections: Exercise i Recall: projection of query point x onto eigenvector : v M 1 f i i j v , x k x , x j M j 1 i i where the are the dual eigenvectors, solutions of the eigenvalue decomposition of . K Consider a 2 dimensional data space, with two datapoints, and the RBF kernel: 2 x x ' 2 k x x , ' e a) How many dual eigenvectors do you have and what is their dimension? b) Compute the eigenvectors and draw t he isolines for the projections on each eigenvector. 2 c) Repeat (b) for a homogeneous polynomial kernel with p=2: k x x , ' x x , ' 25 25 25
Recommend
More recommend