advanced machine learning kernel pca
play

ADVANCED MACHINE LEARNING Kernel PCA 11 ADVANCED MACHINE LEARNING - PowerPoint PPT Presentation

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING Kernel PCA 11 ADVANCED MACHINE LEARNING Overview Todays Lecture Brief Recap of Classical Principal Component Analysis (PCA) Derivation of kernel PCA Exercises to develop a


  1. ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING Kernel PCA 11

  2. ADVANCED MACHINE LEARNING Overview Today’s Lecture • Brief Recap of Classical Principal Component Analysis (PCA) • Derivation of kernel PCA • Exercises to develop a geometrical intuition of kernel PCA 22

  3. ADVANCED MACHINE LEARNING Principal Component Analysis: Overview Take samples of two classes (yellow and pink classes) Each image is a high- dimensional vector   320 240 3 230400 x 33

  4. ADVANCED MACHINE LEARNING Principal Component Analysis: Overview Project the images onto a lower dimensional space     2 2 230400 y through matrix A : y Ax Separating Line 44

  5. ADVANCED MACHINE LEARNING Principal Component Analysis: Overview Project the images onto a lower dimensional space     2 2 230400 y through matrix A : y Ax What is A? PCA discovers the matrix A 55

  6. ADVANCED MACHINE LEARNING Principal Component Analysis: Overview Infinite number of choices for projection matrix A  need criteria to reduce the choice 1: minimum information loss (minimal reconstruction error) x 2 x 1 What is the 2D to 1D projection that minimizes the reconstruction error? 66

  7. ADVANCED MACHINE LEARNING Principal Component Analysis: Overview Infinite number of choices for projection matrix A  need criteria to reduce the choice 1: minimum information loss(minimal reconstruction error) 2: equivalent to finding the direction with maximum variance Smallest breadth of x x data lost 2 2 Largest breadth of data conserved x x Reconstruction after projection 1 1 What is the 2D to 1D projection that minimizes the reconstruction error? 77

  8. ADVANCED MACHINE LEARNING Principal Component Analysis: Overview       1 2 M Dataset X x x ..... x (data is centered, i.e. E X 0)     1  T Compute covariance matrix of dataset : X C E XX M   T Find eigenvalue decomposition: C V V     1 N 1 V e .... e : matrix of eigenvectors  e  : Diagonal matrix of eigenvalues 2 e      1 N Order .... e e , s.t. ... 1 2 N The eigenvectors form a basis of the space. 1 is aligned with the axis of maximum variance. e Project data onto eigenvectors.  Remove projections with low (noise). 88

  9. ADVANCED MACHINE LEARNING PCA for Data Compression   x  p 0.1 N N Compressed image is y Original image is encoded in .  y A x , p Rows of A contains 1st eigenvectors p p Image compressed 90% Original Image 99

  10. ADVANCED MACHINE LEARNING PCA for Feature Extraction Results of decomposition with Principal Component Analysis: eigenvectors Encapsulate main differences across groups of images (in the first eigenvectors) Detailed features (glasses) get encapsulated next (in the following eigenvectors) 10 10

  11. ADVANCED MACHINE LEARNING Principal Component Analysis: Pros & Cons Advantages: a) The projection through PCA ensures minimal reconstruction error. b) The projection does not distort the space (rotation in space).  Ease of visualization/interpretation: The features that appear in the projections are often interpretable visually. Limitations: a ) PCA assumes a linear transformation:  With centering of data, one can only do a rotation in space. b) It fails at finding directions that require a non-linear transformation. 11 11

  12. ADVANCED MACHINE LEARNING Revisiting the hypotheses of PCA PCA assumed a linear transformation  Non-linear PCA (Kernel PCA): find a non-linear embedding of the data and then perform linear PCA. 12 12

  13. ADVANCED MACHINE LEARNING Recall: Principle of kernel Methods Going back to linearity Find a non-linear transformation that send the data in a space where linear computation is again feasible. 13 13

  14. ADVANCED MACHINE LEARNING Kernel PCA: Principle Determine a transformation which brings out features of the data so as to make subsequent computation easier. Original Space After Lifting Data in Feature Space v 2 x 2 x 1 v 1 Example above: Data becomes linearly separable when using a rbf kernel and projecting onto first 2 PC-s of kernel PCA. 14 14

  15. ADVANCED MACHINE LEARNING Kernel PCA: Principle Idea: Send the data X into a feature space H through the nonlinear map f.            i 1... M 1 ,.....,   f  f f i N M X x X x x Perform linear pca in feature space and project into set of eigenvectors in feature space   f 2 x Original H Space   f 2 1 x x 1   x f P x Scholkopf et al, Neural Computation, 1998 15 15

  16. ADVANCED MACHINE LEARNING Kernel PCA: Principle Idea: Send the data X into a feature space H through the nonlinear map f.            i 1... M 1 ,.....,   f  f f i N M X x X x x Perform linear pca in feature space and project into set of eigenvectors in feature space Data projected onto the two first principal components in feature space Original Space x 2 v 2 Determining f is difficult  Kernel Trick x 1 v 1 16 16

  17. MACHINE LEARNING – 2013 Linear PCA in Feature Space Sending the data in feature space through f:   f  f : X H x x Assume that, in feature space H, the data are centered:   M  f  i x 0  i 1 The covariance matrix in the feature space is: 1  T C FF f M    f i The columns i 1... M of are composed of F x . 17

  18. ADVANCED MACHINE LEARNING Linear PCA in Feature Space As in the original space, in feature space, the covariance matrix can be diagonalized and we have now to find the eigenvalues  i > 0 , satisfying:   i i C v v f i Primal eigenvalue problem: Finding the eigenvectors of v C f Not possible in feature space! => Formulate everything as a dot product and use kernel trick! 18 18 18

  19. ADVANCED MACHINE LEARNING From Linear PCA to Kernel PCA 1 M Each eigenvector ,..., v v can be expressed as a linear combination of the images of the datapoints: Rewriting PCA in terms of dot products:     M 1  T f f   i j j i i i Using C v = x x v with C v v f f i M  j 1   M     M 1  1  T   f  f f i j i j j i we obtain, v x x v x .   j M M  j 1  i j 1 i  ij Scalar 19 19 19

  20. ADVANCED MACHINE LEARNING Linear PCA in Feature Space Multiplying the equation:   i i C v v f i   T f j by x , on both sides, we have:     f   f   j i j i x , C v x , v , i j , 1,... M f i   M     1 M  1  T   f  f f i i l i k k i v x C v x x v f  l M M   l 1 k 1 i 20 20 20

  21. MACHINE LEARNING – 2013 Linear PCA in Feature Space             M M M 1   1   f f  f f   f f j k i k l i j l x , x x , x x , x  l l 2 M M    k 1 l 1 l 1 i  K K K jl jk kl k         f f i j i j Use the kernel trick: k x x , : K x , x ij  Eigenvalue problem of the form:    i K M , K : Gram Matrix i i  i Dual eigenvalue problem: Finding the dual eigenvectors . 21

  22. MACHINE LEARNING – 2013 Linear PCA in Feature Space The solutions to the dual eigenvalue problem:   1 M are given by all the eigenvectors ,..., with   non-zero eigenvalues ,..., . 1 M Kernel PCA finds at most M eigenvectors M: number of datapoints M>>N dimension of each datapoint  Eigenvalue problem of the form:    i K M , K : Gram Matrix i i 22

  23. MACHINE LEARNING – 2013 Linear PCA in Feature Space Request that the eigenvectors v of C be normalized, f    i i i.e. v v , 1 i 1,..., M   1 M is equivalent to asking that the dual eigenvectors ,...,    i i are such that: 1/ . 23

  24. ADVANCED MACHINE LEARNING Constructing the kPCA projections We cannot see the projection in feature space! We can only compute the projections of each point onto each eigenvector i Projection of query point x onto eigenvector : v     M M 1  1      f   f f   i i j i j v , x x , x k x , x   j j M M   j 1 j 1 i i Sum over all training points Isolines group points with equal projection:   f  i All points , . : x s t v , x cst . 24 24 24

  25. ADVANCED MACHINE LEARNING kPCA projections: Exercise i Recall: projection of query point x onto eigenvector : v   M 1    f   i i j v , x k x , x  j M  j 1 i  i where the are the dual eigenvectors, solutions of the eigenvalue decomposition of . K   Consider a 2 dimensional data space, with two datapoints, and the RBF kernel: 2  x x '      2 k x x , ' e a) How many dual eigenvectors do you have and what is their dimension? b) Compute the eigenvectors and draw t he isolines for the projections on each eigenvector.    2 c) Repeat (b) for a homogeneous polynomial kernel with p=2: k x x , ' x x , ' 25 25 25

Recommend


More recommend