lecture 25
play

Lecture 25: Autoencoders Kernel PCA Aykut Erdem January 2017 - PowerPoint PPT Presentation

Lecture 25: Autoencoders Kernel PCA Aykut Erdem January 2017 Hacettepe University Today Motivation PCA algorithms Applications PCA shortcomings Autoencoders Kernel PCA 2 Autoencoders 3


  1. Lecture 25: − Autoencoders − Kernel PCA Aykut Erdem January 2017 Hacettepe University

  2. Today • Motivation • PCA algorithms • Applications • PCA shortcomings • Autoencoders • Kernel PCA 2

  3. Autoencoders 3

  4. 
 
 
 
 
 
 Relation to Neural Networks • PCA is closely related to a particular form of neural network • An autoencoder is a neural network whose outputs are its own inputs 
 slide by Sanja Fidler • The goal is to minimize reconstruction error 4

  5. 
 Auto encoders • Define 
 ˆ z = f ( W x ); x = g ( V z ) slide by Sanja Fidler 5

  6. 
 
 
 Auto encoders • Define 
 ˆ z = f ( W x ); x = g ( V z ) • Goal: 
 N 1 X || x ( n ) − ˆ x ( n ) || 2 min 2 N W , V n =1 slide by Sanja Fidler 6

  7. 
 
 
 
 
 
 Auto encoders • Define 
 ˆ z = f ( W x ); x = g ( V z ) • Goal: 
 N 1 X || x ( n ) − ˆ x ( n ) || 2 min 2 N W , V n =1 • If g and f are linear 
 N 1 X || x ( n ) − VW x ( n ) || 2 min 2 N W , V n =1 slide by Sanja Fidler 7

  8. 
 
 
 
 
 
 Auto encoders • Define 
 ˆ z = f ( W x ); x = g ( V z ) • Goal: 
 N 1 X || x ( n ) − ˆ x ( n ) || 2 min 2 N W , V n =1 • If g and f are linear 
 N 1 X || x ( n ) − VW x ( n ) || 2 min 2 N W , V n =1 slide by Sanja Fidler • In other words, the optimal solution is PCA 8

  9. Auto encoders: Nonlinear PCA • What if g ( ) is not linear? • Then we are basically doing nonlinear PCA • Some subtleties but in general this is an accurate description slide by Sanja Fidler 9

  10. Comparing Reconstructions Real data 30-d deep autoencoder 30-d logistic PCA 30-d PCA slide by Sanja Fidler 10

  11. Kernel PCA 11

  12. Dimensionality Reduction • Data representation - Inputs are real-valued vectors in a 
 high dimensional space. 
 in a • Linear structure PCA - Does the date live in a low 
 dimensional subspace? 
 • Nonlinear structure - Does the data live on a low 
 dimensional submanifold? slide by Rita Osadchy 12

  13. The “magic” of high dimensions • Given some problem, how do we know what classes of functions are capable of solving that problem? • VC (Vapnik-Chervonenkis) theory tells us that often mappings which take us into a higher dimensional space than the dimension of the input space provide us with greater classification power. slide by Rita Osadchy 13

  14. Example in R 2 These classes are Th linearly inseparable in lin the input space. slide by Rita Osadchy 14

  15. Example: High-Dimensional Mapping W We can make the probl problem linearly separabl separable by a ma simple mapping Φ → 2 3 : R R + 2 2 a ( x , x ) ( x , x , x x ) 1 2 1 2 1 2 slide by Rita Osadchy 15

  16. Kernel Trick • High-dimensional mapping can seriously increase computation time. 
 • Can we get around this problem and still get the benefit of high-D? 
 • Yes! Kernel Trick l Trick ( ) = φ φ T K x , x ( x ) ( x ) i j i j • Given any algorithm that can be expressed solely in terms of dot products, this trick allows us to construct di ff erent nonlinear versions of slide by Rita Osadchy it. 16

  17. Popular Kernels slide by Rita Osadchy 17

  18. Kernel Principle Component Analysis • Extends conventional principal component analysis (PCA) to a high dimensional feature space using the “kernel trick”. • Can extract up to n (number of samples) nonlinear principal components without expensive computations. slide by Rita Osadchy 18

  19. Making PCA Non-Linear • Suppose that instead of using the points x i we would first map them to some nonlinear feature space φ ( x i ) - E.g. using polar coordinates instead of cartesian coordinates would help us deal with the circle. • Extract principal component in that space (PCA) • The result will be non-linear in the original data space! slide by Rita Osadchy 19

  20. Derivation • Suppose that the mean of the data in the feature space is = ∑ n 1 µ φ = ( x ) 0 i n = i 1 • Covariance: n n 1 1 ∑ ∑ = = φ φ φ φ T T C C ( ( x x ) ) ( ( x x ) ) i i n = i 1 • Eigenvectors = λ C v v slide by Rita Osadchy 20

  21. 
 
 Derivation • Eigenvectors can be expressed as linear combination of features: features: n ∑ = α φ v ( x ) i i = i 1 • Proof: 
 = ∑ n 1 φ φ = λ T C v ( x ) ( x ) v v i i n = i 1 us thus n n 1 1 ∑ ∑ = φ φ = φ ⋅ φ T T v ( x ) ( x ) v ( ( x ) v ) ( x ) λ λ i i i i n n slide by Rita Osadchy = = i 1 i 1 21

  22. t t t t Showing that = ( ⋅ T T xx v x v ) x slide by Rita Osadchy slide by Rita Osadchy 22

  23. t t t t Showing that = ( ⋅ T T xx v x v ) x slide by Rita Osadchy 23

  24. Derivation • So, from before we had, So, from before we had, n n 1 1 ∑ ∑ = φ φ = φ ⋅ φ T T v ( x ) ( x ) v ( ( x ) v ) ( x ) λ λ i i i i n n = = i 1 i 1 just a scalar • this means that all solutions v with λ = 0 lie in the span of φ ( x 1 ) ,.., φ ( x n ) , i.e., n ∑ = α φ v ( x ) i i = i 1 • Finding the eigenvectors is equivalent to slide by Rita Osadchy finding the coe ffi cients α i 24

  25. 
 
 Derivation • By substituting this back into the equation we get:   n n n 1 ∑ ∑ ∑ φ φ α φ = α φ   T λ ( x ) ( x ) ( x ) ( x ) i i jl l j jl l   n = = = i 1 l 1 l 1 • We can rewrite it as   n n n 1 ∑ ∑ ∑ φ  α  = α φ λ ( x ) K ( x , x ) ( x ) i jl i l j jl l   n = = = i 1 l 1 l 1 • Multiple this by φ ( x k ) from the left: 
   n n n 1 ∑ ∑ ∑ φ φ  α  = α φ φ T λ T ( x ) ( x ) K ( x , x ) ( x ) ( x ) slide by Rita Osadchy k i jl i l j jl k l   n = = = i 1 l 1 l 1 25

  26. 
 
 
 Derivation • By plugging in the kernel and rearranging we get: 
 α = λ α 2 K n K j j j We can remove a factor of K from both sides of the matrix (this will only a ff ects the eigenvectors with zero eigenvalue, which will not be a principle component anyway): 
 α = λ α K n j j j • We have a normalization condition for α j n n ( ) ( ) ∑∑ slide by Rita Osadchy = ⇒ α α φ φ = ⇒ α α = T T T v v 1 x x 1 K 1 vectors: j j jl jk l k j j = = k 1 l 1 26

  27. Derivation • By multiplying K α j = n λ j α j by α j and using the normalization condition we get: λ α α = ∀ T n 1 , j j j j • For a new point x , its projection onto the principal components is: mponents is: n n ∑ ∑ φ = α φ φ = α T T ( x ) v ( x ) ( x ) K ( x , x ) j ji i ji i = = i 1 i 1 slide by Rita Osadchy 27

  28. Normalizing the feature space • In general, φ ( x i ) may not be zero mean. • Centered features: features: n ~ 1 ∑ φ = φ − φ ( x ) ( x ) ( x ) k i k n = k 1 • The corresponding kernel is: ~ ~ ~ = φ φ T K ( x , x ) ( x ) ( x ) i j i j T     n n 1 1 ∑ ∑ =  φ − φ   φ − φ  ( x ) ( x ) ( x ) ( x ) i k j k     n n = = k 1 k 1 n n n slide by Rita Osadchy 1 1 1 ∑ ∑ ∑ = − − + K ( x , x ) K ( x , x ) K ( x , x ) K ( x , x ) i j i k j k l k 2 n n n = = = k 1 k 1 l , k 1 28

  29. 
 
 
 Normalizing the feature space n n n ~ 1 1 1 ∑ ∑ ∑ = − − + K ( x , x ) K ( x , x ) K ( x , x ) K ( x , x ) K ( x , x ) i j i j i k j k l k 2 n n n = = = k 1 k 1 l , k 1 • In a matrix form 
 ~ = + K K - 2 1 K 1 K 1 1/n 1/n 1/n is a matrix with all elements 1 where is a matrix with all elements 1/n. re is 1 1/n slide by Rita Osadchy 29

  30. Summary of Kernel PCA • Pick a kernel • Construct the normalized kernel matrix of the data (dimension m x m ): ~ = = + + K K K K - - 2 2 1 1 K K 1 1 K K 1 1 1/n 1/n 1/n 1/n 1/n 1/n genvalue problem: • Solve an eigenvalue problem: ~ α = λ α K i i i point (new or old) • For any data point (new or old), we can represent it as n ∑ = α = slide by Rita Osadchy y K ( x , x ), j 1 ,.., d j ji i = i 1 30

  31. Input points before kernel PCA slide by Rita Osadchy http://en.wikipedia.org/wiki/Kernel_principal_component_analysis 31

  32. Output after kernel PCA The three groups are distinguishable using the first component only slide by Rita Osadchy 66 32

  33. Example: De-noising images slide by Rita Osadchy 33

  34. Properties of KPCA • Kernel PCA can give a good re-encoding of the data when it lies along a non-linear manifold. • The kernel matrix is n x n , so kernel PCA will have di ffi culties if we have lots of data points. slide by Rita Osadchy 34

Recommend


More recommend