Lecture 25: Autoencoders Kernel PCA Aykut Erdem January 2017 - PowerPoint PPT Presentation

Lecture 25: − Autoencoders − Kernel PCA Aykut Erdem January 2017 Hacettepe University

Today • Motivation • PCA algorithms • Applications • PCA shortcomings • Autoencoders • Kernel PCA 2

Autoencoders 3

            Relation to Neural Networks • PCA is closely related to a particular form of neural network • An autoencoder is a neural network whose outputs are its own inputs   slide by Sanja Fidler • The goal is to minimize reconstruction error 4

  Auto encoders • Define   ˆ z = f ( W x ); x = g ( V z ) slide by Sanja Fidler 5

      Auto encoders • Define   ˆ z = f ( W x ); x = g ( V z ) • Goal:   N 1 X || x ( n ) − ˆ x ( n ) || 2 min 2 N W , V n =1 slide by Sanja Fidler 6

            Auto encoders • Define   ˆ z = f ( W x ); x = g ( V z ) • Goal:   N 1 X || x ( n ) − ˆ x ( n ) || 2 min 2 N W , V n =1 • If g and f are linear   N 1 X || x ( n ) − VW x ( n ) || 2 min 2 N W , V n =1 slide by Sanja Fidler 7

            Auto encoders • Define   ˆ z = f ( W x ); x = g ( V z ) • Goal:   N 1 X || x ( n ) − ˆ x ( n ) || 2 min 2 N W , V n =1 • If g and f are linear   N 1 X || x ( n ) − VW x ( n ) || 2 min 2 N W , V n =1 slide by Sanja Fidler • In other words, the optimal solution is PCA 8

Auto encoders: Nonlinear PCA • What if g ( ) is not linear? • Then we are basically doing nonlinear PCA • Some subtleties but in general this is an accurate description slide by Sanja Fidler 9

Comparing Reconstructions Real data 30-d deep autoencoder 30-d logistic PCA 30-d PCA slide by Sanja Fidler 10

Kernel PCA 11

Dimensionality Reduction • Data representation - Inputs are real-valued vectors in a   high dimensional space.   in a • Linear structure PCA - Does the date live in a low   dimensional subspace?   • Nonlinear structure - Does the data live on a low   dimensional submanifold? slide by Rita Osadchy 12

The “magic” of high dimensions • Given some problem, how do we know what classes of functions are capable of solving that problem? • VC (Vapnik-Chervonenkis) theory tells us that often mappings which take us into a higher dimensional space than the dimension of the input space provide us with greater classification power. slide by Rita Osadchy 13

Example in R 2 These classes are Th linearly inseparable in lin the input space. slide by Rita Osadchy 14

Example: High-Dimensional Mapping W We can make the probl problem linearly separabl separable by a ma simple mapping Φ → 2 3 : R R + 2 2 a ( x , x ) ( x , x , x x ) 1 2 1 2 1 2 slide by Rita Osadchy 15

Kernel Trick • High-dimensional mapping can seriously increase computation time.   • Can we get around this problem and still get the benefit of high-D?   • Yes! Kernel Trick l Trick ( ) = φ φ T K x , x ( x ) ( x ) i j i j • Given any algorithm that can be expressed solely in terms of dot products, this trick allows us to construct di ff erent nonlinear versions of slide by Rita Osadchy it. 16

Popular Kernels slide by Rita Osadchy 17

Kernel Principle Component Analysis • Extends conventional principal component analysis (PCA) to a high dimensional feature space using the “kernel trick”. • Can extract up to n (number of samples) nonlinear principal components without expensive computations. slide by Rita Osadchy 18

Making PCA Non-Linear • Suppose that instead of using the points x i we would first map them to some nonlinear feature space φ ( x i ) - E.g. using polar coordinates instead of cartesian coordinates would help us deal with the circle. • Extract principal component in that space (PCA) • The result will be non-linear in the original data space! slide by Rita Osadchy 19

Derivation • Suppose that the mean of the data in the feature space is = ∑ n 1 µ φ = ( x ) 0 i n = i 1 • Covariance: n n 1 1 ∑ ∑ = = φ φ φ φ T T C C ( ( x x ) ) ( ( x x ) ) i i n = i 1 • Eigenvectors = λ C v v slide by Rita Osadchy 20

    Derivation • Eigenvectors can be expressed as linear combination of features: features: n ∑ = α φ v ( x ) i i = i 1 • Proof:   = ∑ n 1 φ φ = λ T C v ( x ) ( x ) v v i i n = i 1 us thus n n 1 1 ∑ ∑ = φ φ = φ ⋅ φ T T v ( x ) ( x ) v ( ( x ) v ) ( x ) λ λ i i i i n n slide by Rita Osadchy = = i 1 i 1 21

t t t t Showing that = ( ⋅ T T xx v x v ) x slide by Rita Osadchy slide by Rita Osadchy 22

t t t t Showing that = ( ⋅ T T xx v x v ) x slide by Rita Osadchy 23

Derivation • So, from before we had, So, from before we had, n n 1 1 ∑ ∑ = φ φ = φ ⋅ φ T T v ( x ) ( x ) v ( ( x ) v ) ( x ) λ λ i i i i n n = = i 1 i 1 just a scalar • this means that all solutions v with λ = 0 lie in the span of φ ( x 1 ) ,.., φ ( x n ) , i.e., n ∑ = α φ v ( x ) i i = i 1 • Finding the eigenvectors is equivalent to slide by Rita Osadchy finding the coe ffi cients α i 24

    Derivation • By substituting this back into the equation we get:   n n n 1 ∑ ∑ ∑ φ φ α φ = α φ   T λ ( x ) ( x ) ( x ) ( x ) i i jl l j jl l   n = = = i 1 l 1 l 1 • We can rewrite it as   n n n 1 ∑ ∑ ∑ φ  α  = α φ λ ( x ) K ( x , x ) ( x ) i jl i l j jl l   n = = = i 1 l 1 l 1 • Multiple this by φ ( x k ) from the left:     n n n 1 ∑ ∑ ∑ φ φ  α  = α φ φ T λ T ( x ) ( x ) K ( x , x ) ( x ) ( x ) slide by Rita Osadchy k i jl i l j jl k l   n = = = i 1 l 1 l 1 25

      Derivation • By plugging in the kernel and rearranging we get:   α = λ α 2 K n K j j j We can remove a factor of K from both sides of the matrix (this will only a ff ects the eigenvectors with zero eigenvalue, which will not be a principle component anyway):   α = λ α K n j j j • We have a normalization condition for α j n n ( ) ( ) ∑∑ slide by Rita Osadchy = ⇒ α α φ φ = ⇒ α α = T T T v v 1 x x 1 K 1 vectors: j j jl jk l k j j = = k 1 l 1 26

Derivation • By multiplying K α j = n λ j α j by α j and using the normalization condition we get: λ α α = ∀ T n 1 , j j j j • For a new point x , its projection onto the principal components is: mponents is: n n ∑ ∑ φ = α φ φ = α T T ( x ) v ( x ) ( x ) K ( x , x ) j ji i ji i = = i 1 i 1 slide by Rita Osadchy 27

Normalizing the feature space • In general, φ ( x i ) may not be zero mean. • Centered features: features: n ~ 1 ∑ φ = φ − φ ( x ) ( x ) ( x ) k i k n = k 1 • The corresponding kernel is: ~ ~ ~ = φ φ T K ( x , x ) ( x ) ( x ) i j i j T     n n 1 1 ∑ ∑ =  φ − φ   φ − φ  ( x ) ( x ) ( x ) ( x ) i k j k     n n = = k 1 k 1 n n n slide by Rita Osadchy 1 1 1 ∑ ∑ ∑ = − − + K ( x , x ) K ( x , x ) K ( x , x ) K ( x , x ) i j i k j k l k 2 n n n = = = k 1 k 1 l , k 1 28

      Normalizing the feature space n n n ~ 1 1 1 ∑ ∑ ∑ = − − + K ( x , x ) K ( x , x ) K ( x , x ) K ( x , x ) K ( x , x ) i j i j i k j k l k 2 n n n = = = k 1 k 1 l , k 1 • In a matrix form   ~ = + K K - 2 1 K 1 K 1 1/n 1/n 1/n is a matrix with all elements 1 where is a matrix with all elements 1/n. re is 1 1/n slide by Rita Osadchy 29

Summary of Kernel PCA • Pick a kernel • Construct the normalized kernel matrix of the data (dimension m x m ): ~ = = + + K K K K - - 2 2 1 1 K K 1 1 K K 1 1 1/n 1/n 1/n 1/n 1/n 1/n genvalue problem: • Solve an eigenvalue problem: ~ α = λ α K i i i point (new or old) • For any data point (new or old), we can represent it as n ∑ = α = slide by Rita Osadchy y K ( x , x ), j 1 ,.., d j ji i = i 1 30

Input points before kernel PCA slide by Rita Osadchy http://en.wikipedia.org/wiki/Kernel_principal_component_analysis 31

Output after kernel PCA The three groups are distinguishable using the first component only slide by Rita Osadchy 66 32

Example: De-noising images slide by Rita Osadchy 33

Properties of KPCA • Kernel PCA can give a good re-encoding of the data when it lies along a non-linear manifold. • The kernel matrix is n x n , so kernel PCA will have di ffi culties if we have lots of data points. slide by Rita Osadchy 34

Lecture 25: Autoencoders Kernel PCA Aykut Erdem January 2017 - PowerPoint PPT Presentation

Lecture 25: Autoencoders Kernel PCA Aykut Erdem January 2017 Hacettepe University Today Motivation PCA algorithms Applications PCA shortcomings Autoencoders Kernel PCA 2 Autoencoders 3

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

CEE 680 Lecture #2 1/22/2020 1 CEE 680 Lecture #2 1/22/2020 2 CEE 680 Lecture #2

Pocket Lecture Pocket Lecture Pocket Lecture Pocket Lecture Listen Audio Notes Progress

Multiphase Modelling in Cancer Helen Byrne Wolfson Centre for Mathematical Biology Mathematical

Previous Lecture Todays Lecture Slides for Lecture 5 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 30 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 28 Completion of divide-by-3 counter

Previous Lecture Todays Lecture Slides for Lecture 12 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 3 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 2 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 35 ENEL 353: Digital Circuits Fall

Lecture Capture Introduction to Lecture Capture Learning Outcomes What will lecture capture

Previous Lecture Todays Lecture Slides for Lecture 32 Completion of a timing analysis

Repetition Automatic Control, Basic Course, Lecture 11 Fredrik Bagge Carlson December 17, 2016

Previous Lecture Todays Lecture Slides for Lecture 26 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 33 ENEL 353: Digital Circuits Fall

Synthesis Tools for White-box Implementations Aleksei Udovenko SnT, University of Luxembourg

Introduction Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Mathematical

Quantum circuits for the CSIDH: optimizing quantum evaluation of isogenies Daniel J. Bernstein

3/18/2018 [1] PE Prof. Mor M. Peretz Analog Electronic Circuits 361-1-3671 M I C T HE C

Space charge studies based on beta measurement in J-PARC MR K. Ohmi KEK, Accelerator Lab Dec.

New Techniques for Searching Di ff erential Trails in Keccak Guozhen Liu, Weidong Qiu, Yi Tu

Early Dark Energy and the BAO Matt Francis (SISSA) with Eric Linder (LBNL) Early Dark Energy

CS-184: Computer Graphics Lecture #8: Projection Prof. James OBrien University of