On the Eigenspectrum Eigenspectrum of the Gram of the Gram On the Matrix and the Generalisation Generalisation Error Error Matrix and the of Kernel PCA ( of Kernel PCA (Shawe Shawe- -Taylor, et al. 2005) Taylor, et al. 2005) Ameet Talwalkar Ameet Talwalkar 02/13/07 02/13/07
Outline Outline � Background Background � � Motivation Motivation � � PCA, MDS PCA, MDS � � ( (Isomap Isomap) ) � � Kernel PCA Kernel PCA � � Generalisation Generalisation Error of Kernel PCA Error of Kernel PCA �
Dimensional Reduction: Motivation Dimensional Reduction: Motivation Lossy Lossy � Computational efficiency Computational efficiency � � Visualization of data requires 2D or 3D representations Visualization of data requires 2D or 3D representations � � Curse of Dimensionality : Learning algorithms require Curse of Dimensionality : Learning algorithms require � “reasonably” good sampling “reasonably” good sampling Intractable learning A(x) problem Tractable learning A(x’) problem Dim Red x -> x’ Lossless – – “Manifold Learning” “Manifold Learning” Lossless � Assumes existence of “intrinsic dimension,” or a Assumes existence of “intrinsic dimension,” or a � reduced representation containing all independent reduced representation containing all independent variables variables
Linear Dimensional Reduction Linear Dimensional Reduction � Assumes input data is a linear function of the Assumes input data is a linear function of the � independent variables independent variables � Common Methods: Common Methods: � � Principal Component Analysis (PCA) Principal Component Analysis (PCA) � � Multidimensional Scaling (MDS) Multidimensional Scaling (MDS) �
PCA – – Big Picture Big Picture PCA � Linearly transform input data in a way Linearly transform input data in a way � that: that: � Maximizes signal (variance) Maximizes signal (variance) � � Minimizes redundancy of signal (covariance) Minimizes redundancy of signal (covariance) �
PCA – – Simple Example Simple Example PCA � Original Data Points Original Data Points � � E.g. shoe size E.g. shoe size � measured in ft, cm measured in ft, cm � y = x provides a y = x provides a � good approx of data good approx of data
PCA – – Simple Example (cont) Simple Example (cont) PCA � Original data Original data � restored using only restored using only first principal first principal component component
PCA – – Covariance Covariance PCA � Covariance is a measure of how much two Covariance is a measure of how much two � variables vary together variables vary together = − − cov( x , y ) E [( x x )( y y )] � � = cov( x , x ) var( x ) � � � If x and y are independent, then If x and y are independent, then cov(x,y cov(x,y) = 0 ) = 0 �
PCA – – Covariance Matrix Covariance Matrix PCA � Stores Stores pairwise pairwise covariance of variables covariance of variables � � Diagonals are variances Diagonals are variances � � Symmetric, Positive Semi Symmetric, Positive Semi- -definite definite � � Start with m column vector observations of n variables Start with m column vector observations of n variables � � Covariance is an n x n matrix Covariance is an n x n matrix � [ ] [ ] [ ] = − − T C E ( X E X )( X E X ) X 1 1 m ∑ T = = T C XX x x X i i m m = i 1
Eigendecomposition Eigendecomposition � Eigenvectors (v) and eigenvalues ( Eigenvectors (v) and eigenvalues ( λ ) for an n x n λ ) for an n x n � matrix, A, are pairs (v, λ ) such that: matrix, A, are pairs (v, λ ) such that: = λ Av v � � � If A is a real symmetric matrix, it can be If A is a real symmetric matrix, it can be � diagonalized into A = E DE into A = E DE T diagonalized T � E = A’s E = A’s orthonormal orthonormal eigenvectors eigenvectors � � D = diagonal matrix of A’s eigenvalues D = diagonal matrix of A’s eigenvalues � � A is positive semi A is positive semi- -definite => eigenvalues non definite => eigenvalues non- -negative negative �
PCA – – Goal (x3) Goal (x3) PCA � Linearly transform input data in a way that: Linearly transform input data in a way that: � � Maximizes signal (variance) Maximizes signal (variance) � � Minimizes redundancy of signal (covariance) Minimizes redundancy of signal (covariance) � � Algorithm: Algorithm: � � Select variance maximizing direction input space Select variance maximizing direction input space � � Find next variance maximizing direction that is orthogonal Find next variance maximizing direction that is orthogonal � to all previously selected directions to all previously selected directions � Repeat k Repeat k- -1 times 1 times � � Find a transformation, P, such that Y = PX and C Find a transformation, P, such that Y = PX and C Y is Y is � diagonalized diagonalized � Solution: project data onto eigenvectors of Solution: project data onto eigenvectors of C C x � x
PCA – – Algorithm Algorithm PCA Goal: Find P where Y = PX s.t s.t. C . C Y is diagonalized diagonalized Goal: Find P where Y = PX Y is � � Select P = E T , or a matrix where each Select P = E T , or a matrix where each 1 � � = T C YY row is an eigenvector of C row is an eigenvector of C x Y m x 1 = T C PAP = T ( PX )( PX ) Y m = T T P ( P DP ) P 1 = T T PXX P m = D = T PAP Inverse = Transpose for Inverse = Transpose for orthonormal orthonormal � � = 1 matrix matrix = T T where A XX EDE m C Y is diagonalized diagonalized C Y is � � PCs are the eigenvectors of C C x PCs are the eigenvectors of � � x th diagonal value of C note: eigenvectors of E are note: eigenvectors of E are i th diagonal value of C Y is the variance of i Y is the variance of � � � � orthonormal orthonormal X along p i X along p i
Gram Matrix (Kernel Matrix) Gram Matrix (Kernel Matrix) � Given X, a collection of m column vector observations Given X, a collection of m column vector observations � of n variables of n variables � Gram Matrix of M: matrix of dot products of inputs Gram Matrix of M: matrix of dot products of inputs � � m x m, real, symmetric m x m, real, symmetric � � Positive semi Positive semi- -definite definite � � “similarity matrix” “similarity matrix” � = T K X X = ⋅ K x x ij i j
Classical Multidimensional Scaling Classical Multidimensional Scaling � Given m objects and dissimilarity Given m objects and dissimilarity δ δ ij for each pair, ij for each pair, � find space in which δ δ ij ≈ Euclidean distance Euclidean distance find space in which ij ≈ 2 = Euclidean Distance: � If If δ δ ij = Euclidean Distance: ij 2 � � Can convert Dissimilarity matrix to Gram Matrix (or we Can convert Dissimilarity matrix to Gram Matrix (or we � can just start with Gram Matrix) can just start with Gram Matrix) � MDS yields same answer as PCA MDS yields same answer as PCA �
Classical Multidimensional Scaling Classical Multidimensional Scaling � Convert Dissimilarity Matrix to Gram Matrix (K) Convert Dissimilarity Matrix to Gram Matrix (K) � � Eigendecomposition Eigendecomposition of K of K � � K = EDE K = EDE T = ED ED 1/2 D 1/2 E T = (ED (ED 1/2 ) (ED 1/2 ) T T = 1/2 D 1/2 E T = 1/2 ) (ED 1/2 ) T � T X � K = K = X X T X � � X X = (ED = (ED 1/2 ) T 1/2 ) T � � Reduce Dimension Reduce Dimension � � Construct Construct X X from subset of eigenvectors/ from subset of eigenvectors/eigenvalues eigenvalues � � Identical to PCA Identical to PCA �
Limitations of Linear Methods Limitations of Linear Methods � Cannot account for non Cannot account for non- - � Small linear relationship of data in linear relationship of data in Euclidean input space input space distance � Data may still have linear Data may still have linear � relationship in some feature relationship in some feature space space � Isomap Isomap: use geodesic : use geodesic � distance to recover manifold distance to recover manifold Large geodesic � Length of shortest curve on Length of shortest curve on � distance a manifold connecting two a manifold connecting two points on the manifold points on the manifold
Local Estimation of Manifolds Local Estimation of Manifolds � Small patches on a non Small patches on a non- -linear manifold look linear linear manifold look linear � � Locally linear neighborhoods defined in two ways Locally linear neighborhoods defined in two ways � � k k - -nearest neighbors: find the k nearest points to a given point nearest neighbors: find the k nearest points to a given point � � ε ε - - ball: find all points that lie within ball: find all points that lie within ε ε of a given point of a given point �
Recommend
More recommend