principal component analysis
play

Principal Component Analysis Ken Kreutz-Delgado (Nuno Vasconcelos) - PowerPoint PPT Presentation

Principal Component Analysis Ken Kreutz-Delgado (Nuno Vasconcelos) UCSD ECE 175A Winter 2012 Curse of dimensionality Typical observation in Bayes decision theory: Error increases when number of features is large Even for simple


  1. Principal Component Analysis Ken Kreutz-Delgado (Nuno Vasconcelos) UCSD — ECE 175A — Winter 2012

  2. Curse of dimensionality Typical observation in Bayes decision theory: • Error increases when number of features is large Even for simple models (e.g. Gaussian) we need a large number of examples n to have good estimates Q: what does “large” mean? This depends on the dimension of the space The best way to see this is to think of an histogram • suppose you have 100 points and you need at least 10 bins per axis in order to get a reasonable quantization for uniform data you get, on average, dimension 1 2 3 points/bin 10 1 0.1 which is decent in1D, bad in 2D, terrible in 3D (9 out of each10 bins are empty!) 2

  3. Curse of Dimensionality This is the curse of dimensionality: • For a given classifier the number of examples required to maintain classification accuracy increases exponentially with the dimension of the feature space In higher dimensions the classifier has more parameters • Therefore: Higher complexity & Harder to learn 3

  4. Dimensionality Reduction What do we do about this? Avoid unnecessary dimensions “Unnecessary” features arise in two ways: 1.features are not discriminant 2.features are not independent (are highly correlated) Non-discriminant means that they do not separate the classes well discriminan t non-discriminant 4

  5. Dimensionality Reduction Q: How do we detect the presence of feature correlations? A: The data “lives” in a low dimensional subspace (up to some amounts of noise). E.g. new feature y salary salary o o o o o o o o o o o o o o o o o o o projection onto o o o o o o o o o o 1D subspace: y = a x car loan car loan In the example above we have a 3D hyper-plane in 5D If we can find this hyper-plane we can: • Project the data onto it • Get rid of two dimensions without introducing significant error 5

  6. Principal Components Basic idea: • If the data lives in a (lower dimensional) subspace, it is going to look very flat when viewed from the full space, e.g. 2D subspace in 3D 1D subspace in 2D This means that: • If we fit a Gaussian to the data the iso-probability contours are going to be highly skewed ellipsoids • The directions that explain most of the variance in the fitted data give the Principle Components of the data. 6

  7. Principal Components How do we find these ellipsoids? When we talked about metrics we said that the • Mahalanobis distance measures the “natural” units for the problem because it is “adapted” to the covariance of the data We also know that • What is special about it is that it uses S -1 Hence, information about possible subspace structure must be in the covariance     S    2 ( , ) 1 T matrix S ( ) ( ) d x x x 7

  8. Multivariate Gaussian Review The equiprobability contours (level sets) of a Gaussian are the points such that Let’s consider the change of variable z = x-  , which only moves the origin by  . The equation is the equation of an ellipse (a hyperellipse). This is easy to see when S is diagonal: 8

  9. Gaussian Review This is the equation of an ellipse with principal lengths s i • E.g. when d = 2 is the ellipse z 2 s 2 s 1 z 1 9

  10. Gaussian Review Introduce a transformation y = F z Then y has covariance If F is proper orthogonal this is just a rotation and we have y 2 z 2 f 1 f 2 y = F z s 1 s 2 s 2 s 1 y 1 z 1 We obtain a rotated ellipse with principal components f 1 and f 2 which are the columns of F Note that is the eigendecomposition of S y 10

  11. Principal Component Analysis (PCA) If y is Gaussian with covariance S , the equiprobability contours are the ellipses whose y 2 f 1 • Principal Components f i are f 2 the eigenvectors of S s 1 s 2 • Principal Values (lengths) s i are the y 1 square roots of the eigenvalues l i of S By computing the eigenvalues we know if the data is flat s 1 >> s 2 : flat s 1 = s 2 : not flat y 2 y 2 s 2 s 1 s 2 s 1 y 1 y 1 11

  12. Learning-based PCA 12

  13. Learning-based PCA 13

  14. Principal Component Analysis How to determine the number of eigenvectors to keep? One possibility is to plot eigenvalue magnitudes • This is called a Scree Plot • Usually there is a fast decrease in the eigenvalue magnitude followed by a flat area • One good choice is the knee of this curve 14

  15. Principal Component Analysis Another possibility: Percentage of Explained Variance • Remember that eigenvalues are a measure of variance along the principle directions (eigenvectors) y 2 z 2 f 1 f 2 y = F z l 1 s 2 l 2 s 1 y 1 z 1 • Ratio r k measures % of total variance k   s 2 contained in the top k eigenvalues i  1 i • Measure of the fraction of data variability r k n  along the associated eigenvectors s 2 i  1 i 15

  16. Principal Component Analysis Given r k a natural measure is to pick the eigenvectors that explain p % of the data variability • This can be done by plotting the ratio r k as a function of k • E.g. we need 3 eigenvectors to cover 70% of the variability of this dataset 16

  17. PCA by SVD There is an alternative way to compute the principal components, based on the singular value decomposition (“Condensed”) Singular Value Decomposition (SVD): • Any full-rank n x m matrix ( n > m ) can be decomposed as A   P  T • M is a n x m (nonsquare) column orthogonal matrix of left singular vectors (columns of M ) • P is an m x m (square) diagonal matrix containing the m singular values (which are nonzero and strictly positive) • N an m x m row orthogonal matrix of right singular vectors (columns of N = rows of N T )        T T T NN I I   m m m m 17

  18. PCA by SVD To relate this to PCA, we construct the d x n Data Matrix   | |     X x x  1 n    | |  The sample mean is     | | 1     n 1 1 1      1 x x x X     1 i n n n n      1 i  | |    1 18

  19. PCA by SVD We center the data by subtracting the mean from each column of X This yields the d x n Centered Data Matrix     | | | |         X x x     1 c n      | |   | |    1 1        T T T 1 11  11  X X X X I   n n 19

  20. PCA by SVD The Sample Covariance is the d x d matrix   1 1      T T S       c c x x x x i i i i n n i i c is the i th column of X c where x i This can be written as       c | | x 1     1 1 S   c c T x x X X     1 n c c n n       c    | | x  n 20

  21. PCA by SVD The centered data matrix     c x 1     T X  c     c x   n is n x d . Assuming it has rank = d , it has the SVD:  P T T       X T T I I c This yields: 1 1 1 S   P P  P  T T T 2 T X X c c n n n 21

  22. PCA by SVD   1 S   P  2 T     n Noting that N is d x d and orthonormal, and P 2 diagonal, shows that this is just the eigenvalue decomposition of S It follows that • The eigenvectors of S are the columns of N • The eigenvalues of S are  2 l  s  2 i i i n This gives an alternative algorithm for PCA 22

  23. PCA by SVD Summary of Computation of PCA by SVD: Given X with one example per column • 1) Create the (transposed) Centered Data-Matrix:   1 11   T T T   X I X c   n • 2) Compute its SVD:  P  T T X c • 3) Principal Components are columns of N; Principle Values are:  s  l  i i i n 23

  24. Principal Component Analysis Principal components are often quite informative about the structure of the data Example: • Eigenfaces, the principal components for the space of images of faces • The figure only show the first 16 eigenvectors (eigenfaces) • Note lighting, structure, etc 24

  25. Principal Components Analysis PCA has been applied to virtually all learning problems E.g. eigenshapes for face morphing morphed faces 25

  26. Principal Component Analysis Sound average Eigensounds corresponding to the three sound images highest eigenvalues 26

  27. Principal Component Analysis Turbulence Eigenflames Flames 27

  28. Principal Component Analysis Video Eigenrings reconstruction 28

  29. Principal Component Analysis Text: Latent Semantic Indexing • Represent each document by a word histogram • Perform SVD on the document x word matrix terms concepts terms concepts x x documents documents = • Principal components as the directions of semantic concepts 29

Recommend


More recommend