principal component analysis
play

Principal component analysis Ingo Blechschmidt December 17th, 2014 - PowerPoint PPT Presentation

Theory Applications Principal component analysis Ingo Blechschmidt December 17th, 2014 Kleine Bayessche AG Principal component analysis 1 / 12 Theory Applications Principal component analysis Ingo Blechschmidt December 17th, 2014 Kleine


  1. Theory Applications Principal component analysis Ingo Blechschmidt December 17th, 2014 Kleine Bayessche AG Principal component analysis 1 / 12

  2. Theory Applications Principal component analysis Ingo Blechschmidt December 17th, 2014 Kleine Bayessche AG Principal component analysis 1 / 12

  3. Theory Applications Principal component analysis Ingo Blechschmidt December 17th, 2014 Kleine Bayessche AG Principal component analysis 1 / 12

  4. Theory Applications Principal component analysis Ingo Blechschmidt December 17th, 2014 Kleine Bayessche AG Principal component analysis 1 / 12

  5. Theory Applications Principal component analysis Ingo Blechschmidt December 17th, 2014 Kleine Bayessche AG Principal component analysis 1 / 12

  6. Theory Applications Principal component analysis Ingo Blechschmidt December 17th, 2014 Kleine Bayessche AG Principal component analysis 1 / 12

  7. Theory Applications Principal component analysis Ingo Blechschmidt December 17th, 2014 Kleine Bayessche AG Principal component analysis 1 / 12

  8. Theory Applications Principal component analysis Ingo Blechschmidt December 17th, 2014 Kleine Bayessche AG Principal component analysis 1 / 12

  9. Theory Applications Principal component analysis Ingo Blechschmidt December 17th, 2014 Kleine Bayessche AG Principal component analysis 1 / 12

  10. Theory Applications Principal component analysis Ingo Blechschmidt December 17th, 2014 Kleine Bayessche AG Principal component analysis 1 / 12

  11. Theory Applications Principal component analysis Ingo Blechschmidt December 17th, 2014 Kleine Bayessche AG Principal component analysis 1 / 12

  12. Theory Applications Principal component analysis Ingo Blechschmidt December 17th, 2014 Kleine Bayessche AG Principal component analysis 1 / 12

  13. Theory Applications Principal component analysis Ingo Blechschmidt December 17th, 2014 Kleine Bayessche AG Principal component analysis 1 / 12

  14. Theory Applications Principal component analysis Ingo Blechschmidt December 17th, 2014 Kleine Bayessche AG Principal component analysis 1 / 12

  15. Theory Applications Principal component analysis Ingo Blechschmidt December 17th, 2014 Kleine Bayessche AG Principal component analysis 1 / 12

  16. Theory Applications Principal component analysis Ingo Blechschmidt December 17th, 2014 Kleine Bayessche AG Principal component analysis 1 / 12

  17. Theory Applications Principal component analysis Ingo Blechschmidt December 17th, 2014 Kleine Bayessche AG Principal component analysis 1 / 12

  18. Theory Applications Principal component analysis Ingo Blechschmidt December 17th, 2014 Kleine Bayessche AG Principal component analysis 1 / 12

  19. Theory Applications Principal component analysis Ingo Blechschmidt December 17th, 2014 Kleine Bayessche AG Principal component analysis 1 / 12

  20. Theory Applications Principal component analysis Ingo Blechschmidt December 17th, 2014 Kleine Bayessche AG Principal component analysis 1 / 12

  21. Theory Applications Outline 1 Theory Singular value decomposition Pseudoinverses Low-rank approximation 2 Applications Image compression Proper orthogonal decomposition Principal component analysis Eigenfaces Digit recognition Kleine Bayessche AG Principal component analysis 2 / 12

  22. Theory Applications SVD Pseudoinverses Low-rank approximation Singular value decomposition Let A ∈ R n × m . Then there exist numbers σ 1 ≥ σ 2 ≥ · · · ≥ σ m ≥ 0 , an orthonormal basis v 1 , . . . , v m of R m , and an orthonormal basis w 1 , . . . , w n of R n , such that A v i = σ i w i , i = 1 , . . . , m . In matrix language: A = W Σ V t , V = ( v 1 | . . . | v m ) ∈ R m × m orthogonal , where W = ( w 1 | . . . | w n ) ∈ R n × n orthogonal , Σ = diag ( σ 1 , . . . , σ m ) ∈ R n × m . Kleine Bayessche AG Principal component analysis 3 / 12

  23. • The singular value decomposition (SVD) exists for any real matrix, even rectangular ones. • The singular values σ i are unique. • The basis vectors are not unique. • If A is orthogonally diagonalizable with eigenvalues λ i (for instance, if A is symmetric), then σ i = | λ i | . �� �� � ij A 2 i σ 2 • � A � Frobenius = ij = tr( A t A ) = i . • There exists a generalization to complex matrices. In this case, the matrix A can be decomposed as W Σ V ⋆ , where V ⋆ is the complex conjugate of V t and W and V are unitary matrices. • The singular value decomposition can also be formulated in a basis-free manner as a result about linear maps between finite-dimensional Hilbert spaces.

  24. Existence proof (sketch): 1. Consider the eigenvalue decomposition of the symmetric and positive-semidefinite matrix A t A : We have an orthonor- mal basis v i of eigenvectors corresponding to eigenvalues λ i . 2. Set σ i := √ λ i . 3. Set w i := 1 σ i A v i (for those i with λ i � = 0 ). 4. Then A v i = σ i w i holds trivially. σ i σ j ( A t A v i , v j ) = λ i δ ij 1 5. The w i are orthonormal: ( w i , w j ) = σ i σ j . 6. If necessary, extend the w i to an orthonormal basis. This proof gives rise to an algorithm for calculating the SVD, but unless A t A is small, it has undesirable numerical properties. (But note that one can also use AA t !) Since the 1960ies, there exists a stable iterative algorithm by Golub and van Loan.

  25. Theory Applications SVD Pseudoinverses Low-rank approximation The pseudoinverse of a matrix Let A ∈ R n × m and b ∈ R n . Then the solutions to the optimization problem � A x − b � 2 − → min under x ∈ R m are given by � 0 � x = A + b + V , ⋆ where A = W Σ V t is the SVD and A + = W Σ + V t , Σ + = diag ( σ − 1 1 , . . . , σ − 1 m ) . Kleine Bayessche AG Principal component analysis 4 / 12

  26. • In the formula for Σ + , set 0 − 1 := 0 . • If A happens to be invertible, then A + = A − 1 . • The pseudoinverse can be used for polynomial approxima- tion: Let data points ( x i , y i ) ∈ R 2 , 1 ≤ i ≤ N , be given. Want to find a polynomial p ( z ) = � n k =0 α i z i , n ≪ N , such that N | p ( x i ) − y i | 2 − � → min. i =1 In matrix language, this problem is written � A u − y � 2 − → min where u = ( α 0 , . . . , α N ) T ∈ R n +1 and x 2  x n    1 x 1 · · · y 1 1 1 x 2 x n 1 · · · x 2 y 2     2 2  ∈ R N × ( n +1) ,  ∈ R N . A = y =  . . . .   .  ... . . . . .     . . . . .   x 2 x n 1 · · · x N y N N N

  27. Theory Applications SVD Pseudoinverses Low-rank approximation Low-rank approximation Let A = W Σ V t ∈ R n × m and 1 ≤ r ≤ n , m . Then a solution to the optimization problem � A − M � Frobenius − → min under all matrices M with rank M ≤ r is given by M = W Σ r V t , where Σ r = diag ( σ 1 , . . . , σ r , 0 , . . . , 0) . The approximation error is � � A − W Σ r V t � F = σ 2 r +1 + · · · + σ 2 m . Kleine Bayessche AG Principal component analysis 5 / 12

  28. • This is the Eckart–Young(–Mirsky) theorem. • Beware of false and incomplete proofs in the literature!

  29. Theory Applications Image compression POD PCA Eigenfaces Digit recognition Image compression Think of images as matrices. Substitute a matrix W Σ V t by W Σ r V t with r small. To reconstruct W Σ r V t , only need to know the r singular values σ 1 , . . . , σ r , r the first r columns of W , and height · r the top r rows of V t . width · r Total amount: r · (1 + height + weight ) ≪ height · width Kleine Bayessche AG Principal component analysis 6 / 12

  30. • See http://speicherleck.de/iblech/stuff/pca-images. pdf for sample compressions and http://pizzaseminar. speicherleck.de/skript4/08-principal-component-analysis/ svd-image.py for the Python code producing theses im- ages. • Image compression by singular value decomposition is mostly of academic interest only. • This might be for the following reasons: other compression algorithms have more efficient implementations; other al- gorithms taylor to the specific properties of human vision; the basis vectors of other approaches (for instance, DCT) are similar to the most important singular basis vectors of a sufficiently large corpus of images. • See http://dsp.stackexchange.com/questions/7859/relationship-between-dct-and-pca .

  31. Theory Applications Image compression POD PCA Eigenfaces Digit recognition Proper orthogonal decomposition Given data points x i ∈ R N , want to find a low-dimensional linear subspace which approximately contains the x i . Minimize � � x i − P U ( x i ) � 2 J ( U ) := i under all r -dimensional subspaces U ⊆ R N , r ≪ N , where P U : R N → R N is the orthogonal projection onto U . Kleine Bayessche AG Principal component analysis 7 / 12

Recommend


More recommend