probability and statistics
play

Probability and Statistics for Computer Science Principal - PowerPoint PPT Presentation

Probability and Statistics for Computer Science Principal Component Analysis --- Exploring the data in less dimensions Credit: wikipedia Hongye Liu, Teaching Assistant Prof, CS361, UIUC, 10.27.2020 Last time Review of Bayesian inference


  1. Probability and Statistics ì for Computer Science Principal Component Analysis --- Exploring the data in less dimensions Credit: wikipedia Hongye Liu, Teaching Assistant Prof, CS361, UIUC, 10.27.2020

  2. Last time ✺ Review of Bayesian inference ✺ Visualizing high dimensional data & Summarizing data ✺ The covariance matrix

  3. Objectives ✺ Principal Component Analysis ✺ Examples of PCA

  4. Diagonalization of a symmetric matrix ✺ If A is an n × n symmetric square matrix, the eigenvalues are real. ✺ If the eigenvalues are also disSnct, their eigenvectors are orthogonal ✺ We can then scale the eigenvectors to unit length, and place them into an orthogonal matrix U = [ u 1 u 2 …. u n ] ✺ We can write the diagonal matrix such Λ = U T AU that the diagonal entries of Λ are λ 1 , λ 2 … λ n in that order.

  5. Diagonalization example ✺ For � � 5 3 A = 3 5

  6. Covariance for a pair of components in a data set ✺ For the jth and kth components of a data set {x} i ( x ( j ) − mean ( { x ( j ) } ))( x ( k ) − mean ( { x ( k ) } )) T � cov ( { x } ; j, k ) = i i N

  7. Covariance matrix Data set { x } 7×8 { x } Covmat( ) 7×7 cov ( { x } ; 3 , 5) 1 2 3 4 5 6 7 1 2 3 4 5 6 7 8 1 * * * * * * * 1 * * * * * * * * 2 * * * * * * * 2 * * * * * * * * 3 * * * * * * * * 3 * * * * * * * { 4 * * * * * * * * 4 * * * * * * * 5 * * * * * * * * 5 * * * * * * * 6 * * * * * * * * 7 * * * * * * * * 6 * * * * * * * 7 * * * * * * *

  8. Properties of Covariance matrix { x } Covmat( ) cov ( { x } ; j, j ) = var ( { x ( j ) } ) 7×7 1 2 3 4 5 6 7 ✺ The diagonal elements 1 * * * * * * * of the covariance matrix 2 * * * * * * * are just variances of 3 * * * * * * * each jth components 4 * * * * * * * ✺ The off diagonals are 5 * * * * * * * covariance between 6 * * * * * * * different components 7 * * * * * * *

  9. Properties of Covariance matrix { x } Covmat( ) 7×7 cov ( { x } ; j, k ) = cov ( { x } ; k, j ) 1 2 3 4 5 6 7 ✺ The covariance 1 * * * * * * * matrix is symmetric ! 2 * * * * * * * ✺ And it’s posi6ve 3 * * * * * * * semi-definite , that is 4 * * * * * * * all λ i ≥ 0 5 * * * * * * * 6 * * * * * * * ✺ Covariance matrix is 7 * * * * * * * diagonalizable

  10. Properties of Covariance matrix { x } Covmat( ) ✺ If we define x c as the 7×7 mean centered 1 2 3 4 5 6 7 matrix for dataset {x} 1 * * * * * * * 2 * * * * * * * Covmat ( { x } ) = x c × x T 3 * * * * * * * c N 4 * * * * * * * 5 * * * * * * * ✺ The covariance 6 * * * * * * * matrix is a d×d matrix 7 * * * * * * * d =7

  11. Example: covariance matrix of a data set (I) What are the dimensions of the covariance matrix of this data? X (1) � � 5 4 3 2 1 A 0 = X (2) − 1 1 0 1 − 1 A) 2 by 2 B) 5 by 5 C) 5 by 2 D) 2 by 5

  12. Example: covariance matrix of a data set Mean centering A 2 = A 1 A T (II) (I) 1 � � 5 4 3 2 1 Inner product of each pairs: A 0 = [1,1] = 10 − 1 1 0 1 − 1 A 2 [2,2] = 4 A 2 � � 2 1 0 − 1 − 2 [1,2] = 0 A 2 A 1 = − 1 1 0 1 − 1 (III) Divide the matrix with N – the number of data poits � � � � = 1 N A 2 = 1 10 0 2 0 Covmat( ) { x } = 0 . 8 0 4 0 5

  13. What do the data look like when Covmat({x}) is diagonal? X (2) X (1) � � 5 4 3 2 1 A 0 = − 1 1 0 1 − 1 X (2) * * X (1) * * * � � � � { x } = 1 N A 2 = 1 10 0 2 0 Covmat( ) = 0 . 8 0 4 0 5

  14. What is the correlation between the 2 components for the data m? � � 20 25 Covmat ( m ) = 25 40

  15. Q. Is this true? Transforming a matrix with orthonormal matrix only rotates the data A. Yes B. No

  16. Dimension Reduction ✺ In stead of showing more dimensions through visualizaSon, it’s a good idea to do dimension reducSon in order to see the major features of the data set. ✺ For example, principal component analysis help find the major components of the data set. ✺ PCA is essenSally about finding eigenvectors of the covariance matrix of the data set {x}

  17. Dimension reduction from 2D to 1D Credit: Prof. Forsyth

  18. Step 1: subtract the mean Credit: Prof. Forsyth

  19. Step 2: Rotate to diagonalize the covariance Credit: Prof. Forsyth

  20. Step 3: Drop component(s) Credit: Prof. Forsyth

  21. Principal Components ✺ The columns of are the normalized eigenvectors of U the Covmat({x}) and are called the principal components of the data {x}

  22. Principal components analysis ✺ We reduce the dimensionality of dataset { x } represented by matrix from d to s (s < d). D d × n ✺ Step 1. define matrix such that m = D − mean ( D ) m d × n r i = U T m i ✺ Step 2. define matrix such that r d × n Λ = U T Covmat ( { x } ) U Λ Where saSsfies , is U T the diagonalizaSon of with the eigenvalues Covmat ( { x } ) sorted in decreasing order, is the orthonormal U eigenvectors’ matrix ✺ Step 3. Define matrix such that is with the last p p d × n r d-s components of made zero. r

  23. What happened to the mean? ✺ Step 1. mean ( m ) = mean ( D − mean ( D )) = 0 ✺ Step 2. mean ( r ) = U T mean ( m ) = U T 0 = 0 ✺ Step 3. mean ( p i ) = mean ( r i ) = 0 while i ∈ 1 : s mean ( p i ) = 0 while i ∈ s + 1 : d

  24. What happened to the covariances? ✺ Step 1. Covmat ( m ) = Covmat ( D ) = Covmat ( { x } ) ✺ Step 2. Covmat ( r ) = U T Covmat ( m ) U = Λ ✺ Step 3. is with the last/smallest d-s Λ Covmat ( p ) diagonal terms turned to 0.

  25. Sample covariance matrix ✺ In many staSsScal programs, the sample covariance matrix is defined to be Covmat ( m ) = m m T N − 1 ✺ Similar to what happens to the unbiased standard deviaSon

  26. PCA an example ✺ Step 1. � � � � 3 − 4 7 1 − 4 − 3 0 D = ⇒ mean ( D ) = 7 − 6 8 − 1 − 1 − 7 0 � � 3 − 4 7 1 − 4 − 3 m = 7 − 6 8 − 1 − 1 − 7 ✺ Step 2. ✺ Step 3.

  27. PCA an example ✺ Step 1. � � � � 3 − 4 7 1 − 4 − 3 0 D = ⇒ mean ( D ) = 7 − 6 8 − 1 − 1 − 7 0 � � 3 − 4 7 1 − 4 − 3 m = 7 − 6 8 − 1 − 1 − 7 ✺ Step 2. � � 20 25 Covmat ( m ) = λ 1 ≃ 57; λ 2 ≃ 3 ⇒ 25 40 � � � � 0 . 5606288 0 . 8280672 0 . 5606288 − 0 . 8280672 U T = ⇒ U = − 0 . 8280672 0 . 5606288 0 . 8280672 0 . 5606288 ✺ Step 3.

  28. PCA an example ✺ Step 1. � � � � 3 − 4 7 1 − 4 − 3 0 D = ⇒ mean ( D ) = 7 − 6 8 − 1 − 1 − 7 0 � � 3 − 4 7 1 − 4 − 3 m = 7 − 6 8 − 1 − 1 − 7 ✺ Step 2. � � 20 25 Covmat ( m ) = λ 1 ≃ 57; λ 2 ≃ 3 ⇒ 25 40 � � � � 0 . 5606288 0 . 8280672 0 . 5606288 − 0 . 8280672 U T = ⇒ U = − 0 . 8280672 0 . 5606288 0 . 8280672 0 . 5606288 � � 7 . 478 − 7 . 211 10 . 549 − 0 . 267 − 3 . 071 − 7 . 478 ⇒ r = U T m = 1 . 440 − 0 . 052 − 1 . 311 − 1 . 389 2 . 752 − 1 . 440 ✺ Step 3.

  29. PCA an example ✺ Step 1. � � � � 3 − 4 7 1 − 4 − 3 0 D = ⇒ mean ( D ) = 7 − 6 8 − 1 − 1 − 7 0 � � 3 − 4 7 1 − 4 − 3 m = 7 − 6 8 − 1 − 1 − 7 ✺ Step 2. � � 20 25 Covmat ( m ) = λ 1 ≃ 57; λ 2 ≃ 3 ⇒ 25 40 � � � � 0 . 5606288 0 . 8280672 0 . 5606288 − 0 . 8280672 U T = ⇒ U = − 0 . 8280672 0 . 5606288 0 . 8280672 0 . 5606288 � � 7 . 478 − 7 . 211 10 . 549 − 0 . 267 − 3 . 071 − 7 . 478 ⇒ r = U T m = 1 . 440 − 0 . 052 − 1 . 311 − 1 . 389 2 . 752 − 1 . 440 ✺ Step 3. � � 7 . 478 − 7 . 211 10 . 549 − 0 . 267 − 3 . 071 − 7 . 478 ⇒ p = 0 0 0 0 0 0

  30. What is this matrix for the previous example? U T Covmat ( m ) U =?

  31. What is this matrix for the previous example? U T Covmat ( m ) U =? � � 57 0 0 3

  32. The Mean square error of the projection ✺ The mean square error is the sum of the smallest d-s eigenvalues in Λ d 1 1 ∥ r i − p i ∥ 2 = ( r ( j ) � � � i ) 2 N − 1 N − 1 j = s +1 i i

  33. The Mean square error of the projection ✺ The mean square error is the sum of the smallest d-s eigenvalues in Λ d d 1 1 ∥ r i − p i ∥ 2 = 1 ( r ( j ) � � � i ) 2 N − 1( r ( j ) � � i ) 2 = N − 1 N − 1 j = s +1 i i j = s +1 i

  34. The Mean square error of the projection ✺ The mean square error is the sum of the smallest d-s eigenvalues in Λ d d 1 1 ∥ r i − p i ∥ 2 = 1 ( r ( j ) � � � i ) 2 N − 1( r ( j ) � � i ) 2 = N − 1 N − 1 j = s +1 i i j = s +1 i d var ( r ( j ) � = i ) j = s +1

  35. The Mean square error of the projection ✺ The mean square error is the sum of the smallest d-s eigenvalues in Λ d d 1 1 ∥ r i − p i ∥ 2 = 1 ( r ( j ) � � � i ) 2 N − 1( r ( j ) � � i ) 2 = N − 1 N − 1 j = s +1 i i j = s +1 i d var ( r ( j ) � = i ) j = s +1 d � λ j = j = s +1

Recommend


More recommend