Principal Components Analysis (PCA) and Singular Value Decomposition (SVD) with applications to Microarrays Prof. Tesler Math 283 Fall 2018 Prof. Tesler Principal Components Analysis Math 283 / Fall 2018 1 / 40
Covariance Let X and Y be random variables, possibly dependent. Recall that the covariance of X and Y is defined as � � Cov ( X , Y ) = E ( X − µ X )( Y − µ Y ) and that an alternate formula is Cov ( X , Y ) = E ( XY ) − E ( X ) E ( Y ) Previously we used Var ( X + Y ) = Var ( X ) + Var ( Y ) + 2 Cov ( X , Y ) and Var ( X 1 + X 2 + · · · + X n ) = Var ( X 1 ) + · · · + Var ( X n ) Prof. Tesler Principal Components Analysis Math 283 / Fall 2018 2 / 40
Covariance properties Covariance properties Cov ( X , X ) = Var ( X ) Cov ( X , Y ) = Cov ( Y , X ) Cov ( aX + b , cY + d ) = ac Cov ( X , Y ) Sign of covariance Cov ( X , Y ) = E (( X − µ X )( Y − µ Y )) When Cov ( X , Y ) is positive: there is a tendency to have X > µ X when Y > µ Y and vice-versa, and X < µ X when Y < µ Y and vice-versa. When Cov ( X , Y ) is negative: there is a tendency to have X > µ X when Y < µ Y and vice-versa, and X < µ X when Y > µ Y and vice-versa. When Cov ( X , Y ) = 0 : a) X and Y might be independent, but it’s not guaranteed. b) Var ( X + Y ) = Var ( X ) + Var ( Y ) Prof. Tesler Principal Components Analysis Math 283 / Fall 2018 3 / 40
Sample variance Variance of a random variable: σ 2 = Var ( X ) = E (( X − µ X ) 2 ) = E ( X 2 ) − ( E ( X )) 2 Sample variance from data x 1 , . . . , x n : � n � n 1 1 n � � s 2 = var ( x ) = x ) 2 = x i 2 x 2 ( x i − ¯ − n − 1 ¯ n − 1 n − 1 i = 1 i = 1 Vector formula: � � Centered data: M = x 1 − ¯ x 2 − ¯ x n − ¯ x x x · · · n − 1 = M M ′ s 2 = M · M n − 1 Prof. Tesler Principal Components Analysis Math 283 / Fall 2018 4 / 40
Sample covariance Covariance between random variables X , Y : σ XY = Cov ( X , Y ) = E (( X − µ X )( Y − µ Y )) = E ( XY ) − E ( X ) E ( Y ) Sample covariance from data ( x 1 , y 1 ) , . . . , ( x n , y n ) : � n � n 1 1 n � � s XY = cov ( x , y ) = ( x i − ¯ x )( y i − ¯ y ) = − n − 1 ¯ x ¯ x i y i y n − 1 n − 1 i = 1 i = 1 Vector formula: � � = x 1 − ¯ x 2 − ¯ x n − ¯ M X x x x · · · � � = y 1 − ¯ y 2 − ¯ y n − ¯ M Y y y y · · · = M X M ′ s XY = M X · M Y Y n − 1 n − 1 Prof. Tesler Principal Components Analysis Math 283 / Fall 2018 5 / 40
Covariance matrix For problems with many simultaneous random variables, put them into vectors: � R � T � � X = Y = U S V and then form a covariance matrix: � Cov ( R , T ) � Cov ( R , U ) Cov ( R , V ) Cov ( � X , � Y ) = Cov ( S , T ) Cov ( S , U ) Cov ( S , V ) In matrix/vector notation, � � Cov ( � X , � ( � X − E ( � ( � Y − E ( � Y )) ′ Y ) = E X )) � �������� �� �������� � � ���������� �� ���������� � � ����������� �� ����������� � 2 × 3 2 × 1 ( 3 × 1 ) ′ = 1 × 3 Prof. Tesler Principal Components Analysis Math 283 / Fall 2018 6 / 40
Covariance matrix (a.k.a. Variance-Covariance matrix) Often there’s one vector with all the variables: R � X = S T Cov ( � Cov ( � X , � X ) = X ) � X )) ′ � ( � X − E ( � X )) ( � X − E ( � = E Cov ( R , R ) Cov ( R , S ) Cov ( R , T ) = Cov ( S , R ) Cov ( S , S ) Cov ( S , T ) Cov ( T , R ) Cov ( T , S ) Cov ( T , T ) Var ( R ) Cov ( R , S ) Cov ( R , T ) = Cov ( R , S ) Var ( S ) Cov ( S , T ) Cov ( R , T ) Cov ( S , T ) Var ( T ) This matrix is symmetric (it equals its own transpose). The diagonal entries are ordinary variances. Prof. Tesler Principal Components Analysis Math 283 / Fall 2018 7 / 40
Covariance matrix properties Cov ( � X , � Cov ( � Y , � X ) ′ Y ) = Cov ( A � X + � B , � A Cov ( � X , � Y ) = Y ) Cov ( � X , C � Y + � Cov ( � X , � Y ) C ′ D ) = Cov ( A � X + � A Cov ( � B ) = X ) A ′ Cov ( � X 1 + � X 2 , � Cov ( � X 1 , � Y ) + Cov ( � X 2 , � Y ) = Y ) Cov ( � X , � Y 1 + � Cov ( � X , � Y 1 ) + Cov ( � X , � Y 2 ) = Y 2 ) A , C are constant matrices, � B , � D are constant vectors, and all dimensions must be correct for matrix arithmetic. Prof. Tesler Principal Components Analysis Math 283 / Fall 2018 8 / 40
Example (2D, but works for higher dimensions too) Data ( x 1 , y 1 ) , . . . , ( x 100 , y 100 ) : � x 1 · · · x 100 � � 3 . 0858 � 0 . 8806 9 . 8850 · · · 4 . 4106 M 0 = = 12 . 8562 10 . 7804 8 . 7504 · · · 13 . 5627 y 1 · · · y 100 (ri+inal data 20 1' 16 14 12 10 ' 6 4 2 0 ! 5 0 5 10 15 Prof. Tesler Principal Components Analysis Math 283 / Fall 2018 9 / 40
Centered data (ri+inal data Centered data 20 10 1' 8 16 6 14 4 12 2 10 0 ' ! 2 6 ! 4 4 ! 6 2 ! 8 0 ! 5 0 5 10 15 ! 10 ! 5 0 5 10 Prof. Tesler Principal Components Analysis Math 283 / Fall 2018 10 / 40
Computing sample covariance matrix Original data: 100 ( x , y ) points in a 2 × 100 matrix M 0 : � x 1 · · · x 100 � � 3 . 0858 � 0 . 8806 9 . 8850 · · · 4 . 4106 M 0 = = 12 . 8562 10 . 7804 8 . 7504 · · · 13 . 5627 y 1 · · · y 100 Centered data: subtract ¯ x from x ’s and ¯ y from y ’s to get M ; here x = 5 , ¯ ¯ y = 10 : � − 1 . 9142 � − 4 . 1194 4 . 8850 − 0 . 5894 · · · M = 2 . 8562 0 . 7804 − 1 . 2496 3 . 5627 · · · Sample covariance: � 31 . 9702 � M M ′ − 16 . 5683 = 100 − 1 = C − 16 . 5683 13 . 0018 � � � s XX � 2 s X s XY s XY = = 2 s YX s YY s XY s Y Prof. Tesler Principal Components Analysis Math 283 / Fall 2018 11 / 40
Orthonormal matrix Recall that for vectors � v , � w , we have � w = | � v | | � w | cos ( θ ) , v · � where θ is the angle between the vectors. Orthogonal means perpendicular. � v and � w are orthogonal when the angle between them is θ = 90 ◦ = π 2 radians. So cos ( θ ) = 0 and � w = 0 . v · � Vectors � v 1 , . . . , � v n are orthonormal when v j = 0 for i � j (different vectors are orthogonal) � v i · � v i = 1 for all i (each vector has length 1; they are all unit vectors) � v i · � � if i � j 0 In short: � v i · � v j = δ ij = if i = j . 1 , ˆ Example: ˆ ı , ˆ k (3D unit vectors along x , y , z axes) are orthonormal. These can be rotated into other orientations to give new “axes” in other directions; that will be our focus. Prof. Tesler Principal Components Analysis Math 283 / Fall 2018 12 / 40
Orthonormal matrix Form an n × n matrix of orthonormal vectors � � V = � v 1 | · · · | � v n by loading n -dimensional column vectors into the columns of V . Transpose it to convert the vectors to row vectors: v ′ � 1 v ′ � V ′ = 2 . . . v ′ � n ( V ′ V ) ij is the i th row of V ′ dotted with the j th column of V : 1 0 0 · · · 0 1 0 · · · . . . ... ( V ′ V ) ij = � V ′ V = v j = δ ij v i · � . . . . . . 0 0 1 · · · Thus, V ′ V = I ( n × n identity matrix), so V ′ = V − 1 . An n × n matrix V is orthonormal when V ′ V = I (or equivalently, VV ′ = I ), where I is the n × n identity matrix. Prof. Tesler Principal Components Analysis Math 283 / Fall 2018 13 / 40
Diagonalizing the sample covariance matrix C = V ′ C V D � 31 . 9702 � � − 0 . 8651 − 0 . 5016 �� 41 . 5768 �� − 0 . 8651 � − 16 . 5683 0 . 5016 0 = − 16 . 5683 13 . 0018 0 . 5016 − 0 . 8651 3 . 3952 − 0 . 5016 − 0 . 8651 0 C is a real-valued symmetric matrix. It can be shown that: C can be diagonalized (recall not all matrices are diagonalizable); in the special form C = VDV ′ with V orthonormal, so V − 1 = V ′ ; all eigenvalues are real numbers � 0 . So we can put them on the diagonal of D in decreasing order: λ 1 � λ 2 � · · · � 0 . Prof. Tesler Principal Components Analysis Math 283 / Fall 2018 14 / 40
Diagonalizing the sample covariance matrix C Since C is symmetric, if � v is a right eigenvector with eigenvalue λ , v ′ is a left eigenvector with eigenvalue λ , and vice-versa: then � v ′ = ( C � v ) ′ = � v ′ C ′ = � v ′ C v = λ � so C � λ � v Diagonalization C = VDV − 1 loads right and left eigenvectors into V and V − 1 . Here those eigenvectors are transposes of each other, leading to the special form C = VDV ′ . Also, all eigenvalues are � 0 (“ C is positive semidefinite ”): For all vectors � w , w | 2 w ′ MM ′ � = | M ′ � w = � w w ′ C � � n − 1 � 0 n − 1 w | 2 . w ′ C � w ′ � Eigenvector equation C � w = λ � w gives � w = λ � w = λ | � w | 2 = � w ′ C � So λ | � w � 0 , giving λ � 0 . Prof. Tesler Principal Components Analysis Math 283 / Fall 2018 15 / 40
Recommend
More recommend