. . . . . . . . . . . . . . Introduction Theory Applications Principal Component Analysis Proseminar Data Mining Tobias Holl 1 1 Technische Universität München 2017-06-09 Tobias Holl Technische Universität München . . . . . . . . . . . . . . . . . . . . . . . . . . Principal Component Analysis
. . . . . . . . . . . . . . . . Introduction Theory Applications The Problem Tobias Holl Technische Universität München . . . . . . . . . . . . . . . . . . . . . . . . Principal Component Analysis
. . . . . . . . . . . . . . . Introduction Theory Applications The Problem Data. Lots of data. Tobias Holl Technische Universität München . . . . . . . . . . . . . . . . . . . . . . . . . Principal Component Analysis
. . . . . . . . . . . . . . . . Introduction Theory Applications An Example Tobias Holl Technische Universität München . . . . . . . . . . . . . . . . . . . . . . . . Principal Component Analysis Reference Energy Disaggregation Data Set [1] ▶ Power usage over >100 days for >200 devices ▶ Measured every 2s ▶ Over 500GB of compressed data
. . . . . . . . . . . . . . . . Introduction Theory Applications An Example Tobias Holl Technische Universität München . . . . . . . . . . . . . . . . . . . . . . . . Principal Component Analysis Iris Data Set [2] ▶ 150 fmowers of 3 difgerent species ▶ Petal and sepal widths and lengths
. . . . . . . . . . . . . . . Introduction Theory Applications An Example . Tobias Holl Technische Universität München . Principal Component Analysis . . . . . . . . . . . . . . . . . . . . . . . 2.0 2.5 3.0 3.5 4.0 0.5 1.0 1.5 2.0 2.5 7.5 6.5 Sepal length 5.5 4.5 4.0 3.5 Sepal width 3.0 2.5 2.0 7 6 5 Petal length 4 3 2 1 2.5 2.0 1.5 Petal width 1.0 0.5 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 1 2 3 4 5 6 7
. . . . . . . . . . . . . . . . Introduction Theory Applications An Example Tobias Holl Technische Universität München . . . . . . . . . . . . . Principal Component Analysis . . . . . . . . . . . 0.5 1.0 1.5 2.0 2.5 7 6 5 Petal length 4 3 2 1 2.5 2.0 1.5 Petal width 1.0 0.5 1 2 3 4 5 6 7
. . . . . . . . . . . . . . . . Introduction Theory Applications An Example Tobias Holl Technische Universität München . . . . . . . . . . . . . . . . . . . Principal Component Analysis . . . . . 2.5 2.0 1.5 Petal width 1.0 0.5 1 2 3 4 5 6 7 Petal length
. . . . . . . . . . . . . . . Introduction Theory Applications An Example Clear correlation Tobias Holl Technische Universität München . . . . . . . . . . . . . . . . . . . . Principal Component Analysis . . . . . 2.5 2.0 1.5 Petal width 1.0 0.5 1 2 3 4 5 6 7 Petal length
. . . . . . . . . . . . . . . Introduction Theory Applications An Example Unnecessary redundancy Tobias Holl Technische Universität München . . . . . . . . . . . . . . . . . . . . Principal Component Analysis . . . . . 2.5 2.0 1.5 Petal width 1.0 0.5 1 2 3 4 5 6 7 Petal length
. x 1 n . . . . . Introduction Theory Applications Data Matrices Variable 1 Variable n Measurement 1 x 11 . . . . . . . ... . . . Measurement m x m 1 x mn Tobias Holl Technische Universität München . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Principal Component Analysis ··· · · · X = ∈ R m × n · · ·
. . . . . . Introduction Theory Applications Data Matrices Variable 1 Variable n Measurement 1 x 11 x 1 n . . . . . . ... . . . Measurement m x m 1 x mn Assume that X is centered around 0 . Tobias Holl Technische Universität München . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Principal Component Analysis ··· · · · X = ∈ R m × n · · ·
. . . . . . . . . . . . . . . Introduction Theory Applications Some Statistics 1 a x b Tobias Holl Technische Universität München . . . . . . . . . . . . . . . . . . . . . . . . . Principal Component Analysis cov ( x a , x b ) = m − 1 x T
. . . . . . . . . . . . . . . Introduction Theory Applications Some Statistics 1 a x b Tobias Holl Technische Universität München . . . . . . . . . . . . . . . . . . . . . . . . . Principal Component Analysis cov ( x a , x b ) = m − 1 x T cov ( x a , x b ) is the covariance of x a and x b .
. . . . . . . . . . . . . . Introduction Theory Applications Some Statistics 1 a x b Covariance describes the strength of the correlation. Tobias Holl Technische Universität München . . . . . . . . . . . . . . . . . . . . . . . . . . Principal Component Analysis cov ( x a , x b ) = m − 1 x T cov ( x a , x b ) is the covariance of x a and x b .
. Introduction . . . . . . . . . . Theory . Applications Some Statistics . . . ... . . . Tobias Holl Technische Universität München . . . . . . . . . . . . . . . Principal Component Analysis . . . . . . . . . . . . . cov ( x 1 , x 1 ) · · · cov ( x 1 , x n ) cov ( X ) = cov ( x 1 , x n ) cov ( x n , x n ) · · ·
. Theory . . . . . . . . . . Introduction Applications . Some Statistics . . . ... . . . 1 Tobias Holl Technische Universität München . . . . . . . . . . . . . . . Principal Component Analysis . . . . . . . . . . . . . cov ( x 1 , x 1 ) cov ( x 1 , x n ) · · · cov ( X ) = cov ( x 1 , x n ) · · · cov ( x n , x n ) cov ( v , v ) = m − 1 v T v = var ( v )
. Introduction . . . . . . . . . . Theory . Applications Some Statistics . . . ... . . . Tobias Holl Technische Universität München . . . . . . . . . . . . . . . Principal Component Analysis . . . . . . . . . . . . . var ( x 1 ) · · · cov ( x 1 , x n ) cov ( X ) = cov ( x 1 , x n ) var ( x n ) · · ·
. Theory . . . . . . . . . . Introduction Applications . Some Statistics . . . ... . . . 1 Tobias Holl Technische Universität München . . . . . . . . . . . . . . . Principal Component Analysis . . . . . . . . . . . . . var ( x 1 ) · · · cov ( x 1 , x n ) cov ( X ) = = m − 1 X T X cov ( x 1 , x n ) var ( x n ) · · ·
. . . . . . . . . . . . . . . Introduction Theory Applications What We Really Want Eliminate unnecessary redundancies Tobias Holl Technische Universität München . . . . . . . . . . . . . . . . . . . . . . . . . Principal Component Analysis
. . . . . . . . . . . . . . . Introduction Theory Applications What We Really Want Transform X into Y so that Tobias Holl Technische Universität München . . . . . . . . . . . . . . . . . . . . . . . . . Principal Component Analysis cov ( y a , y b ) = 0 ∀ a ̸ = b
. . . . . . . . . . . . . . . . Introduction Theory Applications What We Really Want Tobias Holl Technische Universität München . . . . . . . . . . . . . . . . . . . . . . . . Principal Component Analysis Transform X linearly into Y = XP so that cov ( y a , y b ) = 0 ∀ a ̸ = b
. . . . . . . . . . . . . . Introduction Theory Applications What We Really Want 0 ... 0 Tobias Holl Technische Universität München . . . . . . . . . . . . . . . . . . . . . . Principal Component Analysis . . . . Transform X linearly into Y = XP so that var ( y 1 ) cov ( Y ) = var ( y n )
. . . . . . . . . . . . . . Introduction Theory Applications Diagonalizing Matrices Theorem Every symmetric real matrix A has an eigenvalue decomposition eigenvalues of A , and V is orthonormal. The rows of V are the eigenvectors corresponding to the matching entry in D . Tobias Holl Technische Universität München . . . . . . . . . . . . . . . . . . . . . . . . . . Principal Component Analysis A = VDV T , where D is a diagonal matrix composed of the D = V T AV follows trivially.
. . . . . . . . . . . . . . Introduction Theory Applications Diagonalizing Matrices Theorem Every symmetric real matrix A has an eigenvalue decomposition eigenvalues of A , and V is orthonormal. The rows of V are the eigenvectors corresponding to the matching entry in D . Tobias Holl Technische Universität München . . . . . . . . . . . . . . . . . . . . . . . . . . Principal Component Analysis A = VDV T , where D is a diagonal matrix composed of the D = V T AV follows trivially. cov ( Y ) = V T cov ( X ) V
Recommend
More recommend