t 61 3050 machine learning basic principles
play

T-61.3050 Machine Learning: Basic Principles Clustering Kai Puolam - PowerPoint PPT Presentation

Dimensionality Reduction Clustering T-61.3050 Machine Learning: Basic Principles Clustering Kai Puolam aki Laboratory of Computer and Information Science (CIS) Department of Computer Science and Engineering Helsinki University of


  1. Dimensionality Reduction Clustering T-61.3050 Machine Learning: Basic Principles Clustering Kai Puolam¨ aki Laboratory of Computer and Information Science (CIS) Department of Computer Science and Engineering Helsinki University of Technology (TKK) Autumn 2007 AB Kai Puolam¨ aki T-61.3050

  2. Dimensionality Reduction Clustering Remaining Lectures 6 Nov: Dimensionality Reduction & Clustering (Aplaydin Ch 6&7) 13 Nov: Clustering & Algorithms in Data Analysis (PDF chapter) 20 Nov: Assessing Algorithms & Decision Trees (Alpaydin Ch 14&9) 27 Nov: Machine Learning @ Google /TBA (additionally, Google recruitment talk in afternoon in T1 at 16 o’clock, see http://www.cis.hut.fi/googletalk07/ ) 4 Dec: Decision Trees & Linear Discrimination (Alpaydin Ch 10) (7 Dec: last problem session.) 11 Dec: Recap AB The plan is preliminary (may still change) Kai Puolam¨ aki T-61.3050

  3. Dimensionality Reduction Clustering About the Text Book This course has Alpaydin (2004) as a text book. The lecture slides (neither mine nor the ones on the Alpaydin’s site) are not meant to be a replacement for the text book. It is important also to read the book chapters. Library has some reading room copies (they are planning to order some more). If nothing else, you should probably at least copy some key chapters. AB Kai Puolam¨ aki T-61.3050

  4. Dimensionality Reduction Principal Component Analysis (PCA) Clustering Linear Discriminant Analysis (LDA) Outline Dimensionality Reduction 1 Principal Component Analysis (PCA) Linear Discriminant Analysis (LDA) Clustering 2 Introduction K-means Clustering EM Algorithm AB Kai Puolam¨ aki T-61.3050

  5. Dimensionality Reduction Principal Component Analysis (PCA) Clustering Linear Discriminant Analysis (LDA) Principal Component Analysis (PCA) PCA finds low-dimensional linear subspace such that when x is projected there information loss (here defined as variance) is minimized. Finds directions of maximal variance. Projection pursuit: find direction w such that some measure (here variance Var ( w T x )) is maximized. Equivalent to finding eigenvalues and -vectors of covariance or correlation matrix. AB Kai Puolam¨ aki T-61.3050

  6. Dimensionality Reduction Principal Component Analysis (PCA) Clustering Linear Discriminant Analysis (LDA) Principal Component Analysis (PCA) 2� z� 2� x� z� 1� z� 2� x� z� 1� 1� Figure 6.1: Principal components analysis centers the sample and then rotates the axes to line up with the directions of highest variance. If the variance on z 2 is too small, it can be ignored and we have dimensionality reduction from two to one. From: E. Alpaydın. 2004. Introduction to Machine Learning . � The MIT Press. c AB Kai Puolam¨ aki T-61.3050

  7. Dimensionality Reduction Principal Component Analysis (PCA) Clustering Linear Discriminant Analysis (LDA) Principal Component Analysis (PCA) t =1 , x t ∈ R d . More formally: data X = { x t } N Center data: y t = x t − m , where m = � t x t / N . Two options: Use covariance matrix S = � t yy T / N . � Use correlation matrix R , where R ij = S ij / S ii S jj . Diagonalize S (or R ) using Singular Value Decomposition (SVD): C T SC = D , where C is an orthogonal (rotation) matrix satisfying CC T = C T C = 1 and D is a diagonal matrix whose diagonal elements are the eigenvalues λ 1 ≥ . . . ≥ λ d ≥ 0. i th column of C is the i th eigenvector. Project data vectors y t to principal components z t = C T y t (equivalently y t = C z t ). AB Kai Puolam¨ aki T-61.3050

  8. Dimensionality Reduction Principal Component Analysis (PCA) Clustering Linear Discriminant Analysis (LDA) Principal Component Analysis (PCA) Observation: covariance matrix of { z t } N t =1 is a diagonal matrix D whose diagonal elements are the 2� z� 2� x� z� 1� z� 2� variances. X X zz T / N = C T yy T C / N S z = x� z� 1� 1� t t Figure 6.1: Principal components analysis centers X ! C T yy T / N C = C T SC = D , the sample and then rotates the axes to line up with = the directions of highest variance. If the variance on t z 2 is too small, it can be ignored and we have dimensionality reduction from two to one. From: where the diagonal elements of D E. Alpaydın. 2004. Introduction to Machine are the variances D ii = σ 2 Learning . � The MIT Press. c z i . Eigenvalues λ i ⇔ variances σ 2 i . AB Kai Puolam¨ aki T-61.3050

  9. Dimensionality Reduction Principal Component Analysis (PCA) Clustering Linear Discriminant Analysis (LDA) Principal Component Analysis (PCA) Idea: in the PC space ( z space), k first principal components explain the data well enough, where k < d . “Well enough” means here that the reconstruction error is small enough. More formally: Project the data vectors y t into R k using ˆ z t = W T y t , where W ∈ R d × k is a matrix containing the first k columns of C . z t is a representation of y t in k dimensions. (“W < - C[,1:k]”). ˆ z t back to y t space: Project ˆ y t = W ˆ z t = WW T y t ˆ What is the average reconstruction error y t − y t ) T (ˆ y t − y t ) / N ? E = � t (ˆ AB Kai Puolam¨ aki T-61.3050

  10. Dimensionality Reduction Principal Component Analysis (PCA) Clustering Linear Discriminant Analysis (LDA) Principal Component Analysis (PCA) What is the average reconstruction error y t − y t ) T (ˆ y t − y t ) / N ? E = � t (ˆ E = Tr ( E [(ˆ y − y ) (ˆ y − y )]) WW T − 1 WW T − 1 ““ ” h yy T i “ ”” = Tr E “ WW T CDC T WW T ” “ CDC T ” “ ” W T CDC T W = Tr + Tr − 2 Tr d X = λ i , i = k +1 where we have used the fact that S = CDC T = E � yy T � and the cyclic property of the trace, Tr ( AB ) = Tr ( BA ). AB Kai Puolam¨ aki T-61.3050

  11. Dimensionality Reduction Principal Component Analysis (PCA) Clustering Linear Discriminant Analysis (LDA) Principal Component Analysis (PCA) Result: PCA is a linear projection of data from R d into R k such that the average reconstruction error y − y ) T (ˆ � � E = E (ˆ y − y ) is minimized. Proportion of Variance (PoV) Explained: PoV = � k i =1 λ i / � d i =1 λ i . Some rules of thumb to find a good k : PoV ≈ 0 . 9, or PoV curve has an elbow. z t instead of Dimension reduction: it may be sufficient to use ˆ x t to train a classifier etc. ˆ z t using k = 2 (first thing Visualization: plotting the data to ˆ to do with new data). Data compression: instead of storing the full data vectors y t it z t and then reconstruct the may be sufficient to store only ˆ y t = W ˆ z t , if necessary. original data using ˆ AB Kai Puolam¨ aki T-61.3050

  12. Dimensionality Reduction Principal Component Analysis (PCA) Clustering Linear Discriminant Analysis (LDA) Example: Optdigits optdigits data set contains 5620 instances of digitized handwritten digits in range 0–9. Each digit is a R 64 vector: 8 × 8 = 64 pixels, 16 grayscales. 0 4 6 2 AB Kai Puolam¨ aki T-61.3050

  13. Dimensionality Reduction Principal Component Analysis (PCA) Clustering Linear Discriminant Analysis (LDA) Example: Optdigits AB !"#$%&"'()$"*'+)&','-./012 ! 3'4556'73$&)2%#$8)3'$)'90#:83"'!"0&383;'< =:"'97='>&"**'?@ABAC Kai Puolam¨ aki T-61.3050

  14. Dimensionality Reduction Principal Component Analysis (PCA) Clustering Linear Discriminant Analysis (LDA) !"#$%&"'()$"*'+)&','-./012 ! 3'4556'73$&)2%#$8)3'$)'90#:83"'!"0&383;'< =:"'97='>&"**'?@ABAC AB Kai Puolam¨ aki T-61.3050

Recommend


More recommend