scientific computing
play

Scientific Computing Maastricht Science Program Week 5 Frans - PowerPoint PPT Presentation

Scientific Computing Maastricht Science Program Week 5 Frans Oliehoek <frans.oliehoek@maastrichtuniversity.nl> Announcements I will be more strict! Requirements updated... YOU are responsible that the submission satisfies the


  1. Scientific Computing Maastricht Science Program Week 5 Frans Oliehoek <frans.oliehoek@maastrichtuniversity.nl>

  2. Announcements  I will be more strict!  Requirements updated...  YOU are responsible that the submission satisfies the requirements!!!  I will not email you until the rest has their mark.

  3. Recap Last Two Week  Supervised Learning  find f that maps {x 1 (j) ,...,x D (j) } → y (j)  Interpolation  f goes through the data points  linear regression  lossy fit, minimizes 'vertical' SSE  Unsupervised Learning x 1  We just have data points {x 1 (j) ,...,x D (j) }  PCA  minimizes orthogonal projection u =( u 1, u 2 ) x 2

  4. Recap: Clustering  Clustering or Cluster Analysis has many applications  Understanding  Astronomy, Biology, etc.  Data (pre)processing  summarization of data set  compression  Are there questions about k-means clustering?

  5. This Lecture  Last week: unlabeled data (also 'unsupervised learning')  data: just x  Clustering  Principle Components analysis (PCA) – what?  This week  Principle Components analysis (PCA) – how?  Numerical differentiation and integration.

  6. Part 1: Principal Component Analysis ● Recap ● How to do it?

  7. PCA – Intuition  How would you summarize this data using 1 dimension? (what variable contains the most information?) x 2 Very important idea The most information is contained by the variable with the largest spread. ● i.e., highest variance (Information Theory) x 1

  8. PCA – Intuition  How would you summarize this data using 1 dimension? (what variable contains the most information?) so if we have to chose x 2 between x 1 and x 2 → remember x 2 Very important idea Transform of k -th point: ( k ) , x 2 ( k ) )→( z 1 ( k ) ) ( x 1 The most information is contained by the variable with where the largest spread. ( k ) = x 2 ( k ) ● i.e., highest variance z 1 (Information Theory) x 1

  9. PCA – Intuition  How would you summarize this data using 1 dimension? Transform of k -th point: x 2 ( k ) , x 2 ( k ) )→( z 1 ( k ) ) ( x 1 where z 1 is the orthogonal scalar projection on (unit vector) u (1) : ( k ) = u 1 ( 1 ) x 1 ( k ) + u 2 ( 1 ) x 2 ( k ) =( u ( 1 ) , x ( k ) ) z 1 u x 1

  10. More Principle Components  u (2) is the direction with most 'remaining' variance  orthogonal to u (1) ! x 2 In general ● If the data is D-dimensional ● We can find D directions ( 1 ) , ... ,u ( D ) u ● Each direction itself is a D-vector: ( i ) =( u 1, ( i ) ... ,u D ( i ) ) u x 1 ● Each direction is orthogonal to the others: ( i ) ,u ( j ) )= 0 ( u ● The first direction is has most variance ( D ) ● The least variance is in direction u

  11. PCA – Goals  All directions of high variance might be useful in itself  Analysis of data  In the lab you will analyze the ECG signal of a patient with a heart disease.

  12. PCA – Goals  All directions of high variance might be useful in itself  But not for dimension reduction...  Given X (N data points of D variables) → Convert to Z (N data points of d variables) ( 0 ) , x 2 ( 0 ) , ... , x D ( 0 ) )→( z 1 ( 0 ) , z 2 ( 0 ) , ... , z d ( 0 ) ) ( x 1 ( 1 ) , x 2 ( 1 ) , ... , x D ( 1 ) )→( z 1 ( 1 ) , z 2 ( 1 ) , ... , z d ( 1 ) ) ( x 1 ... ( n ) , x 2 ( n ) , ... , x D ( n ) )→( z 1 ( n ) , z 2 ( n ) , ... , z d ( n ) ) ( x 1 ( 0 ) , z i ( 1 ) , ... , z i ( n ) ) ( z i The vector is called the i -th principal component (of the data set)

  13. PCA – Dimension Reduction  Approach  Step 1: ( 0 ) , x 2 ( 0 ) , ... , x D ( 0 ) )→( z 1 ( 0 ) , z 2 ( 0 ) , ... , z D ( 0 ) ) ( x 1 ( 1 ) , x 2 ( 1 ) , ... , x D ( 1 ) )→( z 1 ( 1 ) , z 2 ( 1 ) , ... , z D ( 1 ) ) ( x 1  find all directions (and principal components) ... ( n ) , x 2 ( n ) , ... , x D ( n ) )→( z 1 ( n ) , z 2 ( n ) , ... , z D ( n ) ) ( x 1  Step 2: …?

  14. PCA – Dimension Reduction  Approach  Step 1: ( 0 ) , x 2 ( 0 ) , ... , x D ( 0 ) )→( z 1 ( 0 ) , z 2 ( 0 ) , ... , z D ( 0 ) ) ( x 1 ( 1 ) , x 2 ( 1 ) , ... , x D ( 1 ) )→( z 1 ( 1 ) , z 2 ( 1 ) , ... , z D ( 1 ) ) ( x 1  find all directions (and principal components) ... ( n ) , x 2 ( n ) , ... , x D ( n ) )→( z 1 ( n ) , z 2 ( n ) , ... , z D ( n ) ) ( x 1  Step 2: first d<D PCs contain  keep only the directions with most information! high variance. ( 0 ) , x 2 ( 0 ) , ... , x D ( 0 ) )→( z 1 ( 0 ) , z 2 ( 0 ) , ... , z d ( 0 ) ) ( x 1 → the principal components ( 1 ) , x 2 ( 1 ) , ... , x D ( 1 ) )→( z 1 ( 1 ) , z 2 ( 1 ) , ... , z d ( 1 ) ) with much information ( x 1 ... ( n ) , x 2 ( n ) , ... , x D ( n ) )→( z 1 ( n ) , z 2 ( n ) , ... , z d ( n ) ) ( x 1

  15. PCA – Dimension Reduction  Approach  Step 1: ( 0 ) , x 2 ( 0 ) , ... , x D ( 0 ) )→( z 1 ( 0 ) , z 2 ( 0 ) , ... , z D ( 0 ) ) ( x 1 ( 1 ) , x 2 ( 1 ) , ... , x D ( 1 ) )→( z 1 ( 1 ) , z 2 ( 1 ) , ... , z D ( 1 ) ) ( x 1  find all directions (and principal components) ... ( n ) , x 2 ( n ) , ... , x D ( n ) )→( z 1 ( n ) , z 2 ( n ) , ... , z D ( n ) ) ( x 1  Step 2: first d<D PCs contain  keep only the directions with most information! high variance. ( 0 ) , x 2 ( 0 ) , ... , x D ( 0 ) )→( z 1 ( 0 ) , z 2 ( 0 ) , ... , z d ( 0 ) ) ( x 1 → the principal components ( 1 ) , x 2 ( 1 ) , ... , x D ( 1 ) )→( z 1 ( 1 ) , z 2 ( 1 ) , ... , z d ( 1 ) ) with much information ( x 1 ... ( n ) , x 2 ( n ) , ... , x D ( n ) )→( z 1 ( n ) , z 2 ( n ) , ... , z d ( n ) ) ( x 1

  16. PCA – More Concrete  PCA  finding all the directions, and  principle components  Data compression using PCA  computing compressed representation  computing reconstruction

  17. PCA – More Concrete still to be shown  PCA (using eigen decomposition of cov. matrix)  finding all the directions, and  principle components Easy! for k -th point: ( k ) =( u ( j ) ,x ( k ) ) z j  Data compression using PCA  computing compressed representation  computing reconstruction Easy! For k -th point just keep ( k ) , ... , z d ( k ) ) ( z 1 still to be shown (we show that data is a linear combination of the PCs)

  18. PCA – More Concrete still to be shown  PCA (using eigen decomposition of cov. matrix)  finding all the directions, and  principle components Easy! for k -th point: ( k ) =( u ( j ) ,x ( k ) ) z j  Data compression using PCA  computing compressed representation  computing reconstruction Easy! For k -th point just keep ( k ) , ... , z d ( k ) ) ( z 1 still to be shown (we show that data is a linear combination of the PCs)

  19. Computing the directions U Note: X is now D x N (before N x D) Algorithm  X is the DxN data matrix 1)Preprocessing: ● scale the features ● make X zero mean 2)Compute the data covariance matrix 3) Perform eigen decomposition  directions u i are the eigenvectors of C  variance of u i is the corresponding eigenvalue

  20. Computing the directions U Algorithm  X is the DxN data matrix ( k ) 2 x i ( k ) = x i 1)Preprocessing: ( l ) − min m x i ( l ) max l x i ● scale the features ● make X zero mean 2)Compute the data covariance matrix 3) Perform eigen decomposition  directions u i are the eigenvectors of C  variance of u i is the corresponding eigenvalue

  21. Computing the directions U Algorithm  X is the DxN data matrix μ N − 1 ● Compute μ i = 1 N ∑ ( k ) x i (the mean data point) 1)Preprocessing: k = 1 ● scale the features ● subtract the mean ( k ) = x ( k ) −μ from each point x ● make X zero mean 2)Compute the data covariance matrix 3) Perform eigen decomposition  directions u i are the eigenvectors of C  variance of u i is the corresponding eigenvalue

  22. Computing the directions U Algorithm  X is the DxN data matrix 1)Preprocessing: ● scale the features ● Data covariance matrix ● make X zero mean C = 1 T N XX 2)Compute the data covariance matrix 3) Perform eigen decomposition  directions u i are the eigenvectors of C  variance of u i is the corresponding eigenvalue

  23. Computing the directions U Algorithm  X is the DxN data matrix ● A square matrix has eigenvectors: 1)Preprocessing: map to a multiple of themselves ● scale the features C x =λ x ● make X zero mean 2)Compute the eigenvector data covariance matrix (scalar) eigenvalue 3) Perform eigen decomposition  directions u i are the eigenvectors of C  variance of u i is the corresponding eigenvalue

  24. Computing the directions U Algorithm  X is the DxN data matrix ● A square matrix has eigenvectors: 1)Preprocessing: map to a multiple of themselves ● scale the features C x =λ x [eigenvectors, eigenvals] = eig(C) ● make X zero mean % 'eig' delivers eigenvectors with % the wrong order 2)Compute the % so we flip the matrix U = fliplr(eigenvectors) eigenvector data covariance matrix (scalar) eigenvalue % U(i, :) now is the i-th direction 3) Perform eigen decomposition  directions u i are the eigenvectors of C  variance of u i is the corresponding eigenvalue

  25. PCA – More Concrete still to be shown  PCA (using eigen decomposition of cov. matrix)  finding all the directions, and  principle components Easy! for k -th point: ( k ) =( u ( j ) ,x ( k ) ) z j  Data compression using PCA  computing compressed representation  computing reconstruction Easy! For k -th point just keep ( k ) , ... , z d ( k ) ) ( z 1 still to be shown (we show that data is a linear combination of the PCs)

Recommend


More recommend