learning from data lecture 27 learning aides
play

Learning From Data Lecture 27 Learning Aides Input Preprocessing - PowerPoint PPT Presentation

Learning From Data Lecture 27 Learning Aides Input Preprocessing Dimensionality Reduction and Feature Selection Principal Components Analysis (PCA) Hints, Data Cleaning, Validation, . . . M. Magdon-Ismail CSCI 4100/6100 Learning Aides


  1. Learning From Data Lecture 27 Learning Aides Input Preprocessing Dimensionality Reduction and Feature Selection Principal Components Analysis (PCA) Hints, Data Cleaning, Validation, . . . M. Magdon-Ismail CSCI 4100/6100

  2. Learning Aides Additional tools that can be applied to all techniques Preprocess data to account for arbitrary choices during data collection (input normalization) Remove irrelevant dimensions that can mislead learning (PCA) Incorporate known properties of the target function (hints and invariances) Remove detrimental data (deterministic and stochastic noise) Better ways to validate (estimate E out ) for model selection M Learning Aides : 2 /16 � A c L Creator: Malik Magdon-Ismail Nearest neighbor − →

  3. Nearest Neighbor Mr. Good and Mr. Bad were both given credit cards by the Bank of Learning (BoL). Mr. Good Mr. Bad (Age in years, Income in $ × 1 , 000) (47,35) (22,40) Mr. Unknown who has “coordinates” (21yrs,$36K) applies for credit. Should the BoL give him credit, according to the nearest neighbor algorithm? What if, income is measured in dollars instead of “K” (thousands of dollars)? 50 Income (K) 25 20 45 Age (yrs) M Learning Aides : 3 /16 � A c L Creator: Malik Magdon-Ismail Nearest neighbor uses Euclidean distance − →

  4. Nearest Neighbor Uses Euclidean Distance Mr. Good and Mr. Bad were both given credit cards by the Bank of Learning (BoL). Mr. Good Mr. Bad (Age in years, Income in $) (47,35000) (22,40000) Mr. Unknown who has “coordinates” (21yrs,$36000) applies for credit. Should the BoL give him credit, according to the nearest neighbor algorithm? What if, income is measured in dollars instead of “K” (thousands of dollars)? 40000 Income ($) 35000 -3500 3500 Age (yrs) M Learning Aides : 4 /16 � A c L Creator: Malik Magdon-Ismail Algorithms treat dimensions uniformly − →

  5. Uniform Treatment of Dimensions Most learning algorithms treat each dimension equally | x − x ′ | Nearear neighbor: d ( x , x ′ ) = | | Weight Decay: Ω( w ) = λ w t w SVM: margin defined using Euclidean distance RBF: bump function decays with Euclidean distance Input Preprocessing Unless you want to emphasize certain dimensions, the data should be preprocessed to present each dimension on an equal footing M Learning Aides : 5 /16 � A c L Creator: Malik Magdon-Ismail Input preprocessing is a data transform − →

  6. Input Preprocessing is a Data Transform   x t 1   x t   2 X = x n �→ z n . .   . x t n g ( x ) = ˜ g (Φ( x )) . Raw { x n } have (for example) arbitrary scalings in each dimension, and { z n } will not. M Learning Aides : 6 /16 � A c L Creator: Malik Magdon-Ismail Centering − →

  7. Centering z 2 x 2 centered raw data z 1 x 1 z n = Σ − 1 z n = x n − ¯ x z n = D x n 2 x n N � D ii = 1 Σ = 1 n = 1 x n x t N X t X σ i N n =1 � Σ = 1 ¯ z = 0 σ i = 1 ˜ N Z t Z = I M Learning Aides : 7 /16 � A c L Creator: Malik Magdon-Ismail Normalizing − →

  8. Normalizing z 2 x 2 z 2 centered raw data normalized z 1 x 1 z 1 z n = Σ − 1 z n = x n − ¯ x z n = D x n 2 x n N � D ii = 1 Σ = 1 n = 1 x n x t N X t X σ i N n =1 � Σ = 1 ¯ z = 0 σ i = 1 ˜ N Z t Z = I M Learning Aides : 8 /16 � A c L Creator: Malik Magdon-Ismail Whitening − →

  9. Whitening z 2 x 2 z 2 z 2 centered raw data normalized whitened z 1 x 1 z 1 z 1 z n = Σ − 1 z n = x n − ¯ x z n = D x n 2 x n N � D ii = 1 Σ = 1 n = 1 x n x t N X t X σ i N n =1 � Σ = 1 ¯ z = 0 σ i = 1 ˜ N Z t Z = I M Learning Aides : 9 /16 � A c L Creator: Malik Magdon-Ismail Only use training data − →

  10. Only Use Training Data For Preprocessing WARNING! Transforming data into a more convenient format has a hidden trap which leads to data snooping. When using a test set, determine the input transformation from training data only . Rule: lock away the test data until you have your final hypothesis. D train D − → − → g ( x ) = ˜ g (Φ( x )) input preprocessing 30 z = Φ( x ) snooping Cumulative Profit % − − − − 20 − − → − − 10 − − − 0 → − − − − − − − − − − − − − − − − − − − − − − − → -10 no snooping D test 0 100 200 300 400 500 Day E test M Learning Aides : 10 /16 � A c L Creator: Malik Magdon-Ismail PCA − →

  11. Principal Components Analysis Original Data Rotated Data − − − − − − → Rotate the data so that it is easy to Identify the dominant directions (information) Throw away the smaller dimensions (noise) � x 1 � � z 1 � � z 1 � → → x 2 z 2 M Learning Aides : 11 /16 � A c L Creator: Malik Magdon-Ismail Projecting the data − →

  12. Projecting the Data to Maximize Variance (Always center the data first) v Original Data z = x t n v Find v to maximize the variance of z M Learning Aides : 12 /16 � A c L Creator: Malik Magdon-Ismail Maximizing the variance − →

  13. Maximizing the Variance N � var [ z ] = 1 z 2 n N v n =1 Original Data N � = 1 v t x n x t n v N n =1 � � N � 1 = v t x n x t v n N n =1 = v t Σ v . Choose v as v 1 , the top eigenvector of Σ — the top principal component (PCA) M Learning Aides : 13 /16 � A c L Creator: Malik Magdon-Ismail − →

  14. The Principal Components z 1 = x t v 1 z 2 = x t v 2 z 3 = x t v 3 . . . v 1 , v 2 , · · · , v d are the eigenvectors of Σ with eigenvalues λ 1 ≥ λ 2 ≥ · · · ≥ λ d v Original Data Theorem [Eckart-Young]. These directions give best re- construction of data; also capture maximum variance. M Learning Aides : 14 /16 � A c L Creator: Malik Magdon-Ismail PCA features for digits data − →

  15. PCA Features for Digits Data 100 80 % Reconstruction Error 60 z 2 40 20 1 0 0 50 100 150 200 not 1 k z 1 Principal components are automated Captures dominant directions of the data. May not capture dominant dimensions for f . M Learning Aides : 15 /16 � A c L Creator: Malik Magdon-Ismail Other Learning Aides − →

  16. Other Learning Aides 1. Nonlinear dimension reduction: x 2 x 2 x 2 x 1 x 1 x 1 2. Hints (invariances and prior information): rotational invariance, monotonicity, symmetry, . . . . 3. Removing noisy data: symmetry symmetry symmetry intensity intensity intensity 4. Advanced validation techniques: Rademacher and Permutation penalties More efficient than CV, more convenient and accurate than VC. M Learning Aides : 16 /16 � A c L Creator: Malik Magdon-Ismail

Recommend


More recommend