Reducing Dimensionality Steven J Zeil Old Dominion Univ. Fall 2010 - PowerPoint PPT Presentation

Feature Selection Feature Extraction Reducing Dimensionality Steven J Zeil Old Dominion Univ. Fall 2010 1

Feature Selection Feature Extraction Outline Feature Selection 1 Feature Extraction 2 Principal Components Analysis (PCA) Factor Analysis (FA) Multidimensional Scaling (MDS) Linear Discriminants Analysis (LDA) 2

Feature Selection Feature Extraction Motivation Reduction in complexity of prediction and training Reduction in cost of data extraction Simpler models – reduced variance Easier to visualize & analyze results, identify outliers, etc. 3

Feature Selection Feature Extraction Basic Approaches Given an input population characterized by d attributes: Feature Selection : find k < d dimensions that give the most information. Discard the other d − k . 4

Feature Selection Feature Extraction Basic Approaches Given an input population characterized by d attributes: Feature Selection : find k < d dimensions that give the most information. Discard the other d − k . subset selection 4

Feature Selection Feature Extraction Basic Approaches Given an input population characterized by d attributes: Feature Selection : find k < d dimensions that give the most information. Discard the other d − k . subset selection Feature Extraction : find k ≤ d dimensions that are linear combinations of the original d 4

Feature Selection Feature Extraction Basic Approaches Given an input population characterized by d attributes: Feature Selection : find k < d dimensions that give the most information. Discard the other d − k . subset selection Feature Extraction : find k ≤ d dimensions that are linear combinations of the original d Principal Components Analysis (unsupervised) 4

Feature Selection Feature Extraction Basic Approaches Given an input population characterized by d attributes: Feature Selection : find k < d dimensions that give the most information. Discard the other d − k . subset selection Feature Extraction : find k ≤ d dimensions that are linear combinations of the original d Principal Components Analysis (unsupervised) Related: Factor Analysis and Multidimensional Scaling 4

Feature Selection Feature Extraction Basic Approaches Given an input population characterized by d attributes: Feature Selection : find k < d dimensions that give the most information. Discard the other d − k . subset selection Feature Extraction : find k ≤ d dimensions that are linear combinations of the original d Principal Components Analysis (unsupervised) Related: Factor Analysis and Multidimensional Scaling Linear Discriminants Analysis (supervised) 4

Feature Selection Feature Extraction Basic Approaches Given an input population characterized by d attributes: Feature Selection : find k < d dimensions that give the most information. Discard the other d − k . subset selection Feature Extraction : find k ≤ d dimensions that are linear combinations of the original d Principal Components Analysis (unsupervised) Related: Factor Analysis and Multidimensional Scaling Linear Discriminants Analysis (supervised) Text also mensions Nonlinear methods: Isometric feature mapping and Locally Linear Embedding 4

Feature Selection Feature Extraction Basic Approaches Given an input population characterized by d attributes: Feature Selection : find k < d dimensions that give the most information. Discard the other d − k . subset selection Feature Extraction : find k ≤ d dimensions that are linear combinations of the original d Principal Components Analysis (unsupervised) Related: Factor Analysis and Multidimensional Scaling Linear Discriminants Analysis (supervised) Text also mensions Nonlinear methods: Isometric feature mapping and Locally Linear Embedding Not enough info to really justify 4

Feature Selection Feature Extraction Subset Selection Assume we have a suitable error function and can evaluate it for a variety of models (cross-validation). 5

Feature Selection Feature Extraction Subset Selection Assume we have a suitable error function and can evaluate it for a variety of models (cross-validation). Misclassification error for classification problems 5

Feature Selection Feature Extraction Subset Selection Assume we have a suitable error function and can evaluate it for a variety of models (cross-validation). Misclassification error for classification problems Mean-squared error for regression 5

Feature Selection Feature Extraction Subset Selection Assume we have a suitable error function and can evaluate it for a variety of models (cross-validation). Misclassification error for classification problems Mean-squared error for regression Can’t evaluate all 2 d subsets of d features 5

Feature Selection Feature Extraction Subset Selection Assume we have a suitable error function and can evaluate it for a variety of models (cross-validation). Misclassification error for classification problems Mean-squared error for regression Can’t evaluate all 2 d subsets of d features Forward selection: Start with an empty feature set. Repeatedly add the feature that reduces the error the most. Stop when decrease is insignificant. 5

Feature Selection Feature Extraction Subset Selection Assume we have a suitable error function and can evaluate it for a variety of models (cross-validation). Misclassification error for classification problems Mean-squared error for regression Can’t evaluate all 2 d subsets of d features Forward selection: Start with an empty feature set. Repeatedly add the feature that reduces the error the most. Stop when decrease is insignificant. Backward selection: Start with all features. Remove the feature that decreases the error the most (or increases it the least). Stop when any further removals increase the error significantly. 5

Feature Selection Feature Extraction Subset Selection Assume we have a suitable error function and can evaluate it for a variety of models (cross-validation). Misclassification error for classification problems Mean-squared error for regression Can’t evaluate all 2 d subsets of d features Forward selection: Start with an empty feature set. Repeatedly add the feature that reduces the error the most. Stop when decrease is insignificant. Backward selection: Start with all features. Remove the feature that decreases the error the most (or increases it the least). Stop when any further removals increase the error significantly. Both directions are O ( d 2 ) 5

Feature Selection Feature Extraction Subset Selection Assume we have a suitable error function and can evaluate it for a variety of models (cross-validation). Misclassification error for classification problems Mean-squared error for regression Can’t evaluate all 2 d subsets of d features Forward selection: Start with an empty feature set. Repeatedly add the feature that reduces the error the most. Stop when decrease is insignificant. Backward selection: Start with all features. Remove the feature that decreases the error the most (or increases it the least). Stop when any further removals increase the error significantly. Both directions are O ( d 2 ) Hill-climing: not guaranteed to find global optimum 5

Feature Selection Feature Extraction Notes Variant floating search adds multiple features at once, then backtracks to see what features can be removed Selection is less useful in very high-dimension problems where individual features are of limiteduse, but clusters of features are significant. 6

Feature Selection Feature Extraction Outline Feature Selection 1 Feature Extraction 2 Principal Components Analysis (PCA) Factor Analysis (FA) Multidimensional Scaling (MDS) Linear Discriminants Analysis (LDA) 7

Feature Selection Feature Extraction Principal Components Analysis (PCA) Find a mapping � z = A � x onto a lower-dimension space Unsupervised method: seeks to minimize variance Intuitively: try to spread the points apart as far as possible 8

Feature Selection Feature Extraction 1st Principal Component Assume � x ∼ N ( � µ, Σ). Then w T � w T � w T Σ � � x ∼ N ( � µ, � w ) w T w T Find z 1 = � x , with � w 1 = 1, that maximizes 1 � 1 � w T Var ( z 1 ) = � 1 Σ � w 1 . w T w T w 1 − α ( � w 1 − 1), α ≥ 0 Find max � w 1 � 1 Σ � 1 � Solution: Σ � w 1 = α� w 1 This is an eigenvalue problem on Σ. We want the solution (eigenvector) corresponding to the largest eigenvalue α 9

Feature Selection Feature Extraction 2nd Principal Component w T w T w T Next find z 2 = � x , with � w 2 = 1 and � w 1 = 0, that 2 � 2 � 2 � w T maximizes Var ( z 2 ) = � 2 Σ � w 2 . Solution: Σ � w 2 = α 2 � w 2 Choose the solution (eigenvector) corresponding to the 2nd largest eigenvalue α 2 Because Σ is symmetric, its eigenvectors are mutually orthogonal 10

Feature Selection Feature Extraction Visualizing PCA z = W T ( � � x − � m ) 11

Feature Selection Feature Extraction Is Spreading the Space Enough? Although we can argue that spreading the points leads to a better- conditioned problem: What does this have to do with reducing dimensionality? 12

Reducing Dimensionality Steven J Zeil Old Dominion Univ. Fall 2010 - PowerPoint PPT Presentation

Feature Selection Feature Extraction Reducing Dimensionality Steven J Zeil Old Dominion Univ. Fall 2010 1 Feature Selection Feature Extraction Outline Feature Selection 1 Feature Extraction 2 Principal Components Analysis (PCA) Factor

Investigating Dimensionality Dimensionality Dimensionality with with Investigating

STAT 209 Dimensionality Reduction November 26, 2019 Colin Reimer Dawson 1 / 24 Dimensionality

Dimensionality Reduction Alexandros Tantos Assistant Professor Aristotle University of

Reducing dimensionality Principal components R.W. Oldford Reducing dimensions Recall how

Case 2: Reducing Cardiovascular Risk Type 2 Diabetes Management Case 1: Reducing Hypoglycemic

Kernel-Based Dimensionality Reduction Methods on Synthesized and Facial Image Data Jonathan L.

WIKIPEDIA ARTICLE GROUP 9 Contents Article Overview 1. Dimensionality Reduction 2.

Dimensionality Reduction INFO-4604, Applied Machine Learning University of Colorado Boulder

Estimation of Intrinsic Dimensionality Using High-Rate Vector Quantization Maxim Raginsky and

Nonlinear Dimensionality Reduction Donovan Parks Overview Direct visualization vs.

Dimensionality Reduction Algorithms (and how to interpret their output) Dalya Baron (Tel Aviv

Exploring Multivariate Data with Clustering and Dimensionality Reduction Marco Baroni Practical

Applied Machine Learning Dimensionality reduction using PCA Siamak Ravanbakhsh COMP 551 (Fall

Preprocessing and Dimensionality Reduction J er emy Fix CentraleSup elec

DIMENSIONALITY REDUCTION DIMENSIONALITY REDUCTION MATTHIEU BLOCH April 21, 2020 1 / 26

. . . 1 / 5 The curse of dimensionality . many applications require high dimensional data .

Regularization Overview Regularization Overview Problems & Multicollinearity We will

Principal Component Analysis of High Frequency Data t-Sahalia Dacheng Xiu Yacine A

Latent Variable Models CS3750 Xiaoting Li 1 Out utli line Latent Variable Models

Machine Learning Basics Lecture slides for Chapter 5 of Deep Learning www.deeplearningbook.org

Methods for finding coupled patterns in two data sets Martin Widmann VALUE training school, ICTP

Stiefel Manifolds and their Applications Pierre-Antoine Absil (UCLouvain) CESAME seminar 22

T-61.3050 Machine Learning: Basic Principles Clustering Kai Puolam aki Laboratory of Computer

Block-Quantized Kernel Matrix for Fast Spectral Embedding Kai Zhang James T. Kwok Department of

Reducing Dimensionality Steven J Zeil Old Dominion Univ. Fall 2010 - PowerPoint PPT Presentation

Feature Selection Feature Extraction Reducing Dimensionality Steven J Zeil Old Dominion Univ. Fall 2010 1 Feature Selection Feature Extraction Outline Feature Selection 1 Feature Extraction 2 Principal Components Analysis (PCA) Factor

Investigating Dimensionality Dimensionality Dimensionality with with Investigating

STAT 209 Dimensionality Reduction November 26, 2019 Colin Reimer Dawson 1 / 24 Dimensionality

Dimensionality Reduction Alexandros Tantos Assistant Professor Aristotle University of

Reducing dimensionality Principal components R.W. Oldford Reducing dimensions Recall how

Case 2: Reducing Cardiovascular Risk Type 2 Diabetes Management Case 1: Reducing Hypoglycemic

Kernel-Based Dimensionality Reduction Methods on Synthesized and Facial Image Data Jonathan L.

WIKIPEDIA ARTICLE GROUP 9 Contents Article Overview 1. Dimensionality Reduction 2.

Dimensionality Reduction INFO-4604, Applied Machine Learning University of Colorado Boulder

Estimation of Intrinsic Dimensionality Using High-Rate Vector Quantization Maxim Raginsky and

Nonlinear Dimensionality Reduction Donovan Parks Overview Direct visualization vs.

Dimensionality Reduction Algorithms (and how to interpret their output) Dalya Baron (Tel Aviv

Exploring Multivariate Data with Clustering and Dimensionality Reduction Marco Baroni Practical

Applied Machine Learning Dimensionality reduction using PCA Siamak Ravanbakhsh COMP 551 (Fall

Preprocessing and Dimensionality Reduction J er emy Fix CentraleSup elec

DIMENSIONALITY REDUCTION DIMENSIONALITY REDUCTION MATTHIEU BLOCH April 21, 2020 1 / 26

. . . 1 / 5 The curse of dimensionality . many applications require high dimensional data .

Regularization Overview Regularization Overview Problems &amp; Multicollinearity We will

Principal Component Analysis of High Frequency Data t-Sahalia Dacheng Xiu Yacine A

Latent Variable Models CS3750 Xiaoting Li 1 Out utli line Latent Variable Models

Machine Learning Basics Lecture slides for Chapter 5 of Deep Learning www.deeplearningbook.org

Methods for finding coupled patterns in two data sets Martin Widmann VALUE training school, ICTP

Stiefel Manifolds and their Applications Pierre-Antoine Absil (UCLouvain) CESAME seminar 22

T-61.3050 Machine Learning: Basic Principles Clustering Kai Puolam aki Laboratory of Computer

Block-Quantized Kernel Matrix for Fast Spectral Embedding Kai Zhang James T. Kwok Department of

Regularization Overview Regularization Overview Problems & Multicollinearity We will