3. Dimensionality Reductjon Chlo-Agathe Azencot Centre for - PowerPoint PPT Presentation

Introductjon to Machine Learning CentraleSupélec Paris — Fall 2017 3. Dimensionality Reductjon Chloé-Agathe Azencot Centre for Computatjonal Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr

Learning objectjves ● Give reasons why one would wish to reduce the dimensionality of a data set. ● Explain the difgerence between feature selectjon and feature extractjon. ● Implement some fjlter strategies. ● Implement some wrapper strategies. ● Derive the computatjon of principal components from a “max variance” defjnitjon ● Implement PCA. 2

Curse of dimensionality ● Methods / intuitjons that work in low dimension may not apply to high dimensions. ? ● p=2: Fractjon of the points within a square that fall outside of the circle inscribed in it: 3

Curse of dimensionality ● Methods / intuitjons that work in low dimension may not apply to high dimensions. ● p=2: Fractjon of the points within a square that fall outside of the circle inscribed in it: r 4

Curse of dimensionality ● Methods / intuitjons that work in low dimension may not apply to high dimensions. ● p=3: Fractjon of the points within a cube that fall outside of the sphere inscribed in it: r 5

Curse of dimensionality ● Volume of a p-sphere: The Gamma functjon Γ generalizes the factorial. Γ(n) = (n-1)! ● When p ↗ the proportjon of a hypercube outside of its inscribed hypersphere approaches 1. ● What this means: – hyperspace is very big – all points are far apart ⇒ dimensionality reductjon. 6

More reasons to reduce dimensionality ● Computatjonal complexity (tjme and space) ● Interpretability ● Simpler models are more robust (less variance) ● Data visualizatjon ● Cost of data acquisitjon ● Eliminate non-relevant atuributes that can make it harder for an algorithm to learn. 7

Approaches to dimensionality reductjon ● Feature selectjon Choose m < p features, ignore the remaining (p-m) – Filtering approaches Apply a statjstjcal measure to assign a score to each feature (correlatjon, χ²-test). – Wrapper approaches Search problem: Find the best set of features for a given predictjve model. – Embedded approaches Simultaneously fjt a model and learn which features should be included. All these feature selectjon approaches are supervised. 8

Approaches to dimensionality reductjon ● Feature selectjon Choose m < p features, ignore the remaining (p-m) – Filtering approaches Apply a statjstjcal measure to assign a score to each feature (correlatjon, χ²-test). – Wrapper approaches Search problem: Find the best set of features for a given predictjve model. – Embedded approaches Simultaneously fjt a model and learn which features should be included. Are those approaches supervised or unsupervised? ? All these feature selectjon approaches are supervised. 11

Feature selectjon: Overview All features Features Features Features set M set 1 set 2 Filter Embedded Predictor approaches approaches Wrapper Features Predictor Features approaches Lasso Features Predictor Predictor Elastjc Net 7 . p a h C Subset selectjon: e e S forward selectjon backward selectjon fmoatjng selectjon 12

Feature selectjon: Subset selectjon 13

Subset selectjon ● Goal: Find the subset of features that leads to the best-performing algorithm. ● How many subsets of p features are there? ? 15

Subset selectjon ● Goal: Find the subset of features that leads to the best-performing algorithm. ● Issue: such sets. 16

Subset selectjon ● Goal: Find the subset of features that leads to the best-performing algorithm. ● Issue: such sets. : Error of a ● Greedy approach: forward search predictor trained only using the features in Add the “best” feature at each step – Initjally: – New best feature: – stop if – else: 17

Subset selectjon ● Goal: Find the subset of features that leads to the best-performing algorithm. ● Issue: such sets. : Error of a ● Greedy approach: forward search predictor trained only using the features in Add the “best” feature at each step – Initjally: – New best feature: – stop if – else: What is the complexity of this algorithm? ? 18

Subset selectjon ● Goal: Find the subset of features that leads to the best- performing algorithm. ● Issue: such sets. ● Greedy approach: forward search : Error of a predictor trained only Add the “best” feature at each step using the features in – Initjally: – New best feature: – stop if – else: Complexity: O(p² x C) where C=complexity of training and evaluatjng the model (might depend on p also). Much betuer than O(2 p )! 19

Subset selectjon ● Greedy approach: forward search : Error of a predictor trained only Add the “best” feature at each step using the features in – Initjally: – New best feature: – stop if – else: Complexity: O(p²) ● Alternatjve strategies: – Backward search: start from {1, …, p}, eliminate features. – Floatjng search: alternatjvely add q features and remove r features. 20

Approaches to dimensionality reductjon ● Feature extractjon Project the p features on m < p new dimensions ● Principal Components Analysis (PCA) ● Factor Analysis (FA) Linear ● Non-negatjve Matrix Factorizatjon (NMF) ● Linear Discriminant Analysis (LDA) Supervised ● Multjdimensional scaling (MDS) ● Isometric feature mapping (Isomap) Non linear ● Locally Linear Embedding (LLE) ● Autoencoders Most of these approaches are unsupervised. 21

Feature extractjon: Principal Component Analysis 22

Principal Components Analysis (PCA) ● Goal: Find a low-dimensional space such that informatjon loss is minimized when the data is projected on that space. 23

Principal Components Analysis (PCA) ● Goal: Find a low-dimensional space such that informatjon loss is minimized when the data is projected on that space. ● Unsupervised: We're only looking at the data, not at any labels. 24

Principal Components Analysis (PCA) ● Goal: Find a low-dimensional space such that informatjon loss is minimized when the data is projected on that space. ● Unsupervised: We're only looking at the data, not at any labels. In PCA, we want the variance to be maximized. 25

Principal Components Analysis (PCA) ● Goal: Find a low-dimensional space such that informatjon loss is minimized when the data is projected on that space. ● Unsupervised: We're only looking at the data, not at any labels. In PCA, we want the variance to be maximized. Projectjon on x 2 Projectjon on x 1 26

Principal Components Analysis (PCA) ● Goal: Find a low-dimensional space such that informatjon loss is minimized when the data is projected on that space. ● Unsupervised: We're only looking at the data, not at any labels. In PCA, we want the variance to be maximized. Warning! This requires standardizing the features. Projectjon on x 2 Projectjon on x 1 27

Feature standardizatjon ● Variance of feature j in data set D : ? 28

Feature standardizatjon ● Variance of feature j in data set D : ● Features that take large values will have large variance Compare [10, 20, 30, 40, 50] with [0.1, 0.2, 0.3, 0.4, 0.5]. ● Standardizatjon: – mean centering: give each feature a mean of 0 – variance scaling: give each feature a variance of 1 29

3. Dimensionality Reductjon Chlo-Agathe Azencot Centre for - PowerPoint PPT Presentation

Introductjon to Machine Learning CentraleSuplec Paris Fall 2017 3. Dimensionality Reductjon Chlo-Agathe Azencot Centre for Computatjonal Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr Learning objectjves Give

Investigating Dimensionality Dimensionality Dimensionality with with Investigating

STAT 209 Dimensionality Reduction November 26, 2019 Colin Reimer Dawson 1 / 24 Dimensionality

Dimensionality Reduction Alexandros Tantos Assistant Professor Aristotle University of

Kernel-Based Dimensionality Reduction Methods on Synthesized and Facial Image Data Jonathan L.

WIKIPEDIA ARTICLE GROUP 9 Contents Article Overview 1. Dimensionality Reduction 2.

Dimensionality Reduction INFO-4604, Applied Machine Learning University of Colorado Boulder

Estimation of Intrinsic Dimensionality Using High-Rate Vector Quantization Maxim Raginsky and

Nonlinear Dimensionality Reduction Donovan Parks Overview Direct visualization vs.

Dimensionality Reduction Algorithms (and how to interpret their output) Dalya Baron (Tel Aviv

Exploring Multivariate Data with Clustering and Dimensionality Reduction Marco Baroni Practical

Applied Machine Learning Dimensionality reduction using PCA Siamak Ravanbakhsh COMP 551 (Fall

Preprocessing and Dimensionality Reduction J er emy Fix CentraleSup elec

DIMENSIONALITY REDUCTION DIMENSIONALITY REDUCTION MATTHIEU BLOCH April 21, 2020 1 / 26

. . . 1 / 5 The curse of dimensionality . many applications require high dimensional data .

How to Cope with the Curse of Dimensionality ? Henryk Wo zniakowski University of Warsaw and

Probabilistic Dimensionality Reduction Neil D. Lawrence University of Sheffield Facebook, London

Blending, Modern Hardware Week 12, Mon Apr 2 http://www.ugrad.cs.ubc.ca/~cs314/Vjan2007 Old News

Shopping Tree Urban Shopping Carrier ORANGE A 1 ? User Interviews Our user: Urban grocery

Normal versus abnormal behaviour Charlotte Werger Data Scientist DataCamp Fraud Detection in

Team Pehal <Volunteers> <Day, Date> <Venue> Book 1 Chapter 1 1. Earn

Oberseminar 2014 TAMS activities in RobotEra Hannes Bistry University of Hamburg Faculty of

TM Technology for Blu-ray and TV: Java Creating your own Blu-ray Java Discs The Blu-ray Java

Chapter 2 Data Design and Implementation Outline Different Views of Data: logical,

Envir ironment based on Ju Julia lia Hilding Elmqvist, Toivo Henningsson, Martin Otter Toivo

3. Dimensionality Reductjon Chlo-Agathe Azencot Centre for - PowerPoint PPT Presentation

Introductjon to Machine Learning CentraleSuplec Paris Fall 2017 3. Dimensionality Reductjon Chlo-Agathe Azencot Centre for Computatjonal Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr Learning objectjves Give

Investigating Dimensionality Dimensionality Dimensionality with with Investigating

STAT 209 Dimensionality Reduction November 26, 2019 Colin Reimer Dawson 1 / 24 Dimensionality

Dimensionality Reduction Alexandros Tantos Assistant Professor Aristotle University of

Kernel-Based Dimensionality Reduction Methods on Synthesized and Facial Image Data Jonathan L.

WIKIPEDIA ARTICLE GROUP 9 Contents Article Overview 1. Dimensionality Reduction 2.

Dimensionality Reduction INFO-4604, Applied Machine Learning University of Colorado Boulder

Estimation of Intrinsic Dimensionality Using High-Rate Vector Quantization Maxim Raginsky and

Nonlinear Dimensionality Reduction Donovan Parks Overview Direct visualization vs.

Dimensionality Reduction Algorithms (and how to interpret their output) Dalya Baron (Tel Aviv

Exploring Multivariate Data with Clustering and Dimensionality Reduction Marco Baroni Practical

Applied Machine Learning Dimensionality reduction using PCA Siamak Ravanbakhsh COMP 551 (Fall

Preprocessing and Dimensionality Reduction J er emy Fix CentraleSup elec

DIMENSIONALITY REDUCTION DIMENSIONALITY REDUCTION MATTHIEU BLOCH April 21, 2020 1 / 26

. . . 1 / 5 The curse of dimensionality . many applications require high dimensional data .

How to Cope with the Curse of Dimensionality ? Henryk Wo zniakowski University of Warsaw and

Probabilistic Dimensionality Reduction Neil D. Lawrence University of Sheffield Facebook, London

Blending, Modern Hardware Week 12, Mon Apr 2 http://www.ugrad.cs.ubc.ca/~cs314/Vjan2007 Old News

Shopping Tree Urban Shopping Carrier ORANGE A 1 ? User Interviews Our user: Urban grocery

Normal versus abnormal behaviour Charlotte Werger Data Scientist DataCamp Fraud Detection in

Team Pehal &lt;Volunteers&gt; &lt;Day, Date&gt; &lt;Venue&gt; Book 1 Chapter 1 1. Earn

Oberseminar 2014 TAMS activities in RobotEra Hannes Bistry University of Hamburg Faculty of

TM Technology for Blu-ray and TV: Java Creating your own Blu-ray Java Discs The Blu-ray Java

Chapter 2 Data Design and Implementation Outline Different Views of Data: logical,

Envir ironment based on Ju Julia lia Hilding Elmqvist, Toivo Henningsson, Martin Otter Toivo

Team Pehal <Volunteers> <Day, Date> <Venue> Book 1 Chapter 1 1. Earn