PCA & ICA CE-717: Machine Learning Sharif University of - PowerPoint PPT Presentation

PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2018 Soleymani

Dimensionality Reduction: Feature Selection vs. Feature Extraction } Feature selection } Select a subset of a given feature set } Feature extraction } A linear or non-linear transform on the original feature space 𝑦 & ' 𝑦 " 𝑦 " 𝑧 " 𝑦 " ⋮ ⋮ → ⋮ ⋮ ⋮ → = 𝑔 𝑦 $ 𝑦 & () 𝑦 $ 𝑧 $ ) 𝑦 $ Feature Feature Selection Extraction ( 𝑒 + < 𝑒 ) 2

Feature Extraction } Mapping of the original data to another space } Criterion for feature extraction can be different based on problem settings } Unsupervised task: minimize the information loss (reconstruction error) } Supervised task: maximize the class discrimination on the projected space } Feature extraction algorithms } Linear Methods } Unsupervised: e.g., Principal Component Analysis (PCA) } Supervised: e.g., Linear Discriminant Analysis (LDA) ¨ Also known as Fisher’s Discriminant Analysis (FDA) } Non-linear methods: } Supervised: MLP neural networks } Unsupervised: e.g., autoencoders 3

Feature Extraction } Unsupervised feature extraction: A mapping 𝑔: ℝ $ → ℝ $ ) (") (") 𝑦 " ⋯ 𝑦 $ Or 𝒀 = ⋮ ⋱ ⋮ Feature Extraction only the transformed data (5) (5) 𝑦 " ⋯ 𝑦 $ (") (") 𝑦′ " ⋯ 𝑦′ $ ) 𝒀 + = ⋮ ⋱ ⋮ (5) (5) 𝑦′ " ⋯ 𝑦′ $ ) } Supervised feature extraction: A mapping 𝑔: ℝ $ → ℝ $ ) (") (") 𝑦 " ⋯ 𝑦 $ Or 𝒀 = ⋮ ⋱ ⋮ Feature Extraction only the transformed data (5) (5) 𝑦 " ⋯ 𝑦 $ (") (") 𝑦′ " ⋯ 𝑦′ $ ) 𝑧 (") 𝒀 + = ⋮ ⋱ ⋮ 𝑍 = ⋮ (5) (5) 𝑦′ " ⋯ 𝑦′ $ ) 𝑧 (5) 4

Unsupervised Feature Reduction } Visualization and interpretation: projection of high- dimensional data onto 2D or 3D. } Data compression: efficient storage, communication, or and retrieval. } Pre-process: to improve accuracy by reducing features } As a preprocessing step to reduce dimensions for supervised learning tasks } Helps avoiding overfitting } Noise removal } E.g, “noise” in the images introduced by minor lighting variations, slightly different imaging conditions, 5

Linear Transformation } For linear transformation, we find an explicit mapping 𝑔 𝒚 = 𝑩 < 𝒚 that can transform also new data vectors. Original data Type equation here. reduced data = 𝒚′ ∈ ℝ $ ) 𝑩 < ∈ ℝ $ ) ×$ 𝒚 + = 𝑩 < 𝒚 𝑒 + < 𝑒 𝒚 ∈ ℝ 6

Linear Transformation } Linear transformation are simple mappings ¢ ¢ = = T T x A x ( x a x ) 𝑘 = 1, … , 𝑒 j j 𝑏 "" ⋯ 𝑏 "$ ⋮ ⋱ ⋮ 𝑩 = 𝑏 $" ⋯ 𝑏 $$ ) a a 1 d ¢ 7

Linear Dimensionality Reduction } Unsupervised } Principal Component Analysis (PCA) } Independent Component Analysis (ICA) } SingularValue Decomposition (SVD) } Multi Dimensional Scaling (MDS) } Canonical Correlation Analysis (CCA) } … 8

Principal Component Analysis (PCA) } Also known as Karhonen-Loeve (KL) transform } Principal Components (PCs): orthogonal vectors that are ordered by the fraction of the total information (variation) in the corresponding directions } Find the directions at which data approximately lie } When the data is projected onto first PC, the variance of the projected data is maximized 9

Principal Component Analysis (PCA) } The “best” linear subspace (i.e. providing least reconstruction error of data): } Find mean reduced data } The axes have been rotated to new (principal) axes such that: } Principal axis 1 has the highest variance .... } Principal axis i has the i-th highest variance. } The principal axes are uncorrelated } Covariance among each pair of the principal axes is zero. } Goal: reducing the dimensionality of the data while preserving the variation present in the dataset as much as possible. } PCs can be found as the “best” eigenvectors of the covariance matrix of the data points. 10

Principal components } If data has a Gaussian distribution 𝑂(𝝂, 𝚻), the direction of the largest variance can be found by the eigenvector of 𝚻 that corresponds to the largest eigenvalue of 𝚻 𝒘 W 𝒘 " 11

Example: random direction 12

Example: principal component 13

Covariance Matrix 𝜈 " 𝐹(𝑦 " ) ⋮ ⋮ 𝝂 𝒚 = = 𝜈 $ 𝐹(𝑦 $ ) 𝒚 − 𝝂 𝒚 < 𝜯 = 𝐹 𝒚 − 𝝂 𝒚 5 : } ML estimate of covariance matrix from data points 𝒚 (&) &\" 5 ] = 1 = 1 𝑂 ^ 𝒚 (&) − 𝝂 𝒚 (&) − 𝝂 < ` < 𝒀 ` 𝜯 _ _ 𝑂 𝒀 &\" 𝒚 (") − 𝝂 5 a (") _ 𝒚 _ = 1 ` = 𝑂 ^ 𝒚 (&) 𝒀 = 𝝂 ⋮ ⋮ 𝒚 (5) − 𝝂 a (5) _ 𝒚 &\" 14 Mean-centered data

PCA: Steps } Input: 𝑂×𝑒 data matrix 𝒀 (each row contain a 𝑒 dimensional data point) " 5 𝒚 (&) 5 ∑ } 𝝂 = &\" ` ← Mean value of data points is subtracted from rows of 𝒀 } 𝒀 " ` < 𝒀 ` (Covariance matrix) } 𝚻 = 5 𝒀 } Calculate eigenvalue and eigenvectors of 𝚻 } Pick 𝑒 + eigenvectors corresponding to the largest eigenvalues and put them in the columns of 𝑩 = [𝒘 " , … , 𝒘 $ ) ] } 𝒀′ = 𝒀𝑩 First PC d’-th PC 15

Find principal components } Assume that data is centered. } Find vector 𝒘 that maximizes sample variance of the projected data: 5 1 = 1 W 𝑂 ^ 𝑤 < 𝑦 j 𝑂 𝑤 < 𝑌 < 𝑌𝑤 argmax h j\" s. t. 𝑤 < 𝑤 = 1 𝑀 𝑤, 𝜇 = 𝑤 < 𝑌 < 𝑌𝑤 − 𝜇𝑤 < 𝑤 𝜖𝑀 𝜖𝑤 = 0 ⇒ 2𝑌 < 𝑌𝑤 − 2𝜇𝑤 = 0 ⇒ 𝑌 < 𝑌𝑤 = 𝜇𝑤 16

Find principal components } For symmetric matrices, there exist eigen-vectors that are orthogonal. } Let 𝑤 " , … 𝑤 $ denote the eigen-vectors of 𝑌 < 𝑌 such that: < 𝑤 s = 0, ∀𝑗 ≠ 𝑘 𝑤 & < 𝑤 & = 1, 𝑤 & ∀𝑗 17

Find principal components 𝑌 < 𝑌𝒘 = 𝜇𝒘 ⇒ 𝒘 < 𝑌 < 𝑌𝒘 = 𝜇𝒘 < 𝒘 = 𝜇 } 𝜇 denotes the amount of variance along the found dimension 𝒘 (called energy along that dimension). } } Eigenvalues: 𝜇 " ≥ 𝜇 W ≥ 𝜇 x ≥ ⋯ } The first PC 𝒘 " is the the eigenvector of the sample covariance matrix 𝑌 < 𝑌 associated with the largest eigenvalue. } The 2nd PC 𝒘 W is the the eigenvector of the sample covariance matrix 𝑌 < 𝑌 associated with the second largest eigenvalue } And so on ... 18

Another Interpretation: Least Squares Error } PCs are linear least squares fits to samples, each orthogonal to the previous PCs: } First PC is a minimum distance fit to a vector in the original feature space } Second PC is a minimum distance fit to a vector in the plane perpendicular to the first PC } … 19

Least Squares Error and Maximum Variance Views Are Equivalent (1-dim Interpretation) } When data are mean-removed: } Minimizing sum of square distances to the line is equivalent to maximizing the sum of squares of the projections on that line (Pythagoras). origin 20

Two interpretations } Maximum variance subspace 5 W ^ 𝑤 < 𝑦 j = 𝑤 < 𝑌 < 𝑌𝑤 argmax h j\" } Minimum reconstruction error 5 W ^ 𝑦 j − 𝑤 < 𝑦 j argmin 𝑤 h j\" 𝑦 𝑤 blue 2 + red 2 = geen 2 geen 2 is fixed (shows data) So, maximizing red 2 is equivalent to minimizing blue 2 𝑤 < 𝑦 origin 21

PCA: Uncorrelated Features 𝒚 + = 𝑩 < 𝒚 𝑺 𝒚 ) = 𝐹 𝒚 + 𝒚 +< = 𝐹 𝑩 < 𝒚𝒚 < 𝑩 = 𝑩 < 𝐹 𝒚𝒚 < 𝑩 = 𝑩 < 𝑺 𝒚 𝑩 } If 𝑩 = [𝒃 " , … , 𝒃 $ ] where 𝒃 " , … , 𝒃 $ are orthonormal eighenvectors of 𝑺 𝒚 : 𝑺 𝒚 ) = 𝑩 < 𝑺 𝒚 𝑩 = 𝑩 < 𝑩𝚳𝑩 < 𝑩 = 𝚳 + = 0 + 𝒚 s ⇒ ∀𝑗 ≠ 𝑘 𝑗, 𝑘 = 1, … , 𝑒 𝐹 𝒚 & } then mutually uncorrelated features are obtained } Completely uncorrelated features avoid information redundancies 22

Reconstruction < 𝒚 < 𝒘 " 𝒘 " 𝒚 + = ⋮ ⋮ = 𝒚 < 𝒚 < 𝒘 $ ) 𝒘 $ ) 𝒚 + = 𝑩 < 𝒚 𝑩 = [𝒘 " , … , 𝒘 $ ) ] ⇒ 𝑩𝒚 + = 𝑩𝑩 < 𝒚 = 𝒚 ⇒ 𝒚 = 𝑩𝒚 + } Incorporating all eigenvectors in 𝑩 = [𝒘 " , … , 𝒘 $ ] : ⟹ If 𝑒 + = 𝑒 then 𝒚 can be reconstructed exactly from 𝒚 + 23

PCA Derivation: Relation between Eigenvalues and Variances } The 𝑘 -th largest eigenvalue of 𝑺 𝒚 is the variance on the 𝑘 -th PC: + = 𝒘 s < 𝑺 𝒚 𝒘 s = 𝜇 s 𝑤𝑏𝑠 𝑦 s 24

PCA & ICA CE-717: Machine Learning Sharif University of - PowerPoint PPT Presentation

PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2018 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction } Feature selection } Select a subset of a given feature set } Feature extraction }

ECS231 PCA, revisited May 28, 2019 1 / 18 Outline 1. PCA for lossy data compression 2. PCA for

ICA ICA ICA Self-Defending Networks Digital antibodies neutralize threats

PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan

MLCC 2015 Dimensionality Reduction and PCA Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2015 Outline

SMART ICA Mike Booth Standards Manager ICA & Electrical WIMES May 2017 STANDARDS

Independent Contractor Agreement (ICA) End User Training Purchasing Department April 15, 2014

Ive Got You Under My Skin: A Comparison of IV and s/c PCA Nick Williamson Clinical Nurse

Exploratory Factor Analysis PCA Analysis A Review Precipitation Temperature Ecosystems PCA

Lecture 25: Autoencoders Kernel PCA Aykut Erdem January 2017 Hacettepe University Today

PRESENTATION OVERVIEW Snapshot of ICA Programs and Services Profile of UC Davis

An Overview of ICA-RUS and Some Personal Views on Global Climate Risk Management

AFRI AFRI ICA P ICA P PREPA PREPA AID S AID S SERVI SERVI ICES ICES NIGE NIGE ERIA ERIA

Introduction to Machine Learning CMU-10701 20. Independent Component Analysis Barnabs Pczos

ICA Mountain Cartography Workshop Ban f, 2 -24 April 2014 MAP ING NEW ZEALAND's GREAT WALKS

What we are going to talk about? New tool released at Blackhat Canape What is Citrix

Kernel PCA for SNe Kernel PCA for SNe photometric classification photometric classification

ICA Q&A session Jonathan Bowdler Head of Regulatory Compliance Objectives of the session 1)

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II

Deterministic Independent Component Analysis (ICA) Ruitong Huang Andrs Gyrgy Csaba

Lecture 24: Autoencoders ICA Aykut Erdem December 2017 Hacettepe University Last time

Empirical Comparison of Approximate Inference Algorithms for Networked Data Prithviraj Sen Lise

WindMine: Fast and Effective Mining of Web-click Sequences Yasushi Sakurai (NTT) Lei Li

Disclosures Your Patient Has Carotid Bulb Stenosis and a Chief Medical Officer: ChemoFilter

GWSA IAC Meeting March 7, 2019 Agenda Review draft meeting minutes of December 6, 2018

PCA & ICA CE-717: Machine Learning Sharif University of - PowerPoint PPT Presentation

PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2018 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction } Feature selection } Select a subset of a given feature set } Feature extraction }

ECS231 PCA, revisited May 28, 2019 1 / 18 Outline 1. PCA for lossy data compression 2. PCA for

ICA ICA ICA Self-Defending Networks Digital antibodies neutralize threats

PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan

MLCC 2015 Dimensionality Reduction and PCA Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2015 Outline

SMART ICA Mike Booth Standards Manager ICA &amp; Electrical WIMES May 2017 STANDARDS

Independent Contractor Agreement (ICA) End User Training Purchasing Department April 15, 2014

Ive Got You Under My Skin: A Comparison of IV and s/c PCA Nick Williamson Clinical Nurse

Exploratory Factor Analysis PCA Analysis A Review Precipitation Temperature Ecosystems PCA

Lecture 25: Autoencoders Kernel PCA Aykut Erdem January 2017 Hacettepe University Today

PRESENTATION OVERVIEW Snapshot of ICA Programs and Services Profile of UC Davis

An Overview of ICA-RUS and Some Personal Views on Global Climate Risk Management

AFRI AFRI ICA P ICA P PREPA PREPA AID S AID S SERVI SERVI ICES ICES NIGE NIGE ERIA ERIA

Introduction to Machine Learning CMU-10701 20. Independent Component Analysis Barnabs Pczos

ICA Mountain Cartography Workshop Ban f, 2 -24 April 2014 MAP ING NEW ZEALAND's GREAT WALKS

What we are going to talk about? New tool released at Blackhat Canape What is Citrix

Kernel PCA for SNe Kernel PCA for SNe photometric classification photometric classification

ICA Q&amp;A session Jonathan Bowdler Head of Regulatory Compliance Objectives of the session 1)

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II

Deterministic Independent Component Analysis (ICA) Ruitong Huang Andrs Gyrgy Csaba

Lecture 24: Autoencoders ICA Aykut Erdem December 2017 Hacettepe University Last time

Empirical Comparison of Approximate Inference Algorithms for Networked Data Prithviraj Sen Lise

WindMine: Fast and Effective Mining of Web-click Sequences Yasushi Sakurai (NTT) Lei Li

Disclosures Your Patient Has Carotid Bulb Stenosis and a Chief Medical Officer: ChemoFilter

GWSA IAC Meeting March 7, 2019 Agenda Review draft meeting minutes of December 6, 2018

SMART ICA Mike Booth Standards Manager ICA & Electrical WIMES May 2017 STANDARDS

ICA Q&A session Jonathan Bowdler Head of Regulatory Compliance Objectives of the session 1)