Dimensionality Reduc1on contd Aarti Singh Machine Learning - PowerPoint PPT Presentation

Dimensionality ¡Reduc1on ¡ contd ¡… ¡ Aarti Singh Machine Learning 10-601 Nov 10, 2011 Slides Courtesy: Tom Mitchell, Eric Xing, Lawrence Saul 1

Principal ¡Component ¡Analysis ¡(PCA) ¡ Principal Components are the eigenvectors of the matrix of sample correlations XX T of the data New set of axes V = [v 1 , v 2 , … , v D ] where XX T = V Λ V T • Geometrically: centering followed by rotation – Linear transformation x 2 Original representation of data points x i = [x i 1 , x i 2 , … , x i D ] x i x i 2 x i j = e j T x i where e j = [ 0 … 0 1 0 … 0] j th coordinate Transformed representation of data points [v 1T x i , v 2T x i , … v DT x i ] x i 1 x 1 2

Dimensionality ¡Reduc1on ¡using ¡PCA ¡ Original Representation [x i 1 , x i 2 , … , x i D ] (D-dimensional vector) D D X X x i = x i j e j = (e j T x i ) e j (x i j ) 2 = (e j T x i ) 2 = energy/variance of data point i along coordinate j j =1 j =1 Transformed representation [v 1T x i , v 2T x i , … v DT x i ] (D-dimensional vector) D X x i = (v jT x i ) v j (v j T x i ) 2 = energy/variance of data point i along principal component v j j =1 n X λ j = (v j T x i ) 2 = energy/variance of all points along v j i =1 Dimensionality reduction [v 1T x i , v 2T x i , … v dT x i ] (d-dimensional vector) d ^ X x i = (v jT x i ) v j j =1 Only keep data projections onto principal components which capture enough energy/variance of the data λ 1 ≥ λ 2 ≥ … ≥ λ D 3

Another ¡interpreta1on ¡ Maximum Variance Subspace: PCA finds vectors v such that projections on to the vectors capture maximum variance in the data Minimum Reconstruction Error: PCA finds vectors v such that projection on to the vectors yields minimum MSE reconstruction x i v v T x i One direction approximation X v T Recall: ( ) x i k x i · v k k = 4 k

Another ¡way ¡to ¡compute ¡PCs ¡ ¡ Principal Components – Eigenvectors of XX T (D x D matrix) Problematic for high-dimensional datasets! Another way to compute PCs: Singular Vector Decomposition (SVD) ~ ~ ~ ~ ~ ~ ⇒ XX T = V SU T USV T = V S 2 V T = V Λ V T X = ˜ V SU T = S ¡ U T ¡ ~ X ¡ V ¡ n x n n x n Singular values Right singular vectors x i = x i1 = √ eigenvalues D x n D x n x i2 Left singular vectors … x iD v j = Principal Components 5

Another ¡way ¡to ¡compute ¡PC ¡projec1ons ¡ ¡ ~ Singular Vector Decomposition X = V SU T Projection of data points on to PCs [v 1T x i , v 2T x i , … v nT x i ] = [ σ 1 u 1 (i), σ 2 u 2 (i), … σ n u n (i)] ~ ~ ~ ~ ~ V T X = V T V SU T = SU T (since V T V = I SVD ⇒ eigenvectors are orthornormal) U and S can be obtained by eigendecomposition of X T X! ~ ~ X T X = USV T V SU T = US 2 U T (n x n matrix) Principal Components are obtained by Eigendecomposition of XX T (D x D matrix) Projection of data points on to PCs can be obtained by Eigendecomposition of X T X (n x n matrix) 6

Independent ¡Component ¡Analysis ¡(ICA) ¡ • PCA seeks “orthogonal” directions that capture maximum variance in data, or that minimize squared reconstruction error. • ICA seeks “statistically independent” directions in the data

Dimensionality ¡Reduc1on ¡ “Unrolling the swiss roll” 8

Nonlinear ¡Methods ¡ Data ¡o+en ¡lies ¡on ¡or ¡near ¡a ¡nonlinear ¡low-‑dimensional ¡curve ¡aka ¡manifold. ¡ 9

Nonlinear ¡Methods ¡ Data ¡o+en ¡lies ¡on ¡or ¡near ¡a ¡nonlinear ¡low-‑dimensional ¡curve ¡aka ¡manifold. ¡ 10

Laplacian ¡Eigenmaps ¡ Linear ¡methods ¡– ¡Lower-‑dimensional ¡linear ¡projecAon ¡that ¡preserves ¡distances ¡ between ¡ all ¡points ¡ ¡ Laplacian ¡Eigenmaps ¡(key ¡idea) ¡– ¡preserve ¡ local ¡informaAon ¡only ¡ Project points into a low-dim Construct graph from data points space using “eigenvectors of (capture local information) the graph”

Nonlinear ¡Embedding ¡Results ¡ 12

Step ¡1 ¡-‑ ¡Graph ¡Construc1on ¡ Similarity Graphs: Model local neighborhood relations between data points G(V,E,W) V – Vertices (Data points) E – Edges (1) E – Edge if ||xi – xj|| ≤ ε ( ε – neighborhood graph) (2) E – Edge if k-nearest neighbor (k-NN graph)

Step ¡1 ¡-‑ ¡Graph ¡Construc1on ¡ Similarity Graphs: Model local neighborhood relations between data points (2) E – Edge if k-nearest neighbor (k-NN) yields directed graph connect A with B if A → B OR A ← B (symmetric kNN graph) connect A with B if A → B AND A ← B (mutual kNN graph) Directed nearest neighbors (symmetric) kNN graph mutual kNN graph

Step ¡1 ¡-‑ ¡Graph ¡Construc1on ¡ Similarity Graphs: Model local neighborhood relations between data points G(V,E,W) V – Vertices (Data points) E – Edges W – Weights (1) W - W ij = 1 if edge present, 0 otherwise (2) W - Gaussian kernel similarity function (aka Heat kernel)

Step ¡1 ¡-‑ ¡Graph ¡Construc1on ¡ Similarity Graphs: Model local neighborhood relations between data points Choice of σ 2 , ε and k : ε , k - Chosen so that neighborhood on graphs represent neighborhoods on the manifold (no “shortcuts” connect different arms of the swiss roll) Mostly ad-hoc

Step ¡2 ¡– ¡Projec1on ¡using ¡Graph ¡ Original Representation Transformed representation data point projections x i → (f 1 (i), … , f d (i)) (D-dimensional vector) (d-dimensional vector) Basic Idea: Find vector f such that, if x i is close to x j in the graph (i.e. W ij is large), then projections of the points f(i) and f(j) are close W ij

Step ¡2 ¡– ¡Projec1on ¡using ¡Graph ¡ • Graph ¡Laplacian ¡(unnormalized ¡version) ¡ ¡ ¡ ¡L ¡= ¡D ¡– ¡W ¡ ¡ ¡ ¡ ¡(n ¡x ¡n ¡matrix) ¡ . ¡ ¡W ¡– ¡Weight ¡matrix ¡ ¡ ¡D ¡– ¡Degree ¡matrix ¡= ¡ diag ( d 1 , ¡…. , ¡d n ) ¡ ¡ ¡ ¡ ¡ Note: ¡ If ¡graph ¡is ¡connected, ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ 1 ¡is ¡an ¡eigenvector ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡with ¡0 ¡as ¡eigenvalue ¡

Step ¡2 ¡– ¡Projec1on ¡using ¡Graph ¡ • JusAficaAon ¡– ¡points ¡connected ¡on ¡the ¡graph ¡stay ¡as ¡close ¡as ¡ possible ¡a+er ¡embedding ¡ W ij RHS = f T (D-‑W) ¡ f ¡ ¡ = ¡ f T D ¡ f ¡-‑ ¡ f T W ¡ f ¡ = LHS

Step ¡2 ¡– ¡Projec1on ¡using ¡Graph ¡ • JusAficaAon ¡– ¡points ¡connected ¡on ¡the ¡graph ¡stay ¡as ¡close ¡as ¡ possible ¡a+er ¡embedding ¡ s.t. f T f = 1 W ij ¡Similar ¡to ¡PCA ¡with ¡XX T ¡replaced ¡by ¡L ¡ ¡ ¡ Wrap constraint into the s.t. f T f = 1 ¡ ¡ ¡ ¡ ¡Lagrangian: ¡ objective function ¡ ( L − λ I ) f = 0 Lf = λ f

Step ¡2 ¡– ¡Projec1on ¡using ¡Graph ¡Laplacian ¡ • Graph ¡Laplacian ¡(unnormalized ¡version) ¡ ¡ ¡ ¡L ¡= ¡D ¡– ¡W ¡(n ¡x ¡n ¡matrix) ¡ ¡Find ¡eigenvectors ¡of ¡the ¡graph ¡Laplacian ¡ Lf = λ f ¡ ¡Ordered ¡eigenvalues ¡ 0 = λ 1 ≤ λ 2 ≤ λ 3 ≤ … ≤ λ n ¡ ¡ ¡ ¡To ¡embed ¡data ¡points ¡in ¡d-‑dim ¡space, ¡project ¡data ¡points ¡onto ¡ eigenvectors ¡associated ¡with ¡ λ 1 , λ 2 , … , λ d ¡ ¡ ¡ ¡Original ¡RepresentaAon ¡ ¡Transformed ¡representaAon ¡ ¡data ¡point ¡ ¡ ¡ ¡ ¡projecAons ¡ ¡x i ¡ ¡ ¡ ¡→ ¡(f 1 (i), ¡…, ¡f d (i)) ¡ ¡(D-‑dimensional ¡vector) ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡(d-‑dimensional ¡vector) ¡ ¡

Unrolling ¡the ¡swiss ¡roll ¡ f 2 f 3 N=number of nearest neighbors, t = the heat kernel parameter (Belkin & Niyogi’03)

Example ¡– ¡Understanding ¡syntac1c ¡ structure ¡of ¡words ¡ ¡ 300 ¡most ¡frequent ¡words ¡of ¡Brown ¡corpus ¡ • InformaAon ¡about ¡the ¡frequency ¡of ¡its ¡le+ ¡and ¡right ¡neighbors ¡(600 ¡ • Dimensional ¡space.) ¡ verbs The ¡algorithm ¡run ¡with ¡N ¡= ¡14, ¡t ¡= ¡1 ¡ • prepositions

Dimensionality Reduc1on contd Aarti Singh Machine Learning - PowerPoint PPT Presentation

Dimensionality Reduc1on contd Aarti Singh Machine Learning 10-601 Nov 10, 2011 Slides Courtesy: Tom Mitchell, Eric Xing, Lawrence Saul 1 Principal Component Analysis (PCA) Principal Components are the

Dimensionality Reduc1on Lecture 23 David Sontag New York University Slides adapted from Carlos

Dimensionality Reduc1on Lecture 23 David Sontag New York University Slides adapted from Carlos

Investigating Dimensionality Dimensionality Dimensionality with with Investigating

STAT 209 Dimensionality Reduction November 26, 2019 Colin Reimer Dawson 1 / 24 Dimensionality

Dimensionality Reduc1on Lecture 9 David Sontag New York

Dimensionality Reduc1on Lecture 23 David Sontag New York

Dimensionality Reduc1on Machine Learning 10-601B Seyoung Kim

Dimensionality Reduction Alexandros Tantos Assistant Professor Aristotle University of

classifier Sutanu Gayen Drawbacks of state-of-the art chess engines Contd.. Rule of square:

CS70: Lecture 35. Regression (contd.): Linear and Beyond CS70: Lecture 35. Regression (contd.):

BrickNet (contd) BrickNet (contd) Other Academic Projects Other Academic Projects

Massachuse(s)Toxics)Use)Reduc1on)Act) (TURA):)Reducing)the)Use)of)Carcinogens) Rachel'Massey'

Kernel-Based Dimensionality Reduction Methods on Synthesized and Facial Image Data Jonathan L.

WIKIPEDIA ARTICLE GROUP 9 Contents Article Overview 1. Dimensionality Reduction 2.

Dimensionality Reduction INFO-4604, Applied Machine Learning University of Colorado Boulder

Estimation of Intrinsic Dimensionality Using High-Rate Vector Quantization Maxim Raginsky and

Genus 3 curves with nontrivial multiplications: Questions Jerome William Hoffman Louisiana State

1 Limitations of Polygonal Meshes Planar facets (& silhouettes) Fixed resolution

Reading Recommended: ! Stollnitz, DeRose, and Salesin. Wavelets for Computer Graphics: Theory

A Chabauty-Coleman bound for surfaces in abelian threefolds Hector Pasten Pontificia Universidad

Smooth models for Suzuki and Ree Curves Abdulla Eid RICAM Workshop Algebraic curves over finite

On the Zeta Function of Curves over Finite Fields Nurdag ul Anbar (joint work with Henning

SCATTERING ALONG A CURVE IN THE PLAIN J. DITTRICH NUCLEAR PHYSICS INSTITUTE CAS, RE Z,

Testing Part 2 1 Three Important Testing Questions How shall we generate/select test

Dimensionality Reduc1on contd Aarti Singh Machine Learning - PowerPoint PPT Presentation

Dimensionality Reduc1on contd Aarti Singh Machine Learning 10-601 Nov 10, 2011 Slides Courtesy: Tom Mitchell, Eric Xing, Lawrence Saul 1 Principal Component Analysis (PCA) Principal Components are the

Dimensionality Reduc1on Lecture 23 David Sontag New York University Slides adapted from Carlos

Dimensionality Reduc1on Lecture 23 David Sontag New York University Slides adapted from Carlos

Investigating Dimensionality Dimensionality Dimensionality with with Investigating

STAT 209 Dimensionality Reduction November 26, 2019 Colin Reimer Dawson 1 / 24 Dimensionality

Dimensionality Reduc1on Lecture 9 David Sontag New York

Dimensionality Reduc1on Lecture 23 David Sontag New York

Dimensionality Reduc1on Machine Learning 10-601B Seyoung Kim

Dimensionality Reduction Alexandros Tantos Assistant Professor Aristotle University of

classifier Sutanu Gayen Drawbacks of state-of-the art chess engines Contd.. Rule of square:

CS70: Lecture 35. Regression (contd.): Linear and Beyond CS70: Lecture 35. Regression (contd.):

BrickNet (contd) BrickNet (contd) Other Academic Projects Other Academic Projects

Massachuse(s)Toxics)Use)Reduc1on)Act) (TURA):)Reducing)the)Use)of)Carcinogens) Rachel'Massey'

Kernel-Based Dimensionality Reduction Methods on Synthesized and Facial Image Data Jonathan L.

WIKIPEDIA ARTICLE GROUP 9 Contents Article Overview 1. Dimensionality Reduction 2.

Dimensionality Reduction INFO-4604, Applied Machine Learning University of Colorado Boulder

Estimation of Intrinsic Dimensionality Using High-Rate Vector Quantization Maxim Raginsky and

Genus 3 curves with nontrivial multiplications: Questions Jerome William Hoffman Louisiana State

1 Limitations of Polygonal Meshes Planar facets (&amp; silhouettes) Fixed resolution

Reading Recommended: ! Stollnitz, DeRose, and Salesin. Wavelets for Computer Graphics: Theory

A Chabauty-Coleman bound for surfaces in abelian threefolds Hector Pasten Pontificia Universidad

Smooth models for Suzuki and Ree Curves Abdulla Eid RICAM Workshop Algebraic curves over finite

On the Zeta Function of Curves over Finite Fields Nurdag ul Anbar (joint work with Henning

SCATTERING ALONG A CURVE IN THE PLAIN J. DITTRICH NUCLEAR PHYSICS INSTITUTE CAS, RE Z,

Testing Part 2 1 Three Important Testing Questions How shall we generate/select test

1 Limitations of Polygonal Meshes Planar facets (& silhouettes) Fixed resolution