Dimensionality Reduc1on Lecture 23 David Sontag New York - PowerPoint PPT Presentation

Dimensionality ¡Reduc1on ¡ Lecture ¡23 ¡ David ¡Sontag ¡ New ¡York ¡University ¡ Slides adapted from Carlos Guestrin and Luke Zettlemoyer

Assignments ¡ • Last ¡homework ¡assignment ¡released ¡tonight, ¡ due ¡next ¡Thursday ¡(Dec. ¡5) ¡ • Final ¡project ¡write-‑up ¡due ¡December ¡15 ¡ • 10 ¡minute ¡presentaJons ¡(1 ¡per ¡group) ¡ – Part ¡of ¡your ¡grade ¡ – During ¡final ¡exam ¡period, ¡Dec. ¡17, ¡10-‑11:50am ¡ • I ¡need ¡4 ¡groups ¡to ¡volunteer ¡to ¡give ¡their ¡ presentaJon ¡on ¡Dec. ¡12 ¡ 2

Dimensionality ¡reducJon ¡ • Input ¡data ¡may ¡have ¡thousands ¡or ¡millions ¡of ¡ dimensions! ¡ – e.g., ¡text ¡data ¡has ¡???, ¡images ¡have ¡??? ¡ ¡ • Dimensionality ¡reduc1on : ¡represent ¡data ¡with ¡ fewer ¡dimensions ¡ – easier ¡learning ¡– ¡fewer ¡parameters ¡ – visualizaJon ¡– ¡show ¡high ¡dimensional ¡data ¡in ¡2D ¡ – discover ¡“intrinsic ¡dimensionality” ¡of ¡data ¡ • high ¡dimensional ¡data ¡that ¡is ¡truly ¡lower ¡dimensional ¡ ¡ • noise ¡reducJon ¡

!"#$%&"'%()$*+,-"'% � .&&+#/-"'%0(*1-1(21//)'3"#1-$456(4"$&('%( 1(4'7$)(*"#$%&"'%14(&/1,$ � 831#/4$&0 Slide from Yi Zhang

Lower ¡dimensional ¡projecJons ¡ • Rather ¡than ¡picking ¡a ¡subset ¡of ¡the ¡features, ¡we ¡can ¡ obtain ¡new ¡ones ¡by ¡combining ¡exisJng ¡features ¡x 1 ¡… ¡x n ¡ z 1 = w (1) w (1) ⌥ + x i ⌥ 0 i … i z k = w ( k ) w ( k ) ⌥ + x i 0 i i • New ¡features ¡are ¡linear ¡combinaJons ¡of ¡old ¡ones ¡ • Reduces ¡dimension ¡when ¡k<n ¡ • Let’s ¡consider ¡how ¡to ¡do ¡this ¡in ¡the ¡unsupervised ¡ se]ng ¡ ¡ – just ¡ X , ¡but ¡no ¡Y ¡

Which ¡projecJon ¡is ¡be_er? ¡ From notes by Andrew Ng

Reminder: ¡Vector ¡ProjecJons ¡ • Basic ¡definiJons: ¡ – A.B ¡= ¡|A||B|cos ¡θ ¡ – cos ¡θ ¡= ¡|adj|/|hyp| ¡ ¡ • Assume ¡|B|=1 ¡(unit ¡vector) ¡ – A.B ¡= ¡|A|cos ¡θ ¡ – So, ¡dot ¡product ¡is ¡length ¡of ¡ projecJon!!! ¡

Using ¡a ¡new ¡basis ¡for ¡the ¡data ¡ • Project ¡a ¡point ¡into ¡a ¡(lower ¡dimensional) ¡space: ¡ – point : ¡ x ¡ = ¡(x 1 ,…,x n ) ¡ ¡ – select ¡a ¡basis ¡– ¡set ¡of ¡unit ¡(length ¡1) ¡basis ¡vectors ¡ ( u 1 ,…, u k ) ¡ • we ¡consider ¡orthonormal ¡basis: ¡ ¡ – u i • u i =1, ¡and ¡ u i • u j =0 ¡for ¡i ≠ j ¡ – select ¡a ¡center ¡– ¡ x , ¡defines ¡offset ¡of ¡space ¡ ¡ – best ¡coordinates ¡ in ¡lower ¡dimensional ¡space ¡ defined ¡by ¡dot-‑products: ¡(z 1 ,…,z k ), ¡z j ¡= ¡( x -‑ x ) • u j ¡

Maximize ¡variance ¡of ¡projecJon ¡ Let x (i) be the i th data point minus the mean. Choose unit-length u to maximize: m m 1 1 ( x ( i ) T u ) 2 u T x ( i ) x ( i ) T u � � = m m i =1 i =1 � � m 1 x ( i ) x ( i ) T � u T = u. m i =1 Let ||u||=1 and maximize. Using the method of Lagrange multipliers, can show that the solution is given by the principal eigenvector of the covariance matrix! (shown on board)

Basic ¡PCA ¡algorithm ¡ • Start ¡from ¡m ¡by ¡n ¡data ¡matrix ¡ X ¡ • Recenter : ¡subtract ¡mean ¡from ¡each ¡row ¡of ¡ X ¡ – X c ¡ ← ¡X ¡– ¡X ¡ • Compute ¡covariance ¡ matrix: ¡ – ¡ Σ ¡ ← ¡ 1/m ¡X c T ¡X c ¡ • Find ¡ eigen ¡vectors ¡and ¡values ¡ of ¡ Σ ¡ ¡ • Principal ¡components: ¡k ¡eigen ¡vectors ¡with ¡ highest ¡eigen ¡values ¡

PCA ¡example ¡ Data: Projection: Reconstruction:

Dimensionality ¡reducJon ¡with ¡PCA ¡ In high-dimensional problem, data usually lies near a linear subspace, as noise introduces small variability Only keep data projections onto principal components with large eigenvalues Can ignore the components of lesser significance. 25 20 Variance (%) 15 10 5 0 PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10 You might lose some information, but if the eigenvalues �� much 23 Slide from Aarti Singh

Eigenfaces ¡ [Turk, ¡Pentland ¡’91] ¡ • Input ¡images: ¡  Principal components:

Eigenfaces ¡reconstrucJon ¡ • Each ¡image ¡corresponds ¡to ¡adding ¡together ¡ (weighted ¡versions ¡of) ¡the ¡principal ¡ components: ¡

Scaling ¡up ¡ • Covariance ¡matrix ¡can ¡be ¡really ¡big! ¡ – ¡ Σ ¡is ¡n ¡by ¡n ¡ – 10000 ¡features ¡can ¡be ¡common! ¡ ¡ – finding ¡eigenvectors ¡is ¡very ¡slow… ¡ • Use ¡singular ¡value ¡decomposiJon ¡(SVD) ¡ – Finds ¡k ¡eigenvectors ¡ – great ¡implementaJons ¡available, ¡e.g., ¡Matlab ¡svd ¡

SVD ¡ • Write ¡ X ¡= ¡W ¡S ¡V T ¡ – X ¡ ← ¡data ¡matrix, ¡one ¡row ¡per ¡datapoint ¡ – W ¡ ← ¡weight ¡matrix, ¡one ¡row ¡per ¡datapoint ¡– ¡ coordinate ¡of ¡ x i ¡in ¡eigenspace ¡ ¡ – S ¡ ← ¡singular ¡value ¡matrix, ¡diagonal ¡matrix ¡ • in ¡our ¡se]ng ¡each ¡entry ¡is ¡eigenvalue ¡ λ j ¡ – V T ¡ ← ¡singular ¡vector ¡matrix ¡ • in ¡our ¡se]ng ¡each ¡row ¡is ¡eigenvector ¡ v j ¡

PCA ¡using ¡SVD ¡algorithm ¡ • Start ¡from ¡m ¡by ¡n ¡data ¡matrix ¡ X ¡ • Recenter : ¡subtract ¡mean ¡from ¡each ¡row ¡of ¡ X ¡ – X c ¡ ← ¡X ¡– ¡X ¡ • Call ¡SVD ¡ algorithm ¡on ¡ X c ¡– ¡ask ¡for ¡k ¡singular ¡ vectors ¡ • Principal ¡components: ¡k ¡singular ¡vectors ¡with ¡ highest ¡singular ¡values ¡(rows ¡of ¡ V T ) ¡ – Coefficients: ¡ project ¡each ¡point ¡onto ¡the ¡new ¡vectors ¡

Dimensionality Reduc1on Lecture 23 David Sontag New York - PowerPoint PPT Presentation

Dimensionality Reduc1on Lecture 23 David Sontag New York University Slides adapted from Carlos Guestrin and Luke Zettlemoyer Assignments Last homework assignment released tonight,

Dimensionality Reduc1on Lecture 23 David Sontag New York University Slides adapted from Carlos

Dimensionality Reduc1on Lecture 23 David Sontag New York University Slides adapted from Carlos

Investigating Dimensionality Dimensionality Dimensionality with with Investigating

STAT 209 Dimensionality Reduction November 26, 2019 Colin Reimer Dawson 1 / 24 Dimensionality

Dimensionality Reduc1on Lecture 9 David Sontag New York

Dimensionality Reduc1on contd Aarti Singh Machine Learning 10-601 Nov 10,

Dimensionality Reduc1on Machine Learning 10-601B Seyoung Kim

Dimensionality Reduction Alexandros Tantos Assistant Professor Aristotle University of

Massachuse(s)Toxics)Use)Reduc1on)Act) (TURA):)Reducing)the)Use)of)Carcinogens) Rachel'Massey'

Kernel-Based Dimensionality Reduction Methods on Synthesized and Facial Image Data Jonathan L.

WIKIPEDIA ARTICLE GROUP 9 Contents Article Overview 1. Dimensionality Reduction 2.

Dimensionality Reduction INFO-4604, Applied Machine Learning University of Colorado Boulder

Estimation of Intrinsic Dimensionality Using High-Rate Vector Quantization Maxim Raginsky and

Nonlinear Dimensionality Reduction Donovan Parks Overview Direct visualization vs.

Dimensionality Reduction Algorithms (and how to interpret their output) Dalya Baron (Tel Aviv

Exploring Multivariate Data with Clustering and Dimensionality Reduction Marco Baroni Practical

Housekeeping Welcome to today s ACM Webinar. The presentation starts at the top of the

Linear Regression via Normal Equations some material thanks to Andrew Ng @Stanford Course Map /

Lattice QCD thermodynamics Kalman Szabo Bergische Universitat, Wuppertal Wuppertal-Budapest

Auxiliary field approach to extended operators for quasi-PDFs Jeremy Green in collaboration with

RESTRICTED BOLTZMANN MACHINES DANIEL KOHLSDORF LAST LECTURE: DEEP AUTO ENCODERS Directed Model

Hidden Markov Models Matt Gormley Lecture 19 Nov. 5, 2018 1 Reminders Homework 6: PAC

General Session NYSLRS Retirement Online Employer Workshop Presented by: New York State &

x86 Memory Protec.on and Transla.on Don Porter 1 CSE 506: Opera.ng Systems Logical Diagram

Dimensionality Reduc1on Lecture 23 David Sontag New York - PowerPoint PPT Presentation

Dimensionality Reduc1on Lecture 23 David Sontag New York University Slides adapted from Carlos Guestrin and Luke Zettlemoyer Assignments Last homework assignment released tonight,

Dimensionality Reduc1on Lecture 23 David Sontag New York University Slides adapted from Carlos

Dimensionality Reduc1on Lecture 23 David Sontag New York University Slides adapted from Carlos

Investigating Dimensionality Dimensionality Dimensionality with with Investigating

STAT 209 Dimensionality Reduction November 26, 2019 Colin Reimer Dawson 1 / 24 Dimensionality

Dimensionality Reduc1on Lecture 9 David Sontag New York

Dimensionality Reduc1on contd Aarti Singh Machine Learning 10-601 Nov 10,

Dimensionality Reduc1on Machine Learning 10-601B Seyoung Kim

Dimensionality Reduction Alexandros Tantos Assistant Professor Aristotle University of

Massachuse(s)Toxics)Use)Reduc1on)Act) (TURA):)Reducing)the)Use)of)Carcinogens) Rachel'Massey'

Kernel-Based Dimensionality Reduction Methods on Synthesized and Facial Image Data Jonathan L.

WIKIPEDIA ARTICLE GROUP 9 Contents Article Overview 1. Dimensionality Reduction 2.

Dimensionality Reduction INFO-4604, Applied Machine Learning University of Colorado Boulder

Estimation of Intrinsic Dimensionality Using High-Rate Vector Quantization Maxim Raginsky and

Nonlinear Dimensionality Reduction Donovan Parks Overview Direct visualization vs.

Dimensionality Reduction Algorithms (and how to interpret their output) Dalya Baron (Tel Aviv

Exploring Multivariate Data with Clustering and Dimensionality Reduction Marco Baroni Practical

Housekeeping Welcome to today s ACM Webinar. The presentation starts at the top of the

Linear Regression via Normal Equations some material thanks to Andrew Ng @Stanford Course Map /

Lattice QCD thermodynamics Kalman Szabo Bergische Universitat, Wuppertal Wuppertal-Budapest

Auxiliary field approach to extended operators for quasi-PDFs Jeremy Green in collaboration with

RESTRICTED BOLTZMANN MACHINES DANIEL KOHLSDORF LAST LECTURE: DEEP AUTO ENCODERS Directed Model

Hidden Markov Models Matt Gormley Lecture 19 Nov. 5, 2018 1 Reminders Homework 6: PAC

General Session NYSLRS Retirement Online Employer Workshop Presented by: New York State &amp;

x86 Memory Protec.on and Transla.on Don Porter 1 CSE 506: Opera.ng Systems Logical Diagram

General Session NYSLRS Retirement Online Employer Workshop Presented by: New York State &