The Variational Nystrm Method for Large-Scale Spectral Problems Max - PowerPoint PPT Presentation

The Variational Nyström Method for Large-Scale Spectral Problems Max Vladymyrov Miguel Carreira-Perpiñán Google Inc. EECS, UC Merced June 20, 2016

Graph based dimensionality reduction methods Given high-dimensional data points . Y D × N = ( y 1 , . . . , y N ) 1.Convert data points to a affinity matrix . M N × N 2.Find low-dimensional coordinates , so X d × N = ( x 1 , . . . , x N ) that their similarity is as close as possible to . M High-dimensional Low-dimensional Affinity M input output Y X 100 80 60 40 R D 20 R d 20 40 60 80 100 2

Spectral methods • Consider a spectral problem: XX T = I , � XMX T � min X tr s.t. ‣ : symmetric psd affinity matrix. M N × N • Examples: ‣ Laplacian eigenmaps, is a graph Laplacian. M ‣ ISOMAP , is given by a matrix of shortest distances. M ‣ Kernel PCA, MDS, Locally Linear Embedding (LLE), etc. • Solution is unique and can be found in closed form from the eigenvectors of : . X = U T M M With large , solving the eigenproblem is infeasible even if N M is sparse. 3

Learning with landmarks Goal is find a fast, approximate solution for the embedding X using only the subset of the original points from . Y Select landmarks Compute reduced L Learn landmark Project the rest (e.g. random subset) . affinity matrix representation of the points L × L 0.5 5 0.4 0.3 10 0.2 R D 15 0.1 R d R d 20 5 10 15 20 4

Nyström method Writing the affinity matrix by blocks (landmarks first): M B T A A 21 M = C = B 21 B 21 B 22 The approximation to the eigendecomposition is equal to: ✓ ◆ U A e U M = B 21 U A Λ − 1 A Essentially, an out-of-sample formula: 1. Solve the eigenproblem for a subset of points. 2. Predict the rest of the points through the interpolation formula. 5

Column Sampling method Writing the affinity matrix by blocks (landmarks first): M B T A A 21 M = C = B 21 B 21 B 22 The approximation to the eigendecomposition is given by the left singular vectors of : C e C = U C Σ C V T U M = U C ⇒ C Uses more information from the affinity matrix than Nyström, but M still ignores non-landmark/non-landmark interaction part . B 22 6

Locally Linear Landmarks (LLL) (Vladymyrov & Carreira-Perpiñán, 2013) • Construct the local linear projection matrix from the input : Z Y y n ≈ P L Y ≈ e YZ T l =1 z ln e y l , n = 1 , . . . , N ⇒ • Additional assumption: this projection is satisfied in the X = e embedding space: . XZ T • Plugging the projection to the original obj. function: � XMX T � XX T = I , X = e XZ T min X tr s.t. ⇒ ⇣ X T ⌘ XZ T MZ e XZ T Z e X T = I e e min e X tr s.t. • The solution is given by the reduced generalized eigenproblem: X = eig( ZMZ T , ZZ T ) e • Final embedding are predicted as: . X = e XZ T • This solution is optimal given the constraint . X = e XZ T 7

Generalizing approximations Nyström: Expand the upper part: N × L ✓ ◆ ✓ AU A Λ − 1 ◆ } U A e = CU A Λ − 1 A U M = = B 21 U A Λ − 1 B 21 U A Λ − 1 A } A A L × d Column Sampling: C T C Rewrite using the eigendecomposition of matrix : L × L C = CU C T C Λ − 1 / 2 e U M = U C = CV C Σ − 1 C T C LLL: U M = Z e e X = e e X T XZ T Rewrite the solution as , where is X computed optimally (given ) as: Z X = eig( ZMZ T , ZZ T ) e 8

Generalizing approximations Nyström: 1. Solve the smaller eigendecomposition: L × L A = U A Λ A U T A 2. Apply out-of-sample matrix: N × L e U M = CU A Λ − 1 A Column Sampling: 1. Solve the smaller eigendecomposition: L × L C T C = U C T C Λ C T C U C T C 2. Apply out-of-sample matrix: N × L U M = CU C T C Λ − 1 / 2 e C T C LLL: 1. Solve the smaller eigendecomposition: L × L X = eig( ZMZ T , ZZ T ) e 2. Apply out-of-sample matrix: N × L U M = Z e e X T 9

Generalizing approximations Each approximation consist of the following steps: • define an out-of-sample matrix , Z N × L • compute some reduced eigenproblem and a matrix that Q L × d depends on it, e • final approximation is equal to . U M = ZQ Eigenproblem A U = B U Λ Q L × d Z N × L A , B U Λ − 1 A , I C Nyström Z T Z , I U Λ − 1 / 2 C Column Sampling ZMZ T , Z T Z Y ≈ e computed U YZ LLL from ZMZ T , Z T Z qr( M q S ) U Random Projection 10

Generalizing approximations Each approximation consist of the following steps: • define an out-of-sample matrix , Z N × L • compute some reduced eigenproblem and a matrix that Q L × d depends on it, e • final approximation is equal to . U M = ZQ Eigenproblem A U = B U Λ Q L × d Z N × L A , B U Λ − 1 A , I C Nyström Z T Z , I U Λ − 1 / 2 C Column Sampling ZMZ T , Z T Z Y ≈ e U computed YZ LLL from ZMZ T , Z T Z qr( M q S ) U Random Projection ZMZ T , ZZ T U C Variational Nyström 11

Variational Nyström Add this Nyström out-of-sample constraint to the spectral problem: � XMX T � XX T = I , X = e XC T min X tr s.t. ⇒ ⇣ X T ⌘ XC T MC e XC T C e X T = I e e min e X tr s.t. From LLL perspective: • replace customary built out-of-sample matrix with a readily Z available column matrix , C • abandon local linearity assumption of the weights , Z • save computation of , Z • is usually sparser than (due to locality). C Z 12

Variational Nyström Add this Nyström out-of-sample constraint to the spectral problem: � XMX T � XX T = I , X = e XC T min X tr s.t. ⇒ ⇣ X T ⌘ XC T MC e XC T C e X T = I e e min e X tr s.t. From Nyström perspective: • use the same out-of-sample matrix , but optimize the choice of C the reduced eigenproblem, • for fixed gives better approx. than Nyström or Column e Y Sampling ( optimal for the out-of-sample kernel ). C • uses all the elements from to construct the reduced M eigenproblem, • forgo the interpolating property of Nyström. 13

Subsampling graph Laplacian • Consider given by normalized graph Laplacian matrix: M L ∝ D − 1 / 2 WD − 1 / 2 - Gaussian affinity matrix: w nm = exp( �k y 2 n � y 2 m k / 2 σ 2 ) D = diag ( P N - Degree matrix: m =1 w nm ) • One of the most widely used kernel (Laplacian Eigenmaps, spectral clustering). • Graph Laplacian kernel is a data dependent : subset of graph Laplacian graph Laplacian computed for a subset L × L 6 = constructed for points. of input points L N L × L N × N 14

Subsampling graph Laplacian • Data dependance can be a problem for methods that depend on the subsampling: - Nyström, - Column Sampling, - Variational Nyström. • Not a problem methods for which there is no subsampling: - LLL, - Random projection. Our solution: normalize subsample kernel separately, but in a way that interpolates over the landmarks and gives exact solution when : L = N D 1 D − 1 / 2 D − 1 / 2 D 2 C M L → N 15

Subsampling graph Laplacian D 1 D − 1 / 2 D − 1 / 2 D 2 C M L → N • For Nyström and Column Sampling: • we propose different forms for and , D 1 D 2 • we evaluate empirically which one is the best. • For Variational Nyström: • we showed that factors out, D 2 • any leads to the exact solution when . L = N D 1 For the graph Laplacian kernel, the Variational Nyström approximation is more general. 16

Experiments: Laplacian eigenmaps • Reduce dimensionality of digits from MNIST . N = 20 000 d = 10 • Run 5 times for different randomly chosen landmarks from L = 11 to . L = 19 900 0 10 Error with respect to the exact − 1 10 Nys objfun − 2 CS 10 LLL oNys − 3 10 Halko(q=1) Halko(q=2) Halko(q=3) − 4 10 2 3 4 10 10 10 L Number of landmarks 17

Experiments: Laplacian eigenmaps • Reduce dimensionality of digits from MNIST . N = 20 000 d = 10 • Run 5 times for different randomly chosen landmarks from L = 11 to . L = 19 900 4 10 3 10 Runtime time 2 10 1 10 0 10 2 3 4 10 10 10 L Number of landmarks 18

Experiments: Laplacian eigenmaps • Reduce dimensionality of digits from MNIST . N = 20 000 d = 10 • Run 5 times for different randomly chosen landmarks from L = 11 to . L = 19 900 0 10 Error with respect to the exact − 1 10 objfun − 2 10 − 3 10 − 4 10 0 1 2 3 4 10 10 10 10 10 Runtime time 19

Experiments: Laplacian eigenmaps • Reduce dimensionality of digits from MNIST . N = 20 000 d = 10 • Run 5 times for different randomly chosen landmarks from L = 11 to . L = 19 900 0 10 Error with respect to the exact − 1 10 objfun − 2 10 Variational Nyström − 3 10 is winning! 2x as fast as LLL! − 4 10 0 1 2 3 4 10 10 10 10 10 Runtime time 19

Experiments: Spectral clustering Original image Exact Spectral clustering, t = 512 s Variational Nyström, Nyström, t = 25 s t = 25 s 20x speedup! 20

The Variational Nystrm Method for Large-Scale Spectral Problems Max - PowerPoint PPT Presentation

The Variational Nystrm Method for Large-Scale Spectral Problems Max Vladymyrov Miguel Carreira-Perpin Google Inc. EECS, UC Merced June 20, 2016 Graph based dimensionality reduction methods Given high-dimensional data points

Variational Auto-encoders 2 VARIATIONAL AUTO-ENCODERS INTRODUCTION VARIATIONAL AUTO-ENCODERS

Spectral Clustering Spectral Clustering? Spectral methods Methods using eigenvectors of

An Introduction to An Introduction to Variational Variational Methods for Graphical Models

A large-scale International IPv6 Network A large-scale International IPv6 Network www.6net.org

Less is More: Nystr om Computational Regularization Alessandro Rudi , Raffaello Camoriano,

An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 An Introduction to Spectral

FINANCING LARGE SCALE SOLAR Large Scale Solar Conference - Sydney Gloria Chan Director, Large

Deep Variational Inference FLARE Reading Group Presentation Wesley Tansey 9/28/2016 What is

Variational Inference for GPs: Presenters Group1: Stochastic variational inference. Slides 2 - 28

Rejection Sampling Variational Inference Karan Grewal CSC2547 / STA4273 Overview Variational

Lesson 9 Introduction Signal Spectral Analysis: Estimation of the power spectral density

Draft EE 8235: Lectures 17 & 18 1 Lectures 17 & 18: Numerical methods Spectral

Convergence of spectral measures and eigenvalue rigidity Elizabeth Meckes Case Western Reserve

The Scientific Method The Scientific Method The Scientific Method involves 6 steps: Problem

10Hz Spectral Lines Joschua Dilly 10Hz Spectral Lines 2 Introduction Ions 50cm Protons 30cm

AIRS In-flight Spectral Calibration Steve Gaiser 1 Steve Gaiser, AIRS in-orbit spectral

Future of Cultural Heritage Impact of external developments European Foresight Platform Workshop

The programme has evolved its approach to engagement and evidence gathering in a phased approach

Welcome! Leveraging Physician Leadership in Your Organization Monthly Webinar Series August 9,

The ShanghAI Lectures An experiment in global teaching Fabio Bonsignorio The BioRobotics

Negotiating a safer climate: where next for the UNFCCC? Andrew Prag (OECD) WPCID 4 th meeting 11

Earnings Release Earnings Release 3 rd Quarter 2006 3 rd Quarter 2006 Weyerhaeuser

Debris from dwarf satellites in the Auriga simulations 0 . 0 All sources Random Sub-sample E

WEYERHAEUSER EARNINGS RESULTS | 2nd Quarter 2016 | August 5, 2016 FORWARD-LOOKING

The Variational Nystrm Method for Large-Scale Spectral Problems Max - PowerPoint PPT Presentation

The Variational Nystrm Method for Large-Scale Spectral Problems Max Vladymyrov Miguel Carreira-Perpin Google Inc. EECS, UC Merced June 20, 2016 Graph based dimensionality reduction methods Given high-dimensional data points

Variational Auto-encoders 2 VARIATIONAL AUTO-ENCODERS INTRODUCTION VARIATIONAL AUTO-ENCODERS

Spectral Clustering Spectral Clustering? Spectral methods Methods using eigenvectors of

An Introduction to An Introduction to Variational Variational Methods for Graphical Models

A large-scale International IPv6 Network A large-scale International IPv6 Network www.6net.org

Less is More: Nystr om Computational Regularization Alessandro Rudi , Raffaello Camoriano,

An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 An Introduction to Spectral

FINANCING LARGE SCALE SOLAR Large Scale Solar Conference - Sydney Gloria Chan Director, Large

Deep Variational Inference FLARE Reading Group Presentation Wesley Tansey 9/28/2016 What is

Variational Inference for GPs: Presenters Group1: Stochastic variational inference. Slides 2 - 28

Rejection Sampling Variational Inference Karan Grewal CSC2547 / STA4273 Overview Variational

Lesson 9 Introduction Signal Spectral Analysis: Estimation of the power spectral density

Draft EE 8235: Lectures 17 &amp; 18 1 Lectures 17 &amp; 18: Numerical methods Spectral

Convergence of spectral measures and eigenvalue rigidity Elizabeth Meckes Case Western Reserve

The Scientific Method The Scientific Method The Scientific Method involves 6 steps: Problem

10Hz Spectral Lines Joschua Dilly 10Hz Spectral Lines 2 Introduction Ions 50cm Protons 30cm

AIRS In-flight Spectral Calibration Steve Gaiser 1 Steve Gaiser, AIRS in-orbit spectral

Future of Cultural Heritage Impact of external developments European Foresight Platform Workshop

The programme has evolved its approach to engagement and evidence gathering in a phased approach

Welcome! Leveraging Physician Leadership in Your Organization Monthly Webinar Series August 9,

The ShanghAI Lectures An experiment in global teaching Fabio Bonsignorio The BioRobotics

Negotiating a safer climate: where next for the UNFCCC? Andrew Prag (OECD) WPCID 4 th meeting 11

Earnings Release Earnings Release 3 rd Quarter 2006 3 rd Quarter 2006 Weyerhaeuser

Debris from dwarf satellites in the Auriga simulations 0 . 0 All sources Random Sub-sample E

WEYERHAEUSER EARNINGS RESULTS | 2nd Quarter 2016 | August 5, 2016 FORWARD-LOOKING

Draft EE 8235: Lectures 17 & 18 1 Lectures 17 & 18: Numerical methods Spectral