Online Principal Component Analysis Edo Liberty . . . . . . . - PowerPoint PPT Presentation

Online Principal Component Analysis Edo Liberty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

PCA Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

F min t min PCA Objective Given X ∈ R d × n and k < d minimize over Y ∈ R k × n Φ ∥ X − Φ Y ∥ 2 ∑ Φ ∥ x t − Φ y t ∥ 2 or Think of X = [ x 1 , x 2 , . . . ] and Y = [ y 1 , y 2 , . . . ] as collections of column vectors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Optimal Offline Solution Optimal Offline Solution Let U k span the top k left singular vectors of X . ■ Set Y = U T k X ■ Set Φ = U k ■ Computing U k is possible offline using the Singular Value Decomposition. ■ The optimal reconstruction Φ turns out to be an isometry. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

x t x T t Pass efficient PCA We can compute U k from XX T and XX T = ∑ t . This requires Θ( nd 2 ) time (potentially) and Θ( d 2 ) space. Approximating U k in one pass more efficiently is possible. [FKV04, DK03, Sar06, DMM08, DRVW06, RV07, WLRT08, CW09, Oli10, CW12, Lib13, GP14, GLPW15] Nevertheless, a second pass is required to map x t �→ y t = U T k x t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Online PCA Consider online clustering (e.g. [CCFM97, LSS14] ) or online facility location (e.g. [Mey01] ) The PCA algorithm must output y t before receiving x t +1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Online regression Note that this is non trivial even when d = 2 and k = 1 . For x 1 there aren’t many options... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Online regression Note that this is non trivial even when d = 2 and k = 1 . For x 2 this is already a non standard optimization problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Online regression Note that this is non trivial even when d = 2 and k = 1 . In general, the mapping x i �→ y i is not necessarily linear. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Online PCA, Possible Problem Definitions ■ Stochastic model: Bounds ∥ X − Φ Y ∥ 2 F assumes x t are i.i.d. from an unknown distribution. [OK85, ACS13, MCJ13, BDF13] ■ Regret minimization: Minimizes ∑ t ∥ x t − P t − 1 x t ∥ 2 . Commits to P t − 1 before observing x t . [WK06, NKW13] ■ Random projection: can guarantee online that ∥ ( X − ( XY + ) Y ∥ 2 F is small. [Sar06, CW09] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Online PCA Problem Definitions Definition of a ( c , ε ) -approximation algorithm for Online PCA Given X ∈ R d × n as vectors [ x 1 , x 2 . . . ] and k < d produce Y = [ y 1 , y 2 , . . . ] such that ■ y t is produced before observing x t +1 . ■ y t ∈ R ℓ and ℓ ≤ c · k . ■ ∥ X − Φ Y ∥ 2 F ≤ ∥ X − X k ∥ 2 F + ε ∥ X ∥ 2 F for some isometry Φ . Main Contribution [BGKL15] There exists a (˜ O ( ε − 2 ) , ε ) -approximation algorithm for online PCA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Noisy Data Spectra Setting Y = 0 gives an (0 , ε ) approximation... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Noisy Data Spectra Sometimes, ”poor reconstruction error” is algorithmically required. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Online PCA Problem Definitions Setting Y = U T k X and Φ = U k minimizes ∥ X − Φ Y ∥ 2 2 Definition of a ( c , ε ) -approximation algorithm for Spectral Online PCA Given X ∈ R d × n as vectors [ x 1 , x 2 . . . ] and k < d produce Y = [ y 1 , y 2 , . . . ] such that ■ y t is produced before observing x t +1 . ■ y t ∈ R ℓ and ℓ ≤ c · k . ■ ∥ X − Φ Y ∥ 2 2 ≤ ∥ X − X k ∥ 2 2 + ε ∥ X ∥ 2 2 for some isometry Φ . Main Contribution [KL15] There exists a (˜ O ( ε − 2 ) , ε ) -approximation algorithm for Spectral Online PCA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Some Intuition The covariance matrix X T X visualized as an ellipse. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Some Intuition The optimal residual is R = X − X k . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Some Intuition Any residual R = X − Φ Y such that ∥ R T R ∥ ≤ σ 2 k +1 + εσ 2 1 would work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Bad Algorithm, Big Step Forward ∆ = σ 2 k +1 + εσ 2 1 U ← all zeros matrix for x t ∈ X do if ∥ ( I − UU T ) X 1: t ∥ 2 ≥ ∆ Add the top left singular vector of ( I − UU T ) X 1: t to U yield y t = U T x t Obvious problems with this algorithm (will be fixed later) ■ it must “guess” σ 2 k +1 + εσ 2 1 . ■ it stores the entire history X 1: t ■ it computes the top singular value of ( I − UU T ) X 1: t at every round . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Algorithm Intuition Assume we know ∆ = σ 2 k +1 + εσ 2 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Algorithm Intuition We start with mapping x t �→ 0 and R [1: t ] = X [1: t ] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Algorithm Intuition This is continued as long as ∥ R T R ∥ ≤ ∆ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Algorithm Intuition When ∥ R T R ∥ > ∆ we commit to a new online PCA direction u i . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Algorithm Intuition This prevents R T R from growing more in the direction u i . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Algorithm Properties Theorems 2,5 and 6 in [KL15] ∥ X − UY ∥ 2 2 ≤ ∥ R ∥ 2 2 ≤ σ 2 k + εσ 2 1 + o ( σ 2 1 ) . “Proof by drawing” above is deceivingly simple. This is the main difficulty! Theorem 1 in [KL15] Number of directions added by the algorithm is ℓ ≤ k / ε . i X ∥ 2 all added directions u 1 , . . . , u ℓ . We sum the inequality ∆ ≤ ∥ u ⊤ ℓ ℓ i X ∥ 2 = ∥ U ⊤ ∑ ∥ u ⊤ n X ∥ 2 ∑ σ 2 i ≤ k σ 2 1 + ( ℓ − k ) σ 2 ℓ ∆ ≤ F ≤ k +1 i =1 i =1 By rearranging we get: ℓ ≤ ( k σ 2 1 − k σ 2 k +1 )/(∆ − σ 2 k +1 ) Substituting ∆ = σ 2 k +1 + εσ 2 1 gives ℓ ≤ k / ε . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Fixing the Algorithm ■ Exponentially search for the right ∆ . If we added more than k / ε direction to U we can conclude that ∆ < σ 2 k +1 + εσ 2 1 . ■ Instead of keeping X 1: t use covariance sketching. Keep B such that XX T ∼ BB T and B required o ( d 2 ) to store. ■ Only compute the top singular value of ( I − UU T ) X 1: t “once in a while”. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Visual Ilustration and Open Problem ■ Can we reduce target dimension while keeping the approximation guaranty? ■ Would allowing scaled isometric registration help reduce the target dimension? ■ Can we avoid the exponential search for ∆ ? ■ Is there a simple way to update U that is more accurate than only adding columns? ■ Can we reduce the running time of online PCA? Currently the bottleneck is covariance sketching. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Thank you . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Online Principal Component Analysis Edo Liberty . . . . . . . - PowerPoint PPT Presentation

Online Principal Component Analysis Edo Liberty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . PCA Motivation . . . . . . . . . . . . . . . . . . . . .

Continuous Latent Variables Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 12 Principal Component

Section 1 Principal Component Analysis 1 / 16 Principal Component Analysis ST 810-006

Functional Principal Component Analysis May 14, 2018 Empirical Principal Component FPC for the

Principal Component Analysis Powerpoint Presentation What is multivariate analysis? Summarizing

Principal component analysis Ingo Blechschmidt December 17th, 2014 Kleine Bayessche AG

Functional components Notification component Application received Refuse ? Notification

WIO IOSAP Project Budget Nairobi Convention WIO IOSAP Budget per Project Component COMPONENT

Principal Component Analysis http://setosa.io/ev/principal- Food consumption in the UK

CS475/CS675 Lecture 23: July 19, 2016 Principal Component Analysis, Eigenfaces CS475/CS675 (c)

Dimensionality Reduction: Linear Discriminant Analysis and Principal Component Analysis CMSC 678

Introduction to Principal Component Analysis and Indepedent Component Analysis Tristan A. Hearn

Chapter 5 Singular value decomposition and principal component analysis In A Practical Approach to

Hebbian Learning, Hebbian Learning Principal Component Analysis, and Independent Component

Principal Component Analysis in a Linear Algebraic View by Anna Orosz under the mentorship of

Lecture 3 Principal Component Analysis Lin ZHANG, PhD School of Software Engineering Tongji

Component selection 1 (c) 2020 A.J.M. Montagne Component selection + - + - + - 2 (c)

Principal Component Analysis and Autoencoders Shuiwang Ji Department of Computer Science &

Structured sparse methods for matrix factorization Francis Bach Sierra team, INRIA - Ecole

Symbolic PCA of compositional data. Sun Makosso Kallyth & Edwin Diday Universit e Paris

Benefits and Pitfalls Utility Extensions of the Exponential Mechanism References with

Machine Learning 2 DS 4420 - Spring 2020 Dimensionality reduction I Byron C Wallace Machine

Principal Component Analysis: Why do we use fourier transformation to analyze flow? Ziming Liu

Dimensionality Reduction Algorithms (and how to interpret their output) Dalya Baron (Tel Aviv

Lecture 10: Point Clouds, Eigenvectors, PCA COMPSCI/MATH 290-04 Chris Tralie, Duke University

Online Principal Component Analysis Edo Liberty . . . . . . . - PowerPoint PPT Presentation

Online Principal Component Analysis Edo Liberty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . PCA Motivation . . . . . . . . . . . . . . . . . . . . .

Continuous Latent Variables Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 12 Principal Component

Section 1 Principal Component Analysis 1 / 16 Principal Component Analysis ST 810-006

Functional Principal Component Analysis May 14, 2018 Empirical Principal Component FPC for the

Principal Component Analysis Powerpoint Presentation What is multivariate analysis? Summarizing

Principal component analysis Ingo Blechschmidt December 17th, 2014 Kleine Bayessche AG

Functional components Notification component Application received Refuse ? Notification

WIO IOSAP Project Budget Nairobi Convention WIO IOSAP Budget per Project Component COMPONENT

Principal Component Analysis http://setosa.io/ev/principal- Food consumption in the UK

CS475/CS675 Lecture 23: July 19, 2016 Principal Component Analysis, Eigenfaces CS475/CS675 (c)

Dimensionality Reduction: Linear Discriminant Analysis and Principal Component Analysis CMSC 678

Introduction to Principal Component Analysis and Indepedent Component Analysis Tristan A. Hearn

Chapter 5 Singular value decomposition and principal component analysis In A Practical Approach to

Hebbian Learning, Hebbian Learning Principal Component Analysis, and Independent Component

Principal Component Analysis in a Linear Algebraic View by Anna Orosz under the mentorship of

Lecture 3 Principal Component Analysis Lin ZHANG, PhD School of Software Engineering Tongji

Component selection 1 (c) 2020 A.J.M. Montagne Component selection + - + - + - 2 (c)

Principal Component Analysis and Autoencoders Shuiwang Ji Department of Computer Science &amp;

Structured sparse methods for matrix factorization Francis Bach Sierra team, INRIA - Ecole

Symbolic PCA of compositional data. Sun Makosso Kallyth &amp; Edwin Diday Universit e Paris

Benefits and Pitfalls Utility Extensions of the Exponential Mechanism References with

Machine Learning 2 DS 4420 - Spring 2020 Dimensionality reduction I Byron C Wallace Machine

Principal Component Analysis: Why do we use fourier transformation to analyze flow? Ziming Liu

Dimensionality Reduction Algorithms (and how to interpret their output) Dalya Baron (Tel Aviv

Lecture 10: Point Clouds, Eigenvectors, PCA COMPSCI/MATH 290-04 Chris Tralie, Duke University

Principal Component Analysis and Autoencoders Shuiwang Ji Department of Computer Science &

Symbolic PCA of compositional data. Sun Makosso Kallyth & Edwin Diday Universit e Paris