Subspace Embeddings for Regression Lecture 12 October 1, 2020 - PowerPoint PPT Presentation

CS 498ABD: Algorithms for Big Data Subspace Embeddings for Regression Lecture 12 October 1, 2020 Chandra (UIUC) CS498ABD 1 Fall 2020 1 / 18

Subspace Embedding Question: Suppose we have linear subspace E of R n of dimension d . Can we find a projection Π : R d ! R k such that for every x 2 E , k Π x k 2 = (1 ± ✏ ) k x k 2 ? Not possible if k < d . Possible if k = ` . Pick Π to be an orthonormal basis for E . Disadvantage: This requires knowing E and computing orthonormal basis which is slow. What we really want: Oblivious subspace embedding ala JL based on random projections Chandra (UIUC) CS498ABD 2 Fall 2020 2 / 18

Oblivious Supspace Embedding Theorem Suppose E is a linear subspace of R n of dimension d . Let Π be a DJL matrix Π 2 R k ⇥ d with k = O ( d ✏ 2 log(1 / � )) rows. Then with probability (1 � � ) for every x 2 E , k 1 p Π x k 2 = (1 ± ✏ ) k x k 2 . k In other words JL Lemma extends from one dimension to arbitrary number of dimensions in a graceful way. Chandra (UIUC) CS498ABD 3 Fall 2020 3 / 18

Part I Faster algorithms via subspace embeddings Chandra (UIUC) CS498ABD 4 Fall 2020 4 / 18

Linear model fitting An important problem in data analysis n data points Each data point a i 2 R d and real value b i . We think of a i = ( a i , 1 , a i , 2 , . . . , a i , d ) . Interesting special case is when d = 1 . What model should one use to explain the data? Chandra (UIUC) CS498ABD 5 Fall 2020 5 / 18

Linear model fitting An important problem in data analysis n data points Each data point a i 2 R d and real value b i . We think of a i = ( a i , 1 , a i , 2 , . . . , a i , d ) . Interesting special case is when d = 1 . What model should one use to explain the data? Simplest model? A ffi ne fitting. b i = ↵ 0 + P d j =1 ↵ j a i , j for some real numbers ↵ 0 , ↵ 1 , . . . , ↵ d . Can restrict to ↵ 0 = 0 by lifting to d + 1 dimensions and hence linear model. Chandra (UIUC) CS498ABD 5 Fall 2020 5 / 18

Linear model fitting An important problem in data analysis n data points Each data point a i 2 R d and real value b i . We think of a i = ( a i , 1 , a i , 2 , . . . , a i , d ) . Interesting special case is when d = 1 . What model should one use to explain the data? Simplest model? A ffi ne fitting. b i = ↵ 0 + P d j =1 ↵ j a i , j for some real numbers ↵ 0 , ↵ 1 , . . . , ↵ d . Can restrict to ↵ 0 = 0 by lifting to d + 1 dimensions and hence linear model. But data is noisy so we won’t be able to satisfy all data points even if true model is a linear model. How do we find a good linear model? Chandra (UIUC) CS498ABD 5 Fall 2020 5 / 18

Regression ¥ n data points . Each data point a i 2 R d and real value b i . We think of Lait , F- bi a i = ( a i , 1 , a i , 2 , . . . , a i , d ) . Linear model fitting: Find real numbers ↵ 1 , . . . , ↵ d such that - b b i ' P d A x j =1 ↵ j a i , j for all points. - = Let A be matrix with one row per data point a i . We write x 1 , x 2 , . . . , x d as variables for finding ↵ 1 , . . . , ↵ d . Ideally: Find x 2 R d such that Ax = b Best fit: Find x 2 R d to minimize Ax � b under some norm. k Ax � b k 1 , k Ax � b k 2 , k Ax � b k 1 f- = = Chandra (UIUC) CS498ABD 6 Fall 2020 6 / 18

Linear least squares/Regression Linear least squares: Given A 2 R n ⇥ d and b 2 R d find x to minimize k Ax � b k 2 . Optimal estimator for certain noise models Interesting when n � d the over constrained case when there is no solution to Ax = b and want to find best fit. Wikipedia = Chandra (UIUC) CS498ABD 7 Fall 2020 7 / 18

Linear least squares/Regression Linear least squares: Given A 2 R n ⇥ d and b 2 R d find x to minimize k Ax � b k 2 . ( I = Interesting when n � d the over constrained case when there is no solution to Ax = b and want to find best fit. Geometrically Ax is a linear combination of columns of A . Hence we are asking what is the vector z in the column space of A that is closest to vector b in ` 2 norm. Closest vector to b is the projection of b into the column space of A so it is “obvious” geometrically. How do we find it? Chandra (UIUC) CS498ABD 8 Fall 2020 8 / 18

⇐ " " [ gains [ . . A x Fix x HA x # blk ' x. A' + KA a xD Ad = = = 02 ? x - of A the column b span in is . wi the column space b is Supper int . ? what is the am 11 Ax - BHT +3 = n xp Dfid Is s .tA " " " " Itis . . " A ' = Ah

Linear least squares/Regression Linear least squares: Given A 2 R n ⇥ d and b 2 R d find x to minimize k Ax � b k 2 . Geometrically Ax is a linear combination of columns of A . Hence we are asking what is the vector z in the column space of A that is closest to vector b in ` 2 norm. Closest vector to b is the projection of b into the column space of A so it is “obvious” geometrically. How do we find it? Find an orthonormal basis z 1 , z 2 , . . . , z r for the columns of A . Compute projection c of b to column space of A as c = P r j =1 h b , z j i z j and output answer as k b � c k 2 . expressing is obtained by What is x ? X Ax - = - c c Chandra (UIUC) CS498ABD 9 Fall 2020 9 / 18

Linear least squares/Regression Linear least squares: Given A 2 R n ⇥ d and b 2 R d find x to minimize k Ax � b k 2 . Geometrically Ax is a linear combination of columns of A . Hence we are asking what is the vector z in the column space of A that is closest to vector b in ` 2 norm. Closest vector to b is the projection of b into the column space of A so it is “obvious” geometrically. How do we find it? Find an orthonormal basis z 1 , z 2 , . . . , z r for the columns of A . Compute projection c of b to column space of A as c = P r j =1 h b , z j i z j and output answer as k b � c k 2 . What is x ? We know that Ax = c . Solve linear system. Can combine both steps via SVD and other methods. Chandra (UIUC) CS498ABD 9 Fall 2020 9 / 18

Linear least square: Optimization perspective Linear least squares: Given A 2 R n ⇥ d and b 2 R d find x to minimize k Ax � b k 2 . Optimization: Find x 2 R d to minimize k Ax � b k 2 2 = # k Ax � b k 2 2 = x T A T Ax � 2 b T Ax + b t b - K d . The quadratic function f ( x ) = x T A T Ax � 2 b T Ax + b t b is a convex function since the matrix A T A is positive semi-definite. r f ( x ) = 2 A T Ax � 2 b T A and hence optimum solution x ⇤ is given by x ⇤ = ( A T A ) � 1 b T A . ° # Chandra (UIUC) CS498ABD 10 Fall 2020 10 / 18

Computational perspective n large (number of data points), d smaller so A is tall and skinny. Exact solution requires SVD or other methods. Worst case time nd 2 . Can we speed up computation with some potential approximation? ' d - aid . E = DI EL + nd F Chandra (UIUC) CS498ABD 11 Fall 2020 11 / 18

Linear least squares via Subspace embeddings Let A (1) , A (2) , . . . , A ( d ) be the columns of A and let E be the subspace spanned by { A (1) , A (2) , . . . , A ( d ) , b } Note columns are in R n corresponding to n data points ITE Rk " ' ' ' j ' E has dimension at most d + 1 . EL Use subspace embedding on E . Applying JL matrix Π with k = O ( d ✏ 2 ) rows we reduce { A (1) , A (2) , . . . , A ( d ) , b } to - cdjftlugt 0 (1) , A 0 (2) , . . . , A 0 ( d ) , b 0 } which are vectors in R k . { A K - Solve min x 0 2 R d k A 0 x 0 � b 0 k 2 I Chandra (UIUC) CS498ABD 12 Fall 2020 12 / 18

" d " by ft - - k :# bust ) " kilt IT e R' " n . " ' s xx # - ft k -

Analysis Lemma With probability (1 � � ) , x 0 2 R d k A 0 x 0 � b 0 k 2  (1+ ✏ ) min (1 � ✏ ) min x 2 R d k Ax � b k  min x 2 R d k Ax � b k = Chandra (UIUC) CS498ABD 13 Fall 2020 13 / 18

Analysis Lemma With probability (1 � � ) , x 0 2 R d k A 0 x 0 � b 0 k 2  (1+ ✏ ) min (1 � ✏ ) min x 2 R d k Ax � b k  min x 2 R d k Ax � b k a ← With probability (1 � � ) via the subpsace embedding guarantee, for all z 2 E , (1 � ✏ ) k z k 2  k Π z k 2  (1 + ✏ ) k z k 2 Now prove two inequalities in lemma separately using above. Chandra (UIUC) CS498ABD 13 Fall 2020 13 / 18

Analysis Suppose x ⇤ is an optimum solution to min x k Ax � b k 2 . = Let z = Ax ⇤ � b . We have k Π z k 2  (1 + ✏ ) k z k 2 since z 2 E . , = Chandra (UIUC) CS498ABD 14 Fall 2020 14 / 18

Analysis Suppose x ⇤ is an optimum solution to min x k Ax � b k 2 . Let z = Ax ⇤ � b . We have k Π z k 2  (1 + ✏ ) k z k 2 since z 2 E . Since x ⇤ is a feasible solution to min x 0 k A 0 x 0 � b 0 k , I x 0 k A 0 x 0 � b 0 k 2  k A 0 x ⇤ � b 0 k 2 = k Π ( Ax ⇤ � b ) k 2  (1+ ✏ ) k Ax ⇤ � b k 2 min Hitz 'll Elite ) # EH = = =htEHAxEblL Chandra (UIUC) CS498ABD 14 Fall 2020 14 / 18

Analysis For any y 2 R d , k Π Ay � Π b k 2 � (1 � ✏ ) k Ay � b k 2 because Ay � b is a vector in E and Π preserves all of them. - tbh ay - b) HE HIT A y HIT zllz , Ci - e) Hell Chandra (UIUC) CS498ABD 15 Fall 2020 15 / 18

Analysis For any y 2 R d , k Π Ay � Π b k 2 � (1 � ✏ ) k Ay � b k 2 because Ay � b is a vector in E and Π preserves all of them. - o Let y ⇤ be optimum solution to min x 0 k A 0 x 0 � b 0 k 2 . Then k Π ( Ay ⇤ � b ) k 2 � (1 � ✏ ) k Ay ⇤ � b k 2 � (1 � ✏ ) k Ax ⇤ � b k 2 =p T Chandra (UIUC) CS498ABD 15 Fall 2020 15 / 18

Subspace Embeddings for Regression Lecture 12 October 1, 2020 - PowerPoint PPT Presentation

CS 498ABD: Algorithms for Big Data Subspace Embeddings for Regression Lecture 12 October 1, 2020 Chandra (UIUC) CS498ABD 1 Fall 2020 1 / 18 Subspace Embedding Question: Suppose we have linear subspace E of R n of dimension d . Can we find

Subspace Embeddings and p -Regression Using Exponential Random Variables David P. Woodruff

Subspace Polynomials and Cyclic Subspace Codes Netanel Raviv Joint work with: Prof. Tuvi Etzion

Embeddings @ Twitter Making ML easy with Embeddings !!! Sept 2018 Agenda 1 Team 2 Whats an

Graph based Subspace Segmentation Canyi Lu National University of Singapore Nov. 21, 2013

Word embeddings Rappel Embeddings ( pas Word Embeddings ) Est une lookup table Formalisme:

Word Embeddings Natural Language Processing VU (706.230) - Andi Rexha 02/04/2020 Word Embeddings

Word Embeddings Revisited: Contextual Embeddings CS 6956: Deep Learning for NLP Overview

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Mixed membership word embeddings: Corpus-specific embeddings without big data James Foulds

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Cyclic Subspace Codes Via Subspace Polynomials Kamil Otal and Ferruh zbudak Middle East

Subspace Modeling and Selection Subspace Modeling and Selection for Noisy Speech Recognition for

Subspace Information Criterion Subspace Information Criterion for Image Restoration for Image

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Instruction-Level Parallelism Dynamic Pipelines Dr. Soner Onder CS 4431 Michigan Technological

Airports of Thailand Plc. Corporate Presentation FY2008 (October 2007 September 2008)

Airports of Thailand Plc. Airports of Thailand Plc. Corporate Presentation Corporate

The geometry of black hole entropy John Dougherty UC San Diego March 13, 2015 John Dougherty

COMP 590-154: Computer Architecture Branch Prediction Fragmentation due to Branches Fetch

The importance of multiparticle collisions in heavy ion reactions C. Greiner The Physics of High

Variational approach to data assimilation: optimization aspects and adjoint method Eric Blayo

Healthwatch Bucks Update Recent reports Staying Safe, Staying Home:telecare services in