CS 498ABD: Algorithms for Big Data Subspace Embeddings for Regression Lecture 12 October 1, 2020 Chandra (UIUC) CS498ABD 1 Fall 2020 1 / 18
Subspace Embedding Question: Suppose we have linear subspace E of R n of dimension d . Can we find a projection Π : R d ! R k such that for every x 2 E , k Π x k 2 = (1 ± ✏ ) k x k 2 ? Not possible if k < d . Possible if k = ` . Pick Π to be an orthonormal basis for E . Disadvantage: This requires knowing E and computing orthonormal basis which is slow. What we really want: Oblivious subspace embedding ala JL based on random projections Chandra (UIUC) CS498ABD 2 Fall 2020 2 / 18
Oblivious Supspace Embedding Theorem Suppose E is a linear subspace of R n of dimension d . Let Π be a DJL matrix Π 2 R k ⇥ d with k = O ( d ✏ 2 log(1 / � )) rows. Then with probability (1 � � ) for every x 2 E , k 1 p Π x k 2 = (1 ± ✏ ) k x k 2 . k In other words JL Lemma extends from one dimension to arbitrary number of dimensions in a graceful way. Chandra (UIUC) CS498ABD 3 Fall 2020 3 / 18
Part I Faster algorithms via subspace embeddings Chandra (UIUC) CS498ABD 4 Fall 2020 4 / 18
Linear model fitting An important problem in data analysis n data points Each data point a i 2 R d and real value b i . We think of a i = ( a i , 1 , a i , 2 , . . . , a i , d ) . Interesting special case is when d = 1 . What model should one use to explain the data? Chandra (UIUC) CS498ABD 5 Fall 2020 5 / 18
Linear model fitting An important problem in data analysis n data points Each data point a i 2 R d and real value b i . We think of a i = ( a i , 1 , a i , 2 , . . . , a i , d ) . Interesting special case is when d = 1 . What model should one use to explain the data? Simplest model? A ffi ne fitting. b i = ↵ 0 + P d j =1 ↵ j a i , j for some real numbers ↵ 0 , ↵ 1 , . . . , ↵ d . Can restrict to ↵ 0 = 0 by lifting to d + 1 dimensions and hence linear model. Chandra (UIUC) CS498ABD 5 Fall 2020 5 / 18
Linear model fitting An important problem in data analysis n data points Each data point a i 2 R d and real value b i . We think of a i = ( a i , 1 , a i , 2 , . . . , a i , d ) . Interesting special case is when d = 1 . What model should one use to explain the data? Simplest model? A ffi ne fitting. b i = ↵ 0 + P d j =1 ↵ j a i , j for some real numbers ↵ 0 , ↵ 1 , . . . , ↵ d . Can restrict to ↵ 0 = 0 by lifting to d + 1 dimensions and hence linear model. But data is noisy so we won’t be able to satisfy all data points even if true model is a linear model. How do we find a good linear model? Chandra (UIUC) CS498ABD 5 Fall 2020 5 / 18
Regression ¥ n data points . Each data point a i 2 R d and real value b i . We think of Lait , F- bi a i = ( a i , 1 , a i , 2 , . . . , a i , d ) . Linear model fitting: Find real numbers ↵ 1 , . . . , ↵ d such that - b b i ' P d A x j =1 ↵ j a i , j for all points. - = Let A be matrix with one row per data point a i . We write x 1 , x 2 , . . . , x d as variables for finding ↵ 1 , . . . , ↵ d . Ideally: Find x 2 R d such that Ax = b Best fit: Find x 2 R d to minimize Ax � b under some norm. k Ax � b k 1 , k Ax � b k 2 , k Ax � b k 1 f- = = Chandra (UIUC) CS498ABD 6 Fall 2020 6 / 18
Linear least squares/Regression Linear least squares: Given A 2 R n ⇥ d and b 2 R d find x to minimize k Ax � b k 2 . Optimal estimator for certain noise models Interesting when n � d the over constrained case when there is no solution to Ax = b and want to find best fit. Wikipedia = Chandra (UIUC) CS498ABD 7 Fall 2020 7 / 18
Linear least squares/Regression Linear least squares: Given A 2 R n ⇥ d and b 2 R d find x to minimize k Ax � b k 2 . ( I = Interesting when n � d the over constrained case when there is no solution to Ax = b and want to find best fit. Geometrically Ax is a linear combination of columns of A . Hence we are asking what is the vector z in the column space of A that is closest to vector b in ` 2 norm. Closest vector to b is the projection of b into the column space of A so it is “obvious” geometrically. How do we find it? Chandra (UIUC) CS498ABD 8 Fall 2020 8 / 18
⇐ " " [ gains [ . . A x Fix x HA x # blk ' x. A' + KA a xD Ad = = = 02 ? x - of A the column b span in is . wi the column space b is Supper int . ? what is the am 11 Ax - BHT +3 = n xp Dfid Is s .tA " " " " Itis . . " A ' = Ah
Linear least squares/Regression Linear least squares: Given A 2 R n ⇥ d and b 2 R d find x to minimize k Ax � b k 2 . Geometrically Ax is a linear combination of columns of A . Hence we are asking what is the vector z in the column space of A that is closest to vector b in ` 2 norm. Closest vector to b is the projection of b into the column space of A so it is “obvious” geometrically. How do we find it? Find an orthonormal basis z 1 , z 2 , . . . , z r for the columns of A . Compute projection c of b to column space of A as c = P r j =1 h b , z j i z j and output answer as k b � c k 2 . expressing is obtained by What is x ? X Ax - = - c c Chandra (UIUC) CS498ABD 9 Fall 2020 9 / 18
Linear least squares/Regression Linear least squares: Given A 2 R n ⇥ d and b 2 R d find x to minimize k Ax � b k 2 . Geometrically Ax is a linear combination of columns of A . Hence we are asking what is the vector z in the column space of A that is closest to vector b in ` 2 norm. Closest vector to b is the projection of b into the column space of A so it is “obvious” geometrically. How do we find it? Find an orthonormal basis z 1 , z 2 , . . . , z r for the columns of A . Compute projection c of b to column space of A as c = P r j =1 h b , z j i z j and output answer as k b � c k 2 . What is x ? We know that Ax = c . Solve linear system. Can combine both steps via SVD and other methods. Chandra (UIUC) CS498ABD 9 Fall 2020 9 / 18
Linear least square: Optimization perspective Linear least squares: Given A 2 R n ⇥ d and b 2 R d find x to minimize k Ax � b k 2 . Optimization: Find x 2 R d to minimize k Ax � b k 2 2 = # k Ax � b k 2 2 = x T A T Ax � 2 b T Ax + b t b - K d . The quadratic function f ( x ) = x T A T Ax � 2 b T Ax + b t b is a convex function since the matrix A T A is positive semi-definite. r f ( x ) = 2 A T Ax � 2 b T A and hence optimum solution x ⇤ is given by x ⇤ = ( A T A ) � 1 b T A . ° # Chandra (UIUC) CS498ABD 10 Fall 2020 10 / 18
Computational perspective n large (number of data points), d smaller so A is tall and skinny. Exact solution requires SVD or other methods. Worst case time nd 2 . Can we speed up computation with some potential approximation? ' d - aid . E = DI EL + nd F Chandra (UIUC) CS498ABD 11 Fall 2020 11 / 18
Linear least squares via Subspace embeddings Let A (1) , A (2) , . . . , A ( d ) be the columns of A and let E be the subspace spanned by { A (1) , A (2) , . . . , A ( d ) , b } Note columns are in R n corresponding to n data points ITE Rk " ' ' ' j ' E has dimension at most d + 1 . EL Use subspace embedding on E . Applying JL matrix Π with k = O ( d ✏ 2 ) rows we reduce { A (1) , A (2) , . . . , A ( d ) , b } to - cdjftlugt 0 (1) , A 0 (2) , . . . , A 0 ( d ) , b 0 } which are vectors in R k . { A K - Solve min x 0 2 R d k A 0 x 0 � b 0 k 2 I Chandra (UIUC) CS498ABD 12 Fall 2020 12 / 18
" d " by ft - - k :# bust ) " kilt IT e R' " n . " ' s xx # - ft k -
Analysis Lemma With probability (1 � � ) , x 0 2 R d k A 0 x 0 � b 0 k 2 (1+ ✏ ) min (1 � ✏ ) min x 2 R d k Ax � b k min x 2 R d k Ax � b k = Chandra (UIUC) CS498ABD 13 Fall 2020 13 / 18
Analysis Lemma With probability (1 � � ) , x 0 2 R d k A 0 x 0 � b 0 k 2 (1+ ✏ ) min (1 � ✏ ) min x 2 R d k Ax � b k min x 2 R d k Ax � b k a ← With probability (1 � � ) via the subpsace embedding guarantee, for all z 2 E , (1 � ✏ ) k z k 2 k Π z k 2 (1 + ✏ ) k z k 2 Now prove two inequalities in lemma separately using above. Chandra (UIUC) CS498ABD 13 Fall 2020 13 / 18
Analysis Suppose x ⇤ is an optimum solution to min x k Ax � b k 2 . = Let z = Ax ⇤ � b . We have k Π z k 2 (1 + ✏ ) k z k 2 since z 2 E . , = Chandra (UIUC) CS498ABD 14 Fall 2020 14 / 18
Analysis Suppose x ⇤ is an optimum solution to min x k Ax � b k 2 . Let z = Ax ⇤ � b . We have k Π z k 2 (1 + ✏ ) k z k 2 since z 2 E . Since x ⇤ is a feasible solution to min x 0 k A 0 x 0 � b 0 k , I x 0 k A 0 x 0 � b 0 k 2 k A 0 x ⇤ � b 0 k 2 = k Π ( Ax ⇤ � b ) k 2 (1+ ✏ ) k Ax ⇤ � b k 2 min Hitz 'll Elite ) # EH = = =htEHAxEblL Chandra (UIUC) CS498ABD 14 Fall 2020 14 / 18
Analysis For any y 2 R d , k Π Ay � Π b k 2 � (1 � ✏ ) k Ay � b k 2 because Ay � b is a vector in E and Π preserves all of them. - tbh ay - b) HE HIT A y HIT zllz , Ci - e) Hell Chandra (UIUC) CS498ABD 15 Fall 2020 15 / 18
Analysis For any y 2 R d , k Π Ay � Π b k 2 � (1 � ✏ ) k Ay � b k 2 because Ay � b is a vector in E and Π preserves all of them. - o Let y ⇤ be optimum solution to min x 0 k A 0 x 0 � b 0 k 2 . Then k Π ( Ay ⇤ � b ) k 2 � (1 � ✏ ) k Ay ⇤ � b k 2 � (1 � ✏ ) k Ax ⇤ � b k 2 =p T Chandra (UIUC) CS498ABD 15 Fall 2020 15 / 18
Recommend
More recommend