jl lemma dimensionality reduction and subspace embeddings
play

JL Lemma, Dimensionality Reduction, and Subspace Embeddings - PowerPoint PPT Presentation

CS 498ABD: Algorithms for Big Data JL Lemma, Dimensionality Reduction, and Subspace Embeddings Lecture 11 September 29, 2020 Chandra (UIUC) CS498ABD 1 Fall 2020 1 / 25 F 2 estimation in turnstile setting AMS- ` 2 -Estimate : Let Y 1 , Y 2


  1. CS 498ABD: Algorithms for Big Data JL Lemma, Dimensionality Reduction, and Subspace Embeddings Lecture 11 September 29, 2020 Chandra (UIUC) CS498ABD 1 Fall 2020 1 / 25

  2. F 2 estimation in turnstile setting AMS- ` 2 -Estimate : Let Y 1 , Y 2 , . . . , Y n be { � 1 , +1 } random variables that are 4 -wise independent z 0 While (stream is not empty) do a j = ( i j , ∆ j ) is current update z z + ∆ j Y i j endWhile Output z 2 Claim: Output estimates || x || 2 I ? I 2 where x is the vector at end of stream of updates. ← ± : 't " " e : - - Chandra (UIUC) CS498ABD 2 Fall 2020 2 / 25

  3. Analysis Z = P n i =1 x i Y i and output is Z 2 Z 2 = X X x 2 i Y 2 i + 2 x i x j Y i Y j i 6 = j i and hence Z 2 ⇤ X x 2 i = || x || 2 ⇥ = 2 . E i One can show that Var ( Z 2 )  2(E Z 2 ⇤ ) 2 . ⇥ Chandra (UIUC) CS498ABD 3 Fall 2020 3 / 25

  4. Linear Sketching View Recall that we take average of independent estimators and take median to reduce error. Can we view all this as a sketch? AMS- ` 2 -Sketch : k = c log(1 / � ) / ✏ 2 Let M be a ` ⇥ n matrix with entries in { � 1 , 1 } s.t (i) rows are independent and (ii) in each row entries are 4 -wise independent z is a ` ⇥ 1 vector initialized to 0 While (stream is not empty) do a j = ( i j , ∆ j ) is current update z z + ∆ j Me i j endWhile Output vector z as sketch. M is compactly represented via k hash functions, one per row, independently chosen from 4 -wise independent hash family. Chandra (UIUC) CS498ABD 4 Fall 2020 4 / 25

  5. Geometric Interpretation Given vector x 2 R n let M the random map z = Mx has the following features ⇥ z 2 ⇤ = k x k 2 E[ z i ] = 0 and E 2 for each 1  i  k where k is i number of rows of M Thus each z 2 i is an estimate of length of x in Euclidean norm When k = Θ ( 1 ✏ 2 log(1 / � )) one can obtain an (1 ± ✏ ) estimate of k x k 2 by averaging and median ideas Thus we are able to compress x into k -dimensional vector z such " Is that z contains information to estimate k x k 2 accurately ¥ is ; I l " THE Hell xd I Chandra (UIUC) CS498ABD 5 Fall 2020 5 / 25

  6. Geometric Interpretation Given vector x 2 R n let M the random map z = Mx has the following features ⇥ z 2 ⇤ = k x k 2 E[ z i ] = 0 and E 2 for each 1  i  k where k is i number of rows of M Thus each z 2 i is an estimate of length of x in Euclidean norm When k = Θ ( 1 ✏ 2 log(1 / � )) one can obtain an (1 ± ✏ ) estimate of k x k 2 by averaging and median ideas Thus we are able to compress x into k -dimensional vector z such that z contains information to estimate k x k 2 accurately Question: Do we need median trick? Will averaging do? Chandra (UIUC) CS498ABD 5 Fall 2020 5 / 25

  7. Distributional JL Lemma Lemma (Distributional JL Lemma) Fix vector x 2 R d and let Π 2 R k ⇥ d matrix where each entry Π ij is chosen independently according to standard normal distribution N (0 , 1) distribution. If k = Ω ( 1 ✏ 2 log(1 / � )) , then with probability (1 � � ) E k 1 ( i - c) 11 × 112 E Π x k 2 = (1 ± ✏ ) k x k 2 . p • • k Can choose entries from { � 1 , 1 } as well. Note: unlike ` 2 estimation, entries of Π are independent. 1 Letting z = k Π x we have projected x from d dimensions to p k = O ( 1 ✏ 2 log(1 / � )) dimensions while preserving length to within (1 ± ✏ ) -factor. Chandra (UIUC) CS498ABD 6 Fall 2020 6 / 25

  8. C- R " ' " " ÷ ' ' Kx n ⇐ HEIL elite ) 11 × 11 . n NCO , l ) Tig . with pub = x G - f ) .

  9. Dimensionality reduction Theorem (Metric JL Lemma) Let v 1 , v 2 , . . . , v n be any n points/vectors in R d . For any ✏ 2 (0 , 1 / 2) , there is linear map f : R d ! R k where k  8 ln n / ✏ 2 such that for all 1  i < j  n , CHL ) (1 � ✏ ) || v i � v j || 2  || f ( v i ) � f ( v j ) || 2  || v i � v j || 2 . Moreover f can be obtained in randomized polynomial-time. Linear map f is simply given by random matrix Π : f ( v ) = Π v . • ve → bwfY . HEH I Uj iii. viii. - Hui - Y 'll ud - - Chandra (UIUC) CS498ABD 7 Fall 2020 7 / 25

  10. E Rd . Tu Vi , Vr , . " dpr . . . IT E R' you chase If Lemma : DJL then if any fixed vector - I , but K - KIT xllznn ( II. e) 11 × 112 I C- Rd i # j - Tj (7) Ji nectar ⇒ k= Eln 's = - in f , we chose If - = Ellen . G- plz ) with pub then - fill . IT Cui - 5. Ill , a line , Hui ' Ii - u rectum all ; band Rg union I - ⇐ g. ha with pals preserved are I - Ln 7 .

  11. Dimensionality reduction Theorem (Metric JL Lemma) Let v 1 , v 2 , . . . , v n be any n points/vectors in R d . For any ✏ 2 (0 , 1 / 2) , there is linear map f : R d ! R k where k  8 ln n / ✏ 2 such that for all 1  i < j  n , (1 � ✏ ) || v i � v j || 2  || f ( v i ) � f ( v j ) || 2  || v i � v j || 2 . Moreover f can be obtained in randomized polynomial-time. Linear map f is simply given by random matrix Π : f ( v ) = Π v . Proof. Apply DJL with � = 1 / n 2 and apply union bound to � n � vectors 2 ( v i � v j ) , i 6 = j . Chandra (UIUC) CS498ABD 7 Fall 2020 7 / 25

  12. DJL and Metric JL Key advantage: mapping is oblivious to data! Chandra (UIUC) CS498ABD 8 Fall 2020 8 / 25

  13. Normal Distribution 0 2 ⇡� 2 e � ( x − µ )2 1 Density function: f ( x ) = p 2 σ 2 Standard normal: N (0 , 1) is when µ = 0 , � = 1 44 = Chandra (UIUC) CS498ABD 9 Fall 2020 9 / 25

  14. Normal Distribution Cumulative density function for standard normal: R t 1 e � t 2 / 2 (no closed form) 1 Φ ( x ) = p 2 ⇡ # Chandra (UIUC) CS498ABD 10 Fall 2020 10 / 25

  15. Sum of independent Normally distributed variables Lemma Let X and Y be independent random variables. Suppose X ⇠ N ( µ X , � 2 X ) and Y ⇠ N ( µ Y , � 2 Y ) . Let Z = X + Y . Then Z ⇠ N ( µ X + µ Y , � 2 X + � 2 Y ) . Chandra (UIUC) CS498ABD 11 Fall 2020 11 / 25

  16. Sum of independent Normally distributed variables Lemma Let X and Y be independent random variables. Suppose X ⇠ N ( µ X , � 2 X ) and Y ⇠ N ( µ Y , � 2 Y ) . Let Z = X + Y . Then Z ⇠ N ( µ X + µ Y , � 2 X + � 2 Y ) . Corollary Let X and Y be independent random variables. Suppose X ⇠ N (0 , 1) and Y ⇠ N (0 , 1) . Let Z = aX + bY . Then Z ⇠ N (0 , a 2 + b 2 ) . Chandra (UIUC) CS498ABD 11 Fall 2020 11 / 25

  17. Sum of independent Normally distributed variables Lemma Let X and Y be independent random variables. Suppose X ⇠ N ( µ X , � 2 X ) and Y ⇠ N ( µ Y , � 2 Y ) . Let Z = X + Y . Then Z ⇠ N ( µ X + µ Y , � 2 X + � 2 Y ) . Corollary Let X and Y be independent random variables. Suppose X ⇠ N (0 , 1) and Y ⇠ N (0 , 1) . Let Z = aX + bY . Then Z ⇠ N (0 , a 2 + b 2 ) . Normal distribution is a stable distributions: adding two independent random variables within the same class gives a distribution inside the class. Others exist and useful in F p estimation for p 2 (0 , 2) . Chandra (UIUC) CS498ABD 11 Fall 2020 11 / 25

  18. Concentration of sum of squares of normally distributed variables � 2 ( k ) distribution: distribution of sum of k independent standard normally distributed variables Y = P k i =1 Z i where each Z i ' N (0 , 1) . Chandra (UIUC) CS498ABD 12 Fall 2020 12 / 25

  19. Concentration of sum of squares of normally distributed variables � 2 ( k ) distribution: distribution of sum of k independent standard normally distributed variables Effi 7- O ~ Y = P k i =1 Z i where each Z i ' N (0 , 1) . ⇥ Z 2 ⇤ = 1 hence E[ Y ] = k . E i = Chandra (UIUC) CS498ABD 12 Fall 2020 12 / 25

  20. Concentration of sum of squares of normally distributed variables � 2 ( k ) distribution: distribution of sum of k independent standard normally distributed variables Y = P k i =1 Z i where each Z i ' N (0 , 1) . ⇥ Z 2 ⇤ = 1 hence E[ Y ] = k . E i Lemma Let Z 1 , Z 2 , . . . , Z k be independent N (0 , 1) random variables and let Y = P i Z 2 i . Then, for ✏ 2 (0 , 1 / 2) , there is a constant c such that, Pr[(1 � ✏ ) 2 k  Y  (1 + ✏ ) 2 k ] � 1 � 2 e c ✏ 2 k . I " = . Chandra (UIUC) CS498ABD 12 Fall 2020 12 / 25

  21. � 2 distribution Density function Chandra (UIUC) CS498ABD 13 Fall 2020 13 / 25

  22. � 2 distribution Cumulative density function Chandra (UIUC) CS498ABD 14 Fall 2020 14 / 25

  23. Concentration of sum of squares of normally distributed variables � 2 ( k ) distribution: distribution of sum of k independent standard normally distributed variables Lemma Let Z 1 , Z 2 , . . . , Z k be independent N (0 , 1) random variables and i Z 2 let Y = P i . Then, for ✏ 2 (0 , 1 / 2) , there is a constant c such that, Pr[(1 � ✏ ) 2 k  Y  (1 + ✏ ) 2 k ] � 1 � 2 e c ✏ 2 k . T S Recall Cherno ff -Hoe ff ding bound for bounded independent non-negative random variables. Z 2 i is not bounded, however Cherno ff -Hoe ff ding bounds extend to sums of random variables with exponentially decaying tails. Chandra (UIUC) CS498ABD 15 Fall 2020 15 / 25

  24. → **¥¥÷h¥* t .

  25. Proof of DJL Lemma M Without loss of generality assume k x k 2 = 1 (unit vector) ER n IT ' z r - € Z i = P n , - - - - X , j =1 Π ij x i - - - . Z i ⇠ N (0 , 1) TIE Rk " " Xu IT Ella Elite ) llxth G- SYNTH , EH where G - f ) pub with 1k¥ # bust - N 10,17 . Tlij Chandra (UIUC) CS498ABD 16 Fall 2020 16 / 25

Recommend


More recommend