Dimensionality Reduction and JL Lemma Lecture 12 February 21, 2019 - PowerPoint PPT Presentation

CS 498ABD: Algorithms for Big Data, Spring 2019 Dimensionality Reduction and JL Lemma Lecture 12 February 21, 2019 Chandra (UIUC) CS498ABD 1 Spring 2019 1 / 23

F 2 estimation in turnstile setting AMS- ℓ 2 -Estimate : Let Y 1 , Y 2 , . . . , Y n be {− 1 , +1 } random variables that are 4 -wise independent z ← 0 While (stream is not empty) do a j = ( i j , ∆ j ) is current update z ← z + ∆ j Y i j endWhile Output z 2 Claim: Output estimates || x || 2 2 where x is the vector at end of stream of updates. Chandra (UIUC) CS498ABD 2 Spring 2019 2 / 23

Analysis Z = � n i =1 x i Y i and output is Z 2 Z 2 = � � x 2 i Y 2 i + 2 x i x j Y i Y j i � = j i and hence Z 2 � � x 2 i = || x || 2 � = 2 . E i One can show that Var ( Z 2 ) ≤ 2(E Z 2 � ) 2 . � Chandra (UIUC) CS498ABD 3 Spring 2019 3 / 23

Linear Sketching View Recall that we take average of independent estimators and take median to reduce error. Can we view all this as a sketch? AMS- ℓ 2 -Sketch : k = c log(1 /δ ) /ǫ 2 Let M be a ℓ × n matrix with entries in {− 1 , 1 } s.t (i) rows are independent and (ii) in each row entries are 4 -wise independent z is a ℓ × 1 vector initialized to 0 While (stream is not empty) do a j = ( i j , ∆ j ) is current update z ← z + ∆ j Me i j endWhile Output vector z as sketch. M is compactly represented via k hash functions, one per row, independently chosen from 4 -wise independent hash family. Chandra (UIUC) CS498ABD 4 Spring 2019 4 / 23

Geometric Interpretation Given vector x ∈ R n let M the random map z = Mx has the following features � z 2 � = � x � 2 2 for each 1 ≤ i ≤ k where k is E[ z i ] = 0 and E i number of rows of M Thus each z 2 i is an estimate of length of x in Euclidean norm When k = Θ( 1 ǫ 2 log(1 /δ )) one can obtain an (1 ± ǫ ) estimate of � x � 2 by averaging and median ideas Thus we are able to compress x into k -dimensional vector z such that z contains information to estimate � x � 2 accurately Chandra (UIUC) CS498ABD 5 Spring 2019 5 / 23

Geometric Interpretation Given vector x ∈ R n let M the random map z = Mx has the following features � z 2 � = � x � 2 2 for each 1 ≤ i ≤ k where k is E[ z i ] = 0 and E i number of rows of M Thus each z 2 i is an estimate of length of x in Euclidean norm When k = Θ( 1 ǫ 2 log(1 /δ )) one can obtain an (1 ± ǫ ) estimate of � x � 2 by averaging and median ideas Thus we are able to compress x into k -dimensional vector z such that z contains information to estimate � x � 2 accurately Question: Do we need median trick? Will averaging do? Chandra (UIUC) CS498ABD 5 Spring 2019 5 / 23

Distributional JL Lemma Lemma (Distributional JL Lemma) Fix vector x ∈ R d and let Π ∈ R k × d matrix where each entry Π ij is chosen independently according to standard normal distribution N (0 , 1) distribution. If k = Ω( 1 ǫ 2 log(1 /δ )) , then with probability (1 − δ ) � 1 √ Π x � 2 = (1 ± ǫ ) � x � 2 . k Can choose entries from {− 1 , 1 } as well. Note: unlike ℓ 2 estimation, entries of Π are independent. 1 Letting z = k Π x we have projected x from d dimensions to √ k = O ( 1 ǫ 2 log(1 /δ )) dimensions while preserving length to within (1 ± ǫ ) -factor. Chandra (UIUC) CS498ABD 6 Spring 2019 6 / 23

Dimensionality reduction Theorem (Metric JL Lemma) Let v 1 , v 2 , . . . , v n be any n points/vectors in R d . For any ǫ ∈ (0 , 1 / 2) , there is linear map f : R d → R k where k ≤ 8 ln n /ǫ 2 such that for all 1 ≤ i < j ≤ n , (1 − ǫ ) || v i − v j || 2 ≤ || f ( v i ) − f ( v j ) || 2 ≤ || v i − v j || 2 . Moreover f can be obtained in randomized polynomial-time. Linear map f is simply given by random matrix Π : f ( v ) = Π v . Chandra (UIUC) CS498ABD 7 Spring 2019 7 / 23

Normal Distribution 2 πσ 2 e − ( x − µ )2 1 Density function: f ( x ) = √ 2 σ 2 Standard normal: N (0 , 1) is when µ = 0 , σ = 1 Chandra (UIUC) CS498ABD 8 Spring 2019 8 / 23

Normal Distribution Cumulative density function for standard normal: � t ∞ e − t 2 / 2 (no closed form) 1 Φ( x ) = √ 2 π Chandra (UIUC) CS498ABD 9 Spring 2019 9 / 23

Sum of independent Normally distributed variables Lemma Let X and Y be independent random variables. Suppose X ∼ N ( µ X , σ 2 X ) and Y ∼ N ( µ Y , σ 2 Y ) . Let Z = X + Y . Then Z ∼ N ( µ X + µ Y , σ 2 X + σ 2 Y ) . Chandra (UIUC) CS498ABD 10 Spring 2019 10 / 23

Sum of independent Normally distributed variables Lemma Let X and Y be independent random variables. Suppose X ∼ N ( µ X , σ 2 X ) and Y ∼ N ( µ Y , σ 2 Y ) . Let Z = X + Y . Then Z ∼ N ( µ X + µ Y , σ 2 X + σ 2 Y ) . Corollary Let X and Y be independent random variables. Suppose X ∼ N (0 , 1) and Y ∼ N (0 , 1) . Let Z = aX + bY . Then Z ∼ N (0 , a 2 + b 2 ) . Chandra (UIUC) CS498ABD 10 Spring 2019 10 / 23

Concentration of sum of squares of normally distributed variables Lemma Let Z 1 , Z 2 , . . . , Z k be independent N (0 , 1) random variables and i Z 2 let Y = � i . Then, for ǫ ∈ (0 , 1 / 2) , there is a constant c such that, Pr[(1 − ǫ ) 2 k ≤ Y ≤ (1 + ǫ ) 2 k ] ≥ 1 − 2 e c ǫ 2 k . Chandra (UIUC) CS498ABD 11 Spring 2019 11 / 23

χ 2 distribution Density function Chandra (UIUC) CS498ABD 12 Spring 2019 12 / 23

χ 2 distribution Cumulative density function Chandra (UIUC) CS498ABD 13 Spring 2019 13 / 23

Proof of DJL Lemma Without loss of generality assume � x � 2 = 1 (unit vector) Z i = � n j =1 Π ij x i Z i ∼ N (0 , 1) Chandra (UIUC) CS498ABD 14 Spring 2019 14 / 23

Proof of DJL Lemma Without loss of generality assume � x � 2 = 1 (unit vector) Z i = � n j =1 Π ij x i Z i ∼ N (0 , 1) i . Y ’s distribution is χ 2 since Z 1 , . . . , Z k are Let Y = � k i =1 Z 2 iid Chandra (UIUC) CS498ABD 14 Spring 2019 14 / 23

Proof of DJL Lemma Without loss of generality assume � x � 2 = 1 (unit vector) Z i = � n j =1 Π ij x i Z i ∼ N (0 , 1) i . Y ’s distribution is χ 2 since Z 1 , . . . , Z k are Let Y = � k i =1 Z 2 iid Hence Pr[(1 − ǫ ) 2 k ≤ Y ≤ (1 + ǫ ) 2 k ] ≥ 1 − 2 e c ǫ 2 k Chandra (UIUC) CS498ABD 14 Spring 2019 14 / 23

Proof of DJL Lemma Without loss of generality assume � x � 2 = 1 (unit vector) Z i = � n j =1 Π ij x i Z i ∼ N (0 , 1) i . Y ’s distribution is χ 2 since Z 1 , . . . , Z k are Let Y = � k i =1 Z 2 iid Hence Pr[(1 − ǫ ) 2 k ≤ Y ≤ (1 + ǫ ) 2 k ] ≥ 1 − 2 e c ǫ 2 k Since k = Ω( 1 ǫ 2 log(1 /δ )) we have Pr[(1 − ǫ ) 2 k ≤ Y ≤ (1 + ǫ ) 2 k ] ≥ 1 − δ Chandra (UIUC) CS498ABD 14 Spring 2019 14 / 23

Proof of DJL Lemma Without loss of generality assume � x � 2 = 1 (unit vector) Z i = � n j =1 Π ij x i Z i ∼ N (0 , 1) i . Y ’s distribution is χ 2 since Z 1 , . . . , Z k are Let Y = � k i =1 Z 2 iid Hence Pr[(1 − ǫ ) 2 k ≤ Y ≤ (1 + ǫ ) 2 k ] ≥ 1 − 2 e c ǫ 2 k Since k = Ω( 1 ǫ 2 log(1 /δ )) we have Pr[(1 − ǫ ) 2 k ≤ Y ≤ (1 + ǫ ) 2 k ] ≥ 1 − δ � Therefore � z � 2 = Y / k has the property that with probability (1 − δ ) , � z � 2 = (1 ± ǫ ) � x � 2 . Chandra (UIUC) CS498ABD 14 Spring 2019 14 / 23

JL lower bounds Question: Are the bounds achieved by the lemmas tight or can we do better? How about non-linear maps? Essentially optimal modulo constant factors for worst-case point sets. Chandra (UIUC) CS498ABD 15 Spring 2019 15 / 23

Fast JL and Sparse JL Projection matrix Π is dense and hence Π x takes Θ( kn ) time. Question: Can we find Π to improve time bound? Two scenarios: x is dense x is sparse Chandra (UIUC) CS498ABD 16 Spring 2019 16 / 23

Fast JL and Sparse JL Projection matrix Π is dense and hence Π x takes Θ( kn ) time. Question: Can we find Π to improve time bound? Two scenarios: x is dense x is sparse Main ideas: Choose Π ij to be {− 1 , 0 , 1 } with probability 1 / 6 , 1 / 3 , 1 / 6 . Also works. Roughly 1 / 3 entries are 0 Fast JL: Choose Π in a dependent way to ensure Π x can be computed in O ( d log d ) time Sparse JL: Choose Π such that each column is s -sparse. The best known is s = O ( 1 ǫ log(1 /δ )) Chandra (UIUC) CS498ABD 16 Spring 2019 16 / 23

Subspace Embedding Question: Suppose we have linear subspace E of R d of dimension ℓ . Can we find a projection Π : R d → R k such that for every x ∈ E , � Π x � 2 = (1 ± ǫ ) � x � 2 ? Chandra (UIUC) CS498ABD 17 Spring 2019 17 / 23

Subspace Embedding Question: Suppose we have linear subspace E of R d of dimension ℓ . Can we find a projection Π : R d → R k such that for every x ∈ E , � Π x � 2 = (1 ± ǫ ) � x � 2 ? Not possible if k < ℓ . Why? Chandra (UIUC) CS498ABD 17 Spring 2019 17 / 23

Subspace Embedding Question: Suppose we have linear subspace E of R d of dimension ℓ . Can we find a projection Π : R d → R k such that for every x ∈ E , � Π x � 2 = (1 ± ǫ ) � x � 2 ? Not possible if k < ℓ . Why? Π maps E to a lower dimension. Implies some non-zero vector x ∈ E mapped to 0 Chandra (UIUC) CS498ABD 17 Spring 2019 17 / 23

Dimensionality Reduction and JL Lemma Lecture 12 February 21, 2019 - PowerPoint PPT Presentation

CS 498ABD: Algorithms for Big Data, Spring 2019 Dimensionality Reduction and JL Lemma Lecture 12 February 21, 2019 Chandra (UIUC) CS498ABD 1 Spring 2019 1 / 23 F 2 estimation in turnstile setting AMS- 2 -Estimate : Let Y 1 , Y 2 , . . .

STAT 209 Dimensionality Reduction November 26, 2019 Colin Reimer Dawson 1 / 24 Dimensionality

Dimensionality Reduction Alexandros Tantos Assistant Professor Aristotle University of

Investigating Dimensionality Dimensionality Dimensionality with with Investigating

Dimensionality Reduction Algorithms (and how to interpret their output) Dalya Baron (Tel Aviv

Exploring Multivariate Data with Clustering and Dimensionality Reduction Marco Baroni Practical

Preprocessing and Dimensionality Reduction J er emy Fix CentraleSup elec

WIKIPEDIA ARTICLE GROUP 9 Contents Article Overview 1. Dimensionality Reduction 2.

Nonlinear Dimensionality Reduction Donovan Parks Overview Direct visualization vs.

Applied Machine Learning Dimensionality reduction using PCA Siamak Ravanbakhsh COMP 551 (Fall

DIMENSIONALITY REDUCTION DIMENSIONALITY REDUCTION MATTHIEU BLOCH April 21, 2020 1 / 26

Probabilistic Dimensionality Reduction Neil D. Lawrence University of Sheffield Facebook, London

Kernel-Based Dimensionality Reduction Methods on Synthesized and Facial Image Data Jonathan L.

Theory of Computer Science C4. Regular Languages: Pumping Lemma, Closure Properties and

The Pumping Lemma for Regular Languages The Pumping Lemma forRegular Languages p.1/39

JL Lemma, Dimensionality Reduction, and Subspace Embeddings Lecture 11 September 29, 2020

Spatial Data: Dimensionality Reduction CS444 Techniques, Lecture 3 In this subfield, we think

d i E Dimension a l l u d Dr. Abdulla Eid b A College of Science . r D MATHS 211:

Agenda Standard Basis Documenta8on 1. SML Docs Online Documenta.on

Coordinates and Linear Transforms Coordinate Systems Defn. If B is a basis, then [ x ] B gives

Matrix Calculations: Determinants and Basis Transformation A. Kissinger Institute for Computing

MaGiX@LIX 2011 - Hans Sch onemann hannes@mathematik.uni-kl.de Department of Mathematics

CS 559: Machine Learning Fundamentals and Applications 2 nd Set of Notes Instructor: Philippos

1 Further information Slides to this webcast (available here:

4.4 Coordinate Systems In general, people are more comfortable working with the vector space R n