compsci 514 algorithms for data science
play

compsci 514: algorithms for data science Cameron Musco University - PowerPoint PPT Presentation

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall 2019. Lecture 10 0 logistics submissions until Sunday 10/13 at midnight with no penalty. today. 1 Problem Set 2 is due next Friday 10/11,


  1. compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall 2019. Lecture 10 0

  2. logistics submissions until Sunday 10/13 at midnight with no penalty. today. 1 • Problem Set 2 is due next Friday 10/11, although we will allow • Midterm on Thursday 10/17. Will cover material through

  3. summary Last Class: Dimensionality Reduction data science. Johnson-Lindenstrauss Lemma. This Class: Finish the JL Lemma. methods, etc. 2 • Applications and examples of dimensionality reduction in • Low-distortion embeddings (MinHash as an example). • Low-distortion embeddings for Euclidean space and the • Prove the Johnson-Lindenstrauss Lemma. • Discuss algorithmic considerations, connections to other

  4. embeddings for euclidean space If close to a k -dimensional space, can project to k dimensions without much distortion (the idea behind PCA). 3 Low Distortion Embedding for Euclidean Space: Given x 1 , . . . , x n ∈ R d x n ∈ R d ′ (where d ′ ≪ d ) such and error parameter ϵ ≥ 0, find ˜ x 1 , . . . , ˜ that for all i , j ∈ [ n ] : ( 1 − ϵ ) ∥ x i − x j ∥ 2 ≤ ∥ ˜ x i − ˜ x j ∥ 2 ≤ ( 1 + ϵ ) ∥ x i − x j ∥ 2 If x 1 , . . . , x n lie in a k -dimensional subspace of R d can project to d ′ = k dimensions with no distortion.

  5. the johnson-lindenstrauss lemma , letting Surprising and powerful result. 4 entry chosen i.i.d. as 1 Johnson-Lindenstrauss Lemma: Let Π ∈ R d ′ × d have each d ′ · N ( 0 , 1 ) . For any set of points √ ( ) x 1 , . . . , x n ∈ R d , ϵ, δ > 0, and d ′ = O log ( n /δ ) ϵ 2 x̃ i = Π x i , with probability ≥ 1 − δ we have: For all i , j : ( 1 − ϵ ) ∥ x i − x j ∥ 2 ≤ ∥ x̃ i − x̃ j ∥ 2 ≤ ( 1 + ϵ ) ∥ x i − x j ∥ 2 . • Construction of Π is simple, random and data oblivious. x 1 , . . . , x n : original data points ( d dimensions), x̃ 1 , . . . , x̃ n : compressed data points ( d ′ < d dimensions), Π ∈ R d ′ × d : random projection matrix (embedding function), ϵ : error of embedding, δ : failure probability.

  6. random projection 5 Π ∈ R d ′ × d is a random matrix. I.e., a random function mapping length d vectors to length d ′ vectors. x 1 , . . . , x n : original points ( d dims.), x̃ 1 , . . . , x̃ n : compressed points ( d ′ < d dims.), Π : random projection (embedding function), ϵ : error of embedding.

  7. connection to simhash d 6 Compression operation is x̃ i = Π x i , so for any j , ∑ x̃ i ( j ) = ⟨ Π ( j ) , x i ⟩ = Π ( j , k ) · x i ( k ) . k = 1 Π ( j ) is a vector with independent random Gaussian entries. x 1 , . . . , x n : original points ( d dims.), x̃ 1 , . . . , x̃ n : compressed points ( d ′ < d dims.), Π ∈ R d ′ × d : random projection (embedding function)

  8. connection to simhash d Points with high cosine similarity have similar random projections. 6 Compression operation is x̃ i = Π x i , so for any j , ∑ x̃ i ( j ) = ⟨ Π ( j ) , x i ⟩ = Π ( j , k ) · x i ( k ) . k = 1 Π ( j ) is a vector with independent random Gaussian entries. x 1 , . . . , x n : original points ( d dims.), x̃ 1 , . . . , x̃ n : compressed points ( d ′ < d dims.), Π ∈ R d ′ × d : random projection (embedding function)

  9. connection to simhash d Points with high cosine similarity have similar random projections. 6 Compression operation is x̃ i = Π x i , so for any j , ∑ x̃ i ( j ) = ⟨ Π ( j ) , x i ⟩ = Π ( j , k ) · x i ( k ) . k = 1 Π ( j ) is a vector with independent random Gaussian entries. x 1 , . . . , x n : original points ( d dims.), x̃ 1 , . . . , x̃ n : compressed points ( d ′ < d dims.), Π ∈ R d ′ × d : random projection (embedding function)

  10. connection to simhash d Points with high cosine similarity have similar random projections. 6 Compression operation is x̃ i = Π x i , so for any j , ∑ x̃ i ( j ) = ⟨ Π ( j ) , x i ⟩ = Π ( j , k ) · x i ( k ) . k = 1 Π ( j ) is a vector with independent random Gaussian entries. x 1 , . . . , x n : original points ( d dims.), x̃ 1 , . . . , x̃ n : compressed points ( d ′ < d dims.), Π ∈ R d ′ × d : random projection (embedding function)

  11. connection to simhash d Points with high cosine similarity have similar random projections. 6 Compression operation is x̃ i = Π x i , so for any j , ∑ x̃ i ( j ) = ⟨ Π ( j ) , x i ⟩ = Π ( j , k ) · x i ( k ) . k = 1 Π ( j ) is a vector with independent random Gaussian entries. x 1 , . . . , x n : original points ( d dims.), x̃ 1 , . . . , x̃ n : compressed points ( d ′ < d dims.), Π ∈ R d ′ × d : random projection (embedding function)

  12. connection to simhash d Points with high cosine similarity have similar random projections. 6 Compression operation is x̃ i = Π x i , so for any j , ∑ x̃ i ( j ) = ⟨ Π ( j ) , x i ⟩ = Π ( j , k ) · x i ( k ) . k = 1 Π ( j ) is a vector with independent random Gaussian entries. x 1 , . . . , x n : original points ( d dims.), x̃ 1 , . . . , x̃ n : compressed points ( d ′ < d dims.), Π ∈ R d ′ × d : random projection (embedding function)

  13. connection to simhash x 1 d : random projection (embedding function) d dims.), d x̃ n : compressed points ( d x n : original points ( d dims.), x̃ 1 random projections. similarity have similar Points with high cosine d 6 Compression operation is x̃ i = Π x i , so for any j , ∑ x̃ i ( j ) = ⟨ Π ( j ) , x i ⟩ = Π ( j , k ) · x i ( k ) . k = 1 Π ( j ) is a vector with independent random Gaussian entries. Computing a length d ′ SimHash signature SH 1 ( x i ) , . . . , SH d ′ ( x i ) is identical to computing x̃ i = Π x i and then taking sign ( x̃ i ) .

  14. distributional jl The Johnson-Lindenstrauss Lemma is a direct consequence of a compressed vector rather than distances between vectors. high probability. 7 sen i.i.d. as 1 closely related lemma: Distributional JL Lemma: Let Π ∈ R m × d have each entry cho- ( ) log 1 /δ √ m · N ( 0 , 1 ) . If we set m = O ϵ 2 , then for any y ∈ R d , with probability ≥ 1 − δ ( 1 − ϵ ) ∥ y ∥ 2 ≤ ∥ Π y ∥ 2 ≤ ( 1 + ϵ ) ∥ y ∥ 2 Applying a random matrix Π to any vector y preserves y ’s norm with • Like a low-distortion embedding, but for the length of a • Can be proven from first principles. Will see next. Π ∈ R m × d : random projection matrix. d : original dimension. m : compressed dimension (analogous to d ′ ), ϵ : embedding error, δ : embedding failure prob.

  15. 2 projection matrix. d : original dimension. m : compressed dimension (analo- 8 ⇒ jl distributional jl = ⇒ JL Lemma: Distributional JL show that Distributional JL Lemma = a random projection Π preserves the norm of any y . The main JL Lemma says that Π preserves distances between vectors. Since Π is linear these are the same thing! ( n ) Proof: Given x 1 , . . . , x n , define vectors y ij where y ij = x i − x j . x 1 , . . . , x n : original points, x̃ 1 , . . . , x̃ n : compressed points, Π ∈ R m × d : random gous to d ′ ), ϵ : embedding error, δ : embedding failure prob.

  16. 8 2 projection matrix. d : original dimension. m : compressed dimension (analo- ⇒ jl distributional jl = ⇒ JL Lemma: Distributional JL show that Distributional JL Lemma = a random projection Π preserves the norm of any y . The main JL Lemma says that Π preserves distances between vectors. Since Π is linear these are the same thing! ( n ) Proof: Given x 1 , . . . , x n , define vectors y ij where y ij = x i − x j . ( ) log 1 /δ • If we choose Π with m = O ϵ 2 , for each y ij with probability ≥ 1 − δ we have: ( 1 − ϵ ) ∥ y ij ∥ 2 ≤ ∥ Π y ij ∥ 2 ≤ ( 1 + ϵ ) ∥ y ij ∥ 2 x 1 , . . . , x n : original points, x̃ 1 , . . . , x̃ n : compressed points, Π ∈ R m × d : random gous to d ′ ), ϵ : embedding error, δ : embedding failure prob.

  17. 8 2 projection matrix. d : original dimension. m : compressed dimension (analo- ⇒ jl distributional jl = ⇒ JL Lemma: Distributional JL show that Distributional JL Lemma = a random projection Π preserves the norm of any y . The main JL Lemma says that Π preserves distances between vectors. Since Π is linear these are the same thing! ( n ) Proof: Given x 1 , . . . , x n , define vectors y ij where y ij = x i − x j . ( ) log 1 /δ • If we choose Π with m = O ϵ 2 , for each y ij with probability ≥ 1 − δ we have: ( 1 − ϵ ) ∥ x i − x j ∥ 2 ≤ ∥ Π ( x i − x j ) ∥ 2 ≤ ( 1 + ϵ ) ∥ x i − x j ∥ 2 x 1 , . . . , x n : original points, x̃ 1 , . . . , x̃ n : compressed points, Π ∈ R m × d : random gous to d ′ ), ϵ : embedding error, δ : embedding failure prob.

  18. 8 2 projection matrix. d : original dimension. m : compressed dimension (analo- ⇒ jl distributional jl = ⇒ JL Lemma: Distributional JL show that Distributional JL Lemma = a random projection Π preserves the norm of any y . The main JL Lemma says that Π preserves distances between vectors. Since Π is linear these are the same thing! ( n ) Proof: Given x 1 , . . . , x n , define vectors y ij where y ij = x i − x j . ( ) log 1 /δ • If we choose Π with m = O ϵ 2 , for each y ij with probability ≥ 1 − δ we have: ( 1 − ϵ ) ∥ x i − x j ∥ 2 ≤ ∥ x̃ i − x̃ j ∥ 2 ≤ ( 1 + ϵ ) ∥ x i − x j ∥ 2 x 1 , . . . , x n : original points, x̃ 1 , . . . , x̃ n : compressed points, Π ∈ R m × d : random gous to d ′ ), ϵ : embedding error, δ : embedding failure prob.

Recommend


More recommend