compsci 514 algorithms for data science
play

compsci 514: algorithms for data science Cameron Musco University - PowerPoint PPT Presentation

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall 2019. Lecture 22 0 logistics coming in next couple of days. 1 Problem Set 4 released last night. Due Sunday 12/15 at 8pm. Final Exam


  1. compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall 2019. Lecture 22 0

  2. logistics coming in next couple of days. 1 • Problem Set 4 released last night. Due Sunday 12/15 at 8pm. • Final Exam Thursday 12/19 at 10:30am in Thompson 104. • Exam prep materials (list of topics, practice problems)

  3. • A quick tour of the counterintuitive properties of • Many connections to concentration inequalities. • Implications for working with high-dimensional data (curse summary Before Break: regression. This Class: high-dimensional space. of dimensionality). 2 • Finished discussion of SGD. • Gradient descent and SGD as applied to least squares

  4. summary Before Break: regression. This Class: high-dimensional space. of dimensionality). 2 • Finished discussion of SGD. • Gradient descent and SGD as applied to least squares • A quick tour of the counterintuitive properties of • Many connections to concentration inequalities. • Implications for working with high-dimensional data (curse

  5. high-dimensional data Modern data analysis often involves very high-dimensional data points. user: who they follow, when they visit the site, timestamps for specific iteractions, etc. billion pixel values. Typically when discussing algorithm design we imagine data in much lower (usually 3) dimensional space. 3 • Websites record (tens of) thousands of measurements per • A 3 minute, 500 × 500 pixel video clip at 15 FPS has ≥ 2 • The human genome has 3 billion+ base pairs.

  6. high-dimensional data Modern data analysis often involves very high-dimensional data points. user: who they follow, when they visit the site, timestamps for specific iteractions, etc. billion pixel values. Typically when discussing algorithm design we imagine data in much lower (usually 3) dimensional space. 3 • Websites record (tens of) thousands of measurements per • A 3 minute, 500 × 500 pixel video clip at 15 FPS has ≥ 2 • The human genome has 3 billion+ base pairs.

  7. low-dimensional intuition This can be a bit dangerous as in reality high-dimensional space is very different from low-dimensional space. 4

  8. low-dimensional intuition This can be a bit dangerous as in reality high-dimensional space is very different from low-dimensional space. 4

  9. orthogonal vectors What is the largest set of mutually orthogonal unit vectors in d -dimensional space? Answer: d . 5

  10. orthogonal vectors What is the largest set of mutually orthogonal unit vectors in d -dimensional space? Answer: d . 5

  11. x i x j • By a Chernoff bound, Pr x i x j • If we chose t • x i is always a unit vector. In fact, an exponentially large set of random vectors will be nearly x t each have independent random entries set to 4 e 1 t 2 2 d 6 , using a union bound over all 2 d 3 1 possible pairs, with probability 2 d 3 . 2 e 1 2 all with be nearly orthogonal. 2 e nearly orthogonal vectors • What is the largest set of unit vectors in d -dimensional space that 1. d 2. d 3. d 2 4. 2 d pairwise orthogonal with high probability! Proof: Let x 1 1 d . 6 have all pairwise dot products |⟨ ⃗ x ,⃗ y ⟩| ≤ ϵ ? (think ϵ = . 01)

  12. x i x j • By a Chernoff bound, Pr x i x j • If we chose t • x i is always a unit vector. In fact, an exponentially large set of random vectors will be nearly x t each have independent random entries set to 1 t 2 2 d 6 , using a union bound over all 4 e 2 e 1 2 d 3 2 d 3 . possible pairs, with probability 2 e 1 2 all with be nearly orthogonal. nearly orthogonal vectors • What is the largest set of unit vectors in d -dimensional space that d . 1 Proof: Let x 1 pairwise orthogonal with high probability! 1. d 6 have all pairwise dot products |⟨ ⃗ x ,⃗ y ⟩| ≤ ϵ ? (think ϵ = . 01) 4. 2 Θ( d ) 2. Θ( d ) 3. Θ( d 2 )

  13. x i x j • By a Chernoff bound, Pr x i x j • If we chose t • x i is always a unit vector. x t each have independent random entries set to 2 e 2 d 3 . 1 nearly orthogonal vectors 2 d 6 , using a union bound over all t 2 1 4 e 2 d 3 possible pairs, with probability 1 2 all with be nearly orthogonal. 2 e • What is the largest set of unit vectors in d -dimensional space that d . 1 Proof: Let x 1 pairwise orthogonal with high probability! In fact, an exponentially large set of random vectors will be nearly 1. d 6 have all pairwise dot products |⟨ ⃗ x ,⃗ y ⟩| ≤ ϵ ? (think ϵ = . 01) 4. 2 Θ( d ) 2. Θ( d ) 3. Θ( d 2 )

  14. x i x j • By a Chernoff bound, Pr x i x j • If we chose t • x i is always a unit vector. x t each have independent random entries set to 2 e 2 d 3 . 1 nearly orthogonal vectors 2 d 6 , using a union bound over all t 2 1 4 e 2 d 3 possible pairs, with probability 1 2 all with be nearly orthogonal. 2 e • What is the largest set of unit vectors in d -dimensional space that d . 1 Proof: Let x 1 pairwise orthogonal with high probability! 1. d 6 have all pairwise dot products |⟨ ⃗ x ,⃗ y ⟩| ≤ ϵ ? (think ϵ = . 01) 4. 2 Θ( d ) 2. Θ( d ) 3. Θ( d 2 ) In fact, an exponentially large set of random vectors will be nearly

  15. x i x j • By a Chernoff bound, Pr x i x j • If we chose t • x i is always a unit vector. 2 e 2 d 3 . 1 nearly orthogonal vectors 2 e t 2 1 4 e 2 d 3 possible pairs, with probability 1 2 all with be nearly orthogonal. 2 d 6 , using a union bound over all • What is the largest set of unit vectors in d -dimensional space that d . pairwise orthogonal with high probability! 1. d 6 have all pairwise dot products |⟨ ⃗ x ,⃗ y ⟩| ≤ ϵ ? (think ϵ = . 01) 4. 2 Θ( d ) 2. Θ( d ) 3. Θ( d 2 ) In fact, an exponentially large set of random vectors will be nearly Proof: Let x 1 , . . . , x t each have independent random entries set to √ ± 1 /

  16. x i x j • By a Chernoff bound, Pr x i x j • If we chose t nearly orthogonal vectors 1 2 all with be nearly orthogonal. possible pairs, with probability 2 d 3 4 e 1 t 2 2 d 6 , using a union bound over all 2 e 1 2 d 3 . 2 e • What is the largest set of unit vectors in d -dimensional space that d . pairwise orthogonal with high probability! 1. d 6 have all pairwise dot products |⟨ ⃗ x ,⃗ y ⟩| ≤ ϵ ? (think ϵ = . 01) 4. 2 Θ( d ) 2. Θ( d ) 3. Θ( d 2 ) In fact, an exponentially large set of random vectors will be nearly Proof: Let x 1 , . . . , x t each have independent random entries set to √ ± 1 / • x i is always a unit vector.

  17. • By a Chernoff bound, Pr x i x j • If we chose t nearly orthogonal vectors 1 2 all with be nearly orthogonal. possible pairs, with probability 2 d 3 4 e 1 t 2 2 d 6 , using a union bound over all 2 e 1 2 d 3 . 2 e 6 What is the largest set of unit vectors in d -dimensional space that d . pairwise orthogonal with high probability! 1. d have all pairwise dot products |⟨ ⃗ x ,⃗ y ⟩| ≤ ϵ ? (think ϵ = . 01) 4. 2 Θ( d ) 2. Θ( d ) 3. Θ( d 2 ) In fact, an exponentially large set of random vectors will be nearly Proof: Let x 1 , . . . , x t each have independent random entries set to √ ± 1 / • x i is always a unit vector. • E [ ⟨ x i , x j ⟩ ] = ?

  18. • By a Chernoff bound, Pr x i x j • If we chose t nearly orthogonal vectors 1 2 all with be nearly orthogonal. possible pairs, with probability 2 d 3 4 e 1 t 2 2 d 6 , using a union bound over all 2 e 1 2 d 3 . 2 e 6 What is the largest set of unit vectors in d -dimensional space that d . pairwise orthogonal with high probability! 1. d have all pairwise dot products |⟨ ⃗ x ,⃗ y ⟩| ≤ ϵ ? (think ϵ = . 01) 4. 2 Θ( d ) 2. Θ( d ) 3. Θ( d 2 ) In fact, an exponentially large set of random vectors will be nearly Proof: Let x 1 , . . . , x t each have independent random entries set to √ ± 1 / • x i is always a unit vector. • E [ ⟨ x i , x j ⟩ ] = 0 .

  19. • If we chose t nearly orthogonal vectors What is the largest set of unit vectors in d -dimensional space that 1 2 all with be nearly orthogonal. possible pairs, with probability 2 d 3 4 e 1 t 2 2 d 6 , using a union bound over all 2 e 1 d . pairwise orthogonal with high probability! 1. d 6 have all pairwise dot products |⟨ ⃗ x ,⃗ y ⟩| ≤ ϵ ? (think ϵ = . 01) 4. 2 Θ( d ) 2. Θ( d ) 3. Θ( d 2 ) In fact, an exponentially large set of random vectors will be nearly Proof: Let x 1 , . . . , x t each have independent random entries set to √ ± 1 / • x i is always a unit vector. • E [ ⟨ x i , x j ⟩ ] = 0 . • By a Chernoff bound, Pr [ |⟨ x i , x j ⟩| ≥ ϵ ] ≤ 2 e − ϵ 2 d / 3 .

  20. nearly orthogonal vectors pairwise orthogonal with high probability! d . What is the largest set of unit vectors in d -dimensional space that 6 1. d have all pairwise dot products |⟨ ⃗ x ,⃗ y ⟩| ≤ ϵ ? (think ϵ = . 01) 4. 2 Θ( d ) 2. Θ( d ) 3. Θ( d 2 ) In fact, an exponentially large set of random vectors will be nearly Proof: Let x 1 , . . . , x t each have independent random entries set to √ ± 1 / • x i is always a unit vector. • E [ ⟨ x i , x j ⟩ ] = 0 . • By a Chernoff bound, Pr [ |⟨ x i , x j ⟩| ≥ ϵ ] ≤ 2 e − ϵ 2 d / 3 . 2 e ϵ 2 d / 6 , using a union bound over all ≤ t 2 = 1 4 e ϵ 2 d / 3 • If we chose t = 1 possible pairs, with probability ≥ 1 / 2 all with be nearly orthogonal.

Recommend


More recommend