algorithms for big data vi
play

Algorithms for Big Data (VI) Chihao Zhang Shanghai Jiao Tong - PowerPoint PPT Presentation

Algorithms for Big Data (VI) Chihao Zhang Shanghai Jiao Tong University Oct. 25, 2019 Algorithms for Big Data (VI) 1/13 Review . Algorithms for Big Data (VI) . Output ; , On input from a -universal family; Pick log We learnt AMS


  1. Algorithms for Big Data (VI) Chihao Zhang Shanghai Jiao Tong University Oct. 25, 2019 Algorithms for Big Data (VI) 1/13

  2. Review . Algorithms for Big Data (VI) . Output ; , On input from a -universal family; Pick log We learnt AMS algorithm to estimate f log costs An ad-hoc algorithm for f bits. log log using for 2/13

  3. Review . Algorithms for Big Data (VI) . Output ; , On input from a -universal family; Pick log log costs An ad-hoc algorithm for f bits. 2/13 ( ) kn 1 − 1/ k ( log m + log n ) We learnt AMS algorithm to estimate ∥ f ∥ k k for k ≥ 2 using O

  4. Review from a -universal family; Algorithms for Big Data (VI) . Output ; , On input Pick 2/13 bits. ( ) kn 1 − 1/ k ( log m + log n ) We learnt AMS algorithm to estimate ∥ f ∥ k k for k ≥ 2 using O An ad-hoc algorithm for ∥ f ∥ 2 2 costs O ( log m + log n ) .

  5. bits. Review Algorithms for Big Data (VI) 2/13 ( ) kn 1 − 1/ k ( log m + log n ) We learnt AMS algorithm to estimate ∥ f ∥ k k for k ≥ 2 using O An ad-hoc algorithm for ∥ f ∥ 2 2 costs O ( log m + log n ) . ▶ Pick h : [ n ] → {− 1 , 1 } from a 4 -universal family; ▶ On input ( j , ∆) , x ← x + ∆ · h ( j ) ; ▶ Output x 2 .

  6. x is close to that of f ! An Algebraic View f , we know that E Algorithms for Big Data (VI) The 2-norm of the vector . x . Our algorithm outputs Let x It is instructive to view the Tug-of-War algorithm from linear algebra. . where Consider the matrix . function times (to apply the averaging trick), each time with Assume that we run the algorithm 3/13

  7. x is close to that of f ! An Algebraic View f , we know that E Algorithms for Big Data (VI) The 2-norm of the vector . x . Our algorithm outputs Let x It is instructive to view the Tug-of-War algorithm from linear algebra. . where Consider the matrix . function times (to apply the averaging trick), each time with Assume that we run the algorithm 3/13

  8. x is close to that of f ! An Algebraic View It is instructive to view the Tug-of-War algorithm from linear algebra. Consider the matrix where . Let x f , we know that E . Our algorithm outputs x . The 2-norm of the vector Algorithms for Big Data (VI) 3/13 Assume that we run the algorithm k times (to apply the averaging trick), each time with function h i .

  9. x is close to that of f ! An Algebraic View It is instructive to view the Tug-of-War algorithm from linear algebra. Let x f , we know that E . Our algorithm outputs x . The 2-norm of the vector Algorithms for Big Data (VI) 3/13 Assume that we run the algorithm k times (to apply the averaging trick), each time with function h i . Consider the matrix A = ( a ij ) i ∈ [ k ] , j ∈ [ n ] where a ij = h i ( j ) .

  10. x is close to that of f ! An Algebraic View It is instructive to view the Tug-of-War algorithm from linear algebra. Algorithms for Big Data (VI) The 2-norm of the vector 3/13 Assume that we run the algorithm k times (to apply the averaging trick), each time with function h i . Consider the matrix A = ( a ij ) i ∈ [ k ] , j ∈ [ n ] where a ij = h i ( j ) . ∑ k i =1 x 2 ∥ x ∥ 2 x 2 = ∥ f ∥ 2 [ ] Let x = A f , we know that E i = 2 i 2 . Our algorithm outputs k k .

  11. An Algebraic View It is instructive to view the Tug-of-War algorithm from linear algebra. Algorithms for Big Data (VI) x The 2-norm of the vector 3/13 Assume that we run the algorithm k times (to apply the averaging trick), each time with function h i . Consider the matrix A = ( a ij ) i ∈ [ k ] , j ∈ [ n ] where a ij = h i ( j ) . ∑ k i =1 x 2 ∥ x ∥ 2 x 2 = ∥ f ∥ 2 [ ] Let x = A f , we know that E i = 2 i 2 . Our algorithm outputs k k . √ k is close to that of f !

  12. This operation is ofuen referred as dimension reduction or metric embedding. Dimension Reduction Suppose , what the matrix does is to map a vector in to a vector in without changing its norm much. The algorithm we met is similar to one important dimension reduction technique - Johnson-Lindenstrauss transformation. Algorithms for Big Data (VI) 4/13

  13. Dimension Reduction without changing its norm much. The algorithm we met is similar to one important dimension reduction technique - Johnson-Lindenstrauss transformation. Algorithms for Big Data (VI) 4/13 Suppose k ≪ n , what the matrix A does is to map a vector in R n to a vector in R k This operation is ofuen referred as dimension reduction or metric embedding.

  14. Dimension Reduction without changing its norm much. The algorithm we met is similar to one important dimension reduction technique - Johnson-Lindenstrauss transformation. Algorithms for Big Data (VI) 4/13 Suppose k ≪ n , what the matrix A does is to map a vector in R n to a vector in R k This operation is ofuen referred as dimension reduction or metric embedding.

  15. Johnson-Lindenstrauss transformation x Algorithms for Big Data (VI) independently. by drawing each of its entry from We construct y x y x y x y Theorem satisfying log where exists an matrix . There points , consider a set of and any positive integer For any 5/13

  16. Johnson-Lindenstrauss transformation Theorem We construct by drawing each of its entry from independently. Algorithms for Big Data (VI) 5/13 For any 0 < ε < 1 2 and any positive integer m , consider a set of m points S ⊆ R n . There exists an matrix A ∈ R k × n where k = O ( ε − 2 log m ) satisfying (1 − ε ) ∥ x − y ∥ ≤ ∥ A x − A y ∥ ≤ (1 + ε ) ∥ x − y ∥ . ∀ x , y ∈ S ,

  17. Johnson-Lindenstrauss transformation Theorem Algorithms for Big Data (VI) 5/13 For any 0 < ε < 1 2 and any positive integer m , consider a set of m points S ⊆ R n . There exists an matrix A ∈ R k × n where k = O ( ε − 2 log m ) satisfying (1 − ε ) ∥ x − y ∥ ≤ ∥ A x − A y ∥ ≤ (1 + ε ) ∥ x − y ∥ . ∀ x , y ∈ S , We construct A by drawing each of its entry from N (0 , 1 k ) independently.

  18. Gaussian Distribution Recall the density function of a variable is The distribution function is d Assume and , then Algorithms for Big Data (VI) 6/13

  19. The distribution function is Gaussian Distribution d Assume and , then Algorithms for Big Data (VI) 6/13 Recall the density function of a variable X ∼ N ( µ , σ 2 ) is e − ( x − µ )2 1 2 σ 2 . f X ( x ) = √ 2 πσ

  20. Gaussian Distribution The distribution function is Algorithms for Big Data (VI) , then and Assume 6/13 Recall the density function of a variable X ∼ N ( µ , σ 2 ) is e − ( x − µ )2 1 2 σ 2 . f X ( x ) = √ 2 πσ ∫ x 1 e − ( t − µ )2 F X ( x ) = 2 σ 2 d t . √ 2 πσ −∞

  21. Gaussian Distribution The distribution function is Algorithms for Big Data (VI) 6/13 Recall the density function of a variable X ∼ N ( µ , σ 2 ) is e − ( x − µ )2 1 2 σ 2 . f X ( x ) = √ 2 πσ ∫ x 1 e − ( t − µ )2 F X ( x ) = 2 σ 2 d t . √ 2 πσ −∞ Assume X 1 ∼ N ( µ 1 , σ 2 1 ) and X 2 ∼ N ( µ 2 , σ 2 2 ) , then aX 1 + bX 2 ∼ N ( a µ 1 + b µ 2 , a 2 σ 2 1 + b 2 σ 2 2 ) .

  22. We need a concentration inequality for squared sum of Gaussians: Proof of JL The statement is equivalent to x y x y We only need to show that for every unit length vector f , Pr f Assume x f , then . Pr Algorithms for Big Data (VI) 7/13

  23. We need a concentration inequality for squared sum of Gaussians: Proof of JL The statement is equivalent to We only need to show that for every unit length vector f , Pr f Assume x f , then . Pr Algorithms for Big Data (VI) 7/13 1 − ε ≤ ∥ A ( x − y ) ∥ ≤ 1 + ε . ∥ x − y ∥

  24. We need a concentration inequality for squared sum of Gaussians: Proof of JL The statement is equivalent to We only need to show that for every unit length vector f , Assume x f , then . Pr Algorithms for Big Data (VI) 7/13 1 − ε ≤ ∥ A ( x − y ) ∥ ≤ 1 + ε . ∥ x − y ∥ Pr [ |∥ A f ∥ − 1 | > ε ] ≤ 1 − δ .

  25. We need a concentration inequality for squared sum of Gaussians: Proof of JL The statement is equivalent to We only need to show that for every unit length vector f , Pr Algorithms for Big Data (VI) 7/13 1 − ε ≤ ∥ A ( x − y ) ∥ ≤ 1 + ε . ∥ x − y ∥ Pr [ |∥ A f ∥ − 1 | > ε ] ≤ 1 − δ . j ∈ [ n ] a ij · f j ∼ N (0 , 1 Assume x = A f , then x i = ∑ k ) .

  26. Proof of JL Pr Algorithms for Big Data (VI) The statement is equivalent to 7/13 We only need to show that for every unit length vector f , 1 − ε ≤ ∥ A ( x − y ) ∥ ≤ 1 + ε . ∥ x − y ∥ Pr [ |∥ A f ∥ − 1 | > ε ] ≤ 1 − δ . j ∈ [ n ] a ij · f j ∼ N (0 , 1 Assume x = A f , then x i = ∑ k ) . We need a concentration inequality for squared sum of Gaussians: [� � ] k � � ∑ x 2 i − 1 ≤ 1 − δ . � ≥ ε � � � � � i =1

  27. Concentration Theorem Assume be i.i.d , then for , Pr The proof is similar to the proof of the Chernofg bound we met before. Algorithms for Big Data (VI) 8/13

  28. Concentration Theorem Algorithms for Big Data (VI) The proof is similar to the proof of the Chernofg bound we met before. 8/13 Pr Assume X 1 , X 2 , . . . , X k be i.i.d N (0 , 1) , then for 0 < ε < 1 , [� � k ] < 2 e − ε 2 k � � ∑ X 2 i − k � ≥ ε k � � 8 . � � � i =1

Recommend


More recommend