algorithms for big data ii
play

Algorithms for Big Data (II) Chihao Zhang Shanghai Jiao Tong - PowerPoint PPT Presentation

Algorithms for Big Data (II) Chihao Zhang Shanghai Jiao Tong University Sept. 27, 2019 Algorithms for Big Data (II) 1/17 Review of Last Lecture Last time, we met the streaming model. We studied Morris algorithm for counting the number of


  1. Algorithms for Big Data (II) Chihao Zhang Shanghai Jiao Tong University Sept. 27, 2019 Algorithms for Big Data (II) 1/17

  2. Review of Last Lecture Last time, we met the streaming model. We studied Morris’ algorithm for counting the number of elements in a data stream. We used Averaging trick and Median trick to boost the quality of Morris’ algorithm. Today we will take a closer look at the mathematical tools needed in the course. Algorithms for Big Data (II) 2/17

  3. Markov’s Ineqality Proof. Algorithms for Big Data (II) otherwise. Markov’s inequality 3/17 a For every nonnegative random variable X and every a ≥ 0 , it holds that Pr [ X ≥ a ] ≤ E [ X ] .    1 , if x ≥ a ,  Let 1 X ≥ a be the indicator random variable such that 1 X ≥ a ( x ) =  0 , Then it holds that X ≥ a · 1 X ≥ a . Take the expecation on both sides, we obtain E [ X ] ≥ a · E [ 1 X ≥ a ] = a · Pr [ X ≥ a ] . □

  4. Chebyshev’s Ineqality Proof. Algorithms for Big Data (II) (Markov’s inequality) E Chebyshev’s inequality 4/17 For every random variable X and every a ≥ 0 , it holds that Pr [ � � X − E [ X ] � � ≥ a ] ≤ Var [ X ] . a 2 [ ( X − E [ X ]) 2 ≥ a 2 ] Pr [ � � X − E [ X ] � � ≥ a ] = Pr [ ( X − E [ X ]) 2 ] ≤ a 2 = Var [ X ] . a 2 □

  5. Chernoff’s Bound random variable X . Algorithms for Big Data (II) n n E n Chernofg bound E It holds that 5/17 Let X 1 , . . . , X n be independent Bernoulli trials with E [ X i ] = p i for every i = 1 , . . . , n . Let X = ∑ n i =1 X i . Then for every 0 < ε < 1 , it holds that Pr [ � � X − E [ X ] � − ε 2 E [ X ] ( ) � > ε · E [ X ]] ≤ 2 exp . 3 The main tool to prove Chernofg bound is the moment generating function e tX for a [ e tX ] [ i =1 X i ] [ e tX i ] ∏ ∏ ( (1 − p i ) + p i e t ) e t ∑ n = E = = i =1 i =1 ∏ ( ∏ e − (1 − e t ) p i = e − (1 − e t ) E [ X ] . ) = 1 − (1 − e t ) p i ≤ i =1

  6. Proof of Chernoff Bound E Algorithms for Big Data (II) Combining the bounds for both lower and upper tails, we finish the proof. We can similarly prove that 6/17 For every t > 0 , we have [ e tX ] [ e tX ≥ e t (1+ ε ) E [ X ] ] e t (1+ ε ) E [ X ] ≤ e − (1 − e t ) E [ X ] Pr [ X ≥ (1 + ε ) E [ X ]] = Pr ≤ e t (1+ ε ) E [ X ] . To find an optimal t , we calculate the derivative of above and obtain for t = log (1 + ε ) , ) E [ X ] e ε ( ≤ e − ε 2 E [ X ]/3 . Pr [ X ≥ (1 + ε ) E [ X ]] ≤ (1 + ε ) 1+ ε Pr [ X ≤ (1 − ε ) E [ X ]] ≤ e − ε 2 E [ X ]/2 .

  7. Balls-into-Bins Balls-into-Bins is a simple yet important probabilistic model. log n log log n . It models an important object, the Hash functions. Algorithms for Big Data (II) 7/17 Suppose we throw m ball into n bins uniformly and independently, what is the (expected) maxload of the bins? ( ) When m = n , the answer is Θ

  8. Independence n Algorithms for Big Data (II) 8/17 Pr A set of random variables X 1 , . . . , X n are mutually independent if for every index set I ⊆ [ n ] and values { x i } i ∈ I ,         ∧ ∏  X i = x i  = Pr [ X i = x i ] .   i =1 i ∈ I

  9. k -wise Independence A weaker notion of independence is the k -wise independence. Algorithms for Big Data (II) n 9/17 Pr A set of random variables X 1 , . . . , X n are k -wise independent if for every index set I ⊆ [ n ] with | I | ≤ k and values { x i } i ∈ I ,         ∧ ∏ X i = x i = Pr [ X i = x i ] .     i ∈ I i =1 We call X 1 , . . . , X n pairwise independent if they are 2 -wise independent.

  10. But they are not mutually independent! Examples Algorithms for Big Data (II) 10/17 Suppose we have n independent bits X 1 , . . . , X n ∈ { 0 , 1 } . (∑ ) For every I ∈ [ n ] , define Y I = mod 2 . j ∈ I X j The random bits { Y I } I ⊆ [ n ] are pairwise independent.

  11. Property of Pairwise Independence i Algorithms for Big Data (II) n i E n X j Theorem n X i X j E 11/17 E n Proof. For pairwise independent X 1 , . . . , X n , we have Var [ X 1 + · · · + X n ] = Var [ X 1 ] + · · · + Var [ X n ] . [ ( X 1 + · · · + X n ) 2 ] − ( E [ X 1 + · · · + X n ]) 2 Var [ X 1 · · · + X n ] = E [ ] [ ] [ ]� − � � � E [ X i ] 2 + 2 ∑ ∑ ∑ ∑ X 2 = + 2 E [ X i ] E � � i =1 1 ≤ i < j ≤ n i =1 1 ≤ i < j ≤ n [ ] ∑ − E [ X i ] 2 ) ∑ ( X 2 = = Var [ X i ] . i =1 i =1 □

  12. Hash Functions In Balls-into-Bins, we distribute balls uniformly and independently. Hash functions are important data structures that have been widely used in computer science. We will contruct Hash functions with theoretical guarantees. Algorithms for Big Data (II) 12/17 This can be implemented using Hash functions

  13. Universal Hash Function Families have Algorithms for Big Data (II) k 13/17 Let H be a family of functions from [ m ] to [ n ] where m ≥ n . We call H k -universal if for every distinct x 1 , . . . , x k ∈ [ m ] , we have 1 Pr h ∈H [ h ( x 1 ) = h ( x 2 ) = · · · = h ( x k )] ≤ n k − 1 . We call H strongly k -universal if for every distinct x 1 , . . . , x k ∈ [ m ] , y 1 , . . . , y k ∈ [ n ] , we       = 1   ∧  h ( x i ) = y i  n k . Pr h ∈H   i =1

  14. 14/17 X ij Algorithms for Big Data (II) Therefore, Pr n n Pr E Assume the maxload is Y , which causes Balls-into-Bins with 2 -Universal Hash Family Let X ij be the indicator of the event: i -th ball and j -th ball fall into the same bin. Let X = ∑ 1 ≤ i ≤ j ≤ m X ij be the total number of collisions. Then [ ] ) 1 n < m 2 ( m ∑ E [ X ] = ≤ 2 n . 2 1 ≤ i < j ≤ m ( Y ) ≤ X collisions. Then 2 [ ( Y ] [ ] ≥ m 2 X ≥ m 2 ) ≤ 1 ≤ Pr n . 2 [ ] √ √ ≤ 1 Y − 1 ≥ m 2/ n 2 . The maxload is 1 + 2 n when m = n with probability at least 1/2 .

  15. Algorithms for Big Data (II) The family is 15/17 Construction of 2 -Universal Family Now we explicitly construct a universal family of Hash functions from [ m ] to [ n ] . Let p ≥ m be a prime and let h a , b ( x ) = (( ax + b ) mod p ) mod n . H = { h a , b : 1 ≤ a ≤ p − 1 , 0 ≤ b ≤ p − 1 } .

  16. Proof This is because Algorithms for Big Data (II) has a unique solution 16/17 We compute the colliding probability We show that H constructed above is indeed 2 -universal. Pr h a , b ∈H [ h a , b ( x ) = h a , b ( y )] for x � y . First, we have if x � y , then ax + b � ay + b mod p . Moreover ( a , b ) → ( ax + b , ay + b ) is a bijection from { 1 , . . . , p − 1 } × { 0 , . . . , p − 1 } to { ( u , v ) : 0 ≤ u , v ≤ p − 1 , u � v } .       a = v − u ax + b = u mod p   y − x mod p   ay + b = v mod p b = u − ax mod p .

  17. Proof (cont’d) Therefore, The probabilty is therefore at most Algorithms for Big Data (II) 17/17 Pr h a , b ∈H [ h a , b ( x ) = h a , b ( y )] = Pr ( u , v ) ∈ F 2 p : u � v [ u = v mod n ] . The number of ( u , v ) with u � v is p ( p − 1) . For each u , the number of values of v with u = v mod n is at most ⌈ p / n ⌉ − 1 . p ( ⌈ p / n ⌉ − 1) ≤ 1 n . p ( p − 1)

Recommend


More recommend