estimating frequency moments
play

Estimating Frequency Moments Moments Estimating F 0 Algorithm - PowerPoint PPT Presentation

Estimating Frequency Moments Anil Maheshwari Frequency Estimating Frequency Moments Moments Estimating F 0 Algorithm Correctness Anil Maheshwari Further Improvements anil@scs.carleton.ca Estimating F 2 School of Computer Science


  1. Estimating Frequency Moments Anil Maheshwari Frequency Estimating Frequency Moments Moments Estimating F 0 Algorithm Correctness Anil Maheshwari Further Improvements anil@scs.carleton.ca Estimating F 2 School of Computer Science Correctness Carleton University Improving Variance Canada Complexity

  2. Outline Estimating Frequency Moments Anil Maheshwari Frequency Moments 1 Frequency Moments Estimating F 0 2 Estimating F 0 Algorithm Algorithm 3 Correctness Further Improvements Correctness 4 Estimating F 2 Further Improvements Correctness 5 Improving Variance Estimating F 2 6 Complexity Correctness 7 Improving Variance 8 Complexity 9

  3. Frequency Moments Estimating Frequency Moments Anil Maheshwari Definition Frequency Moments Let A = ( a 1 , a 2 , . . . , a n ) be a stream, where elements are Estimating F 0 from universe U = { 1 , . . . , u } . Let m i = # of elements in Algorithm A that are equal to i . The k -th frequency moment Correctness u i , where 0 0 = 0 . m k F k = � Further Improvements i =1 Estimating F 2 Correctness Improving Variance Complexity

  4. u Estimating m k Example: F k = � Frequency i Moments i =1 Anil Maheshwari A = (3 , 2 , 4 , 7 , 2 , 2 , 3 , 2 , 2 , 1 , 4 , 2 , 2 , 2 , 1 , 1 , 2 , 3 , 2) and Frequency Moments m 1 = m 3 = 3 , m 2 = 10 , m 4 = 2 , m 7 = 1 , m 5 = m 6 = 0 Estimating F 0 7 Algorithm i = 3 0 + 10 0 + 3 0 + 2 0 + 0 0 + 0 0 + 1 0 = 5 m 0 F 0 = � Correctness i =1 (# of Distinct Elements in A ) Further Improvements 7 Estimating F 2 i = 3 1 + 10 1 + 3 1 + 2 1 + 0 1 + 0 1 + 1 1 = 19 m 1 � F 1 = Correctness i =1 Improving (# of Elements in A ) Variance 7 Complexity i = 3 2 + 10 2 + 3 2 + 2 2 + 0 2 + 0 2 + 1 2 = 123 m 2 F 2 = � i =1 (Surprise Number) . . .

  5. Streaming Problem Estimating Frequency Moments Anil Maheshwari Find frequency moments in a stream Frequency Moments Input: A stream A consisting of n elements from Estimating F 0 universe U = { 1 , . . . , u } . Algorithm Output: Estimate Frequency Moments F k ’s for different Correctness values of k . Further Improvements Estimating F 2 Our Task: Estimate F 0 and F 2 using sublinear space Correctness Reference: The space complexity of estimating frequency Improving Variance moments by Noga Alon, Yossi Matias, and Mario Complexity Szegedy, Journal of Computer Systems and Science, 1999.

  6. Estimating F 0 Estimating Frequency Moments Anil Maheshwari Computation of F 0 Frequency Moments Input: Stream A = ( a 1 , a 2 , . . . , a n ) , where each Estimating F 0 a i ∈ U = { 1 , . . . , u } . Algorithm Output: An estimate ˆ F 0 of number of distinct elements Correctness � ˆ � 1 F 0 ≥ 1 − 2 F 0 in A such that Pr c ≤ F 0 ≤ c c for some Further Improvements constant c using sublinear space. Estimating F 2 Correctness Improving Variance Complexity

  7. Algorithm for Estimating F 0 Estimating Frequency Moments Anil Maheshwari Input: Stream A and a hash function h : U → U Output: Estimate ˆ Frequency F 0 Moments Estimating F 0 Algorithm Step 1: Initialize R := 0 Correctness Step 2: For each elements a i ∈ A do: Further Improvements Compute binary representation of h ( a i ) 1 Estimating F 2 Let r be the location of the rightmost 1 2 Correctness in the binary representation Improving Variance if r > R , R := r 3 Complexity Step 3: Return ˆ F 0 = 2 R Space Requirements = O (log u ) bits

  8. Observation 1 Estimating Frequency Moments Anil Maheshwari Let d to be smallest integer such that 2 d ≥ u ( d -bits are sufficient to represent numbers in U ) Frequency Moments Estimating F 0 Observation 1: Algorithm Pr ( rightmost 1 in h ( a i ) is at location ≥ r + 1) = 1 2 r Correctness Further Improvements Estimating F 2 Correctness Improving Variance Complexity

  9. Observations 2 Estimating Frequency Moments Anil Maheshwari Observation 2: For a i � = a j , Pr ( rightmost 1 in 1 h ( a i ) ≥ r + 1 and rightmost 1 in h ( a j ) ≥ r + 1) = Frequency 2 2 r Moments Estimating F 0 Algorithm Correctness Further Improvements Estimating F 2 Correctness Improving Variance Complexity

  10. Observations 3 Estimating Frequency Moments Anil Maheshwari Fix r ∈ { 1 , . . . , d } . ∀ x ∈ A , define indicator r.v: Frequency Moments � 1 , if the rightmost 1 is at location ≥ r + 1 in h ( x ) I r Estimating F 0 x = 0 , otherwise Algorithm Correctness Let Z r = � I r x (sum is over distinct elements of A ) Further Improvements Observation 3: The following holds: Estimating F 2 Correctness E [ I r x ] = 1 1 2 r Improving x ] = 1 1 − 1 Variance V ar [ I r � � 2 2 r 2 r Complexity E [ Z r ] = F 0 3 2 r V ar [ Z r ] ≤ E [ Z r ] 4

  11. Observation 3.1 Estimating Frequency Moments Anil Maheshwari x ] = 1 E [ I r 2 r Frequency Moments Estimating F 0 Algorithm Correctness Further Improvements Estimating F 2 Correctness Improving Variance Complexity

  12. Observation 3.2 Estimating Frequency Moments Anil Maheshwari x ] 2 = 1 2 ] − E [ I r 1 − 1 V ar [ I r x ] = E [ I r � � x 2 r 2 r Frequency Moments Estimating F 0 Algorithm Correctness Further Improvements Estimating F 2 Correctness Improving Variance Complexity

  13. Observation 3.3 Estimating Frequency Moments Anil Maheshwari E [ Z r ] = F 0 2 r Frequency Moments Estimating F 0 Algorithm Correctness Further Improvements Estimating F 2 Correctness Improving Variance Complexity

  14. Observation 3.4 Estimating Frequency Moments Anil Maheshwari V ar [ Z r ] = F 0 1 1 − 1 � � ≤ F 0 2 r = E [ Z r ] 2 r 2 r Frequency Moments Estimating F 0 Algorithm Correctness Further Improvements Estimating F 2 Correctness Improving Variance Complexity

  15. Observation 4 Estimating Frequency Moments Anil Maheshwari If 2 r > cF 0 , Pr ( Z r > 0) < 1 c Frequency Moments Estimating F 0 Algorithm Correctness Further Improvements Estimating F 2 Correctness Improving Variance Complexity

  16. Chebyshev’s Inequality Estimating Frequency Moments Anil Maheshwari Chebyshev’s Inequality Frequency Moments Pr ( | X − E [ X ] | ≥ α ) ≤ V ar [ X ] α 2 Estimating F 0 Algorithm Correctness Further Improvements Estimating F 2 Correctness Improving Variance Complexity

  17. Observation 5 Estimating Frequency Moments Anil Maheshwari If c 2 r < F 0 , Pr ( Z r = 0) < 1 c Frequency Moments Estimating F 0 Algorithm Correctness Further Improvements Estimating F 2 Correctness Improving Variance Complexity

  18. Observation 6 Estimating Frequency Moments Anil Maheshwari Claim Frequency Moments � ˆ � Set ˆ 1 F 0 ≥ 1 − 2 F 0 = 2 R . We have Pr c ≤ F 0 ≤ c Estimating F 0 c Algorithm Observation 4: if 2 r > cF 0 , Pr ( Z r > 0) < 1 Correctness c Observation 5, if c 2 r < F 0 , Pr ( Z r = 0) < 1 Further c Improvements Estimating F 2 Correctness Improving Variance Complexity

  19. Improving success probability Estimating Frequency Moments Anil Maheshwari Execute the algorithm s times in parallel (with independent hash functions) Frequency Moments Let R to the median value among these runs Estimating F 0 Return ˆ F 0 = 2 R Algorithm Correctness Note: Algorithm uses O ( s log u ) bits. Further Improvements Claim Estimating F 2 Correctness For c > 4 , there exists s = O (log 1 ǫ ) , ǫ > 0 , such that Improving ˆ Pr ( 1 F 0 Variance c ≤ F 0 ≤ c ) ≥ 1 − ǫ . Complexity Technique: Median + Chernoff Bounds

  20. Improving success probability (contd.) Estimating Frequency Moments Anil Maheshwari i -th Run of the Algorithm: Frequency Step 1: Initialize R i := 0 Moments Estimating F 0 Step 2: For each elements a i ∈ A do: Algorithm Compute binary representation of h ( a i ) 1 Correctness Let r be the location of the rightmost 1 in the 2 Further binary representation Improvements if r > R i , R i := r 3 Estimating F 2 Step 3: Return R i Correctness Improving Let R = Median ( R 1 , R 2 , . . . , R s ) Variance Complexity

  21. Indicator Random Variables Estimating Frequency Moments Anil Maheshwari Define X 1 , . . . , X s be indicator random variables: Frequency Moments � c ≤ 2 Ri if success, i.e. 1 0 , F 0 ≤ c Estimating F 0 X i = 1 , otherwise Algorithm Correctness Further E [ X i ] = Pr ( X i = 1) ≤ 2 c = β < 1 2 (Since c > 4 ) Improvements 1 Estimating F 2 s � Let X = X i = Number of failures in s runs 2 Correctness i =1 Improving Variance E [ X ] ≤ sβ < s 3 2 Complexity c ≤ 2 R If X < s 2 , then 1 F 0 ≤ c 4 ( R = Median ( R 1 , R 2 , . . . , R s ) )

  22. Chernoff Bounds Estimating Frequency Moments Anil Maheshwari Chernoff Bounds Frequency Moments If r.v. X is sum of independent identical indicator r.v. and Estimating F 0 0 < δ < 1 , Pr ( X ≥ (1 + δ ) E [ X ]) ≤ e − δ 2 E [ X ] 3 Algorithm Correctness Proof: See my notes Further Improvements Estimating F 2 Correctness Improving Variance Complexity

  23. Main Result Estimating Frequency Moments Anil Maheshwari Claim Frequency Moments For any ǫ > 0 , if s = O (log 1 ǫ ) , Pr ( X < s 2 ) ≥ 1 − ǫ Estimating F 0 Algorithm Correctness Further Improvements Estimating F 2 Correctness Improving Variance Complexity

Recommend


More recommend