frequency moments and counting distinct elements
play

Frequency moments and Counting Distinct Elements Lecture 05 - PowerPoint PPT Presentation

CS 498ABD: Algorithms for Big Data Frequency moments and Counting Distinct Elements Lecture 05 September 8, 2020 Chandra (UIUC) CS498ABD 1 Fall 2020 1 / 28 Part I Frequency Moments Chandra (UIUC) CS498ABD 2 Fall 2020 2 / 28


  1. CS 498ABD: Algorithms for Big Data Frequency moments and Counting Distinct Elements Lecture 05 September 8, 2020 Chandra (UIUC) CS498ABD 1 Fall 2020 1 / 28

  2. Part I Frequency Moments Chandra (UIUC) CS498ABD 2 Fall 2020 2 / 28

  3. Streaming model The input consists of m objects/items/tokens e 1 , e 2 , . . . , e m that are seen one by one by the algorithm. The algorithm has “limited” memory say for B tokens where B < m (often B ⌧ m ) and hence cannot store all the input Want to compute interesting functions over input Examples: Each token in a number from [ n ] High-speed network switch: tokens are packets with source, destination IP addresses and message contents. Each token is an edge in graph (graph streams) Each token in a point in some feature space Each token is a row/column of a matrix Chandra (UIUC) CS498ABD 3 Fall 2020 3 / 28

  4. Frequency Moment Problem(s) A fundamental class of problems Formally introduced in the seminal paper of Alon Matias, 1 Szegedy titled “The Space Complexity of Approximating the Frequency Moments” in 1999. Chandra (UIUC) CS498ABD 4 Fall 2020 4 / 28

  5. Frequency Moment Problem(s) A fundamental class of problems Formally introduced in the seminal paper of Alon Matias, Szegedy titled “The Space Complexity of Approximating the Frequency Moments” in 1999. Stream consists of e 1 , e 2 , . . . , e m where each e i is an integer in [ n ] . We know n in advance (or an upper bound) Example: n = 5 and stream is 4 , 2 , 4 , 1 , 1 , 1 , 4 , 5 = Chandra (UIUC) CS498ABD 4 Fall 2020 4 / 28

  6. ⇒ Frequency Moments Stream consists of e 1 , e 2 , . . . , e m where each e i is an integer in [ n ] . We know n in advance (or an upper bound) Given a stream let f i denote the frequency of i or number of times i is seen in the stream Consider vector f = ( f 1 , f 2 , . . . , f n ) i f k For k � 0 the k ’th frequency moment F k = P i . We can also consider the ` k norm of f which is ( F k ) 1 / k . Example: n = 5 and stream is 4 , 2 , 4 , 1 , 1 , 1 , 4 , 5 tu =3 tf I I f , fit - O m=8 f , =3 - Chandra (UIUC) CS498ABD 5 Fall 2020 5 / 28

  7. Frequency Moments Stream consists of e 1 , e 2 , . . . , e m where each e i is an integer in [ n ] . We know n in advance (or an upper bound) Given a stream let f i denote the frequency of i or number of times i is seen in the stream Consider vector f = ( f 1 , f 2 , . . . , f n ) i f k For k � 0 the k ’th frequency moment F k = P i . Chandra (UIUC) CS498ABD 6 Fall 2020 6 / 28

  8. Frequency Moments Stream consists of e 1 , e 2 , . . . , e m where each e i is an integer in [ n ] . We know n in advance (or an upper bound) Given a stream let f i denote the frequency of i or number of times i is seen in the stream Consider vector f = ( f 1 , f 2 , . . . , f n ) i f k For k � 0 the k ’th frequency moment F k = P i . Important cases/regimes: k = 0 : F 0 is simply the number of distinct elements in stream Chandra (UIUC) CS498ABD 6 Fall 2020 6 / 28

  9. Frequency Moments Stream consists of e 1 , e 2 , . . . , e m where each e i is an integer in [ n ] . We know n in advance (or an upper bound) Given a stream let f i denote the frequency of i or number of times i is seen in the stream Consider vector f = ( f 1 , f 2 , . . . , f n ) i f k For k � 0 the k ’th frequency moment F k = P i . Important cases/regimes: k = 0 : F 0 is simply the number of distinct elements in stream k = 1 : F 1 is the length of stream which is easy Chandra (UIUC) CS498ABD 6 Fall 2020 6 / 28

  10. Frequency Moments Stream consists of e 1 , e 2 , . . . , e m where each e i is an integer in [ n ] . We know n in advance (or an upper bound) Given a stream let f i denote the frequency of i or number of times i is seen in the stream Consider vector f = ( f 1 , f 2 , . . . , f n ) i f k For k � 0 the k ’th frequency moment F k = P i . Important cases/regimes: k = 0 : F 0 is simply the number of distinct elements in stream k = 1 : F 1 is the length of stream which is easy k = 2 : F 2 is fundamental in many ways as we will see = Chandra (UIUC) CS498ABD 6 Fall 2020 6 / 28

  11. Frequency Moments Stream consists of e 1 , e 2 , . . . , e m where each e i is an integer in [ n ] . We know n in advance (or an upper bound) Given a stream let f i denote the frequency of i or number of times i is seen in the stream Consider vector f = ( f 1 , f 2 , . . . , f n ) i f k For k � 0 the k ’th frequency moment F k = P i . Important cases/regimes: k = 0 : F 0 is simply the number of distinct elements in stream k = 1 : F 1 is the length of stream which is easy k = 2 : F 2 is fundamental in many ways as we will see k = 1 : F ∞ is the maximum frequency (heavy hitters prob) Chandra (UIUC) CS498ABD 6 Fall 2020 6 / 28

  12. Frequency Moments Stream consists of e 1 , e 2 , . . . , e m where each e i is an integer in [ n ] . We know n in advance (or an upper bound) Given a stream let f i denote the frequency of i or number of times i is seen in the stream Consider vector f = ( f 1 , f 2 , . . . , f n ) i f k For k � 0 the k ’th frequency moment F k = P i . Important cases/regimes: k = 0 : F 0 is simply the number of distinct elements in stream k = 1 : F 1 is the length of stream which is easy k = 2 : F 2 is fundamental in many ways as we will see k = 1 : F ∞ is the maximum frequency (heavy hitters prob) 0 < k < 1 and 1 < k < 2 Chandra (UIUC) CS498ABD 6 Fall 2020 6 / 28

  13. Frequency Moments Stream consists of e 1 , e 2 , . . . , e m where each e i is an integer in [ n ] . We know n in advance (or an upper bound) Given a stream let f i denote the frequency of i or number of times i is seen in the stream Consider vector f = ( f 1 , f 2 , . . . , f n ) i f k For k � 0 the k ’th frequency moment F k = P i . Important cases/regimes: k = 0 : F 0 is simply the number of distinct elements in stream k = 1 : F 1 is the length of stream which is easy k = 2 : F 2 is fundamental in many ways as we will see k = 1 : F ∞ is the maximum frequency (heavy hitters prob) 0 < k < 1 and 1 < k < 2 2 < k < 1 Chandra (UIUC) CS498ABD 6 Fall 2020 6 / 28

  14. Frequency Moments: Questions Estimation Given a stream and k can we estimate F k exactly/approximately with small memory? Chandra (UIUC) CS498ABD 7 Fall 2020 7 / 28

  15. Frequency Moments: Questions Estimation Given a stream and k can we estimate F k exactly/approximately with small memory? Sampling Given a stream and k can we sample an item i in proportion to f k i ? Chandra (UIUC) CS498ABD 7 Fall 2020 7 / 28

  16. Frequency Moments: Questions Estimation Given a stream and k can we estimate F k exactly/approximately with small memory? Sampling Given a stream and k can we sample an item i in proportion to f k i ? Sketching Given a stream and k can we create a sketch/summary of small size? Chandra (UIUC) CS498ABD 7 Fall 2020 7 / 28

  17. Frequency Moments: Questions Estimation Given a stream and k can we estimate F k exactly/approximately with small memory? Sampling Given a stream and k can we sample an item i in proportion to f k i ? Sketching Given a stream and k can we create a sketch/summary of small size? Questions easy if we have memory Ω ( n ) : store f explicitly. Interesting when memory is ⌧ n . Ideally want to do it with log c n memory for some fixed c � 1 (polylog ( n ) ). Note that log n is roughly the memory required to store one token/number. Chandra (UIUC) CS498ABD 7 Fall 2020 7 / 28

  18. Need for approximation and randomization For most of the interesting problems Ω ( n ) lower bound on memory if one wants exact answer or wants deterministic algorithms. Chandra (UIUC) CS498ABD 8 Fall 2020 8 / 28

  19. Need for approximation and randomization For most of the interesting problems Ω ( n ) lower bound on memory if one wants exact answer or wants deterministic algorithms. Hence focus on (1 ± ✏ ) -approximation or constant factor approximation and randomized algorithms Chandra (UIUC) CS498ABD 8 Fall 2020 8 / 28

  20. Need for approximation and randomization For most of the interesting problems Ω ( n ) lower bound on memory if one wants exact answer or wants deterministic algorithms. Hence focus on (1 ± ✏ ) -approximation or constant factor approximation and randomized algorithms Chandra (UIUC) CS498ABD 8 Fall 2020 8 / 28

  21. Relative approximation Let g ( � ) be a real-valued non-negative function over streams � . Definition Let A ( � ) be the real-valued output of a randomized streaming algorithm on stream � . We say that A provides an ( ↵ , � ) relative approximation for a real-valued function g if for all � :  |A ( � ) � Pr g ( � ) � 1 | > ↵  � . Our ideal goal is to obtain a ( ✏ , � ) -approximation for any given ✏ , � 2 (0 , 1) . Chandra (UIUC) CS498ABD 9 Fall 2020 9 / 28

  22. Additive approximation Let g ( � ) be a real-valued function over streams � . If g ( � ) can be negative, focus on additive approximation. Definition Let A ( � ) be the real-valued output of a randomized streaming algorithm on stream � . We say that A provides an ( ↵ , � ) additive approximation for a real-valued function g if for all � : Pr [ |A ( � ) � g ( � ) | > ↵ ]  � . When working with additive approximations some normalization/scaling is typically necessary. Our ideal goal is to obtain a ( ✏ , � ) -approximation for any given ✏ , � 2 (0 , 1) . Chandra (UIUC) CS498ABD 10 Fall 2020 10 / 28

  23. Part II Estimating Distinct Elements Chandra (UIUC) CS498ABD 11 Fall 2020 11 / 28

Recommend


More recommend