stream statistics over sliding window
play

Stream Statistics Over Sliding Window Sum Problem Trends - PowerPoint PPT Presentation

Stream Statistics Over Sliding Window Anil Maheshwari Introduction Algorithm Stream Statistics Over Sliding Window Sum Problem Trends References Anil Maheshwari School of Computer Science Carleton University Canada Outline Stream


  1. Stream Statistics Over Sliding Window Anil Maheshwari Introduction Algorithm Stream Statistics Over Sliding Window Sum Problem Trends References Anil Maheshwari School of Computer Science Carleton University Canada

  2. Outline Stream Statistics Over Sliding Window Anil Maheshwari Introduction Algorithm Introduction 1 Sum Problem Trends References Algorithm 2 Sum Problem 3 Trends 4 References 5

  3. Problem Setting Stream Statistics Over Sliding Window Anil Maheshwari Introduction Main Problem Algorithm Sum Problem The input is an endless stream of binary bits. At any time, Trends among the last N bits received, we are interested in References queries that seek an approximate count of the number of 1 ’s in the stream among the last k bits, where k ≤ N . ǫ log 2 N ) that can Result: A data structure of size O ( 1 approximate the count of the number of 1 s within a factor of 1 ± ǫ Reference: Maintaining stream statistics over sliding windows by Datar, Gionis, Indyk, and Motwani, SIAM Jl. Computing 2002

  4. Variants Stream Statistics Over Sliding Window Anil Maheshwari Introduction Algorithm A stream of positive numbers. The query consists of 1 Sum Problem a value k ∈ { 1 , . . . , N } , and we want to know the Trends (approximate) sum of the last k numbers in the References stream. (Uses sublinear space.) A stream consisting of numbers from the set 2 {− 1 , 0 , +1 } . We want to maintain the sum of last N numbers of the stream. (Requires Ω( N ) bits of storage to approximate the sum that is within a constant factor of the exact sum.) What are the most popular movies in the last week? 3 What is trending in the last week? 4

  5. Main Problem Stream Statistics Over Sliding Window Anil Maheshwari Main Problem Introduction Report an approximate count of the number of 1 ’s in the Algorithm stream of binary bits among the last k bits, where k ≤ N . Sum Problem Trends References What about Exact Count?

  6. Algorithm for Approximate Count Stream Statistics Over Sliding Window Anil Maheshwari Algorithm uses two structures: Introduction Time Stamps: To track the most recent N bits. Algorithm Buckets: With the following features: Sum Problem Trends O (log N ) buckets maintain the 1 ’s among the latest References N bits Number of 1 ’s in a bucket is a power of 2 Each 1 -bit is assigned to exactly one bucket ( 0 -bit may or may not be assigned to any bucket) At most two buckets of a given size (size = #1 s) Each bucket stores time stamp of its most recent bit Most recent bit of any bucket is 1 -bit

  7. Algorithm contd. Stream Statistics Over Sliding Window Anil Maheshwari On receiving a new bit in the data stream: Introduction 0 -bit : Increment the time stamp of each of the buckets by Algorithm 1 , and if any of the buckets time stamp exceeds N , we Sum Problem discard that bucket. Trends 1 -bit : Following updates are done: References Create a bucket B 0 consisting of the newest 1 -bit 1 with a time stamp of 1 . Scan the list of buckets in order of increasing size. 2 Case 1: Two buckets of size 1 . Increment time stamp of each bucket (and possibly discard buckets whose time stamps exceed N ) Case 2: Three buckets of type B 0 .

  8. Illustration Stream Statistics Over Sliding Window Anil Maheshwari Introduction Time Stamp N Time Stamp 1 Algorithm N Sum Problem Unseen part of the stream B 2 B 1 B 1 B 0 B 0 Trends 1 0 0 1 1 0 0 1 1 0 1 1 0 1 1 1 0 1 1 0 1 1 0 A References B 2 B 1 B 1 B 0 B 0 1 0 0 1 1 0 0 1 1 0 1 1 0 1 1 1 0 1 1 0 1 1 0 B B 2 B 2 B 1 B 0 1 0 0 1 1 0 0 1 1 0 1 1 0 1 1 1 0 1 1 0 1 1 0 C B 2 B 2 B 1 B 0 B 0 1 0 0 1 1 0 0 1 1 0 1 1 0 1 1 1 0 1 1 0 1 1 0 D B 2 B 1 B 0 B 0 1 0 0 1 1 0 0 1 1 0 1 1 0 1 1 1 0 1 1 0 1 1 0 E

  9. Space Complexity Stream Statistics Over Sliding Window Anil Maheshwari We have: Introduction O (log N ) buckets as the size of window is N Algorithm Bucket B i stores 2 i 1 -bits Sum Problem For each bucket we store its time stamp and its size Trends References Time stamps requires O (log N ) bits Storing i with bucket B i is sufficient for its size As 0 ≤ i ≤ log N , i can represented using O (log log N ) bits Total space required O (log N (log N + log log N )) = O (log 2 N ) bits

  10. Time Complexity Stream Statistics Over Sliding Window Anil Maheshwari On receiving a 0 -bit: - We update time stamps of each of the O (log N ) buckets Introduction Algorithm - Requires O (log N ) time Sum Problem Trends On receiving a 1 -bit: References - We update the time stamps of each bucket - Potentially merge & cascade buckets - Time (merge & cascade) ≈ # of buckets - Can be performed in O (log N ) time Total Time (per update): O (log n )

  11. Answering Query Stream Statistics Over Sliding Window Anil Maheshwari Query Problem Introduction For any query value k ∈ { 1 , . . . , N } , report an Algorithm approximate count of the number of 1 ’s among the latest Sum Problem k bits of the stream. Trends References Initialize count := 0 1 Traverse buckets from right to left. For each bucket of 2 type B i that is encountered in the traversal: B i is completely contained in the window: 1 Increment count by 2 i B i is completely outside the window: 2 count remains unchanged Partially overlaps the window: 3 Increment count by 2 i 2 Report count as an approximate count. 3

  12. Analysis of Approximation Factor Stream Statistics Over Sliding Window Anil Maheshwari Observation: Except of one bucket, say B j , that is partially in the window of size k , we know that all buckets Introduction Algorithm of type B 0 , B 1 , . . . , B j − 1 are completely within the window. Sum Problem - For those buckets, the count of the number of 1 -bits is Trends j − 1 2 i ≥ 2 j − 1 References � i =0 - The true count (and the approximate count) value is at least 2 j (as the last bit of B j is in the window of interest) - For the bucket B j that overlaps partially with the window, the number of bits that can be in the true count can be anywhere from 0 upto 2 j − 1 . But we only took a contribution of 2 j − 1 in the reported count value - Ratio of the true count to the reported count is within a factor of ( 1 2 , 2) .

  13. Refining the Analysis Stream Statistics Over Sliding Window Anil Maheshwari - Let r ≥ 2 be an integer parameter. - Maintain r − 1 or r copies of B i for each i ≥ 1 (buckets Introduction Algorithm of type B 0 may be less than r − 1 ) Sum Problem - At any time we exceed r copies of any type of buckets, Trends we take the oldest two buckets and merge them to form a References new bucket of the next size. - For the query, assume that the bucket labelled B j is only partially overlapping the query window. j − 1 ( r − 1)2 i 1 -bits are in the query window. - At least 1 + � i =1 - True count and the reported value are within a factor of 1 1 ± r − 1 ⇒ By setting r = 1 + 1 = ǫ , we obtain a data structure of ǫ log 2 N ) that approximates the count of the size O ( 1 number of 1 s within a factor of 1 ± ǫ .

  14. Computation of Sum Stream Statistics Over Sliding Window Anil Maheshwari The Sum Problem Introduction A stream of positive numbers. The query consists of a Algorithm value k ∈ { 1 , . . . , N } , and we want to know the Sum Problem (approximate) sum of the last k numbers in the stream. Trends References 5 7 2 3 9 4 1 6 11 2 4 3

  15. Approach I: Computation of Sum Stream Statistics Over Sliding Window Anil Maheshwari Assuming d -bit numbers. For each bit position, maintain a stream. Approximate number of 1 ′ s in each stream. Introduction Algorithm d − 1 count ( i )2 i � Report approximate sum value as Sum Problem i =0 Trends References 5 7 2 3 9 4 1 6 11 2 4 3

  16. Approach II: Computation of Sum Stream Statistics Over Sliding Window Anil Maheshwari If the next number in the stream is x , insert x 1 ′ s in the stream Introduction Algorithm Sum Problem 5 7 2 3 9 4 1 6 11 2 4 3 Trends References

  17. What is Trending? Stream Statistics Over Sliding Window Anil Maheshwari Among the last 10 12 movie tickets sold, list all popular Introduction movies? Algorithm Sum Problem Let c := 10 − 3 . Maintain (decaying) scores for movies Trends whose threshold is at least τ ∈ (0 , 1) . For each new sale References of ticket (say for Movie M ): For each movie whose score is being maintained, its 1 new score is reduced by a factor of (1 − c ) If we have the score of M , add 1 to that score. 2 Otherwise, create a new score for M and initialize it to 1 Remove any score that falls below τ 3

  18. Questions Stream Statistics Over Sliding Window Anil Maheshwari How many scores are maintained at any given time? 1 Introduction What is sum of all scores at any point of time? 2 Algorithm Answer above questions for τ = 1 2 and 1 3 . 3 Sum Problem Trends References

  19. Conclusions Stream Statistics Over Sliding Window Anil Maheshwari Main References: Introduction Maintaining stream statistics over sliding windows, by Algorithm 1 Datar, Gionis, Indyk, and Motwani, SIAM Jl. Sum Problem Trends Computing 2002. References Chapter in MMDS book (mmds.org) 2 Chapter on Data Streams in My Notes on Topics in 3 Algorithm Design

Recommend


More recommend