Stream Statistics Over Sliding Window Anil Maheshwari Introduction Stream Statistics Over Sliding Window Algorithm Sum Problem Trends References Anil Maheshwari anil@scs.carleton.ca School of Computer Science Carleton University Canada
Outline Stream Statistics Over Sliding Window Anil Maheshwari Introduction Introduction 1 Algorithm Sum Problem Trends Algorithm 2 References Sum Problem 3 Trends 4 References 5
Problem Setting Stream Statistics Over Sliding Window Anil Maheshwari Main Problem Introduction The input is an endless stream of binary bits. At any time, Algorithm among the last N bits received, we are interested in Sum Problem queries that seek an (approximate) count of the number Trends of 1 ’s in the stream among the last k bits, where k ≤ N . References ✏ log 2 N ) that can Result: A data structure of size O ( 1 approximate the count of the number of 1 s within a factor of 1 ± ✏ Reference: Maintaining stream statistics over sliding windows by Datar, Gionis, Indyk, and Motwani, SIAM Jl. Computing 2002
Variants Stream Statistics Over Sliding Window Anil Maheshwari A stream of positive numbers. The query consists of 1 Introduction a value k ∈ { 1 , . . . , N } , and we want to know the Algorithm (approximate) sum of the last k numbers in the Sum Problem stream. Trends A stream consisting of numbers from the set 2 References { − 1 , 0 , +1 } . We want to maintain the sum of last N numbers of the stream. (Requires Ω ( N ) bits of storage to approximate the sum that is within a constant factor of the exact sum.) What are the most popular movies in the last week? 3 What is trending in the last week? 4 . . . 5
Main Problem Stream Statistics Over Sliding Window Anil Maheshwari Main Problem Introduction Report an approximate count of the number of 1 ’s in the Algorithm stream of binary bits among the last k bits, where k ≤ N . Sum Problem Trends References What about Exact Count?
Data Structure Stream Statistics Over Sliding Window Anil Maheshwari Algorithm uses two data structures: Introduction Time Stamps: To track the most recent N bits. Algorithm Buckets: O (log N ) buckets maintain the 1 ’s among the Sum Problem latest N bits. Trends References
Update Stream Statistics Over Sliding Window Anil Maheshwari Introduction 1 1 0 1 1 0 1 1 0 1 0 1 1 1 0 0 Algorithm Sum Problem Trends References 1 1 0 1 1 0 1 1 0 1 0 1 1 1 0 0 0 1 1 0 1 1 0 1 1 0 1 0 1 1 1 0 0 0 1 1 1 0 1 1 0 1 1 0 1 0 1 1 1 0 0 0 1 1
Complexity Analysis Stream Statistics Over Sliding Window Anil Maheshwari Space: O (log 2 N ) bits 1 Introduction Total Time (per update): O (log N ) 2 Algorithm Sum Problem Trends References
Answering Query Stream Statistics Over Sliding Window Anil Maheshwari Query Problem Introduction For any query value k ∈ { 1 , . . . , N } , report an Algorithm approximate count of the number of 1 ’s among the latest Sum Problem k bits of the stream. Trends References Initialize C := 0 1 Traverse buckets from right to left. For each bucket of 2 type B i that is encountered in the traversal: B i is completely contained in the window: 1 C := C + 2 i B i is completely outside the window: 2 C remains unchanged Partially overlaps the window: 3 C := C + 2 i 2 Report C as an approximate count 3
Analysis of Approximation Factor Stream Statistics Over Sliding Window Anil Maheshwari 2-factor approximation Introduction Let C ⇤ be the true count of number of 1 s in the query Algorithm window of size k . Then, 1 C C ∗ ≤ 2 . 2 ≤ Sum Problem Trends References
Improvements Stream Statistics Over Sliding Window Anil Maheshwari Let r > 2 be an integer parameter. Introduction Maintain r − 1 or r copies of B i for each i ≥ 1 Algorithm ( B 0 and the largest bucket may have fewer) Sum Problem At any time we exceed r copies of any type of buckets, we Trends take the oldest two buckets and merge them to form a new References bucket of the next size. Answer queries as before.
Imrovements (contd.) Stream Statistics Over Sliding Window Anil Maheshwari Claim Introduction 1 C 1 For this setting, we have 1 − C ∗ ≤ 1 + r � 1 . r � 1 ≤ Algorithm ✏ log 2 N ) If r = 1 + 1 ✏ , we obtain a data structure of size O ( 1 Sum Problem Trends that approximates the count of the number of 1 s within a References factor of 1 ± ✏ . j � 1 2 i True Count ≥ 1 + ( r − 1) P i =1 Error ≤ 2 j � 1 − 1 2 j − 1 � 1 1 Therefore, 2 i ≤ r � 1 j − 1 1+( r � 1) P i =1
Computation of Sum Stream Statistics Over Sliding Window Anil Maheshwari The Sum Problem Introduction A stream of positive numbers. The query consists of a Algorithm value k ∈ { 1 , . . . , N } , and we want to know the Sum Problem (approximate) sum of the last k numbers in the stream. Trends References 5 7 2 3 9 4 1 6 11 2 4 3
Approach I: Computation of Sum Stream Statistics Over Sliding Window Anil Maheshwari If the next number in the stream is x , insert x 1 0 s in the Introduction stream Algorithm Sum Problem 5 7 2 3 9 4 1 6 11 2 4 3 Trends References
Approach II: Computation of Sum Stream Statistics Over Sliding Window Anil Maheshwari Assuming d -bit numbers. For each bit position i , maintain Introduction a stream. Let C i be the value of approximate number of d � 1 Algorithm 2 i C i 1 0 s in the stream i . Report approximate sum as P Sum Problem i =0 Trends References 5 7 2 3 9 4 1 6 11 2 4 3 2 3 0 0 0 0 1 0 0 0 1 0 0 0 2 2 0 1 0 0 0 1 0 1 0 0 1 0 2 1 0 1 1 1 0 0 0 1 1 1 0 1 2 0 1 1 0 1 1 0 1 0 1 0 0 1
What is Trending? Stream Statistics Over Sliding Window Anil Maheshwari Among the last 10 12 movie tickets sold, list all popular Introduction movies? Algorithm Sum Problem Let c := 10 � 3 . Maintain (decaying) scores for movies Trends whose threshold is at least ⌧ ∈ (0 , 1) . For each new sale References of ticket (say for Movie M ): For each movie whose score is being maintained, its 1 new score is reduced by a factor of (1 − c ) If we have the score of M , add 1 to that score. 2 Otherwise, create a new score for M and initialize it to 1 Remove any score that falls below ⌧ 3
Questions Stream Statistics Over Sliding Window Anil Maheshwari How many scores are maintained at any given time? 1 Introduction What is sum of all scores at any point of time? 2 Algorithm Answer above questions for ⌧ = 1 2 and 1 3 . 3 Sum Problem Trends References
Variants Stream Statistics Over Sliding Window Anil Maheshwari Min/Max 1 Introduction Stream with ± numbers 2 Algorithm Lower Bounds: Results are more-or-less optimal up 3 Sum Problem to constant factors Trends References . . . 4
Conclusions Stream Statistics Over Sliding Window Anil Maheshwari Main References: Introduction Maintaining stream statistics over sliding windows, by Algorithm 1 Sum Problem Datar, Gionis, Indyk, and Motwani, SIAM Jl. Trends Computing 2002. References Chapter in MMDS book (mmds.org) 2 Chapter on Data Streams in My Notes on Topics in 3 Algorithm Design
Recommend
More recommend