A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura Iowa State University 1
Outline of Talk Introduction Algorithm Analysis 2
Time 1 C t t t t t Data stream: 1 2 3 5 4 v v v v v 1 2 3 5 4 For simplicity assume unit valued elements 3
Most recent time window of duration W 1 C Current time t t t t t Data stream: 1 2 3 5 4 v v v v v 1 2 3 5 4 Goal: Compute the sum of elements with time stamps in time window [ C W , C ] v i C W t C i 4
Example I: All packets on a network link, maintain the number of different ip sources in the last one hour Example II: Large database, continuously maintain averages and frequency moments 5
t t t t t Data stream: 1 2 3 4 5 v v v v v 1 2 3 5 4 Synchronous stream t i : In ascending order Asynchronous stream t i : No order guaranteed 6
Why Asynchronous Data Streams? Synchronous stream Asynchronous stream Network Network delay & multi-path routing Synchronous Asynchronous Synchronous Merge w/o control 7
Processing Requirements: • One pass processing • Small workspace: poly-logarithmic in the size of data • Fast processing time per element • Approximate answers are ok 8
Our results: A deterministic data aggregation algorithm log W Time: O log B log W log B Space: O log B log W | X S | Relative Error: S 9
Previous Work: [Datar, Gionis, Indyk, Motwani. SIAM Journal on Computing , 2002] Deterministic, Synchronous Merging buckets [Tirthapura, Xu, Busch, PODC, 2006] Randomized, Asynchronous Random sampling 10
Outline of Talk Introduction Algorithm Analysis 11
Time 1 C Current time t t t t t t Data stream: 1 2 3 5 6 4 For simplicity assume unit valued elements 12
Most recent time window of duration W 1 C Current time t t t t t t Data stream: 1 2 3 5 6 4 Goal: Compute the sum of elements with time stamps in time window [ C W , C ] 13
W W W W W 1 W 2 W 3 W 4 W Divide time into periods of duration W 14
sliding window W C T 1 W 2 W 3 W 4 W The sliding window may span at most two time periods 15
sliding window W S S right left C T 1 W 2 W 3 W 4 W S S S 1 2 Sum can be written as two sub-sums In two time periods 16
sliding window W S S left right C T 1 W 2 W 3 W 4 W D D left right Data structure that S maintains an estimate of left In left time period 17
S left 1 W T D left Without loss of Generality, D Consider data structure left [ W 1 , ] in time period 18
Data structure consists of various levels D 1 D D 2 left D L 2 L is an upper bound of the sum in a period 19
D Consider level i i 1 Bucket at Level 0 1 W Time period 2 i 1 Counts up to elements 20
t 1 t W Stream: 1 1 1 1 W Increase counter value 21
t t 1 t W Stream: 1 2 2 2 1 W Increase counter value 22
t t t 1 t W Stream: 1 2 3 3 3 1 W Increase counter value 23
...... t t t t 1 t i W 1 Stream: i 1 2 3 2 1 1 2 1 1 2 1 i 1 W Increase counter value 24
...... t t t t t 1 t i W 1 Stream: i 1 2 3 2 1 i 1 2 1 2 1 2 i 1 1 W 2 2 i i W 1 W W 1 2 2 Split bucket 2 i 1 Counter threshold of reached 25
...... t t t t t 1 t i W 1 Stream: i 1 2 3 2 1 i 1 2 1 2 1 2 2 i i W 1 W W 1 2 2 New buckets have threshold also 2 i 1 26
...... t t t t t t W 1 Stream: i 1 1 2 3 2 1 i 1 i 2 1 t i 2 1 2 1 2 1 2 1 2 i i W 1 W W 1 2 2 Increase appropriate bucket 27
...... t t t t t t t W 1 Stream: i 1 1 1 2 3 2 1 i 1 i i 2 t W 2 1 2 2 2 i 1 2 2 2 1 2 1 i i W 1 W W 1 2 2 Increase appropriate bucket 28
...... t t t t t t t t W 1 Stream: i 1 1 1 2 3 2 1 1 i 1 i i 2 2 1 2 2 i 2 3 1 t i 1 2 2 3 2 2 2 1 i i W 1 W W 1 2 2 Increase appropriate bucket 29
...... m t t W W Stream: 1 1 t m 2 2 x 2 i 1 1 W 1 W W 1 2 2 2 i 2 i W 3 W 3 W W Split bucket 1 1 2 4 4 30
...... m t t Stream: 1 x 1 W 1 2 2 i 2 i W 3 W 3 W W 1 1 2 4 4 31
...... m t t t W 3 W Stream: 1 1 t m 1 m 1 2 4 x 1 W 1 2 2 i 1 2 i W 3 W 3 W W 1 1 2 4 4 Increase appropriate bucket 32
...... m t ...... t t t Stream: 1 m 1 m x 1 W 1 2 x 2 i 1 4 W W 3 3 W W 3 W W Split bucket 1 1 1 2 2 4 4 4 2 2 i i W 3 W 5 W 5 W 1 2 1 4 8 8 33
...... m t ...... t t t Stream: 1 m 1 m x 1 W 1 2 x 4 3 W W 1 4 2 2 i i W 3 W 5 W 5 W 1 2 1 4 8 8 34
Splitting Tree 2 i 1 1 W x 2 i 1 1 W 1 W W 1 2 2 x 2 i 1 2 i x 2 i 1 4 k W 3 W 3 W W 1 1 2 4 4 x x 3 2 W 3 W 5 W 5 W 1 2 1 4 8 8 35
2 i 1 1 W Max depth = log W Leaf buckets of duration 1 are not split any further t 1 1 t t t 1 2 1 2 36
2 i 1 1 W Leaf buckets The initial bucket may be split into many buckets 37
2 i 1 1 W Leaf buckets Due to space limitations 2 a log W we only keep the last buckets 38
S 1 W T S Suppose we want to find the sum of elements in time period [ T , W ] 39
S 1 W T 2 1 a Consider various levels 2 2 of splitting threshold a 2 k a 2 k 1 a 40
S 1 W T 2 1 a First level with a leaf bucket 2 2 that intersects timeline a 2 k a 2 k 1 a 41
S 1 W T Estimate of S: X x x x z 1 2 x x x 2 k z 1 2 a z a Consider buckets on right of timeline 42
S 1 W T OR 2 1 a First level with a leaf bucket 2 2 On right timeline a 2 k a 2 k 1 a 43
Outline of Talk Introduction Algorithm Analysis 44
S 1 W T 2 i 1 Suppose that we use level in order to compute the estimate 45
t Stream: k x x 1 b b t t l r 2 i 1 Consider splitting threshold level A data element is counted in the appropriate bucket 46
t Stream: k t t t r k l t k t t l r We can assume that the element is placed in the respective bucket 47
t Stream: k 2 i 1 t t l r 2 i t 2 i k t t t t t t r l r 1 l l r 2 2 We can assume that when bucket splits the element is placed in an arbitrary child bucket 48
t Stream: k 2 i 1 t t l r t 2 i 2 i k t t t t t t r l r 1 l l r 2 2 t t t t r l If: GOOD! k l 2 Element counted in correct bucket 49
t Stream: k 2 i 1 t t l r t 2 i 2 i k t t t t t t r l r 1 l l r 2 2 t t r 1 t t l If: BAD! r k 2 Element counted in wrong bucket 50
S 1 W T Consider Leaf Buckets t k 1 W T t W k GOOD! If 51
S 1 W T Consider Leaf Buckets t k 1 W t k T BAD! If Element counted in wrong bucket 52
S 1 W T Consider Leaf Buckets t k 1 W X S | Z | | Z | 1 2 Z :elements of left part counted on right 1 Z :elements of right part counted on left 2 53
T W 1 t Z k 1 elements of left part counted on right t k 1 W Must have been initially inserted in one of these buckets 54
log W Since tree depth | Z | O ( 2 i log W ) 1 55
log W Since tree depth | Z | O ( 2 i log W ) 1 Similarly, we can prove | Z | O ( 2 i log W ) 2 Therefore: | X S | || Z | | Z || O ( 2 i log W ) 1 2 56
2 a log W Since S ( 2 i log W ) It can be proven 57
2 a log W Since S ( 2 i log W ) It can be proven Combined with | X S | O ( 2 i log W ) | X S | We obtain relative error : S 58
Recommend
More recommend