over a sliding window
play

over a Sliding Window Costas Busch Rensselaer Polytechnic Institute - PowerPoint PPT Presentation

A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura Iowa State University 1 Outline of Talk Introduction Algorithm Analysis 2 Time 1 C


  1. A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura Iowa State University 1

  2. Outline of Talk Introduction Algorithm Analysis 2

  3. Time 1 C t t t t t Data stream: 1 2 3 5 4 v v v v v 1 2 3 5 4 For simplicity assume unit valued elements 3

  4. Most recent time window of duration W 1 C Current time t t t t t Data stream: 1 2 3 5 4 v v v v v 1 2 3 5 4 Goal: Compute the sum of elements with time stamps in time window [ C  W , C ]  v i C W t C    i 4

  5. Example I: All packets on a network link, maintain the number of different ip sources in the last one hour Example II: Large database, continuously maintain averages and frequency moments 5

  6. t t t t t Data stream: 1 2 3 4 5 v v v v v 1 2 3 5 4 Synchronous stream t i : In ascending order Asynchronous stream t i : No order guaranteed 6

  7. Why Asynchronous Data Streams? Synchronous stream Asynchronous stream Network Network delay & multi-path routing Synchronous Asynchronous Synchronous Merge w/o control 7

  8. Processing Requirements: • One pass processing • Small workspace: poly-logarithmic in the size of data • Fast processing time per element • Approximate answers are ok 8

  9. Our results: A deterministic data aggregation algorithm log W   Time: O log B       log W log B    Space: O log B log W      | X S |  Relative Error:   S 9

  10. Previous Work: [Datar, Gionis, Indyk, Motwani. SIAM Journal on Computing , 2002] Deterministic, Synchronous Merging buckets [Tirthapura, Xu, Busch, PODC, 2006] Randomized, Asynchronous Random sampling 10

  11. Outline of Talk Introduction Algorithm Analysis 11

  12. Time 1 C Current time t t t t t t Data stream: 1 2 3 5 6 4 For simplicity assume unit valued elements 12

  13. Most recent time window of duration W 1 C Current time t t t t t t Data stream: 1 2 3 5 6 4 Goal: Compute the sum of elements with time stamps in time window [ C  W , C ] 13

  14. W W W W W 1 W 2 W 3 W 4 W Divide time into periods of duration W 14

  15. sliding window W C T 1 W 2 W 3 W 4 W The sliding window may span at most two time periods 15

  16. sliding window W S S right left C T 1 W 2 W 3 W 4 W S S S   1 2 Sum can be written as two sub-sums In two time periods 16

  17. sliding window W S S left right C T 1 W 2 W 3 W 4 W D D left right Data structure that S maintains an estimate of left In left time period 17

  18. S left 1 W T D left Without loss of Generality, D Consider data structure left [ W 1 , ] in time period 18

  19. Data structure consists of various levels D 1 D D 2 left D L 2 L is an upper bound of the sum in a period 19

  20. D Consider level i i 1  Bucket at Level 0 1 W Time period 2  i 1 Counts up to elements 20

  21. t 1 t  W  Stream: 1 1 1 1 W Increase counter value 21

  22. t t 1 t  W  Stream: 1 2 2 2 1 W Increase counter value 22

  23. t t t 1 t  W  Stream: 1 2 3 3 3 1 W Increase counter value 23

  24. ...... t t t t 1 t i W   1  Stream: i 1 2 3  2 1 1  2 1  1  2 1 i  1 W Increase counter value 24

  25. ...... t t t t t 1 t i W   1  Stream: i 1 2 3  2 1 i 1 2  1  2 1  2  i 1 1 W 2 2 i i W 1 W W 1 2  2 Split bucket 2  i 1 Counter threshold of reached 25

  26. ...... t t t t t 1 t i W   1  Stream: i 1 2 3  2 1 i 1 2  1  2 1  2 2 i i W 1 W W 1 2  2 New buckets have threshold also 2  i 1 26

  27. ...... t t t t t t W 1  Stream: i 1  1 2 3  2 1 i 1 i 2   1 t i 2 1   2 1  2 1  2  1 2 i i W 1 W W 1 2  2 Increase appropriate bucket 27

  28. ...... t t t t t t t W 1  Stream: i 1  1  1 2 3  2 1 i 1 i i 2    t W 2 1 2 2   2 i 1  2 2  2  1 2  1 i i W 1 W W 1 2  2 Increase appropriate bucket 28

  29. ...... t t t t t t t t W 1  Stream: i 1  1  1 2 3  2 1 1  i 1 i i 2    2 1 2 2 i  2 3 1 t i   1 2  2 3  2  2 2  1 i i W 1 W W 1 2  2 Increase appropriate bucket 29

  30. ...... m t t W W Stream: 1 1 t   m  2 2 x 2  i 1 1 W 1 W W 1 2  2 2 i 2 i W 3 W 3 W W Split bucket 1 1 2   4 4 30

  31. ...... m t t Stream: 1 x 1 W 1 2 2 i 2 i W 3 W 3 W W 1 1 2   4 4 31

  32. ...... m t t t W 3 W Stream: 1 1 t m 1     m 1 2  4 x 1 W 1 2 2  i 1 2 i W 3 W 3 W W 1 1 2   4 4 Increase appropriate bucket 32

  33. ...... m t ...... t t t Stream: 1 m 1  m  x 1 W 1 2 x 2  i 1 4 W W 3 3 W W 3 W W Split bucket 1 1 1 2  2   4 4 4 2 2 i i W 3 W 5 W 5 W 1 2  1  4 8 8 33

  34. ...... m t ...... t t t Stream: 1 m 1  m  x 1 W 1 2 x 4 3 W W 1  4 2 2 i i W 3 W 5 W 5 W 1 2  1  4 8 8 34

  35. Splitting Tree 2  i 1 1 W x 2  i 1 1 W 1 W W 1 2  2 x 2  i 1 2 i x 2 i 1    4 k W 3 W 3 W W 1 1 2   4 4 x x 3 2 W 3 W 5 W 5 W 1 2  1  4 8 8 35

  36. 2  i 1 1 W Max depth = log W Leaf buckets of duration 1 are not split any further t 1 1  t t t 1 2  1 2 36

  37. 2  i 1 1 W Leaf buckets The initial bucket may be split into many buckets 37

  38. 2  i 1 1 W Leaf buckets Due to space limitations 2   a log W  we only keep the last  buckets 38

  39. S 1 W T S Suppose we want to find the sum of elements in time period [ T , W ] 39

  40. S 1 W T 2 1 a Consider various levels 2 2 of splitting threshold a 2 k a 2 k 1  a 40

  41. S 1 W T 2 1 a First level with a leaf bucket 2 2 that intersects timeline a 2 k a 2 k 1  a 41

  42. S 1 W T Estimate of S: X x x x      z 1 2 x x x 2 k z 1 2 a z  a Consider buckets on right of timeline 42

  43. S 1 W T OR 2 1 a First level with a leaf bucket 2 2 On right timeline a 2 k a 2 k 1  a 43

  44. Outline of Talk Introduction Algorithm Analysis 44

  45. S 1 W T 2  i 1 Suppose that we use level in order to compute the estimate 45

  46. t Stream: k x x 1   b b t t l r 2  i 1 Consider splitting threshold level A data element is counted in the appropriate bucket 46

  47. t Stream: k t t t   r k l t k t t l r We can assume that the element is placed in the respective bucket 47

  48. t Stream: k 2  i 1 t t l r 2 i t 2 i k t t t  t t t  r l r 1 l  l r 2 2 We can assume that when bucket splits the element is placed in an arbitrary child bucket 48

  49. t Stream: k 2  i 1 t t l r t 2 i 2 i k t t t  t t t  r l r 1 l  l r 2 2 t t  t t r   l If: GOOD! k l 2 Element counted in correct bucket 49

  50. t Stream: k 2  i 1 t t l r t 2 i 2 i k t t t  t t t  r l r 1 l  l r 2 2 t t  r 1 t t l    If: BAD! r k 2 Element counted in wrong bucket 50

  51. S 1 W T Consider Leaf Buckets t k 1 W T t W  k  GOOD! If 51

  52. S 1 W T Consider Leaf Buckets t k 1 W t k  T BAD! If Element counted in wrong bucket 52

  53. S 1 W T Consider Leaf Buckets t k 1 W X S | Z | | Z |    1 2 Z :elements of left part counted on right 1 Z :elements of right part counted on left 2 53

  54. T W 1 t Z  k 1 elements of left part counted on right t k 1 W Must have been initially inserted in one of these buckets 54

  55. log W  Since tree depth | Z | O ( 2 i log W )  1 55

  56. log W  Since tree depth | Z | O ( 2 i log W )  1 Similarly, we can prove | Z | O ( 2 i log W )  2 Therefore: | X S | || Z | | Z || O ( 2 i log W )     1 2 56

  57. 2   a log W  Since  S ( 2 i log W )     It can be proven 57

  58. 2   a log W  Since  S ( 2 i log W ) It can be proven     Combined with | X S | O ( 2 i log W )   | X S |  We obtain relative error :   S 58

Recommend


More recommend