mergeable summaries
play

Mergeable Summaries Q P Je ff M. Phillips P Q University of Utah - PowerPoint PPT Presentation

Mergeable Summaries Q P Je ff M. Phillips P Q University of Utah S ( Q, ) S ( P, ) joint with with Pankaj K. Agarwal (Duke) Graham Cormode (AT&T) Zengfeng Huang (HKUST) S ( P Q, ) Zheiwei Wei (HKUST) size of S ( X, )


  1. Mergeable Summaries Q P Je ff M. Phillips P ∪ Q University of Utah S ( Q, ε ) S ( P, ε ) joint with with Pankaj K. Agarwal (Duke) Graham Cormode (AT&T) Zengfeng Huang (HKUST) S ( P ∪ Q, ε ) Zheiwei Wei (HKUST) size of S ( X, ε ) is always m Ke Yi (HKUST) w Array: d CM[i,j]

  2. Summaries for MASSIVE Data Allows approximate computation with guarantees and small space coreset : small summary, proxy for full data set with approx guarantees: • ε -samples of ( P, R ) : approx density • ε -kernel: approx convex shape sketch : (random) (linear) combination of full data, recover functions with approx guarantees: • Euclidean distance: Johnson-Lindenstrauss random projection • min-count sketch: approx item counts • Greenwald-Khanna sketch: approx quantiles • Misra-Gries sketch: approx frequent items w Array: d CM[i,j]

  3. Summaries for MASSIVE Data Allows approximate computation with guarantees and small space coreset : small summary, proxy for full data set with approx guarantees: • ε -samples of ( P, R ) : approx density • ε -kernel: approx convex shape Summary sketch : (random) (linear) combination of full data, recover functions with approx guarantees: • Euclidean distance: Johnson-Lindenstrauss random projection • min-count sketch: approx item counts • Greenwald-Khanna sketch: approx quantiles • Misra-Gries sketch: approx frequent items

  4. Massive Distributed Computation data centers sensor networks multi-core

  5. Massive Distributed Computation data centers sensor networks multi-core

  6. Massive Distributed Computation data centers sensor networks multi-core

  7. Massive Distributed Computation data centers sensor networks multi-core

  8. Massive Distributed Computation data centers sensor networks multi-core

  9. Massive Distributed Computation data centers sensor networks multi-core

  10. Massive Distributed Computation data centers sensor networks multi-core Q P S ( Q, ε ) S ( P, ε )

  11. Massive Distributed Computation data centers sensor networks multi-core Q P P ∪ Q S ( Q, ε ) S ( P, ε ) S ( P ∪ Q, ε )

  12. Massive Distributed Computation data centers sensor networks multi-core Q P P ∪ Q S ( Q, ε ) S ( P, ε ) S ( P ∪ Q, ε ) size of S ( X, ε ) is always m

  13. Massive Distributed Computation data centers sensor networks multi-core Q P P ∪ Q S ( Q, ε ) S ( P, ε ) • similar to: MUD, Dremel more restrictive, “natural” S ( P ∪ Q, ε ) • generalizes streaming • archiving summaries size of S ( X, ε ) is always m

  14. Random Sample Q P P val 15 17 20 1 8 42 7 10 14 3 ran .99 .42 .53 .01 .02 .23 .82 .75 .61 .14 P ∪ Q S ( Q, ε ) S ( P, ε ) S ( P ∪ Q, ε ) size of S ( X, ε ) is always m

  15. Random Sample Q P P val 15 17 20 1 8 42 7 10 14 3 ran .99 .42 .53 .01 .02 .23 .82 .75 .61 .14 P ∪ Q S ( Q, ε ) S ( P, ε ) S ( P ∪ Q, ε ) size of S ( X, ε ) is always m

  16. Random Sample Q P P val 15 7 10 14 20 17 42 3 8 1 ran .99 .82 .75 .61 .53 .42 .23 .14 .02 .01 P ∪ Q S ( Q, ε ) S ( P, ε ) S ( P ∪ Q, ε ) size of S ( X, ε ) is always m

  17. Random Sample S ( P, ε ) Q P P val 15 7 10 14 20 17 42 3 8 1 ran .99 .82 .75 .61 .53 .42 .23 .14 .02 .01 P ∪ Q S ( Q, ε ) S ( P, ε ) S ( P ∪ Q, ε ) size of S ( X, ε ) is always m

  18. Random Sample S ( P, ε ) Q P P val 15 7 10 14 20 17 42 3 8 1 ran .99 .82 .75 .61 .53 .42 .23 .14 .02 .01 S ( Q, ε ) P ∪ Q Q val 31 9 16 11 14 7 2 13 21 4 ran .90 .85 .80 .57 .50 .37 .31 .12 .10 .08 S ( Q, ε ) S ( P, ε ) S ( P ∪ Q, ε ) size of S ( X, ε ) is always m

  19. Random Sample S ( P, ε ) Q P P val 15 7 10 14 20 17 42 3 8 1 ran .99 .82 .75 .61 .53 .42 .23 .14 .02 .01 S ( Q, ε ) P ∪ Q Q val 31 9 16 11 14 7 2 13 21 4 ran .90 .85 .80 .57 .50 .37 .31 .12 .10 .08 S ( Q, ε ) S ( P, ε ) val 15 31 9 7 16 10 ran .99 .90 .85 .82 .80 .75 S ( P ∪ Q, ε ) size of S ( X, ε ) is always m

  20. Random Sample S ( P, ε ) P val 15 7 10 14 20 17 42 3 8 1 ran .99 .82 .75 .61 .53 .42 .23 .14 .02 .01 S ( Q, ε ) P ∪ Q Q val 31 9 16 11 14 7 2 13 21 4 ran .90 .85 .80 .57 .50 .37 .31 .12 .10 .08 S ( P ∪ Q, ε ) val 15 31 9 7 16 10 ran .99 .90 .85 .82 .80 .75 S ( P ∪ Q, ε ) size of S ( X, ε ) is always m

  21. Random Sample S ( P, ε ) P val 15 7 10 14 20 17 42 3 8 1 ran .99 .82 .75 .61 .53 .42 .23 .14 .02 .01 S ( Q, ε ) P ∪ Q Q val 31 9 16 11 14 7 2 13 21 4 ran .90 .85 .80 .57 .50 .37 .31 .12 .10 .08 S ( P ∪ Q, ε ) val 15 31 9 7 16 10 ran .99 .90 .85 .82 .80 .75 S ( P ∪ Q, ε ) size of S ( X, ε ) is always m max element top k elements

  22. Linear Sketches Count-Min sketch of vector P [1 ...U ] : • Linear sketch as array size w × d • Use d hash functions h to map x to [1 ...w ] • Estimate P [ i ] = min j CM [ h j ( i ) , j ] Mergeable: CM ( P + Q ) = CM ( P ) + CM ( Q ) w Array: d CM[i,j]

  23. Linear Sketches Q P Count-Min sketch of vector P [1 ...U ] : • Linear sketch as array size w × d • Use d hash functions h to map x to [1 ...w ] P ∪ Q • Estimate P [ i ] = min j CM [ h j ( i ) , j ] S ( Q, ε ) S ( P, ε ) Mergeable: CM ( P + Q ) = CM ( P ) + CM ( Q ) S ( P ∪ Q, ε ) size of S ( X, ε ) is always m w Array: d CM[i,j]

  24. Linear Sketches Q P Count-Min sketch of vector P [1 ...U ] : • Linear sketch as array size w × d • Use d hash functions h to map x to [1 ...w ] P ∪ Q • Estimate P [ i ] = min j CM [ h j ( i ) , j ] S ( Q, ε ) S ( P, ε ) Mergeable: CM ( P + Q ) = CM ( P ) + CM ( Q ) S ( P, ε ) S ( Q, ε ) S ( P ∪ Q, ε ) size of S ( X, ε ) is always m w Array: d CM[i,j]

  25. Heavy Hitters Summaries Misra-Gries (MG) sketch of P [1 ...U ] : • Keep k (index,count) pairs • If existing index arrives, update count • If new index arrives, make new pair, or decrement all counts Mergeable: Stack MG ( P ) + MG ( Q ) , decrement all counts C k +1 (1,5) (3,6) (8,1) (11,1) (14,3)

  26. Heavy Hitters Summaries Misra-Gries (MG) sketch of P [1 ...U ] : • Keep k (index,count) pairs • If existing index arrives, update count • If new index arrives, make new pair, or decrement all counts Mergeable: Stack MG ( P ) + MG ( Q ) , decrement all counts C k +1 (1,5) (3,6) (8,1) (11,1) (14,3)

  27. Heavy Hitters Summaries Misra-Gries (MG) sketch of P [1 ...U ] : • Keep k (index,count) pairs • If existing index arrives, update count • If new index arrives, make new pair, or decrement all counts Mergeable: Stack MG ( P ) + MG ( Q ) , decrement all counts C k +1 (1,5) (3,6) (8,1) (11,2) (14,3)

  28. Heavy Hitters Summaries Misra-Gries (MG) sketch of P [1 ...U ] : • Keep k (index,count) pairs • If existing index arrives, update count • If new index arrives, make new pair, or decrement all counts Mergeable: Stack MG ( P ) + MG ( Q ) , decrement all counts C k +1 (1,5) (3,6) (8,1) (11,2) (14,3)

  29. Heavy Hitters Summaries Misra-Gries (MG) sketch of P [1 ...U ] : • Keep k (index,count) pairs • If existing index arrives, update count • If new index arrives, make new pair, or decrement all counts Mergeable: Stack MG ( P ) + MG ( Q ) , decrement all counts C k +1 (1,4) (3,5) (11,1) (14,2)

  30. Heavy Hitters Summaries P Misra-Gries (MG) sketch of P [1 ...U ] : • Keep k (index,count) pairs • If existing index arrives, update count • If new index arrives, make new pair, or decrement all counts S ( P, ε ) Mergeable: Stack MG ( P ) + MG ( Q ) , decrement all counts C k +1 | P [ i ] − MG [ i ] | ≤ ε = ˆ m/ ( k + 1) (1,4) S ( P, ε ) (3,5) (11,1) (14,2)

  31. Heavy Hitters Summaries Q P Misra-Gries (MG) sketch of P [1 ...U ] : • Keep k (index,count) pairs • If existing index arrives, update count • If new index arrives, make new pair, or decrement all counts S ( Q, ε ) S ( P, ε ) Mergeable: Stack MG ( P ) + MG ( Q ) , decrement all counts C k +1 (1,2) (1,3) S ( Q, ε ) S ( P, ε ) (3,2) (3,4) (5,1) (9,5) (11,1) (14,4) (14,2)

  32. Heavy Hitters Summaries Q P Misra-Gries (MG) sketch of P [1 ...U ] : • Keep k (index,count) pairs • If existing index arrives, update count • If new index arrives, make new pair, P ∪ Q or decrement all counts S ( Q, ε ) S ( P, ε ) Mergeable: Stack MG ( P ) + MG ( Q ) , decrement all counts C k +1 S ( P ∪ Q, ε ) size of S ( X, ε ) is always m (1,6) (3,6) (5,2) (9,5) (11,1) (14,6)

  33. Heavy Hitters Summaries Q P Misra-Gries (MG) sketch of P [1 ...U ] : • Keep k (index,count) pairs • If existing index arrives, update count • If new index arrives, make new pair, P ∪ Q or decrement all counts S ( Q, ε ) S ( P, ε ) Mergeable: Stack MG ( P ) + MG ( Q ) , decrement all counts C k +1 S ( P ∪ Q, ε ) size of S ( X, ε ) is always m (1,5) (3,5) (5,1) (9,4) (14,5)

  34. Heavy Hitters Summaries Q P Misra-Gries (MG) sketch of P [1 ...U ] : • Keep k (index,count) pairs • If existing index arrives, update count • If new index arrives, make new pair, P ∪ Q or decrement all counts S ( Q, ε ) S ( P, ε ) Mergeable: Stack MG ( P ) + MG ( Q ) , decrement all counts C k +1 S ( P ∪ Q, ε ) size of S ( X, ε ) is always m (1,5) (3,5) S ( P ∪ Q, ε ) (5,1) (9,4) (14,5)

Recommend


More recommend