Mergeable Summaries Q P Je ff M. Phillips P ∪ Q University of Utah S ( Q, ε ) S ( P, ε ) joint with with Pankaj K. Agarwal (Duke) Graham Cormode (AT&T) Zengfeng Huang (HKUST) S ( P ∪ Q, ε ) Zheiwei Wei (HKUST) size of S ( X, ε ) is always m Ke Yi (HKUST) w Array: d CM[i,j]
Summaries for MASSIVE Data Allows approximate computation with guarantees and small space coreset : small summary, proxy for full data set with approx guarantees: • ε -samples of ( P, R ) : approx density • ε -kernel: approx convex shape sketch : (random) (linear) combination of full data, recover functions with approx guarantees: • Euclidean distance: Johnson-Lindenstrauss random projection • min-count sketch: approx item counts • Greenwald-Khanna sketch: approx quantiles • Misra-Gries sketch: approx frequent items w Array: d CM[i,j]
Summaries for MASSIVE Data Allows approximate computation with guarantees and small space coreset : small summary, proxy for full data set with approx guarantees: • ε -samples of ( P, R ) : approx density • ε -kernel: approx convex shape Summary sketch : (random) (linear) combination of full data, recover functions with approx guarantees: • Euclidean distance: Johnson-Lindenstrauss random projection • min-count sketch: approx item counts • Greenwald-Khanna sketch: approx quantiles • Misra-Gries sketch: approx frequent items
Massive Distributed Computation data centers sensor networks multi-core
Massive Distributed Computation data centers sensor networks multi-core
Massive Distributed Computation data centers sensor networks multi-core
Massive Distributed Computation data centers sensor networks multi-core
Massive Distributed Computation data centers sensor networks multi-core
Massive Distributed Computation data centers sensor networks multi-core
Massive Distributed Computation data centers sensor networks multi-core Q P S ( Q, ε ) S ( P, ε )
Massive Distributed Computation data centers sensor networks multi-core Q P P ∪ Q S ( Q, ε ) S ( P, ε ) S ( P ∪ Q, ε )
Massive Distributed Computation data centers sensor networks multi-core Q P P ∪ Q S ( Q, ε ) S ( P, ε ) S ( P ∪ Q, ε ) size of S ( X, ε ) is always m
Massive Distributed Computation data centers sensor networks multi-core Q P P ∪ Q S ( Q, ε ) S ( P, ε ) • similar to: MUD, Dremel more restrictive, “natural” S ( P ∪ Q, ε ) • generalizes streaming • archiving summaries size of S ( X, ε ) is always m
Random Sample Q P P val 15 17 20 1 8 42 7 10 14 3 ran .99 .42 .53 .01 .02 .23 .82 .75 .61 .14 P ∪ Q S ( Q, ε ) S ( P, ε ) S ( P ∪ Q, ε ) size of S ( X, ε ) is always m
Random Sample Q P P val 15 17 20 1 8 42 7 10 14 3 ran .99 .42 .53 .01 .02 .23 .82 .75 .61 .14 P ∪ Q S ( Q, ε ) S ( P, ε ) S ( P ∪ Q, ε ) size of S ( X, ε ) is always m
Random Sample Q P P val 15 7 10 14 20 17 42 3 8 1 ran .99 .82 .75 .61 .53 .42 .23 .14 .02 .01 P ∪ Q S ( Q, ε ) S ( P, ε ) S ( P ∪ Q, ε ) size of S ( X, ε ) is always m
Random Sample S ( P, ε ) Q P P val 15 7 10 14 20 17 42 3 8 1 ran .99 .82 .75 .61 .53 .42 .23 .14 .02 .01 P ∪ Q S ( Q, ε ) S ( P, ε ) S ( P ∪ Q, ε ) size of S ( X, ε ) is always m
Random Sample S ( P, ε ) Q P P val 15 7 10 14 20 17 42 3 8 1 ran .99 .82 .75 .61 .53 .42 .23 .14 .02 .01 S ( Q, ε ) P ∪ Q Q val 31 9 16 11 14 7 2 13 21 4 ran .90 .85 .80 .57 .50 .37 .31 .12 .10 .08 S ( Q, ε ) S ( P, ε ) S ( P ∪ Q, ε ) size of S ( X, ε ) is always m
Random Sample S ( P, ε ) Q P P val 15 7 10 14 20 17 42 3 8 1 ran .99 .82 .75 .61 .53 .42 .23 .14 .02 .01 S ( Q, ε ) P ∪ Q Q val 31 9 16 11 14 7 2 13 21 4 ran .90 .85 .80 .57 .50 .37 .31 .12 .10 .08 S ( Q, ε ) S ( P, ε ) val 15 31 9 7 16 10 ran .99 .90 .85 .82 .80 .75 S ( P ∪ Q, ε ) size of S ( X, ε ) is always m
Random Sample S ( P, ε ) P val 15 7 10 14 20 17 42 3 8 1 ran .99 .82 .75 .61 .53 .42 .23 .14 .02 .01 S ( Q, ε ) P ∪ Q Q val 31 9 16 11 14 7 2 13 21 4 ran .90 .85 .80 .57 .50 .37 .31 .12 .10 .08 S ( P ∪ Q, ε ) val 15 31 9 7 16 10 ran .99 .90 .85 .82 .80 .75 S ( P ∪ Q, ε ) size of S ( X, ε ) is always m
Random Sample S ( P, ε ) P val 15 7 10 14 20 17 42 3 8 1 ran .99 .82 .75 .61 .53 .42 .23 .14 .02 .01 S ( Q, ε ) P ∪ Q Q val 31 9 16 11 14 7 2 13 21 4 ran .90 .85 .80 .57 .50 .37 .31 .12 .10 .08 S ( P ∪ Q, ε ) val 15 31 9 7 16 10 ran .99 .90 .85 .82 .80 .75 S ( P ∪ Q, ε ) size of S ( X, ε ) is always m max element top k elements
Linear Sketches Count-Min sketch of vector P [1 ...U ] : • Linear sketch as array size w × d • Use d hash functions h to map x to [1 ...w ] • Estimate P [ i ] = min j CM [ h j ( i ) , j ] Mergeable: CM ( P + Q ) = CM ( P ) + CM ( Q ) w Array: d CM[i,j]
Linear Sketches Q P Count-Min sketch of vector P [1 ...U ] : • Linear sketch as array size w × d • Use d hash functions h to map x to [1 ...w ] P ∪ Q • Estimate P [ i ] = min j CM [ h j ( i ) , j ] S ( Q, ε ) S ( P, ε ) Mergeable: CM ( P + Q ) = CM ( P ) + CM ( Q ) S ( P ∪ Q, ε ) size of S ( X, ε ) is always m w Array: d CM[i,j]
Linear Sketches Q P Count-Min sketch of vector P [1 ...U ] : • Linear sketch as array size w × d • Use d hash functions h to map x to [1 ...w ] P ∪ Q • Estimate P [ i ] = min j CM [ h j ( i ) , j ] S ( Q, ε ) S ( P, ε ) Mergeable: CM ( P + Q ) = CM ( P ) + CM ( Q ) S ( P, ε ) S ( Q, ε ) S ( P ∪ Q, ε ) size of S ( X, ε ) is always m w Array: d CM[i,j]
Heavy Hitters Summaries Misra-Gries (MG) sketch of P [1 ...U ] : • Keep k (index,count) pairs • If existing index arrives, update count • If new index arrives, make new pair, or decrement all counts Mergeable: Stack MG ( P ) + MG ( Q ) , decrement all counts C k +1 (1,5) (3,6) (8,1) (11,1) (14,3)
Heavy Hitters Summaries Misra-Gries (MG) sketch of P [1 ...U ] : • Keep k (index,count) pairs • If existing index arrives, update count • If new index arrives, make new pair, or decrement all counts Mergeable: Stack MG ( P ) + MG ( Q ) , decrement all counts C k +1 (1,5) (3,6) (8,1) (11,1) (14,3)
Heavy Hitters Summaries Misra-Gries (MG) sketch of P [1 ...U ] : • Keep k (index,count) pairs • If existing index arrives, update count • If new index arrives, make new pair, or decrement all counts Mergeable: Stack MG ( P ) + MG ( Q ) , decrement all counts C k +1 (1,5) (3,6) (8,1) (11,2) (14,3)
Heavy Hitters Summaries Misra-Gries (MG) sketch of P [1 ...U ] : • Keep k (index,count) pairs • If existing index arrives, update count • If new index arrives, make new pair, or decrement all counts Mergeable: Stack MG ( P ) + MG ( Q ) , decrement all counts C k +1 (1,5) (3,6) (8,1) (11,2) (14,3)
Heavy Hitters Summaries Misra-Gries (MG) sketch of P [1 ...U ] : • Keep k (index,count) pairs • If existing index arrives, update count • If new index arrives, make new pair, or decrement all counts Mergeable: Stack MG ( P ) + MG ( Q ) , decrement all counts C k +1 (1,4) (3,5) (11,1) (14,2)
Heavy Hitters Summaries P Misra-Gries (MG) sketch of P [1 ...U ] : • Keep k (index,count) pairs • If existing index arrives, update count • If new index arrives, make new pair, or decrement all counts S ( P, ε ) Mergeable: Stack MG ( P ) + MG ( Q ) , decrement all counts C k +1 | P [ i ] − MG [ i ] | ≤ ε = ˆ m/ ( k + 1) (1,4) S ( P, ε ) (3,5) (11,1) (14,2)
Heavy Hitters Summaries Q P Misra-Gries (MG) sketch of P [1 ...U ] : • Keep k (index,count) pairs • If existing index arrives, update count • If new index arrives, make new pair, or decrement all counts S ( Q, ε ) S ( P, ε ) Mergeable: Stack MG ( P ) + MG ( Q ) , decrement all counts C k +1 (1,2) (1,3) S ( Q, ε ) S ( P, ε ) (3,2) (3,4) (5,1) (9,5) (11,1) (14,4) (14,2)
Heavy Hitters Summaries Q P Misra-Gries (MG) sketch of P [1 ...U ] : • Keep k (index,count) pairs • If existing index arrives, update count • If new index arrives, make new pair, P ∪ Q or decrement all counts S ( Q, ε ) S ( P, ε ) Mergeable: Stack MG ( P ) + MG ( Q ) , decrement all counts C k +1 S ( P ∪ Q, ε ) size of S ( X, ε ) is always m (1,6) (3,6) (5,2) (9,5) (11,1) (14,6)
Heavy Hitters Summaries Q P Misra-Gries (MG) sketch of P [1 ...U ] : • Keep k (index,count) pairs • If existing index arrives, update count • If new index arrives, make new pair, P ∪ Q or decrement all counts S ( Q, ε ) S ( P, ε ) Mergeable: Stack MG ( P ) + MG ( Q ) , decrement all counts C k +1 S ( P ∪ Q, ε ) size of S ( X, ε ) is always m (1,5) (3,5) (5,1) (9,4) (14,5)
Heavy Hitters Summaries Q P Misra-Gries (MG) sketch of P [1 ...U ] : • Keep k (index,count) pairs • If existing index arrives, update count • If new index arrives, make new pair, P ∪ Q or decrement all counts S ( Q, ε ) S ( P, ε ) Mergeable: Stack MG ( P ) + MG ( Q ) , decrement all counts C k +1 S ( P ∪ Q, ε ) size of S ( X, ε ) is always m (1,5) (3,5) S ( P ∪ Q, ε ) (5,1) (9,4) (14,5)
Recommend
More recommend