declaring independence via the sketching of sketches
play

Declaring Independence via the Sketching of Sketches Piotr Indyk - PowerPoint PPT Presentation

Declaring Independence via the Sketching of Sketches Piotr Indyk Andrew McGregor Massachusetts Institute of Technology University of California, San Diego Until August 08 -- Hire Me! The Problem The Problem Center for Disease Control


  1. Declaring Independence via the Sketching of Sketches Piotr Indyk Andrew McGregor Massachusetts Institute of Technology University of California, San Diego Until August ’08 -- Hire Me!

  2. The Problem

  3. The Problem Center for Disease Control (CDC) has massive amounts of data on disease occurrences and their locations. “How correlated is your zip code to the diseases you’ll catch this year?” Image from http://www.cdc.gov/flu/weekly/weeklyarchives2006-2007/images/usmap02.jpg

  4. The Problem Center for Disease Control (CDC) has massive amounts of data on disease occurrences and their locations. “How correlated is your zip code to the diseases you’ll catch this year?” • Sample (sub-linear time): How many are required to distinguish independence from “ ε -far” from independence? [Batu et al. ’01], [Alon et al. ’07], [Valiant ’08] Image from http://www.cdc.gov/flu/weekly/weeklyarchives2006-2007/images/usmap02.jpg

  5. The Problem Center for Disease Control (CDC) has massive amounts of data on disease occurrences and their locations. “How correlated is your zip code to the diseases you’ll catch this year?” • Sample (sub-linear time): How many are required to distinguish independence from “ ε -far” from independence? [Batu et al. ’01], [Alon et al. ’07], [Valiant ’08] • Stream (sub-linear space): Access pairs sequentially or “online” and limited memory. Image from http://www.cdc.gov/flu/weekly/weeklyarchives2006-2007/images/usmap02.jpg

  6. Formulation

  7. Formulation • Stream of m pairs in [ n ] x [ n ]: (3,5), (5,3), (2,7), (3,4), (7,1), (1,2), (3,9), (6,6), ...

  8. Formulation • Stream of m pairs in [ n ] x [ n ]: (3,5), (5,3), (2,7), (3,4), (7,1), (1,2), (3,9), (6,6), ... • Define “empirical” distributions: • Marginals: ( p 1 , ..., p n ), ( q 1 , ..., q n ) • Joint: ( r 11 , r 12 , ..., r nn ) • Product: ( s 11 , s 12 , ..., s nn ) where s ij equals p i q j

  9. Formulation • Stream of m pairs in [ n ] x [ n ]: (3,5), (5,3), (2,7), (3,4), (7,1), (1,2), (3,9), (6,6), ... • Define “empirical” distributions: • Marginals: ( p 1 , ..., p n ), ( q 1 , ..., q n ) • Joint: ( r 11 , r 12 , ..., r nn ) • Product: ( s 11 , s 12 , ..., s nn ) where s ij equals p i q j • Question: How correlated are first and second terms?

  10. Formulation • Stream of m pairs in [ n ] x [ n ]: (3,5), (5,3), (2,7), (3,4), (7,1), (1,2), (3,9), (6,6), ... • Define “empirical” distributions: • Marginals: ( p 1 , ..., p n ), ( q 1 , ..., q n ) • Joint: ( r 11 , r 12 , ..., r nn ) • Product: ( s 11 , s 12 , ..., s nn ) where s ij equals p i q j • Question: How correlated are first and second terms? • E.g., L 1 ( s − r ) = � i , j | s ij − r ij | L 2 ( s − r ) = √ � i , j ( s ij − r ij ) 2 I ( s , r ) = H ( p ) − H ( p | q )

  11. Formulation • Stream of m pairs in [ n ] x [ n ]: (3,5), (5,3), (2,7), (3,4), (7,1), (1,2), (3,9), (6,6), ... • Define “empirical” distributions: • Marginals: ( p 1 , ..., p n ), ( q 1 , ..., q n ) • Joint: ( r 11 , r 12 , ..., r nn ) • Product: ( s 11 , s 12 , ..., s nn ) where s ij equals p i q j • Question: How correlated are first and second terms? • E.g., L 1 ( s − r ) = � i , j | s ij − r ij | L 2 ( s − r ) = √ � i , j ( s ij − r ij ) 2 I ( s , r ) = H ( p ) − H ( p | q ) • Previous work: Can estimate L 1 and L 2 between marginals. • [Alon, Matias, Szegedy ’96], [Feigenbaum et al. ’99], [Indyk ’00], • [Guha, Indyk, McGregor ’07], [Ganguly, Cormode ’07]

  12. Our Results

  13. Our Results • Estimating L 2 (s-r): • (1 + ε )-factor approx. in Õ ( ε -2 ln δ -1 ) space. • “Neat” result extending AMS sketches

  14. Our Results • Estimating L 2 (s-r): • (1 + ε )-factor approx. in Õ ( ε -2 ln δ -1 ) space. • “Neat” result extending AMS sketches • Estimating L 1 (s-r): • O(ln n )-factor approx. in Õ (ln δ -1 ) space. • Sketches of sketches and sketches/embeddings

  15. Our Results • Estimating L 2 (s-r): • (1 + ε )-factor approx. in Õ ( ε -2 ln δ -1 ) space. • “Neat” result extending AMS sketches • Estimating L 1 (s-r): • O(ln n )-factor approx. in Õ (ln δ -1 ) space. • Sketches of sketches and sketches/embeddings • Other Results: • L 1 (s-r): Additive approximations • Mutual Information: Additive but not (1 + ε )-factor approx. • Distributed Model: Pairs are observed by different parties.

  16. a) Neat Result for L 2 a) Neat Result for L 2 b) Sketching Sketches b) Sketching Sketches c) Other Results c) Other Results

  17. a) Neat Result for L 2 b) Sketching Sketches c) Other Results

  18. First Attempt

  19. First Attempt • Random Projection: Let where z ij are z ∈ { − 1 , 1 } n × n unbiased 4-wise independent. [Alon, Matias, Szegedy ’96]

  20. First Attempt • Random Projection: Let where z ij are z ∈ { − 1 , 1 } n × n unbiased 4-wise independent. [Alon, Matias, Szegedy ’96] • Estimator: Suppose we can compute estimator: T = ( z.r − z.s ) 2

  21. First Attempt • Random Projection: Let where z ij are z ∈ { − 1 , 1 } n × n unbiased 4-wise independent. [Alon, Matias, Szegedy ’96] • Estimator: Suppose we can compute estimator: T = ( z.r − z.s ) 2 • Correct in expectation and has small variance: Σ i 1 ,j 1 ,i 2 ,j 2 E [ z i 1 j 1 z i 2 j 2 ] a i 1 j 1 a i 2 j 2 = ( L 2 ( r − s )) 2 E [ T ] = ( a ij = r ij − s ij ) E [ T 2 ] Var [ T ] ≤ = Σ i 1 ,j 1 ,i 2 ,j 2 ,i 3 ,j 3 ,i 4 ,j 4 E [ z i 1 j 1 z i 2 j 2 z i 3 j 3 z i 4 j 4 ] a i 1 j 1 a i 2 j 2 a i 3 j 3 a i 4 j 4 E [ T ] 2 ≤

  22. First Attempt • Random Projection: Let where z ij are z ∈ { − 1 , 1 } n × n unbiased 4-wise independent. [Alon, Matias, Szegedy ’96] • Estimator: Suppose we can compute estimator: T = ( z.r − z.s ) 2 • Correct in expectation and has small variance: Σ i 1 ,j 1 ,i 2 ,j 2 E [ z i 1 j 1 z i 2 j 2 ] a i 1 j 1 a i 2 j 2 = ( L 2 ( r − s )) 2 E [ T ] = ( a ij = r ij − s ij ) E [ T 2 ] Var [ T ] ≤ = Σ i 1 ,j 1 ,i 2 ,j 2 ,i 3 ,j 3 ,i 4 ,j 4 E [ z i 1 j 1 z i 2 j 2 z i 3 j 3 z i 4 j 4 ] a i 1 j 1 a i 2 j 2 a i 3 j 3 a i 4 j 4 E [ T ] 2 ≤

  23. First Attempt • Random Projection: Let where z ij are z ∈ { − 1 , 1 } n × n unbiased 4-wise independent. [Alon, Matias, Szegedy ’96] • Estimator: Suppose we can compute estimator: T = ( z.r − z.s ) 2 • Correct in expectation and has small variance: Σ i 1 ,j 1 ,i 2 ,j 2 E [ z i 1 j 1 z i 2 j 2 ] a i 1 j 1 a i 2 j 2 = ( L 2 ( r − s )) 2 E [ T ] = ( a ij = r ij − s ij ) E [ T 2 ] Var [ T ] ≤ = Σ i 1 ,j 1 ,i 2 ,j 2 ,i 3 ,j 3 ,i 4 ,j 4 E [ z i 1 j 1 z i 2 j 2 z i 3 j 3 z i 4 j 4 ] a i 1 j 1 a i 2 j 2 a i 3 j 3 a i 4 j 4 E [ T ] 2 ≤

  24. First Attempt • Random Projection: Let where z ij are z ∈ { − 1 , 1 } n × n unbiased 4-wise independent. [Alon, Matias, Szegedy ’96] • Estimator: Suppose we can compute estimator: T = ( z.r − z.s ) 2 • Correct in expectation and has small variance: Σ i 1 ,j 1 ,i 2 ,j 2 E [ z i 1 j 1 z i 2 j 2 ] a i 1 j 1 a i 2 j 2 = ( L 2 ( r − s )) 2 E [ T ] = ( a ij = r ij − s ij ) E [ T 2 ] Var [ T ] ≤ = Σ i 1 ,j 1 ,i 2 ,j 2 ,i 3 ,j 3 ,i 4 ,j 4 E [ z i 1 j 1 z i 2 j 2 z i 3 j 3 z i 4 j 4 ] a i 1 j 1 a i 2 j 2 a i 3 j 3 a i 4 j 4 E [ T ] 2 ≤

  25. First Attempt • Random Projection: Let where z ij are z ∈ { − 1 , 1 } n × n unbiased 4-wise independent. [Alon, Matias, Szegedy ’96] • Estimator: Suppose we can compute estimator: T = ( z.r − z.s ) 2 • Correct in expectation and has small variance: Σ i 1 ,j 1 ,i 2 ,j 2 E [ z i 1 j 1 z i 2 j 2 ] a i 1 j 1 a i 2 j 2 = ( L 2 ( r − s )) 2 E [ T ] = ( a ij = r ij − s ij ) E [ T 2 ] Var [ T ] ≤ = Σ i 1 ,j 1 ,i 2 ,j 2 ,i 3 ,j 3 ,i 4 ,j 4 E [ z i 1 j 1 z i 2 j 2 z i 3 j 3 z i 4 j 4 ] a i 1 j 1 a i 2 j 2 a i 3 j 3 a i 4 j 4 E [ T ] 2 ≤ • Repeating O( ε -2 ln δ -1 ) times and take the mean .

  26. Computing Estimator

  27. Computing Estimator • Need to compute: and z.r z.s

  28. Computing Estimator • Need to compute: and z.r z.s • Good News : First term is easy 1) Let A = 0 2) For each stream element: 2.1) If stream element = (i,j) then A ← A + z ij /m

  29. Computing Estimator • Need to compute: and z.r z.s • Good News : First term is easy 1) Let A = 0 2) For each stream element: 2.1) If stream element = (i,j) then A ← A + z ij /m • Bad News: Can’t compute second term!

  30. Computing Estimator • Need to compute: and z.r z.s • Good News : First term is easy 1) Let A = 0 2) For each stream element: 2.1) If stream element = (i,j) then A ← A + z ij /m • Bad News: Can’t compute second term! • Good News: Use bilinear sketch: If for x, y ∈ { − 1 , 1 } n z ij = x i y j z.s = � ij z ij s ij = ( x.p )( y.q ) • i.e., product of sketches is sketch of product.

  31. Computing Estimator • Need to compute: and z.r z.s • Good News : First term is easy 1) Let A = 0 2) For each stream element: 2.1) If stream element = (i,j) then A ← A + z ij /m • Bad News: Can’t compute second term! • Good News: Use bilinear sketch: If for x, y ∈ { − 1 , 1 } n z ij = x i y j z.s = � ij z ij s ij = ( x.p )( y.q ) • i.e., product of sketches is sketch of product. • Bad News: z is no longer 4-wise independent even if x and y are fully random, e.g., z 11 z 12 z 21 z 22 = ( x 1 ) 2 ( x 2 ) 2 ( y 1 ) 2 ( y 2 ) 2 = 1

  32. Still Get Low Variance

Recommend


More recommend