when fourier siirvs fourier based testing for families of
play

when fourier siirvs: fourier-based testing for families of - PowerPoint PPT Presentation

when fourier siirvs: fourier-based testing for families of distributions Clment Canonne 1 , Ilias Diakonikolas, 2 and Alistair Stewart 2 March 19, 2018 Stanford University 1 and University of Southern California 2 background, context, and


  1. ∙ Log-concavity [CDGR16, ADK15] ∙ and more… [Rub12, Can15] Much has been done; and yet… background Over the past 15 years, many results on many properties: ∙ Uniformity [GR00, BFR + 00, Pan08, DGPP16] ∙ Identity [BFF + 01, VV17, BCG17] ∙ Equivalence [BFR + 00, Val11, CDVV14] ∙ Independence [BFF + 01, LRR13, DK16, ADK15] ∙ Monotonicity [BKR04, BFRV11, CDGR16, ADK15] ∙ Poisson Binomial Distributions [AD15, CDGR16] 6

  2. ∙ and more… [Rub12, Can15] Much has been done; and yet… background Over the past 15 years, many results on many properties: ∙ Uniformity [GR00, BFR + 00, Pan08, DGPP16] ∙ Identity [BFF + 01, VV17, BCG17] ∙ Equivalence [BFR + 00, Val11, CDVV14] ∙ Independence [BFF + 01, LRR13, DK16, ADK15] ∙ Monotonicity [BKR04, BFRV11, CDGR16, ADK15] ∙ Poisson Binomial Distributions [AD15, CDGR16] ∙ Log-concavity [CDGR16, ADK15] 6

  3. Much has been done; and yet… background Over the past 15 years, many results on many properties: ∙ Uniformity [GR00, BFR + 00, Pan08, DGPP16] ∙ Identity [BFF + 01, VV17, BCG17] ∙ Equivalence [BFR + 00, Val11, CDVV14] ∙ Independence [BFF + 01, LRR13, DK16, ADK15] ∙ Monotonicity [BKR04, BFRV11, CDGR16, ADK15] ∙ Poisson Binomial Distributions [AD15, CDGR16] ∙ Log-concavity [CDGR16, ADK15] ∙ and more… [Rub12, Can15] 6

  4. and yet… background Over the past 15 years, many results on many properties: ∙ Uniformity [GR00, BFR + 00, Pan08, DGPP16] ∙ Identity [BFF + 01, VV17, BCG17] ∙ Equivalence [BFR + 00, Val11, CDVV14] ∙ Independence [BFF + 01, LRR13, DK16, ADK15] ∙ Monotonicity [BKR04, BFRV11, CDGR16, ADK15] ∙ Poisson Binomial Distributions [AD15, CDGR16] ∙ Log-concavity [CDGR16, ADK15] ∙ and more… [Rub12, Can15] Much has been done; 6

  5. background Over the past 15 years, many results on many properties: ∙ Uniformity [GR00, BFR + 00, Pan08, DGPP16] ∙ Identity [BFF + 01, VV17, BCG17] ∙ Equivalence [BFR + 00, Val11, CDVV14] ∙ Independence [BFF + 01, LRR13, DK16, ADK15] ∙ Monotonicity [BKR04, BFRV11, CDGR16, ADK15] ∙ Poisson Binomial Distributions [AD15, CDGR16] ∙ Log-concavity [CDGR16, ADK15] ∙ and more… [Rub12, Can15] Much has been done; and yet… 6

  6. Can we… design general algorithms and approaches that apply to many testing problems at once? one ring to rule them all? Techniques Most algorithms and results are somewhat ad hoc, and property-specific. 7

  7. one ring to rule them all? Techniques Most algorithms and results are somewhat ad hoc, and property-specific. Can we… design general algorithms and approaches that apply to many testing problems at once? 7

  8. and recently… In testing: [Val11, VV11, CDGR16, ADK15, DK16, BCG17] and in the darkness test them General Trend In learning: [CDSS13, CDSS14, CDSX14, ADLS17] 8

  9. and in the darkness test them General Trend In learning: [CDSS13, CDSS14, CDSX14, ADLS17] and recently… In testing: [Val11, VV11, CDGR16, ADK15, DK16, BCG17] 8

  10. outline of the talk

  11. outline of the talk ∙ Notation, Preliminaries ∙ Overall Goal, Restated ∙ The shape restrictions approach [CDGR16] ∙ The Fourier approach [CDS17] 10

  12. some notation

  13. ∙ Property (or class) of distributions over n : n ∙ Total variation distance (statistical distance, 1 distance): 1 TV p q p S q S p x q x 0 1 2 x S Domain size n is big (“goes to ”). Proximity parameter 0 1 is small. Lowercase Greek letters are in 0 1 . Asymptotics O, , hide logarithmic factors.* glossary ∙ Probability distributions over [ n ] := { 1 , . . . , n } n { } ∆([ n ]) = p : [ n ] → [ 0 , 1 ] : p ( i ) = 1 ∑ i = 1 12

  14. ∙ Total variation distance (statistical distance, 1 distance): 1 TV p q p S q S p x q x 0 1 2 x S Domain size n is big (“goes to ”). Proximity parameter 0 1 is small. Lowercase Greek letters are in 0 1 . Asymptotics O, , hide logarithmic factors.* glossary ∙ Probability distributions over [ n ] := { 1 , . . . , n } n { } ∆([ n ]) = p : [ n ] → [ 0 , 1 ] : p ( i ) = 1 ∑ i = 1 ∙ Property (or class) of distributions over [ n ] : P ⊆ ∆([ n ]) 12

  15. Domain size n is big (“goes to ”). Proximity parameter 0 1 is small. Lowercase Greek letters are in 0 1 . Asymptotics O, , hide logarithmic factors.* glossary ∙ Probability distributions over [ n ] := { 1 , . . . , n } n { } ∆([ n ]) = p : [ n ] → [ 0 , 1 ] : p ( i ) = 1 ∑ i = 1 ∙ Property (or class) of distributions over [ n ] : P ⊆ ∆([ n ]) ∙ Total variation distance (statistical distance, ℓ 1 distance): ( p ( S ) − q ( S )) = 1 d TV ( p , q ) = sup ∑ | p ( x ) − q ( x ) | ∈ [ 0 , 1 ] 2 S ⊆ Ω x ∈ Ω 12

  16. Proximity parameter 0 1 is small. Lowercase Greek letters are in 0 1 . Asymptotics O, , hide logarithmic factors.* glossary ∙ Probability distributions over [ n ] := { 1 , . . . , n } n { } ∆([ n ]) = p : [ n ] → [ 0 , 1 ] : p ( i ) = 1 ∑ i = 1 ∙ Property (or class) of distributions over [ n ] : P ⊆ ∆([ n ]) ∙ Total variation distance (statistical distance, ℓ 1 distance): ( p ( S ) − q ( S )) = 1 d TV ( p , q ) = sup ∑ | p ( x ) − q ( x ) | ∈ [ 0 , 1 ] 2 S ⊆ Ω x ∈ Ω Domain size n ∈ N is big (“goes to ∞ ”). 12

  17. Lowercase Greek letters are in 0 1 . Asymptotics O, , hide logarithmic factors.* glossary ∙ Probability distributions over [ n ] := { 1 , . . . , n } n { } ∆([ n ]) = p : [ n ] → [ 0 , 1 ] : p ( i ) = 1 ∑ i = 1 ∙ Property (or class) of distributions over [ n ] : P ⊆ ∆([ n ]) ∙ Total variation distance (statistical distance, ℓ 1 distance): ( p ( S ) − q ( S )) = 1 d TV ( p , q ) = sup ∑ | p ( x ) − q ( x ) | ∈ [ 0 , 1 ] 2 S ⊆ Ω x ∈ Ω Domain size n ∈ N is big (“goes to ∞ ”). Proximity parameter ε ∈ ( 0 , 1 ] is small. 12

  18. Asymptotics O, , hide logarithmic factors.* glossary ∙ Probability distributions over [ n ] := { 1 , . . . , n } n { } ∆([ n ]) = p : [ n ] → [ 0 , 1 ] : p ( i ) = 1 ∑ i = 1 ∙ Property (or class) of distributions over [ n ] : P ⊆ ∆([ n ]) ∙ Total variation distance (statistical distance, ℓ 1 distance): ( p ( S ) − q ( S )) = 1 d TV ( p , q ) = sup ∑ | p ( x ) − q ( x ) | ∈ [ 0 , 1 ] 2 S ⊆ Ω x ∈ Ω Domain size n ∈ N is big (“goes to ∞ ”). Proximity parameter ε ∈ ( 0 , 1 ] is small. Lowercase Greek letters are in ( 0 , 1 ] . 12

  19. glossary ∙ Probability distributions over [ n ] := { 1 , . . . , n } n { } ∆([ n ]) = p : [ n ] → [ 0 , 1 ] : p ( i ) = 1 ∑ i = 1 ∙ Property (or class) of distributions over [ n ] : P ⊆ ∆([ n ]) ∙ Total variation distance (statistical distance, ℓ 1 distance): ( p ( S ) − q ( S )) = 1 d TV ( p , q ) = sup ∑ | p ( x ) − q ( x ) | ∈ [ 0 , 1 ] 2 S ⊆ Ω x ∈ Ω Domain size n ∈ N is big (“goes to ∞ ”). Proximity parameter ε ∈ ( 0 , 1 ] is small. Lowercase Greek letters are in ( 0 , 1 ] . Asymptotics ˜ O, ˜ Ω , ˜ Θ hide logarithmic factors.* 12

  20. ∙ k-Sum of Independent Integer Random Variables (k-SIIRV) n X X j with X 1 X n 0 1 k 1 independent. j 1 ∙ Poisson Multinomial Distribution (PMD) n X X j with X 1 X n e 1 e k independent. j 1 ∙ (Discrete) Log-Concave p k 2 p k 1 p k 1 and supported on an interval round up the usual suspects ∙ Poisson Binomial Distribution (PBD) n X = ∑ X j , with X 1 . . . , X n ∈ { 0 , 1 } independent. j = 1 13

  21. ∙ Poisson Multinomial Distribution (PMD) n X X j with X 1 X n e 1 e k independent. j 1 ∙ (Discrete) Log-Concave p k 2 p k 1 p k 1 and supported on an interval round up the usual suspects ∙ Poisson Binomial Distribution (PBD) n X = ∑ X j , with X 1 . . . , X n ∈ { 0 , 1 } independent. j = 1 ∙ k-Sum of Independent Integer Random Variables (k-SIIRV) n X = X j , with X 1 . . . , X n ∈ { 0 , 1 , . . . , k − 1 } independent. ∑ j = 1 13

  22. ∙ (Discrete) Log-Concave p k 2 p k 1 p k 1 and supported on an interval round up the usual suspects ∙ Poisson Binomial Distribution (PBD) n X = ∑ X j , with X 1 . . . , X n ∈ { 0 , 1 } independent. j = 1 ∙ k-Sum of Independent Integer Random Variables (k-SIIRV) n X = X j , with X 1 . . . , X n ∈ { 0 , 1 , . . . , k − 1 } independent. ∑ j = 1 ∙ Poisson Multinomial Distribution (PMD) n X = ∑ X j , with X 1 . . . , X n ∈ { e 1 , . . . , e k } independent. j = 1 13

  23. round up the usual suspects ∙ Poisson Binomial Distribution (PBD) n X = ∑ X j , with X 1 . . . , X n ∈ { 0 , 1 } independent. j = 1 ∙ k-Sum of Independent Integer Random Variables (k-SIIRV) n X = X j , with X 1 . . . , X n ∈ { 0 , 1 , . . . , k − 1 } independent. ∑ j = 1 ∙ Poisson Multinomial Distribution (PMD) n X = ∑ X j , with X 1 . . . , X n ∈ { e 1 , . . . , e k } independent. j = 1 ∙ (Discrete) Log-Concave p ( k ) 2 ≥ p ( k − 1 ) p ( k + 1 ) and supported on an interval 13

  24. but… will we ever learn?

  25. (i) Learn p without assumptions using a learner for n (ii) Check if TV p (Computational) 3 Yes, but… 2 . (i) has sample complexity n testing by learning Trivial baseline in property testing: “you can learn, so you can test.” 15

  26. (ii) Check if TV p (Computational) 3 Yes, but… 2 . (i) has sample complexity n testing by learning Trivial baseline in property testing: “you can learn, so you can test.” (i) Learn p without assumptions using a learner for ∆([ n ]) 15

  27. Yes, but… 2 . (i) has sample complexity n testing by learning Trivial baseline in property testing: “you can learn, so you can test.” (i) Learn p without assumptions using a learner for ∆([ n ]) (ii) Check if d TV (ˆ p , P ) ≤ ε (Computational) 3 15

  28. testing by learning Trivial baseline in property testing: “you can learn, so you can test.” (i) Learn p without assumptions using a learner for ∆([ n ]) (ii) Check if d TV (ˆ p , P ) ≤ ε (Computational) 3 Yes, but… (i) has sample complexity Θ( n /ε 2 ) . 15

  29. (i) Learn p as if p using a learner for 2 (ii) Test TV p p 3 vs. TV p p 3 (iii) Check if TV p (Computational) 3 The triangle inequality does the rest. testing by learning “Folklore” baseline in property testing: “if you can learn, you can test.” 16

  30. 2 (ii) Test TV p p 3 vs. TV p p 3 (iii) Check if TV p (Computational) 3 The triangle inequality does the rest. testing by learning “Folklore” baseline in property testing: “if you can learn, you can test.” (i) Learn p as if p ∈ P using a learner for P 16

  31. (iii) Check if TV p (Computational) 3 The triangle inequality does the rest. testing by learning “Folklore” baseline in property testing: “if you can learn, you can test.” (i) Learn p as if p ∈ P using a learner for P p , p ) ≥ 2 ε (ii) Test d TV (ˆ p , p ) ≤ ε 3 vs. d TV (ˆ 3 16

  32. The triangle inequality does the rest. testing by learning “Folklore” baseline in property testing: “if you can learn, you can test.” (i) Learn p as if p ∈ P using a learner for P p , p ) ≥ 2 ε (ii) Test d TV (ˆ p , p ) ≤ ε 3 vs. d TV (ˆ 3 (iii) Check if d TV (ˆ p , P ) ≤ ε (Computational) 3 16

  33. testing by learning “Folklore” baseline in property testing: “if you can learn, you can test.” (i) Learn p as if p ∈ P using a learner for P p , p ) ≥ 2 ε (ii) Test d TV (ˆ p , p ) ≤ ε 3 vs. d TV (ˆ 3 (iii) Check if d TV (ˆ p , P ) ≤ ε (Computational) 3 The triangle inequality does the rest. 16

  34. testing by learning? “Folklore” baseline in property testing: “if you can learn, you can test.” (i) Learn p as if p ∈ P using a learner for P p , p ) ≥ 2 ε (ii) Test if d TV (ˆ p , p ) ≤ ε 3 vs. d TV (ˆ 3 (iii) Check if d TV (ˆ p , P ) ≤ ε (Computational) 3 Not quite. n (ii) fine for functions. But for distributions? Requires Ω( log n ) samples [VV11, JYW17] 17

  35. unified approaches: leveraging structure

  36. Theorem (Wishful) Let be a class of distributions that all exhibit some “nice structure.” If can be tested with q queries, algorithm can too, with “roughly” q queries as well. More formally, we want: Goal Design general-purpose testing algorithms that, when applied to a property , have (tight, or at least reasonable) sample complexity q as long as satisfies some structural assumption parameterized by . swiss army knives What we want General algorithms applying to all (or many) distribution testing problems. 19

  37. More formally, we want: Goal Design general-purpose testing algorithms that, when applied to a property , have (tight, or at least reasonable) sample complexity q as long as satisfies some structural assumption parameterized by . swiss army knives What we want General algorithms applying to all (or many) distribution testing problems. Theorem (Wishful) Let P be a class of distributions that all exhibit some “nice structure.” If P can be tested with q queries, algorithm T can too, with “roughly” q queries as well. 19

  38. swiss army knives What we want General algorithms applying to all (or many) distribution testing problems. Theorem (Wishful) Let P be a class of distributions that all exhibit some “nice structure.” If P can be tested with q queries, algorithm T can too, with “roughly” q queries as well. More formally, we want: Goal Design general-purpose testing algorithms that, when applied to a property P , have (tight, or at least reasonable) sample complexity q ( ε, τ ) as long as P satisfies some structural assumption S τ parameterized by τ . 19

  39. swiss army knives: shape restrictions Structural assumption S τ : every distribution in P is well-approximated (in a specific ℓ 2 -type sense) by a piecewise-constant distribution with L P ( τ ) pieces. 20

  40. swiss army knives: shape restrictions Structural assumption S τ : every distribution in P is well-approximated (in a specific ℓ 2 -type sense) by a piecewise-constant distribution with L P ( τ ) pieces. Theorem ([CDGR16]) There exists an algorithm which, given sampling access to an unknown distribution p over [ n ] and parameter ε ∈ ( 0 , 1 ] , can distinguish with probability 2 / 3 between (a) p ∈ P versus (b) nL P ( ε ) /ε 3 + L P ( ε ) /ε 2 ) samples. d TV ( p , P ) > ε , with ˜ O ( √ 20

  41. Applications ∙ monotonicity ∙ log-concavity ∙ unimodality ∙ Poisson Binomial ∙ k-modality ∙ Monotone Hazard Rate ∙ k-histograms … swiss army knives: shape restrictions Outline: Abstracting ideas from [BKR04] (for monotonicity): 1. decomposition step: recursively build a partition Π of [ n ] in O ( L P ( ε )) intervals s.t. p is roughly uniform on each piece. If successful, then p will be close to its “flattening” q on Π ; if not, we have proof that p / ∈ P and we can reject. 2. approximation step: learn q. Can be done with few samples since Π has few intervals. 3. projection step: (computational) verify that d TV ( q , P ) < O ( ε ) . 21

  42. swiss army knives: shape restrictions Outline: Abstracting ideas from [BKR04] (for monotonicity): 1. decomposition step: recursively build a partition Π of [ n ] in O ( L P ( ε )) intervals s.t. p is roughly uniform on each piece. If successful, then p will be close to its “flattening” q on Π ; if not, we have proof that p / ∈ P and we can reject. 2. approximation step: learn q. Can be done with few samples since Π has few intervals. 3. projection step: (computational) verify that d TV ( q , P ) < O ( ε ) . Applications ∙ monotonicity ∙ log-concavity ∙ unimodality ∙ Poisson Binomial ∙ k-modality ∙ Monotone Hazard Rate ∙ k-histograms … 21

  43. that’s great! but… Figure: A 3-SIIRV (for n = 100). Like all of us, it has ups and downs. 22

  44. swiss army knives: fourier sparsity Structural assumption S τ : every distribution in P has sparse Fourier and effective support: ∃ M P ( τ ) , S P ( τ ) s.t. ∀ p ∈ P , ∃ I p ⊆ [ n ] with | I p | ≤ M P ( τ ) p 1 S P ( ε ) ∥ 2 ≤ O ( ε ) , ∥ p 1 I p ∥ 1 ≤ O ( ε ) ∥ ˆ Theorem ([CDS17]) There exists an algorithm which, given sampling access to an unknown distribution p over [ n ] and parameter ε ∈ ( 0 , 1 ] , can distinguish with probability 2 / 3 between (a) p ∈ P versus (b) | S P ( ε ) | M P ( ε ) /ε 2 + | S P ( ε ) | /ε 2 ) samples. d TV ( p , P ) > ε , with ˜ O ( √ 23

  45. Applications ∙ k-SIIRVS ∙ Poisson Multinomial ∙ Poisson Binomial ∙ log-concavity swiss army knives: fourier sparsity Outline: 1. effective support test: take samples to identify a candidate I p , and check | I p | ≤ M ( ε ) 2. Fourier effective support test: invoke a Fourier sparsity subroutine to check that ∥ ˆ p 1 S P ( ε ) ∥ 2 ≤ O ( ε ) (if so learn q, inverse Fourier transform of ˆ p 1 S P ( ε ) ) 3. projection step: (computational) verify that d TV ( q , P ) < O ( ε ) . 24

  46. swiss army knives: fourier sparsity Outline: 1. effective support test: take samples to identify a candidate I p , and check | I p | ≤ M ( ε ) 2. Fourier effective support test: invoke a Fourier sparsity subroutine to check that ∥ ˆ p 1 S P ( ε ) ∥ 2 ≤ O ( ε ) (if so learn q, inverse Fourier transform of ˆ p 1 S P ( ε ) ) 3. projection step: (computational) verify that d TV ( q , P ) < O ( ε ) . Applications ∙ k-SIIRVS ∙ Poisson Multinomial ∙ Poisson Binomial ∙ log-concavity 24

  47. in more detail

  48. First non-trivial tester for SIIRVs. Near-optimal for constant k: lower k 1 2 n 1 4 bound of [CDGR16]. 2 fourier sparsity: the guiding example Theorem (Testing SIIRVs) There exists an algorithm that, given k , n ∈ N , ε ∈ ( 0 , 1 ] , and sample access to p ∈ ∆( N ) , tests the class of k-SIIRVs with ( kn 1 / 4 ε + k 2 log 1 / 4 1 ε 2 log 2 k ) O ε 2 ε samples from p, and runs in time n ( k /ε ) O ( k log( k /ε )) . 26

  49. Near-optimal for constant k: lower k 1 2 n 1 4 bound of [CDGR16]. 2 fourier sparsity: the guiding example Theorem (Testing SIIRVs) There exists an algorithm that, given k , n ∈ N , ε ∈ ( 0 , 1 ] , and sample access to p ∈ ∆( N ) , tests the class of k-SIIRVs with ( kn 1 / 4 ε + k 2 log 1 / 4 1 ε 2 log 2 k ) O ε 2 ε samples from p, and runs in time n ( k /ε ) O ( k log( k /ε )) . First non-trivial tester for SIIRVs. 26

  50. fourier sparsity: the guiding example Theorem (Testing SIIRVs) There exists an algorithm that, given k , n ∈ N , ε ∈ ( 0 , 1 ] , and sample access to p ∈ ∆( N ) , tests the class of k-SIIRVs with ( kn 1 / 4 ε + k 2 log 1 / 4 1 ε 2 log 2 k ) O ε 2 ε samples from p, and runs in time n ( k /ε ) O ( k log( k /ε )) . First non-trivial tester for SIIRVs. Near-optimal for constant k: lower ( k 1 / 2 n 1 / 4 ) bound of Ω [CDGR16]. ε 2 26

  51. fourier sparsity: the guiding example Theorem (Testing SIIRVs) There exists an algorithm that, given k , n ∈ N , ε ∈ ( 0 , 1 ] , and sample access to p ∈ ∆( N ) , tests the class of k-SIIRVs with ( kn 1 / 4 ε + k 2 log 1 / 4 1 ε 2 log 2 k ) O ε 2 ε samples from p, and runs in time n ( k /ε ) O ( k log( k /ε )) . First non-trivial tester for SIIRVs. Near-optimal for constant k: lower ( k 1 / 2 n 1 / 4 ) bound of Ω [CDGR16]. ε 2 26

  52. ∙ have sparse effective support ∙ have nicely bounded 2 norm ∙ have very nice Fourier spectrum fourier sparsity: the guiding example k -SIIRVs… ∙ are very badly approximated by histograms 27

  53. ∙ have nicely bounded 2 norm ∙ have very nice Fourier spectrum fourier sparsity: the guiding example k -SIIRVs… ∙ are very badly approximated by histograms ∙ have sparse effective support 27

  54. ∙ have very nice Fourier spectrum fourier sparsity: the guiding example k -SIIRVs… ∙ are very badly approximated by histograms ∙ have sparse effective support ∙ have nicely bounded ℓ 2 norm 27

  55. fourier sparsity: the guiding example k -SIIRVs… ∙ are very badly approximated by histograms ∙ have sparse effective support ∙ have nicely bounded ℓ 2 norm ∙ have very nice Fourier spectrum 27

  56. fourier sparsity (the fine print) Theorem (General Testing Statement) Let P ⊆ ∆( N ) be a property satisfying the following. ∃ S : ( 0 , 1 ] → 2 N , M : ( 0 , 1 ] → N , and q I : ( 0 , 1 ] → N s.t. for all ε ∈ ( 0 , 1 ] , 1. Fourier sparsity: ∀ p ∈ P , the Fourier transform (modulo M ( ε ) ) of p is concentrated on S ( ε ) : p 1 S ( ε ) ∥ 2 2 ≤ O ( ε 2 ) . namely, ∥ � 2. Support sparsity: ∀ p ∈ P , ∃ interval I ⊆ N with | I | ≤ M ( ε ) such that (i) p is concentrated on I: p ( I ) ≥ 1 − O ( ε ) and (ii) I can be identified w.h.p. with q I ( ε ) samples. 3. Projection: there is a procedure Project P which, on input ε and the explicit description of h ∈ ∆( N ) , runs in time T ( ε ) and distinguishes between d TV ( h , P ) ≤ 2 ε 5 , and d TV ( h , P ) > ε 2 . 4. (Optional) L 2 -norm bound: ∃ b ∈ ( 0 , 1 ] s.t. ∥ p ∥ 2 2 ≤ b ∀ p ∈ P . Then, ∃ a tester for P with sample complexity m equal to ( √ ) | S ( ε ) | M ( ε ) + | S ( ε ) | O + q I ( ε ) ε 2 ε 2 ( √ ) bM ( ε ) + | S ( ε ) | (if (iv) holds, can replace by O + q I ( ε ) ); and runs in time O ( m | S | + T ( ε )) . ε 2 ε 2 Further, when the algorithm accepts, it also learns p: i.e., outputs hypothesis h s.t. d TV ( p , h ) ≤ ε . 28

  57. Require: sample access to a distribution p ∈ ∆( N ) , parameter ε ∈ ( 0 , 1 ] , b ∈ ( 0 , 1 ] , functions S : ( 0 , 1 ] → 2 N , M : ( 0 , 1 ] → N , q I : ( 0 , 1 ] → N , and procedure Project P 1: Effective Support 2: Take q I ( ε ) samples to identify a “candidate set” I. ▷ Works s.h.p if p ∈ P . 3: Take O ( 1 /ε ) samples to distinguish b/w p ( I ) ≥ 1 − ε 5 and p ( I ) < 1 − ε 4 . ▷ Correct w.h.p. 4: if | I | > M ( ε ) or we detected that p ( I ) > ε 4 then 5: return reject 6: end if 7: 8: Fourier Effective Support Simulating sample access to p ′ = p mod M ( ε ) , call TestFourierSupport on p ′ with 9: parameters M ( ε ) , M ( ε ) , b, and S ( ε ) . 5 √ ε 10: if TestFourierSupport returned reject then 11: return reject 12: end if 13: Let � h = ( � h ( ξ )) ξ ∈ S ( ε ) be the Fourier coefficients it outputs, and h their inverse Fourier transform (modulo M ( ε ) ) ▷ Do not actually compute h here. 14: 15: Projection Step 16: Call Project P on parameters ε and h, and return accept if it does, reject otherwise. 17: 29

  58. (Modulo one little lie.) Other results… For PBD (k 2) and PMDs (multidimensional) as well, the second w/ the suitable generalization of discrete Fourier transform. fourier sparsity: the guiding example With this in hand… The testing result for k-SIIRVs immediately follows. 30

  59. Other results… For PBD (k 2) and PMDs (multidimensional) as well, the second w/ the suitable generalization of discrete Fourier transform. fourier sparsity: the guiding example With this in hand… The testing result for k-SIIRVs immediately follows.(Modulo one little lie.) 30

  60. fourier sparsity: the guiding example With this in hand… The testing result for k-SIIRVs immediately follows.(Modulo one little lie.) Other results… For PBD (k = 2) and PMDs (multidimensional) as well, the second w/ the suitable generalization of discrete Fourier transform. 30

  61. fourier sparsity (the main tool) Theorem (Testing Fourier Sparsity) Given parameters M ≥ 1, ε, b ∈ ( 0 , 1 ] , subset S ⊆ [ M ] and sample access to q ∈ ∆([ M ]) , TestFourierSupport either rejects or outputs Fourier coefficients � h ′ = ( � h ′ ( ξ )) ξ ∈ S s.t., w.h.p., all the following holds. 1. if ∥ q ∥ 2 2 > 2b, then it rejects; q ∗ supported entirely on S, ∥ q − q ∗ ∥ 2 > ε , then it 2. if ∥ q ∥ 2 2 ≤ 2b and ∀ q ∗ : [ M ] → R with � rejects; q ∗ supported entirely on S s.t. ∥ q − q ∗ ∥ 2 ≤ ε 3. if ∥ q ∥ 2 2 ≤ b and ∃ q ∗ : [ M ] → R with � 2 , then it does not reject; √ 4. if it does not reject, then ∥ � q 1 S − � h ′ ∥ 2 ≤ O ( ε M ) and the inverse Fourier transform (modulo M) h ′ of the Fourier coefficients it outputs satisfies ∥ q − h ′ ∥ 2 ≤ O ( ε ) . ( √ ) √ | S | Moreover, it takes m = O b M samples from q, and runs in time O ( m | S | ) . ε 2 + M ε 2 + 31

  62. Second idea Do not consider directly these coefficients (timewise, expensive). Instead, rely on (the analysis of) an 2 identity tester [CDVV14]+Plancherel to get guarantees on the FC. fourier sparsity (the main tool) Idea Consider the Fourier coefficients of the empirical distribution (from few samples). 32

  63. fourier sparsity (the main tool) Idea Consider the Fourier coefficients of the empirical distribution (from few samples). Second idea Do not consider directly these coefficients (timewise, expensive). Instead, rely on (the analysis of) an ℓ 2 identity tester [CDVV14]+Plancherel to get guarantees on the FC. 32

  64. fourier sparsity (the main tool) Idea Consider the Fourier coefficients of the empirical distribution (from few samples). Second idea Do not consider directly these coefficients (timewise, expensive). Instead, rely on (the analysis of) an ℓ 2 identity tester [CDVV14]+Plancherel to get guarantees on the FC. 32

  65. open questions, and questions.

  66. ∙ Uncertainty Principle: what about this S M term? ∙ Fourier works: what about other bases? open questions ∙ More applications: what is your favorite property? 34

  67. ∙ Fourier works: what about other bases? open questions ∙ More applications: what is your favorite property? ∙ Uncertainty Principle: what about this | S ( ε ) | M ( ε ) term? √ 34

  68. open questions ∙ More applications: what is your favorite property? ∙ Uncertainty Principle: what about this | S ( ε ) | M ( ε ) term? √ ∙ Fourier works: what about other bases? 34

  69. Thank You. 35

  70. Jayadev Acharya and Constantinos Daskalakis. Testing Poisson Binomial Distributions. In Proceedings of SODA, pages 1829–1840, 2015. Jayadev Acharya, Constantinos Daskalakis, and Gautam C. Kamath. Optimal Testing for Properties of Distributions. In C. Cortes, N.D. Lawrence, D.D. Lee, M. Sugiyama, R. Garnett, and R. Garnett, editors, Advances in Neural Information Processing Systems 28, pages 3577–3598. Curran Associates, Inc., 2015. Jayadev Acharya, Ilias Diakonikolas, Jerry Zheng Li, and Ludwig Schmidt. Sample-optimal density estimation in nearly-linear time. In Proceedings of SODA, pages 1278–1289. SIAM, 2017. Eric Blais, Clément L. Canonne, and Tom Gur. Distribution testing lower bounds via reductions from communication complexity. In Computational Complexity Conference, volume 79 of LIPIcs, pages 28:1–28:40. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2017. Tuğkan Batu, Eldar Fischer, Lance Fortnow, Ravi Kumar, Ronitt Rubinfeld, and Patrick White. Testing random variables for independence and identity. In Proceedings of FOCS, pages 442–451, 2001. 35

  71. Tuğkan Batu, Lance Fortnow, Ronitt Rubinfeld, Warren D. Smith, and Patrick White. Testing that distributions are close. In Proceedings of FOCS, pages 189–197, 2000. Arnab Bhattacharyya, Eldar Fischer, Ronitt Rubinfeld, and Paul Valiant. Testing monotonicity of distributions over general partial orders. In Proceedings of ITCS, pages 239–252, 2011. Tuğkan Batu, Ravi Kumar, and Ronitt Rubinfeld. Sublinear algorithms for testing monotone and unimodal distributions. In Proceedings of STOC, pages 381–390, New York, NY, USA, 2004. ACM. Arnab Bhattacharyya and Yuichi Yoshida. Property Testing. Forthcoming, 2017. Clément L. Canonne. A Survey on Distribution Testing: your data is Big. But is it Blue? Electronic Colloquium on Computational Complexity (ECCC), 22:63, April 2015. Clément L. Canonne, Ilias Diakonikolas, Themis Gouleakis, and Ronitt Rubinfeld. Testing Shape Restrictions of Discrete Distributions. In Proceedings of STACS, 2016. 35

  72. See also [CDGR17] (full version). Clément L. Canonne, Ilias Diakonikolas, Themis Gouleakis, and Ronitt Rubinfeld. Testing shape restrictions of discrete distributions. Theory of Computing Systems, pages 1–59, 2017. Yu Cheng, Ilias Diakonikolas, and Alistair Stewart. Playing anonymous games using simple strategies. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, Proceedings of SODA, pages 616–631, Philadelphia, PA, USA, 2017. Society for Industrial and Applied Mathematics. Siu-on Chan, Ilias Diakonikolas, Rocco A. Servedio, and Xiaorui Sun. Learning mixtures of structured distributions over discrete domains. In Proceedings of SODA, pages 1380–1394, 2013. Siu-on Chan, Ilias Diakonikolas, Rocco A. Servedio, and Xiaorui Sun. Efficient density estimation via piecewise polynomial approximation. In Proceedings of STOC, pages 604–613. ACM, 2014. Siu-on Chan, Ilias Diakonikolas, Rocco A. Servedio, and Sun. Xiaorui. Near-optimal density estimation in near-linear time using variable-width histograms. 35

  73. In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada, pages 1844–1852, 2014. Siu-on Chan, Ilias Diakonikolas, Gregory Valiant, and Paul Valiant. Optimal algorithms for testing closeness of discrete distributions. In Proceedings of SODA, pages 1193–1203, 2014. Ilias Diakonikolas, Themis Gouleakis, John Peebles, and Eric Price. Collision-based testers are optimal for uniformity and closeness. Electronic Colloquium on Computational Complexity (ECCC), 23:178, 2016. Ilias Diakonikolas and Daniel M. Kane. A new approach for testing properties of discrete distributions. In Proceedings of FOCS. IEEE Computer Society, 2016. Oded Goldreich, Shafi Goldwasser, and Dana Ron. Property testing and its connection to learning and approximation. Journal of the ACM, 45(4):653–750, July 1998. Oded Goldreich, editor. Property Testing: Current Research and Surveys. Springer, 2010. LNCS 6390. 35

  74. Oded Goldreich. Introduction to Property Testing. Forthcoming, 2017. Oded Goldreich and Dana Ron. On testing expansion in bounded-degree graphs. Technical Report TR00-020, Electronic Colloquium on Computational Complexity (ECCC), 2000. Jiantao Jiao, Han Yanjun, and Tsachy Weissman. Minimax Estimation of the L_1 Distance. ArXiv e-prints, May 2017. Reut Levi, Dana Ron, and Ronitt Rubinfeld. Testing properties of collections of distributions. Theory of Computing, 9:295–347, 2013. Liam Paninski. A coincidence-based test for uniformity given very sparsely sampled discrete data. IEEE Transactions on Information Theory, 54(10):4750–4755, 2008. Dana Ron. Property Testing: A Learning Theory Perspective. 35

Recommend


More recommend