frontiers in distribution testing a sample of what to
play

Frontiers in Distribution Testing: A Sample of What to Expect Too - PowerPoint PPT Presentation

Frontiers in Distribution Testing: A Sample of What to Expect Too Early for Puns? Clment Canonne October 14, 2017 Columbia University Stanford University Background, Context, and Motivation Expensive access: pricey data Model


  1. 2 [AD15, CDGR16, CDS17] Background Over the past 15+ years, many results on many properties: Caveat: The above is not entirely accurate, and only the (usually) dominant term is included. For instance, the sample complexity of So much has been done; and yet so much remains… [Rub12, Can15] clusterability, juntas… and it goes on. • histograms, MHR, log-concavity, k -wise independence, SIIRV, PMD, n 1 4 • Poisson Binomial Distributions: [BKR04, BFRV11, ADK15] 2 n • Monotonicity: [BFF 01, LRR13, DK16] 4 3 m 2 3 n 1 3 • Independence: 5 [GR00, BFR + 00, Pan08, DGPP16] • Uniformity: Θ( √ n /ε 2 ) [BFF + 01, VV14, DKN15, BCG17] • Identity: Θ( √ n /ε 2 ) , Φ( p , Θ( ε )) [BFR + 00, Val11, CDVV14, DK16] • Equivalence: Θ( n 2 / 3 /ε 4 / 3 ) equivalence is actually Θ( max ( n 2 / 3 /ε 4 / 3 , √ n /ε 2 )) ; for monotonicity, the current best upper bound has an additional 1 /ε 4 term, while for PBDs the lower bound of Ω( n 1 / 4 /ε 2 ) is almost matched by an O ( n 1 / 4 /ε 2 + log2 ( 1 /ε ) /ε 2 ) upper bound. Don’t sue me.

  2. 2 [AD15, CDGR16, CDS17] Background Over the past 15+ years, many results on many properties: Caveat: The above is not entirely accurate, and only the (usually) dominant term is included. For instance, the sample complexity of So much has been done; and yet so much remains… [Rub12, Can15] clusterability, juntas… and it goes on. • histograms, MHR, log-concavity, k -wise independence, SIIRV, PMD, n 1 4 • Poisson Binomial Distributions: [BKR04, BFRV11, ADK15] 2 n • Monotonicity: 5 [GR00, BFR + 00, Pan08, DGPP16] • Uniformity: Θ( √ n /ε 2 ) [BFF + 01, VV14, DKN15, BCG17] • Identity: Θ( √ n /ε 2 ) , Φ( p , Θ( ε )) [BFR + 00, Val11, CDVV14, DK16] • Equivalence: Θ( n 2 / 3 /ε 4 / 3 ) • Independence: Θ( m 2 / 3 n 1 / 3 /ε 4 / 3 ) [BFF + 01, LRR13, DK16] equivalence is actually Θ( max ( n 2 / 3 /ε 4 / 3 , √ n /ε 2 )) ; for monotonicity, the current best upper bound has an additional 1 /ε 4 term, while for PBDs the lower bound of Ω( n 1 / 4 /ε 2 ) is almost matched by an O ( n 1 / 4 /ε 2 + log2 ( 1 /ε ) /ε 2 ) upper bound. Don’t sue me.

  3. 2 [AD15, CDGR16, CDS17] Background Over the past 15+ years, many results on many properties: Caveat: The above is not entirely accurate, and only the (usually) dominant term is included. For instance, the sample complexity of So much has been done; and yet so much remains… [Rub12, Can15] clusterability, juntas… and it goes on. • histograms, MHR, log-concavity, k -wise independence, SIIRV, PMD, n 1 4 • Poisson Binomial Distributions: [BKR04, BFRV11, ADK15] 5 [GR00, BFR + 00, Pan08, DGPP16] • Uniformity: Θ( √ n /ε 2 ) [BFF + 01, VV14, DKN15, BCG17] • Identity: Θ( √ n /ε 2 ) , Φ( p , Θ( ε )) [BFR + 00, Val11, CDVV14, DK16] • Equivalence: Θ( n 2 / 3 /ε 4 / 3 ) • Independence: Θ( m 2 / 3 n 1 / 3 /ε 4 / 3 ) [BFF + 01, LRR13, DK16] • Monotonicity: Θ( √ n /ε 2 ) equivalence is actually Θ( max ( n 2 / 3 /ε 4 / 3 , √ n /ε 2 )) ; for monotonicity, the current best upper bound has an additional 1 /ε 4 term, while for PBDs the lower bound of Ω( n 1 / 4 /ε 2 ) is almost matched by an O ( n 1 / 4 /ε 2 + log2 ( 1 /ε ) /ε 2 ) upper bound. Don’t sue me.

  4. Background Over the past 15+ years, many results on many properties: Caveat: The above is not entirely accurate, and only the (usually) dominant term is included. For instance, the sample complexity of So much has been done; and yet so much remains… [Rub12, Can15] clusterability, juntas… and it goes on. • histograms, MHR, log-concavity, k -wise independence, SIIRV, PMD, [BKR04, BFRV11, ADK15] 5 [GR00, BFR + 00, Pan08, DGPP16] • Uniformity: Θ( √ n /ε 2 ) [BFF + 01, VV14, DKN15, BCG17] • Identity: Θ( √ n /ε 2 ) , Φ( p , Θ( ε )) [BFR + 00, Val11, CDVV14, DK16] • Equivalence: Θ( n 2 / 3 /ε 4 / 3 ) • Independence: Θ( m 2 / 3 n 1 / 3 /ε 4 / 3 ) [BFF + 01, LRR13, DK16] • Monotonicity: Θ( √ n /ε 2 ) • Poisson Binomial Distributions: ˜ Θ( n 1 / 4 /ε 2 ) [AD15, CDGR16, CDS17] equivalence is actually Θ( max ( n 2 / 3 /ε 4 / 3 , √ n /ε 2 )) ; for monotonicity, the current best upper bound has an additional 1 /ε 4 term, while for PBDs the lower bound of Ω( n 1 / 4 /ε 2 ) is almost matched by an O ( n 1 / 4 /ε 2 + log2 ( 1 /ε ) /ε 2 ) upper bound. Don’t sue me.

  5. Background Over the past 15+ years, many results on many properties: Caveat: The above is not entirely accurate, and only the (usually) dominant term is included. For instance, the sample complexity of So much has been done; and yet so much remains… [Rub12, Can15] clusterability, juntas… and it goes on. • histograms, MHR, log-concavity, k -wise independence, SIIRV, PMD, [BKR04, BFRV11, ADK15] 5 [GR00, BFR + 00, Pan08, DGPP16] • Uniformity: Θ( √ n /ε 2 ) [BFF + 01, VV14, DKN15, BCG17] • Identity: Θ( √ n /ε 2 ) , Φ( p , Θ( ε )) [BFR + 00, Val11, CDVV14, DK16] • Equivalence: Θ( n 2 / 3 /ε 4 / 3 ) • Independence: Θ( m 2 / 3 n 1 / 3 /ε 4 / 3 ) [BFF + 01, LRR13, DK16] • Monotonicity: Θ( √ n /ε 2 ) • Poisson Binomial Distributions: ˜ Θ( n 1 / 4 /ε 2 ) [AD15, CDGR16, CDS17] equivalence is actually Θ( max ( n 2 / 3 /ε 4 / 3 , √ n /ε 2 )) ; for monotonicity, the current best upper bound has an additional 1 /ε 4 term, while for PBDs the lower bound of Ω( n 1 / 4 /ε 2 ) is almost matched by an O ( n 1 / 4 /ε 2 + log2 ( 1 /ε ) /ε 2 ) upper bound. Don’t sue me.

  6. and yet so much remains… Background Over the past 15+ years, many results on many properties: Caveat: The above is not entirely accurate, and only the (usually) dominant term is included. For instance, the sample complexity of So much has been done; [Rub12, Can15] clusterability, juntas… and it goes on. • histograms, MHR, log-concavity, k -wise independence, SIIRV, PMD, [BKR04, BFRV11, ADK15] 5 [GR00, BFR + 00, Pan08, DGPP16] • Uniformity: Θ( √ n /ε 2 ) [BFF + 01, VV14, DKN15, BCG17] • Identity: Θ( √ n /ε 2 ) , Φ( p , Θ( ε )) [BFR + 00, Val11, CDVV14, DK16] • Equivalence: Θ( n 2 / 3 /ε 4 / 3 ) • Independence: Θ( m 2 / 3 n 1 / 3 /ε 4 / 3 ) [BFF + 01, LRR13, DK16] • Monotonicity: Θ( √ n /ε 2 ) • Poisson Binomial Distributions: ˜ Θ( n 1 / 4 /ε 2 ) [AD15, CDGR16, CDS17] equivalence is actually Θ( max ( n 2 / 3 /ε 4 / 3 , √ n /ε 2 )) ; for monotonicity, the current best upper bound has an additional 1 /ε 4 term, while for PBDs the lower bound of Ω( n 1 / 4 /ε 2 ) is almost matched by an O ( n 1 / 4 /ε 2 + log2 ( 1 /ε ) /ε 2 ) upper bound. Don’t sue me.

  7. Background Over the past 15+ years, many results on many properties: Caveat: The above is not entirely accurate, and only the (usually) dominant term is included. For instance, the sample complexity of So much has been done; and yet so much remains… [Rub12, Can15] clusterability, juntas… and it goes on. • histograms, MHR, log-concavity, k -wise independence, SIIRV, PMD, [BKR04, BFRV11, ADK15] 5 [GR00, BFR + 00, Pan08, DGPP16] • Uniformity: Θ( √ n /ε 2 ) [BFF + 01, VV14, DKN15, BCG17] • Identity: Θ( √ n /ε 2 ) , Φ( p , Θ( ε )) [BFR + 00, Val11, CDVV14, DK16] • Equivalence: Θ( n 2 / 3 /ε 4 / 3 ) • Independence: Θ( m 2 / 3 n 1 / 3 /ε 4 / 3 ) [BFF + 01, LRR13, DK16] • Monotonicity: Θ( √ n /ε 2 ) • Poisson Binomial Distributions: ˜ Θ( n 1 / 4 /ε 2 ) [AD15, CDGR16, CDS17] equivalence is actually Θ( max ( n 2 / 3 /ε 4 / 3 , √ n /ε 2 )) ; for monotonicity, the current best upper bound has an additional 1 /ε 4 term, while for PBDs the lower bound of Ω( n 1 / 4 /ε 2 ) is almost matched by an O ( n 1 / 4 /ε 2 + log2 ( 1 /ε ) /ε 2 ) upper bound. Don’t sue me.

  8. Background Over the past 15+ years, many results on many properties: Caveat: The above is not entirely accurate, and only the (usually) dominant term is included. For instance, the sample complexity of So much has been done; and yet so much remains… [Rub12, Can15] clusterability, juntas… and it goes on. • histograms, MHR, log-concavity, k -wise independence, SIIRV, PMD, [BKR04, BFRV11, ADK15] 5 [GR00, BFR + 00, Pan08, DGPP16] • Uniformity: Θ( √ n /ε 2 ) [BFF + 01, VV14, DKN15, BCG17] • Identity: Θ( √ n /ε 2 ) , Φ( p , Θ( ε )) [BFR + 00, Val11, CDVV14, DK16] • Equivalence: Θ( n 2 / 3 /ε 4 / 3 ) • Independence: Θ( m 2 / 3 n 1 / 3 /ε 4 / 3 ) [BFF + 01, LRR13, DK16] • Monotonicity: Θ( √ n /ε 2 ) • Poisson Binomial Distributions: ˜ Θ( n 1 / 4 /ε 2 ) [AD15, CDGR16, CDS17] equivalence is actually Θ( max ( n 2 / 3 /ε 4 / 3 , √ n /ε 2 )) ; for monotonicity, the current best upper bound has an additional 1 /ε 4 term, while for PBDs the lower bound of Ω( n 1 / 4 /ε 2 ) is almost matched by an O ( n 1 / 4 /ε 2 + log2 ( 1 /ε ) /ε 2 ) upper bound. Don’t sue me.

  9. Many questions remain Techniques Most algorithms, results are somewhat ad hoc, and property-specifjc. Hardness Most properties are depressingly hard to test: n samples are required. Tolerance and estimation Testing is good; but what about tolerant testing and functional estimation? Beyond? Only a preliminary step! What if… 6

  10. Many questions remain Techniques Most algorithms, results are somewhat ad hoc, and property-specifjc. Hardness required. Tolerance and estimation Testing is good; but what about tolerant testing and functional estimation? Beyond? Only a preliminary step! What if… 6 Most properties are depressingly hard to test: Ω( √ n ) samples are

  11. Many questions remain Techniques Most algorithms, results are somewhat ad hoc, and property-specifjc. Hardness required. Tolerance and estimation Testing is good; but what about tolerant testing and functional estimation? Beyond? Only a preliminary step! What if… 6 Most properties are depressingly hard to test: Ω( √ n ) samples are

  12. Many questions remain Techniques Most algorithms, results are somewhat ad hoc, and property-specifjc. Hardness required. Tolerance and estimation Testing is good; but what about tolerant testing and functional estimation? Beyond? Only a preliminary step! What if… 6 Most properties are depressingly hard to test: Ω( √ n ) samples are

  13. Some Notation

  14. • Property (or class) of distributions over • Total variation distance (statistical distance, 1 distance): d TV p q 2 x 0 1 is small. Lowercase Greek letters are in 0 1 . Asymptotics O , Glossary 1 hide logarithmic factors.* , ”). Proximity parameter is big (“goes to Domain size/parameter n 0 1 q x p x S q S p S sup : 7 • Probability distributions over discrete Ω (e.g. [ n ] := { 1 , . . . , n } ) { } ∑ ∆([ n ]) = p : Ω → [ 0 , 1 ] : p ( i ) = 1 i ∈ Ω

  15. 1 distance): d TV p q 2 x 0 1 is small. Lowercase Greek letters are in 0 1 . Asymptotics O , Glossary q S hide logarithmic factors.* , ”). Proximity parameter is big (“goes to Domain size/parameter n 0 1 q x p x 1 S p S sup • Total variation distance (statistical distance, 7 • Probability distributions over discrete Ω (e.g. [ n ] := { 1 , . . . , n } ) { } ∑ ∆([ n ]) = p : Ω → [ 0 , 1 ] : p ( i ) = 1 i ∈ Ω • Property (or class) of distributions over Ω : P ⊆ ∆(Ω)

  16. 0 1 is small. Lowercase Greek letters are in 0 1 . Asymptotics O , Glossary 2 hide logarithmic factors.* , ”). Proximity parameter is big (“goes to Domain size/parameter n 7 • Probability distributions over discrete Ω (e.g. [ n ] := { 1 , . . . , n } ) { } ∑ ∆([ n ]) = p : Ω → [ 0 , 1 ] : p ( i ) = 1 i ∈ Ω • Property (or class) of distributions over Ω : P ⊆ ∆(Ω) • Total variation distance (statistical distance, ℓ 1 distance): ∑ d TV ( p , q ) = sup ( p ( S ) − q ( S )) = 1 | p ( x ) − q ( x ) | ∈ [ 0 , 1 ] S ⊆ Ω x ∈ Ω

  17. 0 1 is small. Lowercase Greek letters are in 0 1 . Asymptotics O , Glossary 2 hide logarithmic factors.* , Proximity parameter 7 • Probability distributions over discrete Ω (e.g. [ n ] := { 1 , . . . , n } ) { } ∑ ∆([ n ]) = p : Ω → [ 0 , 1 ] : p ( i ) = 1 i ∈ Ω • Property (or class) of distributions over Ω : P ⊆ ∆(Ω) • Total variation distance (statistical distance, ℓ 1 distance): ∑ d TV ( p , q ) = sup ( p ( S ) − q ( S )) = 1 | p ( x ) − q ( x ) | ∈ [ 0 , 1 ] S ⊆ Ω x ∈ Ω Domain size/parameter n ∈ N is big (“goes to ∞ ”).

  18. Glossary 2 hide logarithmic factors.* , Lowercase Greek letters are in 0 1 . Asymptotics O , 7 • Probability distributions over discrete Ω (e.g. [ n ] := { 1 , . . . , n } ) { } ∑ ∆([ n ]) = p : Ω → [ 0 , 1 ] : p ( i ) = 1 i ∈ Ω • Property (or class) of distributions over Ω : P ⊆ ∆(Ω) • Total variation distance (statistical distance, ℓ 1 distance): ∑ d TV ( p , q ) = sup ( p ( S ) − q ( S )) = 1 | p ( x ) − q ( x ) | ∈ [ 0 , 1 ] S ⊆ Ω x ∈ Ω Domain size/parameter n ∈ N is big (“goes to ∞ ”). Proximity parameter ε ∈ ( 0 , 1 ] is small.

  19. Glossary 2 hide logarithmic factors.* , Asymptotics O , 7 • Probability distributions over discrete Ω (e.g. [ n ] := { 1 , . . . , n } ) { } ∑ ∆([ n ]) = p : Ω → [ 0 , 1 ] : p ( i ) = 1 i ∈ Ω • Property (or class) of distributions over Ω : P ⊆ ∆(Ω) • Total variation distance (statistical distance, ℓ 1 distance): ∑ d TV ( p , q ) = sup ( p ( S ) − q ( S )) = 1 | p ( x ) − q ( x ) | ∈ [ 0 , 1 ] S ⊆ Ω x ∈ Ω Domain size/parameter n ∈ N is big (“goes to ∞ ”). Proximity parameter ε ∈ ( 0 , 1 ] is small. Lowercase Greek letters are in ( 0 , 1 ] .

  20. Glossary 2 O , 7 • Probability distributions over discrete Ω (e.g. [ n ] := { 1 , . . . , n } ) { } ∑ ∆([ n ]) = p : Ω → [ 0 , 1 ] : p ( i ) = 1 i ∈ Ω • Property (or class) of distributions over Ω : P ⊆ ∆(Ω) • Total variation distance (statistical distance, ℓ 1 distance): ∑ d TV ( p , q ) = sup ( p ( S ) − q ( S )) = 1 | p ( x ) − q ( x ) | ∈ [ 0 , 1 ] S ⊆ Ω x ∈ Ω Domain size/parameter n ∈ N is big (“goes to ∞ ”). Proximity parameter ε ∈ ( 0 , 1 ] is small. Lowercase Greek letters are in ( 0 , 1 ] . Asymptotics ˜ Ω , ˜ ˜ Θ hide logarithmic factors.*

  21. General Approaches, Unifjed Paradigms, and Many-Birded Stones

  22. (ii) Check if d TV p 2 . Testing By Learning Trivial baseline in property testing: “you can learn, so you can test.” (i) Learn p without assumptions using a learner for 3 (Computational) Yes, but… (i) has sample complexity n 8

  23. (ii) Check if d TV p 2 . Testing By Learning Trivial baseline in property testing: “you can learn, so you can test.” 3 (Computational) Yes, but… (i) has sample complexity n 8 (i) Learn p without assumptions using a learner for ∆(Ω)

  24. 2 . Testing By Learning Trivial baseline in property testing: “you can learn, so you can test.” 3 (Computational) Yes, but… (i) has sample complexity n 8 (i) Learn p without assumptions using a learner for ∆(Ω) (ii) Check if d TV (ˆ p , P ) ≤ ε

  25. Testing By Learning Trivial baseline in property testing: “you can learn, so you can test.” 3 (Computational) Yes, but… 8 (i) Learn p without assumptions using a learner for ∆(Ω) (ii) Check if d TV (ˆ p , P ) ≤ ε (i) has sample complexity Θ( n /ε 2 ) .

  26. (ii) Test d TV p p 3 vs. d TV p p (iii) Check if d TV p Testing By Learning “Folklore” baseline in property testing: “if you can learn, you can test.” (i) Learn p as if p using a learner for 2 3 3 (Computational) The triangle inequality does the rest. 9

  27. (ii) Test d TV p p 3 vs. d TV p p (iii) Check if d TV p Testing By Learning “Folklore” baseline in property testing: “if you can learn, you can test.” 2 3 3 (Computational) The triangle inequality does the rest. 9 (i) Learn p as if p ∈ P using a learner for P

  28. (iii) Check if d TV p Testing By Learning “Folklore” baseline in property testing: “if you can learn, you can test.” 3 3 (Computational) The triangle inequality does the rest. 9 (i) Learn p as if p ∈ P using a learner for P (ii) Test d TV (ˆ p , p ) ≤ ε 3 vs. d TV (ˆ p , p ) ≥ 2 ε

  29. Testing By Learning “Folklore” baseline in property testing: “if you can learn, you can test.” 3 3 (Computational) The triangle inequality does the rest. 9 (i) Learn p as if p ∈ P using a learner for P (ii) Test d TV (ˆ p , p ) ≤ ε 3 vs. d TV (ˆ p , p ) ≥ 2 ε (iii) Check if d TV (ˆ p , P ) ≤ ε

  30. Testing By Learning “Folklore” baseline in property testing: “if you can learn, you can test.” 3 3 (Computational) The triangle inequality does the rest. 9 (i) Learn p as if p ∈ P using a learner for P (ii) Test d TV (ˆ p , p ) ≤ ε 3 vs. d TV (ˆ p , p ) ≥ 2 ε (iii) Check if d TV (ˆ p , P ) ≤ ε

  31. Testing By Learning? “Folklore” baseline in property testing: “if you can learn, you can test.” samples [VV11a, JYW17] n Not quite. (Computational) 3 10 3 (i) Learn p as if p ∈ P using a learner for P (ii) Test if d TV (ˆ p , p ) ≤ ε 3 vs. d TV (ˆ p , p ) ≥ 2 ε (iii) Check if d TV (ˆ p , P ) ≤ ε (ii) fjne for functions. But for distributions? Requires Ω( log n )

  32. 2 distance (i) Learn p as if p 2 p 2 vs. d TV p p (iii) Check if d TV p Testing By Learning! 3 becomes cheap! Acharya, Daskalakis, and Kamath [ADK15]: now (i) is harder, but (ii) Success. (Computational) 3 p 2 All is doomed, there is no hope, and every dream ends up shattered on (ii) Test if in using a learner for Although… this unforgiving Earth. 11

  33. 2 p 2 vs. d TV p p (iii) Check if d TV p Testing By Learning! becomes cheap! Acharya, Daskalakis, and Kamath [ADK15]: now (i) is harder, but (ii) Success. (Computational) 3 2 3 All is doomed, there is no hope, and every dream ends up shattered on p (ii) Test if this unforgiving Earth. Although… 11 (i) Learn p as if p ∈ P using a learner for P in χ 2 distance

  34. (iii) Check if d TV p Testing By Learning! All is doomed, there is no hope, and every dream ends up shattered on this unforgiving Earth. Although… 3 3 (Computational) Success. Acharya, Daskalakis, and Kamath [ADK15]: now (i) is harder, but (ii) becomes cheap! 11 (i) Learn p as if p ∈ P using a learner for P in χ 2 distance (ii) Test if χ 2 (ˆ p || p ) ≤ ε 2 vs. d TV (ˆ p , p ) ≥ 2 ε

  35. Testing By Learning! All is doomed, there is no hope, and every dream ends up shattered on this unforgiving Earth. Although… 3 3 (Computational) Success. Acharya, Daskalakis, and Kamath [ADK15]: now (i) is harder, but (ii) becomes cheap! 11 (i) Learn p as if p ∈ P using a learner for P in χ 2 distance (ii) Test if χ 2 (ˆ p || p ) ≤ ε 2 vs. d TV (ˆ p , p ) ≥ 2 ε (iii) Check if d TV (ˆ p , P ) ≤ ε

  36. Testing By Learning! All is doomed, there is no hope, and every dream ends up shattered on this unforgiving Earth. Although… 3 3 (Computational) Success. Acharya, Daskalakis, and Kamath [ADK15]: now (i) is harder, but (ii) becomes cheap! 11 (i) Learn p as if p ∈ P using a learner for P in χ 2 distance (ii) Test if χ 2 (ˆ p || p ) ≤ ε 2 vs. d TV (ˆ p , p ) ≥ 2 ε (iii) Check if d TV (ˆ p , P ) ≤ ε

  37. approximation by histograms (“shape restrictions”) (ii) Learn p effjciently (in a weird KL/ 2 sense) using this structure (iii) Check if d TV p d TV p p (Computational) comes for free! O Canonne, Diakonikolas, Gouleakis, and Rubinfeld [CDGR16]: now Success. Testing By Learning! 3 All is not doomed, there is some hope, and not every dream ends up : succinct (i) Test that p satisfjes a strong structural guarantee of And… shattered on this unforgiving Earth. 12

  38. (ii) Learn p effjciently (in a weird KL/ 2 sense) using this structure (iii) Check if d TV p d TV p p Testing By Learning! All is not doomed, there is some hope, and not every dream ends up shattered on this unforgiving Earth. And… 3 (Computational) Success. Canonne, Diakonikolas, Gouleakis, and Rubinfeld [CDGR16]: now O comes for free! 12 (i) Test that p satisfjes a strong structural guarantee of P : succinct approximation by histograms (“shape restrictions”)

  39. (iii) Check if d TV p d TV p p Testing By Learning! All is not doomed, there is some hope, and not every dream ends up shattered on this unforgiving Earth. And… 3 (Computational) Success. Canonne, Diakonikolas, Gouleakis, and Rubinfeld [CDGR16]: now O comes for free! 12 (i) Test that p satisfjes a strong structural guarantee of P : succinct approximation by histograms (“shape restrictions”) (ii) Learn p effjciently (in a weird KL/ ℓ 2 sense) using this structure

  40. d TV p p Testing By Learning! All is not doomed, there is some hope, and not every dream ends up shattered on this unforgiving Earth. And… 3 (Computational) Success. Canonne, Diakonikolas, Gouleakis, and Rubinfeld [CDGR16]: now O comes for free! 12 (i) Test that p satisfjes a strong structural guarantee of P : succinct approximation by histograms (“shape restrictions”) (ii) Learn p effjciently (in a weird KL/ ℓ 2 sense) using this structure (iii) Check if d TV (ˆ p , P ) ≤ ε

  41. Testing By Learning! All is not doomed, there is some hope, and not every dream ends up shattered on this unforgiving Earth. And… 3 (Computational) Success. Canonne, Diakonikolas, Gouleakis, and Rubinfeld [CDGR16]: now 12 (i) Test that p satisfjes a strong structural guarantee of P : succinct approximation by histograms (“shape restrictions”) (ii) Learn p effjciently (in a weird KL/ ℓ 2 sense) using this structure (iii) Check if d TV (ˆ p , P ) ≤ ε d TV (ˆ p , p ) ≤ O ( ε ) comes for free!

  42. discrete Fourier transform (Fourier sparsity) 2 sense) using this structure (iii) Check if d TV p Testing By Learning! All is hope, there is no doom, and every dream ends up bright and shiny on this wonderful Earth. And… (i) Test that p satisfjes a strong structural guarantee of : nice (ii) Learn p effjciently (in 3 (Computational) Success. Canonne, Diakonikolas, and Stewart [CDS17]: “all your (Fourier) basis are belong to…” 13

  43. 2 sense) using this structure (iii) Check if d TV p Testing By Learning! All is hope, there is no doom, and every dream ends up bright and shiny on this wonderful Earth. And… (ii) Learn p effjciently (in 3 (Computational) Success. Canonne, Diakonikolas, and Stewart [CDS17]: “all your (Fourier) basis are belong to…” 13 (i) Test that p satisfjes a strong structural guarantee of P : nice discrete Fourier transform (Fourier sparsity)

  44. (iii) Check if d TV p Testing By Learning! All is hope, there is no doom, and every dream ends up bright and shiny on this wonderful Earth. And… 3 (Computational) Success. Canonne, Diakonikolas, and Stewart [CDS17]: “all your (Fourier) basis are belong to…” 13 (i) Test that p satisfjes a strong structural guarantee of P : nice discrete Fourier transform (Fourier sparsity) (ii) Learn p effjciently (in ℓ 2 sense) using this structure

  45. Testing By Learning! All is hope, there is no doom, and every dream ends up bright and shiny on this wonderful Earth. And… 3 (Computational) Success. Canonne, Diakonikolas, and Stewart [CDS17]: “all your (Fourier) basis are belong to…” 13 (i) Test that p satisfjes a strong structural guarantee of P : nice discrete Fourier transform (Fourier sparsity) (ii) Learn p effjciently (in ℓ 2 sense) using this structure (iii) Check if d TV (ˆ p , P ) ≤ ε

  46. Testing By Learning! All is hope, there is no doom, and every dream ends up bright and shiny on this wonderful Earth. And… 3 (Computational) Success. Canonne, Diakonikolas, and Stewart [CDS17]: “all your (Fourier) basis are belong to…” 13 (i) Test that p satisfjes a strong structural guarantee of P : nice discrete Fourier transform (Fourier sparsity) (ii) Learn p effjciently (in ℓ 2 sense) using this structure (iii) Check if d TV (ˆ p , P ) ≤ ε

  47. 2 tester trickier. Can we reduce one to the other? (i) Map p n to a “nicer, smoother” p O n (ii) Test p using an (iii) That’s all. Success. Diakonikolas and Kane [DK16]: “It works.” 14 Testing in TV via ℓ 2 Testing in ℓ 2 distance is well-understood [CDVV14]; testing in TV ℓ 1 ) is

  48. 2 tester trickier. Can we reduce one to the other? (i) Map p n to a “nicer, smoother” p O n (ii) Test p using an (iii) That’s all. Success. Diakonikolas and Kane [DK16]: “It works.” 14 Testing in TV via ℓ 2 Testing in ℓ 2 distance is well-understood [CDVV14]; testing in TV ℓ 1 ) is

  49. 2 tester trickier. Can we reduce one to the other? (ii) Test p using an (iii) That’s all. Success. Diakonikolas and Kane [DK16]: “It works.” 14 Testing in TV via ℓ 2 Testing in ℓ 2 distance is well-understood [CDVV14]; testing in TV ℓ 1 ) is (i) Map p ∈ ∆([ n ]) to a “nicer, smoother” p ′ ∈ ∆([ O ( n )])

  50. trickier. Can we reduce one to the other? (iii) That’s all. Success. Diakonikolas and Kane [DK16]: “It works.” 14 Testing in TV via ℓ 2 Testing in ℓ 2 distance is well-understood [CDVV14]; testing in TV ℓ 1 ) is (i) Map p ∈ ∆([ n ]) to a “nicer, smoother” p ′ ∈ ∆([ O ( n )]) (ii) Test p ′ using an ℓ 2 tester

  51. trickier. Can we reduce one to the other? (iii) That’s all. Success. Diakonikolas and Kane [DK16]: “It works.” 14 Testing in TV via ℓ 2 Testing in ℓ 2 distance is well-understood [CDVV14]; testing in TV ℓ 1 ) is (i) Map p ∈ ∆([ n ]) to a “nicer, smoother” p ′ ∈ ∆([ O ( n )]) (ii) Test p ′ using an ℓ 2 tester

  52. trickier. Can we reduce one to the other? (iii) That’s all. Success. Diakonikolas and Kane [DK16]: “It works.” 14 Testing in TV via ℓ 2 Testing in ℓ 2 distance is well-understood [CDVV14]; testing in TV ℓ 1 ) is (i) Map p ∈ ∆([ n ]) to a “nicer, smoother” p ′ ∈ ∆([ O ( n )]) (ii) Test p ′ using an ℓ 2 tester

  53. Tolerant Testing and Estimation Theorem (Everything is n log n ) Pretty much every tolerant testing question or functional estimation n Technically, and as Jiantao’s talk will describe: a more accurate description is that whatever estimation can be performed in k log k samples via the plug-in empirical estimator, the optimal scheme does with k . “Enlarge your sample,” if you will. 15 (entropy, support size, …) has sample complexity Θ ε ( log n ) .

  54. Tolerant Testing and Estimation Theorem (Everything is n log n ) Pretty much every tolerant testing question or functional estimation n Technically, and as Jiantao’s talk will describe: a more accurate description is that whatever estimation can be performed in k log k samples via the plug-in empirical estimator, the optimal scheme does with k . “Enlarge your sample,” if you will. 15 (entropy, support size, …) has sample complexity Θ ε ( log n ) .

  55. 2 log n • Valiant–Valiant [VV11b]: actually, can even do it with a linear • Jiao et al. [JVHW15], Wu and Yang [WY16]: actually, best Tolerant Testing and Estimation • Paul Valiant [Val11]: the canonical tester for symmetric properties (not quite, but near-optimal) • Valiant–Valiant [VV11a]: learn the histogram with O n samples, then plug in – and we’re done estimator • Acharya, Das, Orlitsky, Suresh [ADOS17]: actually, the (Profjle) Maximum Likelihood Estimator (PMLE) does it polynomial approximation is the tool for the job • Han, Jiao, and Weissman [HJW17]: actually, moment-matching is also the tool for the job 16

  56. • Valiant–Valiant [VV11b]: actually, can even do it with a linear • Jiao et al. [JVHW15], Wu and Yang [WY16]: actually, best Tolerant Testing and Estimation • Paul Valiant [Val11]: the canonical tester for symmetric properties (not quite, but near-optimal) n samples, then plug in – and we’re done estimator • Acharya, Das, Orlitsky, Suresh [ADOS17]: actually, the (Profjle) Maximum Likelihood Estimator (PMLE) does it polynomial approximation is the tool for the job • Han, Jiao, and Weissman [HJW17]: actually, moment-matching is also the tool for the job 16 • Valiant–Valiant [VV11a]: learn the histogram with O ( ε 2 log n )

  57. • Jiao et al. [JVHW15], Wu and Yang [WY16]: actually, best Tolerant Testing and Estimation • Paul Valiant [Val11]: the canonical tester for symmetric properties (not quite, but near-optimal) n samples, then plug in – and we’re done estimator • Acharya, Das, Orlitsky, Suresh [ADOS17]: actually, the (Profjle) Maximum Likelihood Estimator (PMLE) does it polynomial approximation is the tool for the job • Han, Jiao, and Weissman [HJW17]: actually, moment-matching is also the tool for the job 16 • Valiant–Valiant [VV11a]: learn the histogram with O ( ε 2 log n ) • Valiant–Valiant [VV11b]: actually, can even do it with a linear

  58. • Jiao et al. [JVHW15], Wu and Yang [WY16]: actually, best Tolerant Testing and Estimation • Paul Valiant [Val11]: the canonical tester for symmetric properties (not quite, but near-optimal) n samples, then plug in – and we’re done estimator • Acharya, Das, Orlitsky, Suresh [ADOS17]: actually, the (Profjle) Maximum Likelihood Estimator (PMLE) does it polynomial approximation is the tool for the job • Han, Jiao, and Weissman [HJW17]: actually, moment-matching is also the tool for the job 16 • Valiant–Valiant [VV11a]: learn the histogram with O ( ε 2 log n ) • Valiant–Valiant [VV11b]: actually, can even do it with a linear

  59. • Han, Jiao, and Weissman [HJW17]: actually, moment-matching is Tolerant Testing and Estimation • Paul Valiant [Val11]: the canonical tester for symmetric properties (not quite, but near-optimal) n samples, then plug in – and we’re done estimator • Acharya, Das, Orlitsky, Suresh [ADOS17]: actually, the (Profjle) Maximum Likelihood Estimator (PMLE) does it • Jiao et al. [JVHW15], Wu and Yang [WY16]: actually, best polynomial approximation is the tool for the job also the tool for the job 16 • Valiant–Valiant [VV11a]: learn the histogram with O ( ε 2 log n ) • Valiant–Valiant [VV11b]: actually, can even do it with a linear

  60. Tolerant Testing and Estimation • Paul Valiant [Val11]: the canonical tester for symmetric properties (not quite, but near-optimal) n samples, then plug in – and we’re done estimator • Acharya, Das, Orlitsky, Suresh [ADOS17]: actually, the (Profjle) Maximum Likelihood Estimator (PMLE) does it • Jiao et al. [JVHW15], Wu and Yang [WY16]: actually, best polynomial approximation is the tool for the job • Han, Jiao, and Weissman [HJW17]: actually, moment-matching is also the tool for the job 16 • Valiant–Valiant [VV11a]: learn the histogram with O ( ε 2 log n ) • Valiant–Valiant [VV11b]: actually, can even do it with a linear

  61. General Approaches To Sadness, Too Unifjed algorithms and techniques for upper bounds are nice, but what about this feeling of despair in the face of impossibility? 17

  62. General Approaches To Sadness, Too • Paul Valiant [Val11]: lower bounds for symmetric properties via moment-matching: “Wishful Thinking Theorem.” • Valiant-Valiant [VV14]: blackbox statement for Le Cam’s two point method • Diakonikolas and Kane [DK16]: information-theoretic framework to proving lower bounds via mutual information. • Canonne, Diakonikolas, Gouleakis, and Rubinfeld [CDGR16]: lower bounds by reductions from (distribution testing+agnostic learning): “if you can learn, you can’t test.” • Blais, Canonne, and Gur [BCG17]: lower bounds by reductions from communication complexity: “Alice and Bob say I can’t test.” • Valiant–Valiant, Jiao et al., Wu and Yang: lower bounds for tolerant testing via best polynomial approximation (dual of the u.b.’s). 18

  63. General Approaches To Sadness, Too • Paul Valiant [Val11]: lower bounds for symmetric properties via moment-matching: “Wishful Thinking Theorem.” method • Diakonikolas and Kane [DK16]: information-theoretic framework to proving lower bounds via mutual information. • Canonne, Diakonikolas, Gouleakis, and Rubinfeld [CDGR16]: lower bounds by reductions from (distribution testing+agnostic learning): “if you can learn, you can’t test.” • Blais, Canonne, and Gur [BCG17]: lower bounds by reductions from communication complexity: “Alice and Bob say I can’t test.” • Valiant–Valiant, Jiao et al., Wu and Yang: lower bounds for tolerant testing via best polynomial approximation (dual of the u.b.’s). 18 • Valiant-Valiant [VV14]: blackbox statement for Le Cam’s two point

  64. General Approaches To Sadness, Too • Paul Valiant [Val11]: lower bounds for symmetric properties via moment-matching: “Wishful Thinking Theorem.” method • Diakonikolas and Kane [DK16]: information-theoretic framework to proving lower bounds via mutual information. • Canonne, Diakonikolas, Gouleakis, and Rubinfeld [CDGR16]: lower bounds by reductions from (distribution testing+agnostic learning): “if you can learn, you can’t test.” • Blais, Canonne, and Gur [BCG17]: lower bounds by reductions from communication complexity: “Alice and Bob say I can’t test.” • Valiant–Valiant, Jiao et al., Wu and Yang: lower bounds for tolerant testing via best polynomial approximation (dual of the u.b.’s). 18 • Valiant-Valiant [VV14]: blackbox statement for Le Cam’s two point

  65. General Approaches To Sadness, Too • Paul Valiant [Val11]: lower bounds for symmetric properties via moment-matching: “Wishful Thinking Theorem.” method • Diakonikolas and Kane [DK16]: information-theoretic framework to proving lower bounds via mutual information. • Canonne, Diakonikolas, Gouleakis, and Rubinfeld [CDGR16]: lower “if you can learn, you can’t test.” • Blais, Canonne, and Gur [BCG17]: lower bounds by reductions from communication complexity: “Alice and Bob say I can’t test.” • Valiant–Valiant, Jiao et al., Wu and Yang: lower bounds for tolerant testing via best polynomial approximation (dual of the u.b.’s). 18 • Valiant-Valiant [VV14]: blackbox statement for Le Cam’s two point bounds by reductions from (distribution testing+agnostic learning):

  66. testing via best polynomial approximation (dual of the u.b.’s). General Approaches To Sadness, Too • Paul Valiant [Val11]: lower bounds for symmetric properties via moment-matching: “Wishful Thinking Theorem.” method • Diakonikolas and Kane [DK16]: information-theoretic framework to proving lower bounds via mutual information. • Canonne, Diakonikolas, Gouleakis, and Rubinfeld [CDGR16]: lower “if you can learn, you can’t test.” • Blais, Canonne, and Gur [BCG17]: lower bounds by reductions from communication complexity: “Alice and Bob say I can’t test.” • Valiant–Valiant, Jiao et al., Wu and Yang: lower bounds for tolerant 18 • Valiant-Valiant [VV14]: blackbox statement for Le Cam’s two point bounds by reductions from (distribution testing+agnostic learning):

  67. General Approaches To Sadness, Too • Paul Valiant [Val11]: lower bounds for symmetric properties via moment-matching: “Wishful Thinking Theorem.” method • Diakonikolas and Kane [DK16]: information-theoretic framework to proving lower bounds via mutual information. • Canonne, Diakonikolas, Gouleakis, and Rubinfeld [CDGR16]: lower “if you can learn, you can’t test.” • Blais, Canonne, and Gur [BCG17]: lower bounds by reductions from communication complexity: “Alice and Bob say I can’t test.” • Valiant–Valiant, Jiao et al., Wu and Yang: lower bounds for tolerant testing via best polynomial approximation (dual of the u.b.’s). 18 • Valiant-Valiant [VV14]: blackbox statement for Le Cam’s two point bounds by reductions from (distribution testing+agnostic learning):

  68. For More and Better on This…

  69. Ilias Diakonikolas (USC) Optimal Distribution Testing via Reductions Jiantao Jiao (Stanford University) Three Approaches towards Optimal Property Estimation and Testing Alon Orlitsky (UCSD) A Unifjed Maximum Likelihood Approach for Estimating Symmetric Distribution Properties Gautam Kamath (MIT) Testing with Alternative Distances 19

  70. Ilias Diakonikolas (USC) Optimal Distribution Testing via Reductions Jiantao Jiao (Stanford University) Three Approaches towards Optimal Property Estimation and Testing Alon Orlitsky (UCSD) A Unifjed Maximum Likelihood Approach for Estimating Symmetric Distribution Properties Gautam Kamath (MIT) Testing with Alternative Distances 19

  71. Ilias Diakonikolas (USC) Optimal Distribution Testing via Reductions Jiantao Jiao (Stanford University) Three Approaches towards Optimal Property Estimation and Testing Alon Orlitsky (UCSD) A Unifjed Maximum Likelihood Approach for Estimating Symmetric Distribution Properties Gautam Kamath (MIT) Testing with Alternative Distances 19

  72. Ilias Diakonikolas (USC) Optimal Distribution Testing via Reductions Jiantao Jiao (Stanford University) Three Approaches towards Optimal Property Estimation and Testing Alon Orlitsky (UCSD) A Unifjed Maximum Likelihood Approach for Estimating Symmetric Distribution Properties Gautam Kamath (MIT) Testing with Alternative Distances 19

  73. The Curse of Dimensionality, and How to Deal with It

  74. Costis Daskalakis (MIT) High-Dimensional Distribution Testing 20

  75. Now, Make It Quantum.

  76. Ryan O’Donnell (CMU) Distribution testing in the 21 1 ⁄ 21 2 th century

  77. “Correct Me If I’m Wrong”

Recommend


More recommend