alice and bob show distribution testing lower bounds They don’t talk to each other anymore. Clément Canonne (Columbia University) July 9, 2017 Joint work with Eric Blais (UWaterloo) and Tom Gur (Weizmann Institute UC Berkeley)
“distribution testing?”
sublinear, approximate, randomized algorithms that take random samples ∙ Big Dataset: too big ∙ Expensive access: pricey data ∙ “Model selection”: many options Need to infer information – one bit – from the data: fast, or with very few samples. why? Property testing of probability distributions: 2
approximate, randomized algorithms that take random samples ∙ Big Dataset: too big ∙ Expensive access: pricey data ∙ “Model selection”: many options Need to infer information – one bit – from the data: fast, or with very few samples. why? Property testing of probability distributions: sublinear, 2
randomized algorithms that take random samples ∙ Big Dataset: too big ∙ Expensive access: pricey data ∙ “Model selection”: many options Need to infer information – one bit – from the data: fast, or with very few samples. why? Property testing of probability distributions: sublinear, approximate, 2
algorithms that take random samples ∙ Big Dataset: too big ∙ Expensive access: pricey data ∙ “Model selection”: many options Need to infer information – one bit – from the data: fast, or with very few samples. why? Property testing of probability distributions: sublinear, approximate, randomized 2
∙ Big Dataset: too big ∙ Expensive access: pricey data ∙ “Model selection”: many options Need to infer information – one bit – from the data: fast, or with very few samples. why? Property testing of probability distributions: sublinear, approximate, randomized algorithms that take random samples 2
∙ Expensive access: pricey data ∙ “Model selection”: many options Need to infer information – one bit – from the data: fast, or with very few samples. why? Property testing of probability distributions: sublinear, approximate, randomized algorithms that take random samples ∙ Big Dataset: too big 2
∙ “Model selection”: many options Need to infer information – one bit – from the data: fast, or with very few samples. why? Property testing of probability distributions: sublinear, approximate, randomized algorithms that take random samples ∙ Big Dataset: too big ∙ Expensive access: pricey data 2
Need to infer information – one bit – from the data: fast, or with very few samples. why? Property testing of probability distributions: sublinear, approximate, randomized algorithms that take random samples ∙ Big Dataset: too big ∙ Expensive access: pricey data ∙ “Model selection”: many options 2
why? Property testing of probability distributions: sublinear, approximate, randomized algorithms that take random samples ∙ Big Dataset: too big ∙ Expensive access: pricey data ∙ “Model selection”: many options Need to infer information – one bit – from the data: fast, or with very few samples. 2
3
in an (egg)shell. how? (Property) Distribution Testing: 4
in an (egg)shell. how? (Property) Distribution Testing: 4
how? (Property) Distribution Testing: in an (egg)shell. 4
Must decide: p , or 1 p ? (and be correct on any p with probability at least 2 3) how? Known domain (here [ n ] = { 1 , . . . , n } ) Property P ⊆ ∆([ n ]) Independent samples from unknown p ∈ ∆([ n ]) Distance parameter ε ∈ ( 0 , 1 ] 5
, or 1 p ? (and be correct on any p with probability at least 2 3) how? Known domain (here [ n ] = { 1 , . . . , n } ) Property P ⊆ ∆([ n ]) Independent samples from unknown p ∈ ∆([ n ]) Distance parameter ε ∈ ( 0 , 1 ] Must decide: p ∈ P 5
(and be correct on any p with probability at least 2 3) how? Known domain (here [ n ] = { 1 , . . . , n } ) Property P ⊆ ∆([ n ]) Independent samples from unknown p ∈ ∆([ n ]) Distance parameter ε ∈ ( 0 , 1 ] Must decide: p ∈ P , or ℓ 1 ( p , P ) > ε ? 5
how? Known domain (here [ n ] = { 1 , . . . , n } ) Property P ⊆ ∆([ n ]) Independent samples from unknown p ∈ ∆([ n ]) Distance parameter ε ∈ ( 0 , 1 ] Must decide: p ∈ P , or ℓ 1 ( p , P ) > ε ? (and be correct on any p with probability at least 2 / 3) 5
∙ Uniformity [GR00, BFR 00, Pan08] ∙ Identity* [BFF 01, VV14] ∙ Equivalence [BFR 00, Val11, CDVV14] ∙ Independence [BFF 01, LRR13] ∙ Monotonicity [BKR04] ∙ Poisson Binomial Distributions [AD14] ∙ Generic approachs for classes [CDGR15, ADK15] ∙ and more… and? Many results on many properties: 6
∙ Identity* [BFF 01, VV14] ∙ Equivalence [BFR 00, Val11, CDVV14] ∙ Independence [BFF 01, LRR13] ∙ Monotonicity [BKR04] ∙ Poisson Binomial Distributions [AD14] ∙ Generic approachs for classes [CDGR15, ADK15] ∙ and more… and? Many results on many properties: ∙ Uniformity [GR00, BFR + 00, Pan08] 6
∙ Equivalence [BFR 00, Val11, CDVV14] ∙ Independence [BFF 01, LRR13] ∙ Monotonicity [BKR04] ∙ Poisson Binomial Distributions [AD14] ∙ Generic approachs for classes [CDGR15, ADK15] ∙ and more… and? Many results on many properties: ∙ Uniformity [GR00, BFR + 00, Pan08] ∙ Identity* [BFF + 01, VV14] 6
∙ Independence [BFF 01, LRR13] ∙ Monotonicity [BKR04] ∙ Poisson Binomial Distributions [AD14] ∙ Generic approachs for classes [CDGR15, ADK15] ∙ and more… and? Many results on many properties: ∙ Uniformity [GR00, BFR + 00, Pan08] ∙ Identity* [BFF + 01, VV14] ∙ Equivalence [BFR + 00, Val11, CDVV14] 6
∙ Monotonicity [BKR04] ∙ Poisson Binomial Distributions [AD14] ∙ Generic approachs for classes [CDGR15, ADK15] ∙ and more… and? Many results on many properties: ∙ Uniformity [GR00, BFR + 00, Pan08] ∙ Identity* [BFF + 01, VV14] ∙ Equivalence [BFR + 00, Val11, CDVV14] ∙ Independence [BFF + 01, LRR13] 6
∙ Poisson Binomial Distributions [AD14] ∙ Generic approachs for classes [CDGR15, ADK15] ∙ and more… and? Many results on many properties: ∙ Uniformity [GR00, BFR + 00, Pan08] ∙ Identity* [BFF + 01, VV14] ∙ Equivalence [BFR + 00, Val11, CDVV14] ∙ Independence [BFF + 01, LRR13] ∙ Monotonicity [BKR04] 6
∙ Generic approachs for classes [CDGR15, ADK15] ∙ and more… and? Many results on many properties: ∙ Uniformity [GR00, BFR + 00, Pan08] ∙ Identity* [BFF + 01, VV14] ∙ Equivalence [BFR + 00, Val11, CDVV14] ∙ Independence [BFF + 01, LRR13] ∙ Monotonicity [BKR04] ∙ Poisson Binomial Distributions [AD14] 6
∙ and more… and? Many results on many properties: ∙ Uniformity [GR00, BFR + 00, Pan08] ∙ Identity* [BFF + 01, VV14] ∙ Equivalence [BFR + 00, Val11, CDVV14] ∙ Independence [BFF + 01, LRR13] ∙ Monotonicity [BKR04] ∙ Poisson Binomial Distributions [AD14] ∙ Generic approachs for classes [CDGR15, ADK15] 6
and? Many results on many properties: ∙ Uniformity [GR00, BFR + 00, Pan08] ∙ Identity* [BFF + 01, VV14] ∙ Equivalence [BFR + 00, Val11, CDVV14] ∙ Independence [BFF + 01, LRR13] ∙ Monotonicity [BKR04] ∙ Poisson Binomial Distributions [AD14] ∙ Generic approachs for classes [CDGR15, ADK15] ∙ and more… 6
We want more methods. Generic if possible, applying to many problems at once. but? Lower bounds… … are quite tricky. 7
but? Lower bounds… … are quite tricky. We want more methods. Generic if possible, applying to many problems at once. 7
but? Lower bounds… … are quite tricky. We want more methods. Generic if possible, applying to many problems at once. 7
but? Lower bounds… … are quite tricky. We want more methods. Generic if possible, applying to many problems at once. 7
but? Lower bounds… … are quite tricky. We want more methods. Generic if possible, applying to many problems at once. 7
“communication complexity?”
what now? f ( x , y ) 9
what now? f ( x , y ) 9
what now? f ( x , y ) 9
what now? f ( x , y ) 9
what now? f ( x , y ) 9
what now? But communicating is hard. 10
was that a toilet? ∙ f known by all parties ∙ Alice gets x, Bob gets y ∙ Private randomness Goal: minimize communication (worst case over x , y, randomness) to compute f ( x , y ) . 11
∙ f known by all parties ∙ Alice gets x, Bob gets y ∙ Both send one-way messages to a referee ∙ Private randomness SMP Simultaneous Message Passing model. also… …in our setting, Alice and Bob do not get to communicate. 12
SMP Simultaneous Message Passing model. also… …in our setting, Alice and Bob do not get to communicate. ∙ f known by all parties ∙ Alice gets x, Bob gets y ∙ Both send one-way messages to a referee ∙ Private randomness 12
also… …in our setting, Alice and Bob do not get to communicate. ∙ f known by all parties ∙ Alice gets x, Bob gets y ∙ Both send one-way messages to a referee ∙ Private randomness SMP Simultaneous Message Passing model. 12
referee model (smp). 13
referee model (smp). Upshot √ SMP ( Eq n ) = Ω( n ) (Only O ( log n ) with one-way communication!) 14
Recommend
More recommend