Communication Complexity of Learning Discrete Distributions Krzysztof Onak IBM T.J. Watson Research Center Joint work with Ilias Diakonikolas , Elena Grigorescu , and Abhiram Natarajan . Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 1 / 20
Distribution Learning and Testing Input: Stream of independent samples from an unknown distribution D x 1 , x 2 , x 3 , x 4 , . . . Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 2 / 20
Distribution Learning and Testing Input: Stream of independent samples from an unknown distribution D x 1 , x 2 , x 3 , x 4 , . . . Goal: Learn the distribution or test a property or estimate a parameter Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 2 / 20
Distribution Learning and Testing Input: Stream of independent samples from an unknown distribution D x 1 , x 2 , x 3 , x 4 , . . . Goal: Learn the distribution or test a property or estimate a parameter • Small total variation distance error acceptable • Traditional focus: sample complexity Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 2 / 20
Learning Discrete Distributions D = probability distribution on { 1 , . . . , n } Input: Independent samples from D x 1 , x 2 , x 3 , x 4 , . . . Goal: Output a distribution D ′ such that �D − D ′ � 1 < ǫ Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 3 / 20
Learning Discrete Distributions D = probability distribution on { 1 , . . . , n } Input: Independent samples from D x 1 , x 2 , x 3 , x 4 , . . . Goal: Output a distribution D ′ such that �D − D ′ � 1 < ǫ Sample complexity: Θ( n /ǫ 2 ) Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 3 / 20
Communication Complexity Distributed data: samples held by different players Example: Samples in different data centers Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 4 / 20
Communication Complexity Distributed data: samples held by different players Example: Samples in different data centers How much do players have to communicate to solve the problem? Is sublinear communication possible? Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 4 / 20
“Survey” Complexity This talk will focus on the simplest setting: • Each player has one sample and sends a single message to a referee • The referee outputs solution sample Player 1 sample Player 2 sample Player 3 Referee output sample Player p Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 5 / 20
“Survey” Complexity This talk will focus on the simplest setting: • Each player has one sample and sends a single message to a referee • The referee outputs solution sample Player 1 sample Player 2 sample Player 3 Referee output sample Player p • Each sample is Θ( log n ) bits • Can average communication be made o ( log n ) ? Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 5 / 20
Related Work A lot of recent interest in communication-efficient learning: DAW12, ZDW13, ZX15, GMN14, KVW14, LBKW14, SSZ14, DJWZ14, LSLT15, BGMNW15 • Both upper and lower bounds. • Usually more continuous problems. • Sample problem: estimating the mean of a Gaussian distribution. Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 6 / 20
Related Work A lot of recent interest in communication-efficient learning: DAW12, ZDW13, ZX15, GMN14, KVW14, LBKW14, SSZ14, DJWZ14, LSLT15, BGMNW15 • Both upper and lower bounds. • Usually more continuous problems. • Sample problem: estimating the mean of a Gaussian distribution. See Mark Braverman’s talk tomorrow Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 6 / 20
Outline O ( n /ǫ 2 ) Sample Complexity Review 1 Communication Complexity Lower Bound 2 Quick Distribution Testing Example 3 Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 7 / 20
Outline O ( n /ǫ 2 ) Sample Complexity Review 1 Communication Complexity Lower Bound 2 Quick Distribution Testing Example 3 Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 8 / 20
Upper Bound Review Solution: D ′ = empirical distribution of O ( n /ǫ 2 ) samples Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 9 / 20
Upper Bound Review Solution: D ′ = empirical distribution of O ( n /ǫ 2 ) samples Why this works: • For every subset of { 1 , . . . , n } the probabilities under D and D ′ within ǫ/ 2 with probability 1 − 2 − 2 n Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 9 / 20
Upper Bound Review Solution: D ′ = empirical distribution of O ( n /ǫ 2 ) samples Why this works: • For every subset of { 1 , . . . , n } the probabilities under D and D ′ within ǫ/ 2 with probability 1 − 2 − 2 n • Union bound: �D − D ′ � 1 ≤ ǫ with probability 1 − o ( 1 ) Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 9 / 20
Lower Bound Review Fact: Hoeffding’s inequality is optimal • ǫ -biased coin, determine direction of the bias • Ω( ǫ − 2 ) coin tosses needed Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 10 / 20
Lower Bound Review Fact: Hoeffding’s inequality is optimal • ǫ -biased coin, determine direction of the bias • Ω( ǫ − 2 ) coin tosses needed Construction: δ 1 = 1 δ 2 = − 1 δ 3 = 1 δ 4 = 1 10 δ 1 ǫ − 10 δ 1 ǫ 1 2 3 4 5 6 7 8 Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 10 / 20
Lower Bound Review Fact: Hoeffding’s inequality is optimal • ǫ -biased coin, determine direction of the bias • Ω( ǫ − 2 ) coin tosses needed Construction: δ 1 = 1 δ 2 = − 1 δ 3 = 1 δ 4 = 1 10 δ 1 ǫ − 10 δ 1 ǫ 1 2 3 4 5 6 7 8 • Each pair randomly biased by 10 ǫ Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 10 / 20
Lower Bound Review Fact: Hoeffding’s inequality is optimal • ǫ -biased coin, determine direction of the bias • Ω( ǫ − 2 ) coin tosses needed Construction: δ 1 = 1 δ 2 = − 1 δ 3 = 1 δ 4 = 1 10 δ 1 ǫ − 10 δ 1 ǫ 1 2 3 4 5 6 7 8 • Each pair randomly biased by 10 ǫ 9 • Need to predict bias of more than 10 pairs (via averaging/Markov’s bound) Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 10 / 20
Lower Bound Review Fact: Hoeffding’s inequality is optimal • ǫ -biased coin, determine direction of the bias • Ω( ǫ − 2 ) coin tosses needed Construction: δ 1 = 1 δ 2 = − 1 δ 3 = 1 δ 4 = 1 10 δ 1 ǫ − 10 δ 1 ǫ 1 2 3 4 5 6 7 8 • Each pair randomly biased by 10 ǫ 9 • Need to predict bias of more than 10 pairs (via averaging/Markov’s bound) • This requires Ω( n /ǫ 2 ) samples Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 10 / 20
Outline O ( n /ǫ 2 ) Sample Complexity Review 1 Communication Complexity Lower Bound 2 Quick Distribution Testing Example 3 Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 11 / 20
Our Claim � � n No protocol with o ǫ 2 log n communication on average that succeeds learning the distribution with probability 99 / 100. Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 12 / 20
Our Claim � � n No protocol with o ǫ 2 log n communication on average that succeeds learning the distribution with probability 99 / 100. n /ǫ 2 log n � � (Can assume at most O players in the proof) Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 12 / 20
Hard Distribution Reuse the hard distribution for sampling: δ 1 = 1 δ 2 = − 1 δ 3 = 1 δ 4 = 1 10 δ 1 ǫ − 10 δ 1 ǫ 1 2 3 4 5 6 7 8 Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 13 / 20
Hard Distribution Reuse the hard distribution for sampling: δ 1 = 1 δ 2 = − 1 δ 3 = 1 δ 4 = 1 10 δ 1 ǫ − 10 δ 1 ǫ 1 2 3 4 5 6 7 8 Can assume the protocol is deterministic: • Slight loss in the probability of success • Expected communication goes up by constant factor Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 13 / 20
The Proof Plan • Assume o ( n ǫ − 2 log n ) communication protocol Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 14 / 20
The Proof Plan • Assume o ( n ǫ − 2 log n ) communication protocol • For random i , show that: • Messages reveal very little about δ i (even if the referee knows all other δ i ’s) • The referee can predict δ i with probability 1 2 + o ( 1 ) Krzysztof Onak (IBM Research) Communication Complexity of Learning Discrete Distributions 14 / 20
Recommend
More recommend