Information Complexity and Applications Mark Braverman Princeton University and IAS FoCM’17 July 17, 2017
Coding vs complexity: a tale of two theories Coding Computational Complexity Goal: data transmission Goal: computation Different channels Models of computation “Big” questions are “Big” questions are answered with theorems conjectures “ 𝐶𝑇𝐷 1/3 can transmit ≈ “One day, we’ll prove EXP requires > 𝑜 3 𝑂𝐵𝑂𝐸 gates” 0.052 trits per application”
A key difference • Information theory is a very effective language: fits many coding situations perfectly • Shannon’s channel coding theory is “continuous”: – Turn the channel into a continuous resource; – Separate the communication channel from how it is used 3
Theory of computation is “discrete” • Von Neumann (~1948): “…Thus formal logic is, by the nature of its approach, cut off from the best cultivated portions of mathematics, and forced onto the most difficult part of the mathematical terrain, into combinatorics. The theory of automata, … will have to share this unattractive property of formal logic. It will have to be, from the mathematical point of view, combinatorial rather than analytical. ” 4
Overview • Today: Will discuss the extension of the information language to apply to problems in complexity theory. 5
Background: Shannon’s entropy • Assume a lossless binary channel. • A message 𝑌 is distributed according to some prior 𝜈 . • The inherent amount of bits it takes to transmit 𝑌 is given by its entropy 𝐼 𝑌 = 𝜈 𝑌 = 𝑦 log 2 (1/𝜈[𝑌 = 𝑦]) . 𝑌 ∼ 𝜈 communication channel 6 B A
Shannon’s Noiseless Coding Theorem • The cost of communicating many copies of 𝑌 scales as 𝐼(𝑌) . • Shannon’s source coding theorem: – Let 𝐷 𝑜 𝑌 be the cost of transmitting 𝑜 independent copies of 𝑌 . Then the amortized transmission cost 𝑜→∞ 𝐷 𝑜 (𝑌)/𝑜 = 𝐼 𝑌 . lim • Operationalizes 𝐼 𝑌 . 7
𝐼(𝑌) is nicer than 𝐷 𝑜 (𝑌) • Sending a uniform trit 𝑈 in {1,2,3} . • Using the prefix-free encoding {0,10,11} sending on trit 𝑈 1 costs 𝐷 1 = 5/3 ≈ 1.667 bits. 29 • Sending two trits (𝑈 1 𝑈 2 ) costs 𝐷 2 = 9 bits using the encoding {000,001,010,011,100,101,110,1110,1111} . The cost per trit is 29/18 ≈ 1.611 < 𝐷 1 . • 𝐷 1 + 𝐷 1 ≠ 𝐷 2 . 8
𝐼(𝑌) is nicer than 𝐷 𝑜 (𝑌) 15 29 • 𝐷 1 = 9 , 𝐷 2 = 9 • 𝐷 1 + 𝐷 1 ≠ 𝐷 2 . • The entropy 𝐼(𝑈) = log 2 3 ≈ 1.585 . • We have 𝐼 𝑈 1 𝑈 2 = log 2 9 = 𝐼 𝑈 1 + 𝐼(𝑈 2 ) . • 𝐼 𝑈 is additive over independent variables. • 𝐷 𝑜 = 𝑜 ⋅ log 2 3 ± 𝑝(𝑜). 9
Today • We will discuss generalizing information and coding theory to interactive computation scenarios: “using interaction over a channel to solve a computational problem” • In Computer Science, the amount of communication needed to solve a problem is studied by the area of communication complexity. 10
Communication complexity [Yao’79] • Considers functionalities requiring interactive computation. • Focus on the two party setting first. A & B implement a X Y functionality F(X,Y). F(X,Y) A B e.g. F(X,Y) = “X=Y?” 11
Communication complexity Goal: implement a functionality 𝐺(𝑌, 𝑍) . A protocol 𝜌(𝑌, 𝑍) computing 𝐺(𝑌, 𝑍) : Shared randomness R m 1 (X,R) Y X m 2 (Y,m 1 ,R) m 3 (X,m 1 ,m 2 ,R) A B F(X,Y) Communication cost 𝐷𝐷 𝜌 = #of bits exchanged.
Communication complexity • (Distributional) communication complexity with input distribution 𝜈 and error 𝜁 : 𝐷𝐷 𝐺, 𝜈, 𝜁 . Error ≤ 𝜁 w.r.t. 𝜈 : 𝐷𝐷 𝐺, 𝜈, 𝜁 ≔ min ≤𝜁 𝐷𝐷(𝜌) 𝜌:𝜈 𝜌 𝑌,𝑍 ≠𝐺 𝑌,𝑍 • (Randomized/worst-case) communication complexity: 𝐷𝐷(𝐺, 𝜁) . Error ≤ 𝜁 on all inputs. • Yao’s minimax: 𝐷𝐷 𝐺, 𝜁 = max 𝐷𝐷(𝐺, 𝜈, 𝜁) . 𝜈 13
A tool for unconditional lower bounds about computation • Streaming; • Data structures; • Distributed computing; • VLSI design lower bounds; • Circuit complexity; • One of two main tools for unconditional lower bounds. • Connections to other problems in complexity theory (e.g. hardness amplification). 14
Set disjointness and intersection Alice and Bob each given a set 𝑌 ⊆ 1, … , 𝑜 , 𝑍 ⊆ {1, … , 𝑜} (can be viewed as vectors in 0,1 𝑜 ). • Intersection 𝐽𝑜𝑢 𝑜 𝑌, 𝑍 = 𝑌 ∩ 𝑍 . • Disjointness 𝐸𝑗𝑡𝑘 𝑜 𝑌, 𝑍 = 1 if 𝑌 ∩ 𝑍 = ∅ , and 0 otherwise • A non-trivial theorem [Kalyanasundaram- Schnitger’87 , Razborov’92] : 𝐷𝐷 𝐸𝑗𝑡𝑘 𝑜 , 1/4 = Ω(𝑜) . • Exercise: Solve 𝐸𝑗𝑡𝑘 𝑜 with error → 0 (say, 1/𝑜 ) in 0.9𝑜 bits of communication. Can you do 0.6𝑜 ? 0.4𝑜 ?
Direct sum • 𝐽𝑜𝑢 𝑜 is just 𝑜 times 2 -bit 𝐵𝑂𝐸 . • ¬𝐸𝑗𝑡𝑘 𝑜 is a disjunction of 2 -bit 𝐵𝑂𝐸 s. • What is the connection between the communication cost of one 𝐵𝑂𝐸 and the communication cost of 𝑜 𝐵𝑂𝐸 s? • Understanding the connection between the hardness of a problem and the hardness of its pieces. • A natural approach to lower bounds. 16
How does CC scale with copies? • 𝐷𝐷 𝐺 𝑜 , 𝜈 𝑜 , 𝜁 /𝑜 →? 𝐷𝐷 𝐺, 𝜈, 𝜁 ? Recall: • lim 𝑜→∞ 𝐷 𝑜 (𝑌)/𝑜 = 𝐼 𝑌 • Information complexity is the corresponding scaling limit for 𝐷𝐷 𝐺 𝑜 , 𝜈 𝑜 , 𝜁 /𝑜 . • Helps understand problems composed of smaller problems. 17
Interactive information complexity • Information complexity :: communication complexity as • Shannon’s entropy :: transmission cost 18
Information theory in two slides • For two (potentially correlated) variables 𝑌, 𝑍 , the conditional entropy of 𝑌 given 𝑍 is the amount of uncertainty left in 𝑌 given 𝑍 : 𝐼 𝑌 𝑍 ≔ 𝐹 𝑧~𝑍 H X Y = y . • One can show 𝐼 𝑌𝑍 = 𝐼 𝑍 + 𝐼(𝑌|𝑍) . • This important fact is knows as the chain rule. • If 𝑌 ⊥ 𝑍 , then 𝐼 𝑌𝑍 = 𝐼 𝑌 + 𝐼 𝑍 𝑌 = 𝐼 𝑌 + 𝐼 𝑍 . 19
Mutual information • The mutual information is defined as 𝐽 𝑌; 𝑍 = 𝐼 𝑌 − 𝐼 𝑌 𝑍 = 𝐼 𝑍 − 𝐼(𝑍|𝑌) • “How much knowing 𝑌 reduce the uncertainty of 𝑍 ?” • Conditional mutual information: 𝐽 𝑌; 𝑍 𝑎 ≔ 𝐼 𝑌 𝑎 − 𝐼(𝑌|𝑍𝑎) • Simple intuitive interpretation. 20
The information cost of a protocol • Prior distribution: 𝑌, 𝑍 ∼ 𝜈 . X Y Protocol Protocol π transcript Π Depends A B on both 𝐽𝐷(𝜌, 𝜈) = 𝐽(Π; 𝑍|𝑌) + 𝐽(Π; 𝑌|𝑍) Π and 𝜈 what Alice learns about Y + what Bob learns about X
Example • 𝐺 is “𝑌 = 𝑍? ” . • 𝜈 is a distribution where 𝑌 = 𝑍 w.p. ½ and (𝑌, 𝑍) are random w.p. ½ . X Y SHA-256(X) [256 bits] X=Y? [1 bit] A B 𝐽𝐷(𝜌, 𝜈) = 𝐽(Π; 𝑍|𝑌) + 𝐽(Π; 𝑌|𝑍) ≈ 1 + 129 = 130 bits what Alice learns about Y + what Bob learns about X
The information complexity of a problem • Communication complexity: 𝐷𝐷 𝐺, 𝜈, 𝜁 ≔ min 𝐷𝐷(𝜌) . 𝜌 𝑑𝑝𝑛𝑞𝑣𝑢𝑓𝑡 𝐺 𝑥𝑗𝑢ℎ 𝑓𝑠𝑠𝑝𝑠 ≤𝜁 Needed! • Analogously: 𝐽𝐷 𝐺, 𝜈, 𝜁 ≔ inf 𝐽𝐷(𝜌, 𝜈) . 𝜌 𝑑𝑝𝑛𝑞𝑣𝑢𝑓𝑡 𝐺 𝑥𝑗𝑢ℎ 𝑓𝑠𝑠𝑝𝑠 ≤𝜁 • (Easy) fact: 𝐽𝐷 𝐺, 𝜈, 𝜁 ≤ 𝐷𝐷 𝐺, 𝜈, 𝜁 . 23
Information = amortized communication • Recall: lim 𝑜→∞ 𝐷 𝑜 (𝑌)/𝑜 = 𝐼 𝑌 Theorem: [B.- Rao’11] 𝑜→∞ 𝐷𝐷(𝐺 𝑜 , 𝜈 𝑜 , 𝜁)/𝑜 = 𝐽𝐷 𝐺, 𝜈, 𝜁 . • lim • Corollary: 𝑜→∞ 𝐷𝐷(𝐽𝑜𝑢 𝑜 , 0 + )/𝑜 = 𝐽𝐷 𝐵𝑂𝐸, 0 lim
The two-bit AND • Alice and Bob each have a bit 𝑌, 𝑍 ∈ {0,1} distributed according to some 𝜈 on 0,1 2 . • Want to compute 𝑌 ∧ 𝑍 , while revealing to each other as little as possible to each others’ inputs (w.r.t. the worst 𝜈 ). • Answer 𝐽𝐷(𝐵𝑂𝐸, 0) is a number between 1 and 2 . 25
The two-bit AND Results [B.-Garg-Pankratov- Weinstein’13 ] : • 𝐽𝐷 𝐵𝑂𝐸, 0 ≈ 1.4922 bits. • Find the value of 𝐽𝐷 𝐵𝑂𝐸, 𝜈, 0 for all priors 𝜈 and exhibit the information- theoretically optimal protocol for computing the 𝐵𝑂𝐸 of two bits. • Studying 𝐽𝐷 𝐵𝑂𝐸, 𝜈, 0 as a function ℝ +4 /ℝ + → ℝ + is a functional minimization problem subject to a family of constraints (cf. construction of harmonic functions). 26
The two-bit AND • Studying 𝐽𝐷 𝐵𝑂𝐸, 𝜈, 0 as a function ℝ +4 /ℝ + → ℝ + is a functional minimization problem subject to a family of constraints (cf. construction of harmonic functions). • We adopt a “guess and verify” strategy, although the general question of computing the information complexity of a function from its truth table is a very interesting one. 27
The optimal protocol for AND 1 𝑍 ∈ {0,1} 𝑌 ∈ {0,1} A B If X=1, A=1 If Y=1, B=1 If X=0, A=U [0,1] If Y=0, B=U [0,1] 0
Recommend
More recommend