Dispersion for Data-Driven Algorithm Design, Online Learning, and Private Optimization Ellen Vitercik Northwestern Quarterly Theory Workshop Joint work with Nina Balcan and Travis Dick
Many problems have fast, optimal algorithms • E.g., sorting, shortest paths
Many problems have fast, optimal algorithms • E.g., sorting, shortest paths Many problems don’t • E.g., integer programming, subset selection • Many approximation and heuristic techniques • Best method depends on the application • Which to use?
Practitioners repeatedly solve problems Maintain same structure Differ on underlying data Should be algo that’s good across all instances Use information about prior instances Use ML to automate algorithm design to choose algorithm for future instances
Automated algorithm design Use ML to automate algorithm design Use ML to automate algorithm design Large body of empirical work: • Comp bio [DeBlasio and Kececioglu , ‘18] • AI [Xu, Hutter, Hoos, and Leyton- Brown, ’08] This work : formal guarantees for this approach
Simple example: knapsack Problem instance : • 𝑜 items; Item 𝑗 has value 𝑤 𝑗 and size 𝑡 𝑗 • Knapsack with capacity 𝐿 Goal : find most valuable items that fit Algorithm (parameterized by 𝜍 ≥ 0 ) : 𝑤 𝑗 Add items in decreasing order of How to set? 𝜍 𝑡 𝑗 [Gupta and Roughgarden , ‘17]
Application domain: stealing jewelry
Online algorithm configuration Day 1 Knapsack algorithm parameter 𝜍 0.95
Online algorithm configuration Day 2 Knapsack algorithm parameter 𝜍 0.45
Online algorithm configuration Day 3 Knapsack algorithm Value of parameter 𝜍 items in knapsack 0.45 Parameter 𝜍
Online algorithm configuration Day 3 𝑣 3 𝜍 Knapsack algorithm Algorithm parameter 𝜍 utility on 3 rd instance 0.45 Parameter 𝜍
Online algorithm configuration Day 4 𝑣 4 𝜍 Knapsack algorithm Algorithm parameter 𝜍 utility on 4 th instance 0.75 Parameter 𝜍
Online algorithm configuration Goal :Compete with best fixed parameters in hindsight. Minimize regret .
Optimizing piecewise Lipschitz functions Configuration ⇔ optimizing sums of piecewise Lipschitz functions Worst-case impossible to optimize online! Algorithm utility on 𝑢 th instance Parameter 𝜍
Our contributions Structural property dispersion implies strong guarantees for: • Online optimization of PWL functions • Uniform convergence in statistical settings • Differentially private optimization Dispersion satisfied in real problems under very mild assumptions
Outline 1. Online learning setup 2. Dispersion 3. Regret bounds 4. Examples of dispersion 5. Other applications of dispersion 6. Conclusion
Online piecewise Lipschitz optimization For each round 𝑢 ∈ 1, … , 𝑈 : 1. Learner chooses 𝝇 𝑢 ∈ ℝ 𝑒 2. Adversary chooses piecewise 𝑀 -Lipschitz function 𝑣 𝑢 : ℝ 𝑒 → ℝ 3. Learner gets reward 𝑣 𝑢 𝝇 𝑢 4. Full information: Learner observes function 𝑣 𝑢 𝑣 𝑢 𝜍 = Algorithm utility on 𝑢 th instance 𝜍
Online piecewise Lipschitz optimization For each round 𝑢 ∈ 1, … , 𝑈 : 1. Learner chooses 𝝇 𝑢 ∈ ℝ 𝑒 2. Adversary chooses piecewise 𝑀 -Lipschitz function 𝑣 𝑢 : ℝ 𝑒 → ℝ 3. Learner gets reward 𝑣 𝑢 𝝇 𝑢 4. Full information: Learner observes function 𝑣 𝑢 Bandit feedback: Learner only observes 𝑣 𝑢 𝝇 𝑢 𝑣 𝑢 𝜍 = Algorithm utility on 𝑢 th instance 𝜍
Online piecewise Lipschitz optimization For each round 𝑢 ∈ 1, … , 𝑈 : 1. Learner chooses 𝝇 𝑢 ∈ ℝ 𝑒 2. Adversary chooses piecewise 𝑀 -Lipschitz function 𝑣 𝑢 : ℝ 𝑒 → ℝ 3. Learner gets reward 𝑣 𝑢 𝝇 𝑢 4. Full information: Learner observes function 𝑣 𝑢 Bandit feedback: Learner only observes 𝑣 𝑢 𝝇 𝑢 Avg regret 𝑈 𝑈 𝝇∈ℝ 𝑒 σ 𝑢=1 𝑣 𝑢 𝝇 − σ 𝑢=1 Goal: Minimize regret = max 𝑣 𝑢 𝝇 𝑢 Want regret sublinear in 𝑈 𝑈
Prior work on PWL online optimization Gupta and Roughgarden [’17]: Max-Weight Independent Set algo configuration Cohen-Addad and Kanade [’17]: 1D piecewise constant functions
Mean adversary Exists adversary choosing piecewise constant functions s.t.: Every full information online algorithm has linear regret . Round 1: Adversary chooses one or the other with equal prob.
Mean adversary Exists adversary choosing piecewise constant functions s.t.: Every full information online algorithm has linear regret . Round 1: Round 2:
Mean adversary Exists adversary choosing piecewise constant functions s.t.: Every full information online algorithm has linear regret . Round 1: Round 2: Repeatedly halves optimal region
Mean adversary Exists adversary choosing piecewise constant functions s.t.: Every full information online algorithm has linear regret . Round 1: Round 2: Repeatedly halves optimal region
Mean adversary Exists adversary choosing piecewise constant functions s.t.: Every full information online algorithm has linear regret . Round 1: Round 2: Repeatedly halves optimal region
Mean adversary Exists adversary choosing piecewise constant functions s.t.: Every full information online algorithm has linear regret . Round 1: Round 2: Repeatedly halves optimal region 𝑈 Learner’s expected reward: 2 Reward of best point in hindsight: 𝑈 𝑈 Expected regret = 2
Outline 1. Online learning setup 2. Dispersion 3. Regret bounds 4. Examples of dispersion 5. Other applications of dispersion 6. Conclusion
Dispersion Mean adversary concentrates discontinuities near maximizer 𝜍 ∗ Even points very close to 𝜍 ∗ have low utility! 𝑣 1 , … , 𝑣 𝑈 are 𝒙, 𝒍 - dispersed at point 𝝇 if: ℓ 2 -ball 𝐶 𝝇, 𝑥 contains discontinuities for ≤ 𝑙 of 𝑣 1 , … , 𝑣 𝑈 Ball of radius 𝑥 about 𝝇 contains 2 discontinuities. 𝝇 → (𝑥, 2) -dispersed at 𝝇 . 𝑥
Sums of piecewise dispersed functions 𝑈 Given 𝑣 1 , … , 𝑣 𝑈 , plot of sum σ 𝑢=1 𝑣 𝑢 : Not dispersed Dispersed 𝜍 𝜍 Many discontinuities in interval Few discontinuities in interval
Key property of dispersed functions If 𝑣 1 , … , 𝑣 𝑈 : ℝ 𝑒 → [0,1] are 1. Piecewise 𝑀 -Lipschitz 2. ( 𝑥, 𝑙 )-dispersed at maximizer 𝝇 ∗ , 𝑣 𝑢 𝝇 ∗ − 𝑈𝑀𝑥 − 𝑙 . For every 𝝇 ∈ 𝐶 𝝇 ∗ , 𝑥 : σ 𝑢=1 𝑈 𝑈 𝑣 𝑢 𝝇 ≥ σ 𝑢=1 Proof idea : 𝑣 1 , … , 𝑣 𝑈 Is 𝑣 𝑢 𝑀 -Lipschitz on 𝐶 𝝇 ∗ , 𝑥 ? 𝝇 ∗
Key property of dispersed functions If 𝑣 1 , … , 𝑣 𝑈 : ℝ 𝑒 → [0,1] are 1. Piecewise 𝑀 -Lipschitz 2. ( 𝑥, 𝑙 )-dispersed at maximizer 𝝇 ∗ , 𝑣 𝑢 𝝇 ∗ − 𝑈𝑀𝑥 − 𝒍 . For every 𝝇 ∈ 𝐶 𝝇 ∗ , 𝑥 : σ 𝑢=1 𝑈 𝑈 𝑣 𝑢 𝝇 ≥ σ 𝑢=1 Proof idea : 𝑣 1 , … , 𝑣 𝑈 𝑣 𝑢 𝝇 − 𝑣 𝑢 𝝇 ∗ ≤ 1 No ( ≤ 𝑙 functions) Is 𝑣 𝑢 𝑀 -Lipschitz on 𝐶 𝝇 ∗ , 𝑥 ? 𝝇 ∗
Key property of dispersed functions If 𝑣 1 , … , 𝑣 𝑈 : ℝ 𝑒 → [0,1] are 1. Piecewise 𝑀 -Lipschitz 2. ( 𝑥, 𝑙 )-dispersed at maximizer 𝝇 ∗ , 𝑣 𝑢 𝝇 ∗ − 𝑼𝑴𝒙 − 𝑙 . For every 𝝇 ∈ 𝐶 𝝇 ∗ , 𝑥 : σ 𝑢=1 𝑈 𝑈 𝑣 𝑢 𝝇 ≥ σ 𝑢=1 Proof idea : 𝑣 1 , … , 𝑣 𝑈 𝑣 𝑢 𝝇 − 𝑣 𝑢 𝝇 ∗ ≤ 1 No ( ≤ 𝑙 functions) Is 𝑣 𝑢 𝑀 -Lipschitz on 𝐶 𝝇 ∗ , 𝑥 ? 𝝇 ∗ ( ≤ 𝑈 functions) Yes 𝑣 𝑢 𝝇 − 𝑣 𝑢 𝝇 ∗ ≤ 𝑀𝑥
Outline 1. Online learning setup 2. Dispersion 3. Regret bounds 1. Full information 2. Bandit feedback 4. Examples of dispersion 5. Other applications of dispersion 6. Conclusion
Full information online learning Exponentially Weighted Forecaster [Cesa- Bianchi & Lugosi ’06] : 𝑢−1 𝑣 𝑡 𝝇 𝑢 (𝝇) ∝ exp 𝜇 σ 𝑡=1 At round 𝑢 , sample from dist. w/ PDF 𝑔 . 𝑣 𝑢 𝜍 𝜍
Full information online learning Theorem: If 𝑣 1 , … , 𝑣 𝑈 : 𝐶 𝑒 (𝟏, 1) → 0,1 are: 1. Piecewise 𝑀 -Lipschitz 2. ( 𝑥, 𝑙 )-dispersed at 𝝇 ∗ , 𝝇 ∗ 𝟐 EWF has regret 𝑃 𝑼𝒆 𝐦𝐩𝐡 𝒙 + 𝑼𝑴𝒙 + 𝒍 . Intuition: Every 𝝇 ∈ 𝐶 𝝇 ∗ , 𝑥 has utility ≥ 𝑃𝑄𝑈 − 𝑈𝑀𝑥 − 𝑙. When is this a good bound? 1 𝑀 𝑈 and 𝑙 = ෨ 𝑈 , regret is ෨ For 𝑥 = 𝑃 𝑃 𝑈𝑒 1 EWF can compete with 𝐶 𝝇 ∗ , 𝑥 up to 𝑃 𝑈𝑒 log factor. 𝑥
Full information online learning Theorem: If 𝑣 1 , … , 𝑣 𝑈 : 𝐶 𝑒 (𝟏, 1) → 0,1 are: 1. Piecewise 𝑀 -Lipschitz 2. ( 𝑥, 𝑙 )-dispersed at 𝝇 ∗ , 𝝇 ∗ 𝟐 EWF has regret 𝑃 𝑼𝒆 𝐦𝐩𝐡 𝒙 + 𝑼𝑴𝒙 + 𝒍 . Intuition: Every 𝝇 ∈ 𝐶 𝝇 ∗ , 𝑥 has utility ≥ 𝑃𝑄𝑈 − 𝑼𝑴𝒙 − 𝒍. 1 EWF can compete with 𝐶 𝝇 ∗ , 𝑥 up to 𝑃 𝑈𝑒 log factor. 𝑥
Recommend
More recommend