refined bounds for algorithm configuration the knife edge
play

Refined bounds for algorithm configuration: The knife-edge of dual - PowerPoint PPT Presentation

Refined bounds for algorithm configuration: The knife-edge of dual class approximability Nina Balcan, Tuomas Sandholm, Ellen Vitercik Algorithms typically come with many tunable parameters Significant impact on runtime, solution quality,


  1. Refined bounds for algorithm configuration: The knife-edge of dual class approximability Nina Balcan, Tuomas Sandholm, Ellen Vitercik

  2. Algorithms typically come with many tunable parameters Significant impact on runtime, solution quality, … Hand-tuning is time-consuming , tedious , and error-prone

  3. Automated algorithm configuration Goal: Automate algorithm configuration via machine learning Algorithmically find good parameter settings using a set of “typical” inputs from application at hand Training set

  4. Automated configuration procedure 1. Fix parameterized algorithm (e.g., CPLEX) 2. Receive set 𝒯 of “typical” inputs from unknown distribution Problem Problem Problem Problem instance 1 instance 2 instance 3 instance 4 3. Return parameter setting with good avg performance over 𝒯 Runtime, solution quality, memory usage, etc.

  5. Automated configuration procedure Seen Unseen ? Problem Problem Problem Problem Problem instance 1 instance 2 instance 3 instance 4 instance Key question (focus of talk): Will those parameters have good expected performance?

  6. Overview of main result Key question (focus of talk): Key question (focus of talk): Will those parameters have good expected performance? Will those parameters have good expected performance? “Yes” when algorithmic performance as function of parameters can be approximated by a simple function Algorithmic performance 𝒈 ∗ (𝒔) Simple approximating function 𝒉 ∗ (𝒔) Parameter 𝑠

  7. Overview of main result Observe this structure, e.g., in integer programming algorithm configuration Algorithmic performance 𝒈 ∗ (𝒔) Simple approximating function 𝒉 ∗ (𝒔) Parameter 𝑠

  8. Overview of main result: a dichotomy If approximation holds under the 𝑀 ! -norm: 𝑔 ∗ 𝑠 − 𝑕 ∗ (𝑠) sup " We provide strong guarantees is small Algorithmic performance 𝒈 ∗ (𝒔) Simple approximating function 𝒉 ∗ (𝒔) Parameter 𝑠

  9. Overview of main result: a dichotomy If approximation holds under the 𝑀 ! -norm: ! ∫ 𝑔 ∗ 𝑠 − 𝑕 ∗ (𝑠) # 𝑒𝑠 We provide strong guarantees is small If approximation only holds under the 𝑀 " -norm for 𝑞 < ∞ : Not possible to provide strong guarantees in worst case Algorithmic performance 𝒈 ∗ (𝒔) Simple approximating function 𝒉 ∗ (𝒔) Parameter 𝑠

  10. Model

  11. Model 𝒴 : Set of all inputs (e.g., integer programs) ℝ # : Set of all parameter settings (e.g., CPLEX parameters) Standard assumption: Unknown distribution 𝒠 over inputs E.g., represents scheduling problem airline solves day-to-day

  12. “Algorithmic performance” 𝒔 𝑦 = utility of algorithm parameterized by 𝒔 ∈ ℝ # on input 𝑦 𝑔 E.g., runtime, solution quality, memory usage, … Assume 𝑔 𝒔 𝑦 ∈ −1,1 Can be generalized to 𝑔 𝒔 𝑦 ∈ −𝐼, 𝐼

  13. Generalization bounds

  14. Generalization bounds Key question: For any parameter setting 𝒔 , Does good avg utility on training set imply good exp utility? Formally: Given samples 𝑦 % , … , 𝑦 & ~𝒠 , for any 𝒔 , & 1 ? 𝑂 4 𝑔 𝒔 𝑦 ' − 𝔽 )~𝒠 𝑔 𝒔 𝑦 ≤ ? '(% Empirical average utility Typically, answer by bounding the intrinsic complexity of 𝒔 ∈ ℝ # ℱ = 𝑔 𝒔

  15. Generalization bounds Key question: For any parameter setting 𝒔 , Does good avg utility on training set imply good exp utility? Formally: Given samples 𝑦 % , … , 𝑦 & ~𝒠 , for any 𝒔 , & 1 ? 𝑂 4 𝑔 𝒔 𝑦 ' − 𝔽 )~𝒠 𝑔 𝒔 𝑦 ≤ ? '(% Expected utility Typically, answer by bounding the intrinsic complexity of 𝒔 ∈ ℝ # ℱ = 𝑔 𝒔

  16. Generalization bounds Key question: For any parameter setting 𝒔 , Does good avg utility on training set imply good exp utility? Formally: Given samples 𝑦 % , … , 𝑦 & ~𝒠 , for any 𝒔 , & 1 ? 𝑂 4 𝑔 𝒔 𝑦 ' − 𝔽 )~𝒠 𝑔 𝒔 𝑦 ≤ ? '(% Typically, answer by bounding the intrinsic complexity of 𝒔 ∈ ℝ # ℱ = 𝑔 𝒔

  17. Generalization bounds 𝒔 : 𝒴 → ℝ 𝒔 ∈ ℝ # is gnarly Challenge: Class ℱ = 𝑔 E.g., in integer programming algorithm configuration: • Each domain element is an IP • Unclear how to plot or visualize functions 𝑔 𝒔 • No obvious notions of Lipschitzness or smoothness to rely on

  18. Dual functions

  19. Dual classes 𝒔 𝑦 = utility of algorithm parameterized by 𝒔 ∈ ℝ # on input 𝑦 𝑔 𝒔 : 𝒴 → ℝ 𝒔 ∈ ℝ # ℱ = 𝑔 “Primal” function class ∗ 𝒔 = utility as function of parameters 𝑔 ) ∗ 𝒔 = 𝑔 𝑔 𝒔 𝑦 ) ℱ ∗ = 𝑔 ∗ : ℝ # → ℝ 𝑦 ∈ 𝒴 “Dual” function class ) • Dual functions have simple, Euclidean domain • Often have ample structure can use to bound complexity of ℱ

  20. Dual function approximability 𝒔 ∈ ℝ # ℱ = 𝑔 𝒔 Sets of functions mapping 𝒴 to ℝ 𝒔 ∈ ℝ # 𝒣 = 𝑕 𝒔 Dual class 𝒣 ∗ (𝜹, 𝒒) -approximates ℱ ∗ if for all 𝑦 ∈ 𝒴 , ! ∗ − 𝑕 ) ∗ 𝒔 − 𝑕 ) ∗ (𝒔) " 𝑒𝒔 ≤ 𝛿. ∗ 𝑔 " = A ℝ " 𝑔 ) ) ∗ 𝑔 $ ∗ 𝑕 $ 𝑠

  21. Main result: Upper bound

  22. Generalization upper bound 𝒔 ∈ ℝ # ℱ = 𝑔 𝒔 Sets of functions mapping 𝒴 to ℝ 𝒔 ∈ ℝ # 𝒣 = 𝑕 𝒔 With high probability over the draw of 𝒯~𝒠 & , for any 𝒔 , 1 1 1 ∗ − 𝑕 ) ! + H = E ∗ 𝑂 4 𝑔 𝒔 𝑦 − 𝔽 )~𝒠 𝑔 𝒔 𝑦 𝑃 𝑂 4 𝑔 ℜ 𝒯 𝒣 + ) 𝑂 )∈𝒯 )∈𝒯 Average utility over the training set

  23. Generalization upper bound 𝒔 ∈ ℝ # ℱ = 𝑔 𝒔 Sets of functions mapping 𝒴 to ℝ 𝒔 ∈ ℝ # 𝒣 = 𝑕 𝒔 With high probability over the draw of 𝒯~𝒠 & , for any 𝒔 , 1 1 1 ∗ − 𝑕 ) ! + H = E ∗ 𝑂 4 𝑔 𝒔 𝑦 − 𝔽 )~𝒠 𝑔 𝒔 𝑦 𝑃 𝑂 4 𝑔 ℜ 𝒯 𝒣 + ) 𝑂 )∈𝒯 )∈𝒯 Expected utility

  24. Generalization upper bound 𝒔 ∈ ℝ # ℱ = 𝑔 𝒔 Sets of functions mapping 𝒴 to ℝ 𝒔 ∈ ℝ # 𝒣 = 𝑕 𝒔 With high probability over the draw of 𝒯~𝒠 & , for any 𝒔 , 1 1 1 ∗ − 𝑕 ) ! + H = E ∗ 𝑂 4 𝑔 𝒔 𝑦 − 𝔽 )~𝒠 𝑔 𝒔 𝑦 𝑃 𝑂 4 𝑔 ℜ 𝒯 𝒣 + ) 𝑂 )∈𝒯 )∈𝒯 If 𝒣 not too complex and 𝒣 ∗ (𝛿, ∞) -approximates ℱ ∗ , Bound approaches 𝑷 𝜹 as 𝑶 → ∞ .

  25. Main result: Lower bound

  26. Lower bound For any 𝛿 and 𝑞 < ∞ , there exist function classes ℱ, 𝒣 such that: • Dual class 𝒣 ∗ (𝛿, 𝑞) -approximates ℱ ∗ • 𝒣 is very simple Rademacher complexity is 0 % • ℱ is very complex Rademacher complexity is 0 • Not possible to provide generalization bounds in worst case

  27. Experiments

  28. Experiments: Integer programming Tune integer programming solver parameters Also studied by Balcan, Dick, Sandholm, Vitercik [ICML’18] Generalization error Distributions over auction IPs Our bound Bound by BDS V ’18 [Leyton-Brown, Pearson, Shoham, EC’00] Number of training instances

  29. Conclusion

  30. Conclusion • Provided generalization bounds for algorithm configuration • Apply whenever utility as function of parameters is “approximately simple” • Connection between learnability and approximability is balanced on a knife-edge • If approximation holds under 𝑀 / -norm, can provide strong bounds • If holds under 𝑀 0 -norm for 𝑞 < ∞ , not possible to provide bounds • Experiments demonstrate strength of these bounds

Recommend


More recommend