Refined bounds for algorithm configuration: The knife-edge of dual class approximability Nina Balcan, Tuomas Sandholm, Ellen Vitercik
Algorithms typically come with many tunable parameters Significant impact on runtime, solution quality, … Hand-tuning is time-consuming , tedious , and error-prone
Automated algorithm configuration Goal: Automate algorithm configuration via machine learning Algorithmically find good parameter settings using a set of “typical” inputs from application at hand Training set
Automated configuration procedure 1. Fix parameterized algorithm (e.g., CPLEX) 2. Receive set 𝒯 of “typical” inputs from unknown distribution Problem Problem Problem Problem instance 1 instance 2 instance 3 instance 4 3. Return parameter setting with good avg performance over 𝒯 Runtime, solution quality, memory usage, etc.
Automated configuration procedure Seen Unseen ? Problem Problem Problem Problem Problem instance 1 instance 2 instance 3 instance 4 instance Key question (focus of talk): Will those parameters have good expected performance?
Overview of main result Key question (focus of talk): Key question (focus of talk): Will those parameters have good expected performance? Will those parameters have good expected performance? “Yes” when algorithmic performance as function of parameters can be approximated by a simple function Algorithmic performance 𝒈 ∗ (𝒔) Simple approximating function 𝒉 ∗ (𝒔) Parameter 𝑠
Overview of main result Observe this structure, e.g., in integer programming algorithm configuration Algorithmic performance 𝒈 ∗ (𝒔) Simple approximating function 𝒉 ∗ (𝒔) Parameter 𝑠
Overview of main result: a dichotomy If approximation holds under the 𝑀 ! -norm: 𝑔 ∗ 𝑠 − ∗ (𝑠) sup " We provide strong guarantees is small Algorithmic performance 𝒈 ∗ (𝒔) Simple approximating function 𝒉 ∗ (𝒔) Parameter 𝑠
Overview of main result: a dichotomy If approximation holds under the 𝑀 ! -norm: ! ∫ 𝑔 ∗ 𝑠 − ∗ (𝑠) # 𝑒𝑠 We provide strong guarantees is small If approximation only holds under the 𝑀 " -norm for 𝑞 < ∞ : Not possible to provide strong guarantees in worst case Algorithmic performance 𝒈 ∗ (𝒔) Simple approximating function 𝒉 ∗ (𝒔) Parameter 𝑠
Model
Model 𝒴 : Set of all inputs (e.g., integer programs) ℝ # : Set of all parameter settings (e.g., CPLEX parameters) Standard assumption: Unknown distribution over inputs E.g., represents scheduling problem airline solves day-to-day
“Algorithmic performance” 𝒔 𝑦 = utility of algorithm parameterized by 𝒔 ∈ ℝ # on input 𝑦 𝑔 E.g., runtime, solution quality, memory usage, … Assume 𝑔 𝒔 𝑦 ∈ −1,1 Can be generalized to 𝑔 𝒔 𝑦 ∈ −𝐼, 𝐼
Generalization bounds
Generalization bounds Key question: For any parameter setting 𝒔 , Does good avg utility on training set imply good exp utility? Formally: Given samples 𝑦 % , … , 𝑦 & ~ , for any 𝒔 , & 1 ? 𝑂 4 𝑔 𝒔 𝑦 ' − 𝔽 )~ 𝑔 𝒔 𝑦 ≤ ? '(% Empirical average utility Typically, answer by bounding the intrinsic complexity of 𝒔 ∈ ℝ # ℱ = 𝑔 𝒔
Generalization bounds Key question: For any parameter setting 𝒔 , Does good avg utility on training set imply good exp utility? Formally: Given samples 𝑦 % , … , 𝑦 & ~ , for any 𝒔 , & 1 ? 𝑂 4 𝑔 𝒔 𝑦 ' − 𝔽 )~ 𝑔 𝒔 𝑦 ≤ ? '(% Expected utility Typically, answer by bounding the intrinsic complexity of 𝒔 ∈ ℝ # ℱ = 𝑔 𝒔
Generalization bounds Key question: For any parameter setting 𝒔 , Does good avg utility on training set imply good exp utility? Formally: Given samples 𝑦 % , … , 𝑦 & ~ , for any 𝒔 , & 1 ? 𝑂 4 𝑔 𝒔 𝑦 ' − 𝔽 )~ 𝑔 𝒔 𝑦 ≤ ? '(% Typically, answer by bounding the intrinsic complexity of 𝒔 ∈ ℝ # ℱ = 𝑔 𝒔
Generalization bounds 𝒔 : 𝒴 → ℝ 𝒔 ∈ ℝ # is gnarly Challenge: Class ℱ = 𝑔 E.g., in integer programming algorithm configuration: • Each domain element is an IP • Unclear how to plot or visualize functions 𝑔 𝒔 • No obvious notions of Lipschitzness or smoothness to rely on
Dual functions
Dual classes 𝒔 𝑦 = utility of algorithm parameterized by 𝒔 ∈ ℝ # on input 𝑦 𝑔 𝒔 : 𝒴 → ℝ 𝒔 ∈ ℝ # ℱ = 𝑔 “Primal” function class ∗ 𝒔 = utility as function of parameters 𝑔 ) ∗ 𝒔 = 𝑔 𝑔 𝒔 𝑦 ) ℱ ∗ = 𝑔 ∗ : ℝ # → ℝ 𝑦 ∈ 𝒴 “Dual” function class ) • Dual functions have simple, Euclidean domain • Often have ample structure can use to bound complexity of ℱ
Dual function approximability 𝒔 ∈ ℝ # ℱ = 𝑔 𝒔 Sets of functions mapping 𝒴 to ℝ 𝒔 ∈ ℝ # = 𝒔 Dual class ∗ (𝜹, 𝒒) -approximates ℱ ∗ if for all 𝑦 ∈ 𝒴 , ! ∗ − ) ∗ 𝒔 − ) ∗ (𝒔) " 𝑒𝒔 ≤ 𝛿. ∗ 𝑔 " = A ℝ " 𝑔 ) ) ∗ 𝑔 $ ∗ $ 𝑠
Main result: Upper bound
Generalization upper bound 𝒔 ∈ ℝ # ℱ = 𝑔 𝒔 Sets of functions mapping 𝒴 to ℝ 𝒔 ∈ ℝ # = 𝒔 With high probability over the draw of 𝒯~ & , for any 𝒔 , 1 1 1 ∗ − ) ! + H = E ∗ 𝑂 4 𝑔 𝒔 𝑦 − 𝔽 )~ 𝑔 𝒔 𝑦 𝑃 𝑂 4 𝑔 ℜ 𝒯 + ) 𝑂 )∈𝒯 )∈𝒯 Average utility over the training set
Generalization upper bound 𝒔 ∈ ℝ # ℱ = 𝑔 𝒔 Sets of functions mapping 𝒴 to ℝ 𝒔 ∈ ℝ # = 𝒔 With high probability over the draw of 𝒯~ & , for any 𝒔 , 1 1 1 ∗ − ) ! + H = E ∗ 𝑂 4 𝑔 𝒔 𝑦 − 𝔽 )~ 𝑔 𝒔 𝑦 𝑃 𝑂 4 𝑔 ℜ 𝒯 + ) 𝑂 )∈𝒯 )∈𝒯 Expected utility
Generalization upper bound 𝒔 ∈ ℝ # ℱ = 𝑔 𝒔 Sets of functions mapping 𝒴 to ℝ 𝒔 ∈ ℝ # = 𝒔 With high probability over the draw of 𝒯~ & , for any 𝒔 , 1 1 1 ∗ − ) ! + H = E ∗ 𝑂 4 𝑔 𝒔 𝑦 − 𝔽 )~ 𝑔 𝒔 𝑦 𝑃 𝑂 4 𝑔 ℜ 𝒯 + ) 𝑂 )∈𝒯 )∈𝒯 If not too complex and ∗ (𝛿, ∞) -approximates ℱ ∗ , Bound approaches 𝑷 𝜹 as 𝑶 → ∞ .
Main result: Lower bound
Lower bound For any 𝛿 and 𝑞 < ∞ , there exist function classes ℱ, such that: • Dual class ∗ (𝛿, 𝑞) -approximates ℱ ∗ • is very simple Rademacher complexity is 0 % • ℱ is very complex Rademacher complexity is 0 • Not possible to provide generalization bounds in worst case
Experiments
Experiments: Integer programming Tune integer programming solver parameters Also studied by Balcan, Dick, Sandholm, Vitercik [ICML’18] Generalization error Distributions over auction IPs Our bound Bound by BDS V ’18 [Leyton-Brown, Pearson, Shoham, EC’00] Number of training instances
Conclusion
Conclusion • Provided generalization bounds for algorithm configuration • Apply whenever utility as function of parameters is “approximately simple” • Connection between learnability and approximability is balanced on a knife-edge • If approximation holds under 𝑀 / -norm, can provide strong bounds • If holds under 𝑀 0 -norm for 𝑞 < ∞ , not possible to provide bounds • Experiments demonstrate strength of these bounds
Recommend
More recommend