Refined bounds for algorithm configuration: The knife-edge of dual - - PowerPoint PPT Presentation

refined bounds for algorithm configuration the knife edge
SMART_READER_LITE
LIVE PREVIEW

Refined bounds for algorithm configuration: The knife-edge of dual - - PowerPoint PPT Presentation

Refined bounds for algorithm configuration: The knife-edge of dual class approximability Nina Balcan, Tuomas Sandholm, Ellen Vitercik Algorithms typically come with many tunable parameters Significant impact on runtime, solution quality,


slide-1
SLIDE 1

Refined bounds for algorithm configuration: The knife-edge of dual class approximability

Nina Balcan, Tuomas Sandholm, Ellen Vitercik

slide-2
SLIDE 2

Algorithms typically come with many tunable parameters Significant impact on runtime, solution quality, … Hand-tuning is time-consuming, tedious, and error-prone

slide-3
SLIDE 3

Automated algorithm configuration

Goal: Automate algorithm configuration via machine learning Algorithmically find good parameter settings using a set of “typical” inputs from application at hand

Training set

slide-4
SLIDE 4

Automated configuration procedure

  • 1. Fix parameterized algorithm (e.g., CPLEX)
  • 2. Receive set 𝒯 of “typical” inputs from unknown distribution
  • 3. Return parameter setting with good avg performance over 𝒯

Problem instance 1 Problem instance 2 Problem instance 3 Problem instance 4 Runtime, solution quality, memory usage, etc.

slide-5
SLIDE 5

Automated configuration procedure

Problem instance 1 Problem instance 2 Problem instance 3 Problem instance 4 Problem instance

Seen Unseen? Key question (focus of talk): Will those parameters have good expected performance?

slide-6
SLIDE 6

Overview of main result

Key question (focus of talk): Will those parameters have good expected performance? “Yes” when algorithmic performance as function of parameters can be approximated by a simple function

Simple approximating function 𝒉∗(𝒔) Algorithmic performance 𝒈∗(𝒔) Parameter 𝑠

Key question (focus of talk): Will those parameters have good expected performance?

slide-7
SLIDE 7

Overview of main result

Observe this structure, e.g., in integer programming algorithm configuration

Algorithmic performance 𝒈∗(𝒔) Parameter 𝑠 Simple approximating function 𝒉∗(𝒔)

slide-8
SLIDE 8

Overview of main result: a dichotomy

If approximation holds under the 𝑀!-norm: We provide strong guarantees

sup

"

𝑔∗ 𝑠 − 𝑕∗(𝑠) is small Algorithmic performance 𝒈∗(𝒔) Parameter 𝑠 Simple approximating function 𝒉∗(𝒔)

slide-9
SLIDE 9

Overview of main result: a dichotomy

If approximation holds under the 𝑀!-norm: We provide strong guarantees If approximation only holds under the 𝑀"-norm for 𝑞 < ∞: Not possible to provide strong guarantees in worst case

Algorithmic performance 𝒈∗(𝒔) Parameter 𝑠 Simple approximating function 𝒉∗(𝒔)

! ∫ 𝑔∗ 𝑠 − 𝑕∗(𝑠) # 𝑒𝑠

is small

slide-10
SLIDE 10

Model

slide-11
SLIDE 11

𝒴: Set of all inputs (e.g., integer programs) ℝ#: Set of all parameter settings (e.g., CPLEX parameters) Standard assumption: Unknown distribution 𝒠 over inputs

E.g., represents scheduling problem airline solves day-to-day

Model

slide-12
SLIDE 12

𝑔

𝒔 𝑦 = utility of algorithm parameterized by 𝒔 ∈ ℝ# on input 𝑦

E.g., runtime, solution quality, memory usage, …

Assume 𝑔

𝒔 𝑦 ∈ −1,1

Can be generalized to 𝑔

𝒔 𝑦 ∈ −𝐼, 𝐼

“Algorithmic performance”

slide-13
SLIDE 13

Generalization bounds

slide-14
SLIDE 14

Generalization bounds

Key question: For any parameter setting 𝒔, Does good avg utility on training set imply good exp utility? Formally: Given samples 𝑦%, … , 𝑦&~𝒠, for any 𝒔, 1 𝑂 4

'(% &

𝑔

𝒔 𝑦' − 𝔽)~𝒠 𝑔 𝒔 𝑦

≤ ? Typically, answer by bounding the intrinsic complexity of ℱ = 𝑔

𝒔

𝒔 ∈ ℝ#

?

Empirical average utility

slide-15
SLIDE 15

Generalization bounds

Key question: For any parameter setting 𝒔, Does good avg utility on training set imply good exp utility? Formally: Given samples 𝑦%, … , 𝑦&~𝒠, for any 𝒔, 1 𝑂 4

'(% &

𝑔

𝒔 𝑦' − 𝔽)~𝒠 𝑔 𝒔 𝑦

≤ ? Typically, answer by bounding the intrinsic complexity of ℱ = 𝑔

𝒔

𝒔 ∈ ℝ#

?

Expected utility

slide-16
SLIDE 16

Generalization bounds

Key question: For any parameter setting 𝒔, Does good avg utility on training set imply good exp utility? Formally: Given samples 𝑦%, … , 𝑦&~𝒠, for any 𝒔, 1 𝑂 4

'(% &

𝑔

𝒔 𝑦' − 𝔽)~𝒠 𝑔 𝒔 𝑦

≤ ? Typically, answer by bounding the intrinsic complexity of ℱ = 𝑔

𝒔

𝒔 ∈ ℝ#

?

slide-17
SLIDE 17

Generalization bounds

Challenge: Class ℱ = 𝑔

𝒔: 𝒴 → ℝ 𝒔 ∈ ℝ# is gnarly

E.g., in integer programming algorithm configuration:

  • Each domain element is an IP
  • Unclear how to plot or visualize functions 𝑔

𝒔

  • No obvious notions of Lipschitzness or smoothness to rely on
slide-18
SLIDE 18

Dual functions

slide-19
SLIDE 19

Dual classes

𝑔

𝒔 𝑦 = utility of algorithm parameterized by 𝒔 ∈ ℝ# on input 𝑦

ℱ = 𝑔

𝒔: 𝒴 → ℝ 𝒔 ∈ ℝ#

“Primal” function class 𝑔

) ∗ 𝒔 = utility as function of parameters

𝑔

) ∗ 𝒔 = 𝑔 𝒔 𝑦

ℱ∗ = 𝑔

) ∗: ℝ# → ℝ 𝑦 ∈ 𝒴

“Dual” function class

  • Dual functions have simple, Euclidean domain
  • Often have ample structure can use to bound complexity of ℱ
slide-20
SLIDE 20

Dual function approximability

ℱ = 𝑔

𝒔

𝒔 ∈ ℝ# 𝒣 = 𝑕𝒔 𝒔 ∈ ℝ# Dual class 𝒣∗ (𝜹, 𝒒)-approximates ℱ∗ if for all 𝑦 ∈ 𝒴, 𝑔

) ∗ − 𝑕) ∗ " =

!

A

ℝ" 𝑔 ) ∗ 𝒔 − 𝑕) ∗(𝒔) " 𝑒𝒔 ≤ 𝛿.

Sets of functions mapping 𝒴 to ℝ

𝑕$

𝑔

$ ∗

𝑠

slide-21
SLIDE 21

Main result: Upper bound

slide-22
SLIDE 22

Generalization upper bound

ℱ = 𝑔

𝒔

𝒔 ∈ ℝ# 𝒣 = 𝑕𝒔 𝒔 ∈ ℝ# With high probability over the draw of 𝒯~𝒠&, for any 𝒔, 1 𝑂 4

)∈𝒯

𝑔

𝒔 𝑦 − 𝔽 )~𝒠 𝑔 𝒔 𝑦

= E 𝑃 1 𝑂 4

)∈𝒯

𝑔

) ∗ − 𝑕) ∗ ! + H

ℜ𝒯 𝒣 + 1 𝑂 Sets of functions mapping 𝒴 to ℝ

Average utility over the training set

slide-23
SLIDE 23

Generalization upper bound

ℱ = 𝑔

𝒔

𝒔 ∈ ℝ# 𝒣 = 𝑕𝒔 𝒔 ∈ ℝ# With high probability over the draw of 𝒯~𝒠&, for any 𝒔, 1 𝑂 4

)∈𝒯

𝑔

𝒔 𝑦 − 𝔽 )~𝒠 𝑔 𝒔 𝑦

= E 𝑃 1 𝑂 4

)∈𝒯

𝑔

) ∗ − 𝑕) ∗ ! + H

ℜ𝒯 𝒣 + 1 𝑂 Sets of functions mapping 𝒴 to ℝ

Expected utility

slide-24
SLIDE 24

Generalization upper bound

ℱ = 𝑔

𝒔

𝒔 ∈ ℝ# 𝒣 = 𝑕𝒔 𝒔 ∈ ℝ# With high probability over the draw of 𝒯~𝒠&, for any 𝒔, 1 𝑂 4

)∈𝒯

𝑔

𝒔 𝑦 − 𝔽 )~𝒠 𝑔 𝒔 𝑦

= E 𝑃 1 𝑂 4

)∈𝒯

𝑔

) ∗ − 𝑕) ∗ ! + H

ℜ𝒯 𝒣 + 1 𝑂 If 𝒣 not too complex and 𝒣∗ (𝛿, ∞)-approximates ℱ∗, Bound approaches 𝑷 𝜹 as 𝑶 → ∞. Sets of functions mapping 𝒴 to ℝ

slide-25
SLIDE 25

Main result: Lower bound

slide-26
SLIDE 26

Lower bound

For any 𝛿 and 𝑞 < ∞, there exist function classes ℱ, 𝒣 such that:

  • Dual class 𝒣∗ (𝛿, 𝑞)-approximates ℱ∗
  • 𝒣 is very simple

Rademacher complexity is 0

  • ℱ is very complex

Rademacher complexity is

%

  • Not possible to provide generalization bounds in worst case
slide-27
SLIDE 27

Experiments

slide-28
SLIDE 28

Tune integer programming solver parameters

Also studied by Balcan, Dick, Sandholm, Vitercik [ICML’18]

Distributions over auction IPs

[Leyton-Brown, Pearson, Shoham, EC’00]

Experiments: Integer programming

Number of training instances Generalization error Our bound Bound by BDSV’18

slide-29
SLIDE 29

Conclusion

slide-30
SLIDE 30

Conclusion

  • Provided generalization bounds for algorithm configuration
  • Apply whenever utility as function of parameters is

“approximately simple”

  • Connection between learnability and approximability is

balanced on a knife-edge

  • If approximation holds under 𝑀/-norm, can provide strong bounds
  • If holds under 𝑀0-norm for 𝑞 < ∞, not possible to provide bounds
  • Experiments demonstrate strength of these bounds