Refined bounds for algorithm configuration: The knife-edge of dual - PowerPoint PPT Presentation

Refined bounds for algorithm configuration: The knife-edge of dual class approximability Nina Balcan, Tuomas Sandholm, Ellen Vitercik

Algorithms typically come with many tunable parameters Significant impact on runtime, solution quality, … Hand-tuning is time-consuming , tedious , and error-prone

Automated algorithm configuration Goal: Automate algorithm configuration via machine learning Algorithmically find good parameter settings using a set of “typical” inputs from application at hand Training set

Automated configuration procedure 1. Fix parameterized algorithm (e.g., CPLEX) 2. Receive set 𝒯 of “typical” inputs from unknown distribution Problem Problem Problem Problem instance 1 instance 2 instance 3 instance 4 3. Return parameter setting with good avg performance over 𝒯 Runtime, solution quality, memory usage, etc.

Automated configuration procedure Seen Unseen ? Problem Problem Problem Problem Problem instance 1 instance 2 instance 3 instance 4 instance Key question (focus of talk): Will those parameters have good expected performance?

Overview of main result Key question (focus of talk): Key question (focus of talk): Will those parameters have good expected performance? Will those parameters have good expected performance? “Yes” when algorithmic performance as function of parameters can be approximated by a simple function Algorithmic performance 𝒈 ∗ (𝒔) Simple approximating function 𝒉 ∗ (𝒔) Parameter 𝑠

Overview of main result Observe this structure, e.g., in integer programming algorithm configuration Algorithmic performance 𝒈 ∗ (𝒔) Simple approximating function 𝒉 ∗ (𝒔) Parameter 𝑠

Overview of main result: a dichotomy If approximation holds under the 𝑀 ! -norm: 𝑔 ∗ 𝑠 − 𝑕 ∗ (𝑠) sup " We provide strong guarantees is small Algorithmic performance 𝒈 ∗ (𝒔) Simple approximating function 𝒉 ∗ (𝒔) Parameter 𝑠

Overview of main result: a dichotomy If approximation holds under the 𝑀 ! -norm: ! ∫ 𝑔 ∗ 𝑠 − 𝑕 ∗ (𝑠) # 𝑒𝑠 We provide strong guarantees is small If approximation only holds under the 𝑀 " -norm for 𝑞 < ∞ : Not possible to provide strong guarantees in worst case Algorithmic performance 𝒈 ∗ (𝒔) Simple approximating function 𝒉 ∗ (𝒔) Parameter 𝑠

Model 𝒴 : Set of all inputs (e.g., integer programs) ℝ # : Set of all parameter settings (e.g., CPLEX parameters) Standard assumption: Unknown distribution 𝒠 over inputs E.g., represents scheduling problem airline solves day-to-day

“Algorithmic performance” 𝒔 𝑦 = utility of algorithm parameterized by 𝒔 ∈ ℝ # on input 𝑦 𝑔 E.g., runtime, solution quality, memory usage, … Assume 𝑔 𝒔 𝑦 ∈ −1,1 Can be generalized to 𝑔 𝒔 𝑦 ∈ −𝐼, 𝐼

Generalization bounds

Generalization bounds Key question: For any parameter setting 𝒔 , Does good avg utility on training set imply good exp utility? Formally: Given samples 𝑦 % , … , 𝑦 & ~𝒠 , for any 𝒔 , & 1 ? 𝑂 4 𝑔 𝒔 𝑦 ' − 𝔽 )~𝒠 𝑔 𝒔 𝑦 ≤ ? '(% Empirical average utility Typically, answer by bounding the intrinsic complexity of 𝒔 ∈ ℝ # ℱ = 𝑔 𝒔

Generalization bounds Key question: For any parameter setting 𝒔 , Does good avg utility on training set imply good exp utility? Formally: Given samples 𝑦 % , … , 𝑦 & ~𝒠 , for any 𝒔 , & 1 ? 𝑂 4 𝑔 𝒔 𝑦 ' − 𝔽 )~𝒠 𝑔 𝒔 𝑦 ≤ ? '(% Expected utility Typically, answer by bounding the intrinsic complexity of 𝒔 ∈ ℝ # ℱ = 𝑔 𝒔

Generalization bounds Key question: For any parameter setting 𝒔 , Does good avg utility on training set imply good exp utility? Formally: Given samples 𝑦 % , … , 𝑦 & ~𝒠 , for any 𝒔 , & 1 ? 𝑂 4 𝑔 𝒔 𝑦 ' − 𝔽 )~𝒠 𝑔 𝒔 𝑦 ≤ ? '(% Typically, answer by bounding the intrinsic complexity of 𝒔 ∈ ℝ # ℱ = 𝑔 𝒔

Generalization bounds 𝒔 : 𝒴 → ℝ 𝒔 ∈ ℝ # is gnarly Challenge: Class ℱ = 𝑔 E.g., in integer programming algorithm configuration: • Each domain element is an IP • Unclear how to plot or visualize functions 𝑔 𝒔 • No obvious notions of Lipschitzness or smoothness to rely on

Dual functions

Dual classes 𝒔 𝑦 = utility of algorithm parameterized by 𝒔 ∈ ℝ # on input 𝑦 𝑔 𝒔 : 𝒴 → ℝ 𝒔 ∈ ℝ # ℱ = 𝑔 “Primal” function class ∗ 𝒔 = utility as function of parameters 𝑔 ) ∗ 𝒔 = 𝑔 𝑔 𝒔 𝑦 ) ℱ ∗ = 𝑔 ∗ : ℝ # → ℝ 𝑦 ∈ 𝒴 “Dual” function class ) • Dual functions have simple, Euclidean domain • Often have ample structure can use to bound complexity of ℱ

Dual function approximability 𝒔 ∈ ℝ # ℱ = 𝑔 𝒔 Sets of functions mapping 𝒴 to ℝ 𝒔 ∈ ℝ # 𝒣 = 𝑕 𝒔 Dual class 𝒣 ∗ (𝜹, 𝒒) -approximates ℱ ∗ if for all 𝑦 ∈ 𝒴 , ! ∗ − 𝑕 ) ∗ 𝒔 − 𝑕 ) ∗ (𝒔) " 𝑒𝒔 ≤ 𝛿. ∗ 𝑔 " = A ℝ " 𝑔 ) ) ∗ 𝑔 $ ∗ 𝑕 $ 𝑠

Main result: Upper bound

Generalization upper bound 𝒔 ∈ ℝ # ℱ = 𝑔 𝒔 Sets of functions mapping 𝒴 to ℝ 𝒔 ∈ ℝ # 𝒣 = 𝑕 𝒔 With high probability over the draw of 𝒯~𝒠 & , for any 𝒔 , 1 1 1 ∗ − 𝑕 ) ! + H = E ∗ 𝑂 4 𝑔 𝒔 𝑦 − 𝔽 )~𝒠 𝑔 𝒔 𝑦 𝑃 𝑂 4 𝑔 ℜ 𝒯 𝒣 + ) 𝑂 )∈𝒯 )∈𝒯 Average utility over the training set

Generalization upper bound 𝒔 ∈ ℝ # ℱ = 𝑔 𝒔 Sets of functions mapping 𝒴 to ℝ 𝒔 ∈ ℝ # 𝒣 = 𝑕 𝒔 With high probability over the draw of 𝒯~𝒠 & , for any 𝒔 , 1 1 1 ∗ − 𝑕 ) ! + H = E ∗ 𝑂 4 𝑔 𝒔 𝑦 − 𝔽 )~𝒠 𝑔 𝒔 𝑦 𝑃 𝑂 4 𝑔 ℜ 𝒯 𝒣 + ) 𝑂 )∈𝒯 )∈𝒯 Expected utility

Generalization upper bound 𝒔 ∈ ℝ # ℱ = 𝑔 𝒔 Sets of functions mapping 𝒴 to ℝ 𝒔 ∈ ℝ # 𝒣 = 𝑕 𝒔 With high probability over the draw of 𝒯~𝒠 & , for any 𝒔 , 1 1 1 ∗ − 𝑕 ) ! + H = E ∗ 𝑂 4 𝑔 𝒔 𝑦 − 𝔽 )~𝒠 𝑔 𝒔 𝑦 𝑃 𝑂 4 𝑔 ℜ 𝒯 𝒣 + ) 𝑂 )∈𝒯 )∈𝒯 If 𝒣 not too complex and 𝒣 ∗ (𝛿, ∞) -approximates ℱ ∗ , Bound approaches 𝑷 𝜹 as 𝑶 → ∞ .

Main result: Lower bound

Lower bound For any 𝛿 and 𝑞 < ∞ , there exist function classes ℱ, 𝒣 such that: • Dual class 𝒣 ∗ (𝛿, 𝑞) -approximates ℱ ∗ • 𝒣 is very simple Rademacher complexity is 0 % • ℱ is very complex Rademacher complexity is 0 • Not possible to provide generalization bounds in worst case

Experiments

Experiments: Integer programming Tune integer programming solver parameters Also studied by Balcan, Dick, Sandholm, Vitercik [ICML’18] Generalization error Distributions over auction IPs Our bound Bound by BDS V ’18 [Leyton-Brown, Pearson, Shoham, EC’00] Number of training instances

Conclusion

Conclusion • Provided generalization bounds for algorithm configuration • Apply whenever utility as function of parameters is “approximately simple” • Connection between learnability and approximability is balanced on a knife-edge • If approximation holds under 𝑀 / -norm, can provide strong bounds • If holds under 𝑀 0 -norm for 𝑞 < ∞ , not possible to provide bounds • Experiments demonstrate strength of these bounds

Refined bounds for algorithm configuration: The knife-edge of dual - PowerPoint PPT Presentation

Refined bounds for algorithm configuration: The knife-edge of dual class approximability Nina Balcan, Tuomas Sandholm, Ellen Vitercik Algorithms typically come with many tunable parameters Significant impact on runtime, solution quality,

Cloud Cloud Cloud Cloud network Edge Edge Edge Edge as a Edge Edge Edge Edge Edge

Get the edge Get the edge Get the edge Get the edge Get the edge Get the edge Get the edge

Knife Sharpening Presented by J.D. Swanepoel Swan Knife Sharpene ners ??? Swan Knife

Knife Crime Knife Crime This lesson deal with the issue of knife crime in modern Britain. Read

CATALOG & POP CREATIVE ROUND 1: 08/02/2017 CATALOG 160 STEPS. 1 KNIFE. Every Case knife

Configuration management Configuration management Configuration management Configuration

map-D map-D data refined map-D data refined map-D A GPU Database for Real-Time Big Data

Augeas a configuration API Raphal Pinson Configuration Management Sitewide configuration

Lecture 2. Upper and lower bounds for subgaussian matrices The -net method refined 1 Random

Knife Crime and the consequences PCSO NATALIE GABRIEL PCSO EMMA STEVENS True or False? If you

Circuit Lower-bounds Lecture 24 Weak circuits are indeed weak 1 Circuit Lower-bounds 2

Edge-based Segmentation Transform Hough Edge Tracking Linking Edge Detection Canny Edge

CNC PINpad USA, December 2014 Configuration Configuration Description POS Dollar General

EPiServer och Configuration Management EPiServer och Configuration Management Configuration

Effect of Edge Preparation Methods on Effect of Edge Preparation Methods on Edge Retention Rate

Next Edge Theta Yield Fund Next Edge Capital Corp., January 2016 IMPORTANT NOTES The Next Edge

Generalizing G odels Constructible Universe: The HOD Dichotomy W. Hugh Woodin Harvard

CSC304 Lecture 20 Fair Division 3: Leximin Allocation (computational resources, matching with

A Dichotomy for Homomorphism-Closed Queries on Probabilistic Graphs Antoine Amarilli 1 and smail

CSP dichotomy for special oriented trees Jakub Bul n Department of Algebra, Charles

Exploring the FRI/FRII radio dichotomy with the Fermi satellite Fermi and Jansky: Our Evolving

Topological embeddability between functions. Rapha el Carroy Kurt G odel Research Center

A Holant Dichotomy: Is the FKT Algorithm Universal? Jin-Yi Cai 1 , Zhiguo Fu 2 , Heng Guo 1 , and

Autonomous Differential Equations Bernd Schr oder logo1 Bernd Schr oder Louisiana Tech

Refined bounds for algorithm configuration: The knife-edge of dual - PowerPoint PPT Presentation

Refined bounds for algorithm configuration: The knife-edge of dual class approximability Nina Balcan, Tuomas Sandholm, Ellen Vitercik Algorithms typically come with many tunable parameters Significant impact on runtime, solution quality,

Cloud Cloud Cloud Cloud network Edge Edge Edge Edge as a Edge Edge Edge Edge Edge

Get the edge Get the edge Get the edge Get the edge Get the edge Get the edge Get the edge

Knife Sharpening Presented by J.D. Swanepoel Swan Knife Sharpene ners ??? Swan Knife

Knife Crime Knife Crime This lesson deal with the issue of knife crime in modern Britain. Read

CATALOG &amp; POP CREATIVE ROUND 1: 08/02/2017 CATALOG 160 STEPS. 1 KNIFE. Every Case knife

Configuration management Configuration management Configuration management Configuration

map-D map-D data refined map-D data refined map-D A GPU Database for Real-Time Big Data

Augeas a configuration API Raphal Pinson Configuration Management Sitewide configuration

Lecture 2. Upper and lower bounds for subgaussian matrices The -net method refined 1 Random

Knife Crime and the consequences PCSO NATALIE GABRIEL PCSO EMMA STEVENS True or False? If you

Circuit Lower-bounds Lecture 24 Weak circuits are indeed weak 1 Circuit Lower-bounds 2

Edge-based Segmentation Transform Hough Edge Tracking Linking Edge Detection Canny Edge

CNC PINpad USA, December 2014 Configuration Configuration Description POS Dollar General

EPiServer och Configuration Management EPiServer och Configuration Management Configuration

Effect of Edge Preparation Methods on Effect of Edge Preparation Methods on Edge Retention Rate

Next Edge Theta Yield Fund Next Edge Capital Corp., January 2016 IMPORTANT NOTES The Next Edge

Generalizing G odels Constructible Universe: The HOD Dichotomy W. Hugh Woodin Harvard

CSC304 Lecture 20 Fair Division 3: Leximin Allocation (computational resources, matching with

A Dichotomy for Homomorphism-Closed Queries on Probabilistic Graphs Antoine Amarilli 1 and smail

CSP dichotomy for special oriented trees Jakub Bul n Department of Algebra, Charles

Exploring the FRI/FRII radio dichotomy with the Fermi satellite Fermi and Jansky: Our Evolving

Topological embeddability between functions. Rapha el Carroy Kurt G odel Research Center

A Holant Dichotomy: Is the FKT Algorithm Universal? Jin-Yi Cai 1 , Zhiguo Fu 2 , Heng Guo 1 , and

Autonomous Differential Equations Bernd Schr oder logo1 Bernd Schr oder Louisiana Tech

CATALOG & POP CREATIVE ROUND 1: 08/02/2017 CATALOG 160 STEPS. 1 KNIFE. Every Case knife