AI-Augmented Algorithms How I Learned to Stop Worrying and Love Choice Lars Kotthofg University of Wyoming larsko@uwyo.edu Boulder, 16 January 2019
Outline 2 ▷ Big Picture ▷ Motivation ▷ Choosing Algorithms ▷ Tuning Algorithms ▷ (NCAR-relevant) Applications ▷ Outlook and Resources
Big Picture techniques intelligently – automatically https://xkcd.com/720/ 3 ▷ advance the state of the art through meta-algorithmic ▷ rather than inventing new things, use existing things more ▷ invent new things through combinations of existing things
Big Picture techniques intelligently – automatically https://xkcd.com/720/ 3 ▷ advance the state of the art through meta-algorithmic ▷ rather than inventing new things, use existing things more ▷ invent new things through combinations of existing things
Motivation – What Difgerence Does It Make? 4
Prominent Application Fréchette, Alexandre, Neil Newman, Kevin Leyton-Brown. “Solving the Station Packing Problem.” In Association for the Advancement of Artifjcial Intelligence (AAAI), 2016. 5
Performance Difgerences Hurley, Barry, Lars Kotthofg, Yuri Malitsky, and Barry O’Sullivan. “Proteus: A Hierarchical Portfolio of Solvers and Transformations.” In CPAIOR, 2014. 6 1000 100 Virtual Best SAT 10 1 0.1 0.1 1 10 100 1000 Virtual Best CSP
Leveraging the Difgerences Xu, Lin, Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown. “SATzilla: Portfolio-Based Algorithm Selection for SAT.” J. Artif. Intell. Res. (JAIR) 32 (2008): 565–606. 7
Performance Improvements Hutter, Frank, Domagoj Babic, Holger H. Hoos, and Alan J. Hu. 27–34. Washington, DC, USA: IEEE Computer Society, 2007. FMCAD ’07: Proceedings of the Formal Methods in Computer Aided Design, “Boosting Verifjcation by Automatic Tuning of Decision Procedures.” In 8 4 10 SPEAR, optimized for SWV (s) 3 10 2 10 1 10 0 10 −1 10 −2 10 −2 10 −1 10 0 10 1 10 2 10 3 10 4 10 SPEAR, original default (s)
Common Theme Performance models of black-box processes approximate model based on results of evaluations of the underlying process can be helpful) through interrogation of the model 9 ▷ also called surrogate models ▷ substitute expensive underlying process with cheap ▷ build approximate model using machine learning techniques ▷ no knowledge of what the underlying process is required (but ▷ may facilitate better understanding of the underlying process
Choosing Algorithms 10
Algorithm Selection Given a problem, choose the best algorithm to solve it. Rice, John R. “The Algorithm Selection Problem.” Advances in Computers 15 (1976): 65–118. 11
Algorithm Selection . Extraction Feature Feature Extraction . . . Instance 6: Algorithm 3 Instance 5: Algorithm 3 Instance 4: Algorithm 2 . . Instance 6 Portfolio Instance 5 Instance 4 Performance Model Algorithm Selection Instance 3 Instance 1 Instance 2 Training Instances Algorithm 3 Algorithm 1 Algorithm 2 12
Algorithm Portfolios algorithms across several securities performing poorly other algorithms known to have good performance Huberman, Bernardo A., Rajan M. Lukose, and Tad Hogg. “An Economics Approach to Hard Computational Problems.” Science 275, no. 5296 (1997): 51–54. doi:10.1126/science.275.5296.51. 13 ▷ instead of a single algorithm, use several complementary ▷ idea from Economics – minimise risk by spreading it out ▷ same for computational problems – minimise risk of algorithm ▷ in practice often constructed from competition winners or
Algorithms “algorithm” used in a very loose sense 14 ▷ algorithms ▷ heuristics ▷ machine learning models ▷ software systems ▷ machines ▷ …
Parallel Portfolios Why not simply run all algorithms in parallel? 15 ▷ not enough resources may be available/waste of resources ▷ algorithms may be parallelized themselves ▷ memory/cache contention
Building an Algorithm Selection System algorithms in portfolio on a number of instances 16 ▷ requires algorithms with complementary performance ▷ most approaches rely on machine learning ▷ train with representative data, i.e. performance of all ▷ evaluate performance on separate set of instances ▷ potentially large amount of prep work
Key Components of an Algorithm Selection System optional: extraction time) 17 ▷ feature extraction ▷ performance model ▷ prediction-based selector/scheduler ▷ presolver ▷ secondary/hierarchical models and predictors (e.g. for feature
Types of Performance Models Instance 1 A3: 2 votes Pairwise Regression Models A1 - A2 0 A1 - A3 0 … A1: -1.3 A2: 0.4 A3: 1.7 Instance 2 A1: 1 vote Instance 3 . . . Instance 1: Algorithm 2 Instance 2: Algorithm 1 Instance 3: Algorithm 3 . . . A2: 0 votes … Regression Models A2 A1 A2 A3 A1: 1.2 A2: 4.5 A3: 3.9 Classifjcation Model A1 A3 A1 A1 A3 Pairwise Classifjcation Models A1 vs. A2 A1 A2 A1 A1 A1 vs. A3 A1 A1 A3 18
Tuning Algorithms 19
Algorithm Confjguration Given a (set of) problem(s), fjnd the best parameter confjguration. 20
Parameters? resolution 21 ▷ anything you can change that makes sense to change ▷ e.g. search heuristic, optimization level, computational ▷ not random seed, whether to enable debugging, etc. ▷ some will afgect performance, others will have no efgect at all
Automated Algorithm Confjguration black-box process 22 ▷ no background knowledge on parameters or algorithm – ▷ as little manual intervention as possible ▷ failures are handled appropriately ▷ resources are not wasted ▷ can run unattended on large-scale compute infrastructure
Algorithm Confjguration Frank Hutter and Marius Lindauer, “Algorithm Confjguration: A Hands on Tutorial”, AAAI 2016 23
General Approach workings, build surrogate model based on this data 24 ▷ evaluate algorithm as black-box function ▷ observe efgect of parameters without knowing the inner ▷ decide where to evaluate next, based on surrogate model ▷ repeat
When are we done? parameter space solution (with fjnite time) 25 ▷ most approaches incomplete, i.e. do not exhaustively explore ▷ cannot prove optimality, not guaranteed to fjnd optimal ▷ performance highly dependent on confjguration space � How do we know when to stop?
Time Budget How much time/how many function evaluations? 26 ▷ too much � wasted resources ▷ too little � suboptimal result ▷ use statistical tests ▷ evaluate on parts of the instance set ▷ for runtime: adaptive capping ▷ in general: whatever resources you can reasonably invest
Grid and Random Search Bergstra, James, and Yoshua Bengio. “Random Search for Hyper-Parameter Optimization.” J. Mach. Learn. Res. 13, no. 1 (February 2012): 281–305. 27 ▷ evaluate certain points in parameter space
Model-Based Search results Hutter, Frank, Holger H. Hoos, and Kevin Leyton-Brown. “Sequential Model-Based Optimization for General Algorithm Confjguration.” In LION 5, 507–23, 2011. 28 ▷ evaluate small number of confjgurations ▷ build model of parameter-performance surface based on the ▷ use model to predict where to evaluate next ▷ repeat ▷ allows targeted exploration of new confjgurations ▷ can take instance features into account like algorithm selection
Model-Based Search Example 29 Iter = 1, Gap = 1.9909e−01 0.8 ● ● y 0.4 ● type ● init ● prop 0.0 ● type 0.025 y 0.020 yhat ei 0.015 ei 0.010 0.005 0.000 −1.0 −0.5 0.0 0.5 1.0 x
Model-Based Search Example 30 Iter = 2, Gap = 1.9909e−01 0.8 ● ● y type 0.4 ● ● init ● prop 0.0 ● seq type 0.03 y yhat 0.02 ei ei 0.01 0.00 −1.0 −0.5 0.0 0.5 1.0 x
Model-Based Search Example 31 Iter = 3, Gap = 1.9909e−01 0.8 ● ● y type 0.4 ● ● init ● prop 0.0 seq ● type 0.006 y yhat 0.004 ei ei 0.002 0.000 −1.0 −0.5 0.0 0.5 1.0 x
Model-Based Search Example 32 Iter = 4, Gap = 1.9992e−01 0.8 ● ● y type 0.4 ● ● init ● prop 0.0 seq ● type 8e−04 y 6e−04 yhat ei ei 4e−04 2e−04 0e+00 −1.0 −0.5 0.0 0.5 1.0 x
Model-Based Search Example 33 Iter = 5, Gap = 1.9992e−01 0.8 ● ● y type 0.4 ● ● init ● prop 0.0 seq ● type y 2e−04 yhat ei ei 1e−04 0e+00 −1.0 −0.5 0.0 0.5 1.0 x
Model-Based Search Example 34 Iter = 6, Gap = 1.9996e−01 0.8 ● ● y type 0.4 ● ● init ● prop 0.0 seq ● 0.00012 type y 0.00009 yhat ei 0.00006 ei 0.00003 0.00000 −1.0 −0.5 0.0 0.5 1.0 x
Model-Based Search Example 35 Iter = 7, Gap = 2.0000e−01 0.8 ● ● y type 0.4 ● ● init ● prop 0.0 seq ● 5e−05 type y 4e−05 yhat 3e−05 ei ei 2e−05 1e−05 0e+00 −1.0 −0.5 0.0 0.5 1.0 x
Model-Based Search Example 36 Iter = 8, Gap = 2.0000e−01 0.8 ● ● y type 0.4 ● ● init ● prop 0.0 seq ● type 2.0e−05 y 1.5e−05 yhat ei ei 1.0e−05 5.0e−06 0.0e+00 −1.0 −0.5 0.0 0.5 1.0 x
Model-Based Search Example 37 Iter = 9, Gap = 2.0000e−01 0.8 ● ● y type 0.4 ● ● init ● prop 0.0 seq ● 1.0e−05 type y 7.5e−06 yhat ei 5.0e−06 ei 2.5e−06 0.0e+00 −1.0 −0.5 0.0 0.5 1.0 x
Recommend
More recommend