A Hands-On Introduction to Automatic Machine Learning Lars Kotthofg University of Wyoming larsko@uwyo.edu AutoML Workshop, 28 August 2018, Nanjing 1
Machine Learning Data Machine Learning Predictions 2
Automatic Machine Learning Hyperparameter Tuning Data Machine Learning Predictions 3
Grid and Random Search Bergstra, James, and Yoshua Bengio. “Random Search for Hyper-Parameter Optimization.” J. Mach. Learn. Res. 13, no. 1 (February 2012): 281–305. 4 ▷ evaluate certain points in parameter space
Local Search quality achieved 5 ▷ start with random confjguration ▷ change a single parameter (local search step) ▷ if better, keep the change, else revert ▷ repeat, stop when resources exhausted or desired solution ▷ restart occasionally with new random confjgurations
Local Search Example graphics by Holger Hoos 6 (Initialisation)
Local Search Example graphics by Holger Hoos 7 (Initialisation)
Local Search Example graphics by Holger Hoos 8 (Local Search)
Local Search Example graphics by Holger Hoos 9 (Local Search)
Local Search Example graphics by Holger Hoos 10 (Perturbation)
Local Search Example graphics by Holger Hoos 11 (Local Search)
Local Search Example graphics by Holger Hoos 12 (Local Search)
Local Search Example graphics by Holger Hoos 13 (Local Search)
Local Search Example graphics by Holger Hoos 14 ? Selection (using Acceptance Criterion)
Model-Based Search results quality achieved 15 ▷ evaluate small number of confjgurations ▷ build model of parameter-performance surface based on the ▷ use model to predict where to evaluate next ▷ repeat, stop when resources exhausted or desired solution ▷ allows targeted exploration of promising confjgurations
Model-Based Search Example Bischl, Bernd, Jakob Richter, Jakob Bossek, Daniel Horn, Janek Thomas, http://arxiv.org/abs/1703.03373. Optimization of Expensive Black-Box Functions,” March 9, 2017. and Michel Lang. “MlrMBO: A Modular Framework for Model-Based 16 Iter = 1, Gap = 1.9909e−01 0.8 ● ● y 0.4 ● type ● init ● prop 0.0 ● type 0.025 y 0.020 yhat ei 0.015 ei 0.010 0.005 0.000 −1.0 −0.5 0.0 0.5 1.0 x
Model-Based Search Example Bischl, Bernd, Jakob Richter, Jakob Bossek, Daniel Horn, Janek Thomas, http://arxiv.org/abs/1703.03373. Optimization of Expensive Black-Box Functions,” March 9, 2017. and Michel Lang. “MlrMBO: A Modular Framework for Model-Based 17 Iter = 2, Gap = 1.9909e−01 0.8 ● ● y type 0.4 ● ● init ● prop 0.0 ● seq type 0.03 y yhat 0.02 ei ei 0.01 0.00 −1.0 −0.5 0.0 0.5 1.0 x
Model-Based Search Example Bischl, Bernd, Jakob Richter, Jakob Bossek, Daniel Horn, Janek Thomas, http://arxiv.org/abs/1703.03373. Optimization of Expensive Black-Box Functions,” March 9, 2017. and Michel Lang. “MlrMBO: A Modular Framework for Model-Based 18 Iter = 3, Gap = 1.9909e−01 0.8 ● ● y type 0.4 ● ● init ● prop 0.0 seq ● type 0.006 y yhat 0.004 ei ei 0.002 0.000 −1.0 −0.5 0.0 0.5 1.0 x
Model-Based Search Example Bischl, Bernd, Jakob Richter, Jakob Bossek, Daniel Horn, Janek Thomas, http://arxiv.org/abs/1703.03373. Optimization of Expensive Black-Box Functions,” March 9, 2017. and Michel Lang. “MlrMBO: A Modular Framework for Model-Based 19 Iter = 4, Gap = 1.9992e−01 0.8 ● ● y type 0.4 ● ● init ● prop 0.0 seq ● 8e−04 type y 6e−04 yhat ei ei 4e−04 2e−04 0e+00 −1.0 −0.5 0.0 0.5 1.0 x
Model-Based Search Example Bischl, Bernd, Jakob Richter, Jakob Bossek, Daniel Horn, Janek Thomas, http://arxiv.org/abs/1703.03373. Optimization of Expensive Black-Box Functions,” March 9, 2017. and Michel Lang. “MlrMBO: A Modular Framework for Model-Based 20 Iter = 5, Gap = 1.9992e−01 0.8 ● ● y type 0.4 ● ● init ● prop 0.0 seq ● type y 2e−04 yhat ei ei 1e−04 0e+00 −1.0 −0.5 0.0 0.5 1.0 x
Model-Based Search Example Bischl, Bernd, Jakob Richter, Jakob Bossek, Daniel Horn, Janek Thomas, http://arxiv.org/abs/1703.03373. Optimization of Expensive Black-Box Functions,” March 9, 2017. and Michel Lang. “MlrMBO: A Modular Framework for Model-Based 21 Iter = 6, Gap = 1.9996e−01 0.8 ● ● y type 0.4 ● ● init ● prop 0.0 seq ● 0.00012 type y 0.00009 yhat ei 0.00006 ei 0.00003 0.00000 −1.0 −0.5 0.0 0.5 1.0 x
Model-Based Search Example Bischl, Bernd, Jakob Richter, Jakob Bossek, Daniel Horn, Janek Thomas, http://arxiv.org/abs/1703.03373. Optimization of Expensive Black-Box Functions,” March 9, 2017. and Michel Lang. “MlrMBO: A Modular Framework for Model-Based 22 Iter = 7, Gap = 2.0000e−01 0.8 ● ● y type 0.4 ● ● init ● prop 0.0 seq ● 5e−05 type 4e−05 y yhat 3e−05 ei ei 2e−05 1e−05 0e+00 −1.0 −0.5 0.0 0.5 1.0 x
Model-Based Search Example Bischl, Bernd, Jakob Richter, Jakob Bossek, Daniel Horn, Janek Thomas, http://arxiv.org/abs/1703.03373. Optimization of Expensive Black-Box Functions,” March 9, 2017. and Michel Lang. “MlrMBO: A Modular Framework for Model-Based 23 Iter = 8, Gap = 2.0000e−01 0.8 ● ● y type 0.4 ● ● init ● prop 0.0 seq ● type 2.0e−05 y 1.5e−05 yhat ei ei 1.0e−05 5.0e−06 0.0e+00 −1.0 −0.5 0.0 0.5 1.0 x
Model-Based Search Example Bischl, Bernd, Jakob Richter, Jakob Bossek, Daniel Horn, Janek Thomas, http://arxiv.org/abs/1703.03373. Optimization of Expensive Black-Box Functions,” March 9, 2017. and Michel Lang. “MlrMBO: A Modular Framework for Model-Based 24 Iter = 9, Gap = 2.0000e−01 0.8 ● ● y type 0.4 ● ● init ● prop 0.0 seq ● 1.0e−05 type y 7.5e−06 yhat ei 5.0e−06 ei 2.5e−06 0.0e+00 −1.0 −0.5 0.0 0.5 1.0 x
Model-Based Search Example Bischl, Bernd, Jakob Richter, Jakob Bossek, Daniel Horn, Janek Thomas, http://arxiv.org/abs/1703.03373. Optimization of Expensive Black-Box Functions,” March 9, 2017. and Michel Lang. “MlrMBO: A Modular Framework for Model-Based 25 Iter = 10, Gap = 2.0000e−01 0.8 ● ● y type 0.4 ● ● init ● prop 0.0 seq ● type 4e−07 y yhat 3e−07 ei ei 2e−07 1e−07 0e+00 −1.0 −0.5 0.0 0.5 1.0 x
Problems 26 ▷ How good are we really? ▷ How much of it is just random chance? ▷ Can we do better?
Underlying Issues space 27 ▷ true performance landscape unknown ▷ resources allow to explore only tiny part of hyperparameter ▷ results inherently stochastic
Potential Solutions 28 ▷ better-understood benchmarks ▷ more comparisons ▷ more runs with difgerent random seed
Two-Slide MBO ML # http://www.cs.uwyo.edu/~larsko/mbo.py [0. 1, 1e-06,0.9333333333333333]] # [1000000000.0, 1e-07, 1.0], # # [[1.0, 0.1, 1.0], for pars in random.sample(param_grid, initial_samples): return np.median(cross_val_score(clf, iris.data, iris.target, cv = 10)) 29 def est_acc(pars): random.seed(1) evals = 10 initial_samples = 3 # [{'C': 0.01, 'gamma': 1e-09}, {'C': 0.01, 'gamma': 1e-08}...] for y in params['gamma'] ] 'gamma': np.logspace(-9, 3, 13) } params = { 'C': np.logspace(-2, 10, 13), param_grid = [ { 'C': x, 'gamma': y } for x in params['C'] clf = svm.SVC(**pars) data = [] acc = est_acc(pars) data += [ list(pars.values()) + [ acc ] ]
Two-Slide MBO ML regr = RandomForestRegressor(random_state = 0) for evals in range(0, evals): regr.fit(df[:,0:2], df[:,2]) print ("{}: best predicted {} for {}, actual {}" .format(evals, round(preds[i], 2), param_grid[i], round(acc, 2))) i = np.array(data)[:,2].argmax() print ("Best accuracy ({}) for parameters {}".format(data[i][2], data[i][0:2])) 30 df = np.array(data) preds = regr.predict([ list(pars.values()) for pars in param_grid ]) i = preds.argmax() acc = est_acc(param_grid[i]) data += [ list(param_grid[i].values()) + [ acc ] ]
Two-Slide MBO ML 0: best predicted 0.99 for {'C': 1.0, 'gamma': 1e-09}, actual 0.93 1: best predicted 0.99 for {'C': 1000000000.0, 'gamma': 1e-09}, actual 0.93 2: best predicted 0.99 for {'C': 1000000000.0, 'gamma': 0.1}, actual 0.93 3: best predicted 0.97 for {'C': 1.0, 'gamma': 0.1}, actual 1.0 4: best predicted 0.99 for {'C': 1.0, 'gamma': 0.1}, actual 1.0 5: best predicted 1.0 for {'C': 1.0, 'gamma': 0.1}, actual 1.0 6: best predicted 1.0 for {'C': 1.0, 'gamma': 0.1}, actual 1.0 7: best predicted 1.0 for {'C': 1.0, 'gamma': 0.1}, actual 1.0 8: best predicted 1.0 for {'C': 0.01, 'gamma': 0.1}, actual 0.93 9: best predicted 1.0 for {'C': 1.0, 'gamma': 0.1}, actual 1.0 Best accuracy (1.0) for parameters [1.0, 0.1] 31
Tools and Resources TPOT https://github.com/EpistasisLab/tpot mlrMBO https://github.com/mlr-org/mlrMBO SMAC http://www.cs.ubc.ca/labs/beta/Projects/SMAC/ Spearmint https://github.com/HIPS/Spearmint TPE https://jaberg.github.io/hyperopt/ Auto-WEKA http://www.cs.ubc.ca/labs/beta/Projects/autoweka/ Auto-sklearn https://github.com/automl/auto-sklearn Available soon: edited book on automatic machine learning https://www.automl.org/book/ (Frank Hutter, Lars Kotthofg, Joaquin Vanschoren) 32 iRace http://iridia.ulb.ac.be/irace/
I’m hiring! Several funded graduate/postdoc positions available. 33
Recommend
More recommend