A quantile-based approach for hyperparameter transfer learning David Salinas 2 Huibin Shen 1 Valerio Perrone 1 1 Amazon Research 2 NAVER LABS Europe, work done at Amazon December 11, 2019 David Salinas, Huibin Shen, Valerio Perrone A quantile-based approach for hyperparameter transfer learning (Amazon Berlin) December 11, 2019 1 / 8
Transfer learning setting i } n l Assume many HP evaluations { x l i , y l i =0 available for n l datasets i ∈ R d hyperparameter, y l x l i ∈ R objective to be minimized Can we use it to speed up the tuning of a new dataset? David Salinas, Huibin Shen, Valerio Perrone A quantile-based approach for hyperparameter transfer learning (Amazon Berlin) December 11, 2019 2 / 8
Transfer learning Difficulties: Scales of objectives y l i may vary significantly across tasks Noise may not be Gaussian Many observations: hard to apply (approximate) GP dataset electricity exchange-rate 10 1 m4-Daily m4-Hourly m4-Monthly m4-Quarterly 10 0 value m4-Weekly m4-Yearly solar traffic 10 1 wiki-rolling 10 2 1.0 1.5 2.0 2.5 3.0 3.5 log number gradient update David Salinas, Huibin Shen, Valerio Perrone A quantile-based approach for hyperparameter transfer learning (Amazon Berlin) December 11, 2019 3 / 8
Gaussian Copula transform If only every y l was Gaussian... Apply change of variable ψ = Φ − 1 ◦ F Φ Gaussian CDF, F is the marginal CDFs (approximated with empirical CDF) z l = ψ ( y l ) All z l becomes centered Gaussian! z l ∈ N (0 , 1) David Salinas, Huibin Shen, Valerio Perrone A quantile-based approach for hyperparameter transfer learning (Amazon Berlin) December 11, 2019 4 / 8
Transfer learning Parametric Prior Regress z ( x ) ≈ N ( µ θ ( x ) , σ θ ( x )) Parameters θ are learned with MLE on evaluations Joint-learning as θ tied across tasks (only possible because z have comparable scales across tasks l) Two HPO strategies Thompson sampling with N ( µ θ ( x ) , σ θ ( x )) Gaussian Copula Process with the prior N ( µ θ ( x ) , σ θ ( x )) David Salinas, Huibin Shen, Valerio Perrone A quantile-based approach for hyperparameter transfer learning (Amazon Berlin) December 11, 2019 5 / 8
Results Evaluate on 3 blackboxes with precomputed evaluations (MLP [Klein 18], DeepAR [Salinas 17], XGboost) blackbox # datasets # hyperparameters # evaluations objectives DeepAR 11 6 ∼ 220 quantile loss, time FCNET 4 9 62208 MSE, time XGBoost 9 9 5000 1-AUC David Salinas, Huibin Shen, Valerio Perrone A quantile-based approach for hyperparameter transfer learning (Amazon Berlin) December 11, 2019 6 / 8
Results fcnet DeepAR 10 1 Normalized distance to the minimum Normalized distance to the minimum 10 3 10 2 10 3 10 4 10 4 RS RS GP GP 10 5 ABLR ABLR WS-best WS-best auto-range-gp auto-range-gp 10 6 10 5 CTS CTS GCP GCP 10 7 20 40 60 80 100 20 40 60 80 100 Iteration Iteration xgboost RS Normalized distance to the minimum GP 10 1 ABLR WS-best auto-range-gp CTS GCP 2 10 20 40 60 80 100 Iteration David Salinas, Huibin Shen, Valerio Perrone A quantile-based approach for hyperparameter transfer learning (Amazon Berlin) December 11, 2019 7 / 8
Results Because every objectives are Gaussian centered, we can easily combined them! Multi-objective: optimize accuracy/time trade-off with z error ( x ) + z runtime ( x ) More at our poster! David Salinas, Huibin Shen, Valerio Perrone A quantile-based approach for hyperparameter transfer learning (Amazon Berlin) December 11, 2019 8 / 8
Recommend
More recommend