3 Improving Model Selection by Employing the Test Data Max Westphal, Werner Brannath University of Bremen, Germany Institute for Statistics Research Training Group π 3 mwestphal@uni-bremen.de https://github.com/maxwestphal/ ICML 2019, Long Beach June 11, 2018 1 / 9 Improving Model Selection Max Westphal
3 Train-Validation-Test Split X ˆ ˆ X ˆ ˆ f 1 Y ϑ V ˆ f 1 Y 1 A 1 X ˆ ˆ X ˆ ˆ f 2 Y ϑ V ˆ f 2 Y 2 A 2 ˆ ˆ ˆ ˆ X f 3 Y ϑ V ˆ X f 3 Y 3 A 3 ˆ ˆ ˆ ˆ X f 4 Y ϑ V ˆ X f 4 Y A 4 4 Training Validation argmax ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ A M X f M ˆ ˆ ϑ V ˆ X f M ˆ ˆ Y Y M Learning 2 / 9 Improving Model Selection Max Westphal
3 Train-Validation-Test Split X ˆ ˆ X ˆ ˆ f 1 Y ϑ V ˆ f 1 Y 1 A 1 X ˆ ˆ X ˆ ˆ f 2 Y ϑ V ˆ f 2 Y 2 A 2 ˆ ˆ ˆ ˆ X f 3 Y ϑ V ˆ X f 3 Y 3 A 3 ˆ ˆ ˆ ˆ X f 4 Y ϑ V ˆ X f 4 Y A 4 4 Training Validation argmax ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ A M X f M ˆ ˆ ϑ V ˆ X f M ˆ ˆ Y Y M Learning 2 / 9 Improving Model Selection Max Westphal
3 Train-Validation-Test Split ˆ ˆ ˆ ˆ X f 1 Y ϑ V ˆ X f 1 Y 1 A 1 ˆ ˆ ˆ ˆ X f 2 Y ϑ V ˆ X f 2 Y 2 A 2 ˆ ˆ ˆ ˆ X f 3 Y ˆ ϑ V X f 3 Y 3 A 3 ˆ ˆ ˆ ˆ X f 4 Y ϑ V ˆ X f 4 Y A 4 4 Training Validation argmax ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ A M ˆ ˆ X f M ˆ ˆ ϑ V X f M ˆ Y Y M Learning 2 / 9 Improving Model Selection Max Westphal
3 Train-Validation-Test Split Particularly in regulated environments, we need a reliable performance assessment before implementing a prediction model in practice. Example: disease diagnosis / prognosis based on clinical data Usually recommended strategy: Evaluate a single final model on independent test data. 3 / 9 Improving Model Selection Max Westphal
3 Train-Validation-Test Split ˆ X f 1 Y ˆ X f 2 ˆ ˆ Y ˆ ˆ X f 3 Y + + ϕ 4 CI 4 ˆ X f 4 ˆ ϑ E ˆ Y 4 Test ⋅ ⋅ ⋅ ˆ X f M ˆ Y Evaluation 4 / 9 Improving Model Selection Max Westphal
3 Train-Validation-Test Split Particularly in regulated environments, we need a reliable performance assessment before implementing a prediction model in practice. Example: disease diagnosis / prognosis based on clinical data Usually recommended strategy: Evaluate a single final model on independent test data! Easy-to-use strategy, allowing for a reliable performance assessment and simple inference. However, we have no way to fix a bad model selection after having observed the test data. 5 / 9 Improving Model Selection Max Westphal
3 Simultaneous Model Evaluation ˆ ˆ ˆ ˆ X f 1 Y ϑ V ˆ X f 1 Y 1 A 1 ˆ ˆ ˆ ˆ X f 2 Y ϑ V ˆ X f 2 Y 2 A 2 ˆ ˆ ˆ ˆ X f 3 Y ˆ ϑ V X f 3 Y 3 A 3 ˆ ˆ ˆ ˆ X f 4 Y ϑ V ˆ X f 4 Y A 4 4 Training Validation selection rule ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ A M ˆ ˆ X f M ˆ ˆ ϑ V X f M ˆ Y Y M Learning 6 / 9 Improving Model Selection Max Westphal
3 Simultaneous Model Evaluation ˆ X f 1 Y ˆ + + ϕ 2 CI 2 ˆ X f 2 ˆ ϑ E ˆ Y 2 X f 3 ˆ ˆ Y + + ϕ 4 CI 4 X ˆ ˆ ˆ f 4 Y ϑ E 4 Test argmax ˆ X f 2 Y ˆ ⋅ ⋅ ⋅ + + ϕ M CI M ˆ ˆ ˆ X f M Y ϑ E M Evaluation 6 / 9 Improving Model Selection Max Westphal
3 Simulation study ϕ 1 X ˆ ˆ X ˆ ˆ f 1 Y ϑ V ˆ f 1 Y 1 A 1 + + ϕ 2 CI 2 X f 2 ˆ Y ˆ ϑ V ˆ X f 2 ˆ ˆ Y ϑ E ˆ 2 2 A 2 ϕ 3 ˆ ˆ X f 3 Y ˆ ˆ ϑ V X f 3 Y ˆ 3 A 3 + + CI 4 ϕ 4 X f 4 ˆ ˆ ϑ V ˆ X f 4 ˆ ˆ ϑ E ˆ Y Y A 4 4 4 Training Validation Test selection argmax X ˆ ˆ f 2 Y rule ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ A M + + CI M ϕ M X f M ˆ ˆ ϑ V ˆ X f M ˆ ˆ ϑ E ˆ Y Y M M Learning Evaluation Idea: simulate data and train, select and evaluate binary classifiers in different scenarios 24 artificial classification tasks 72,000 replications of complete ML pipeline 28,800,000 distinct models (EN, CART, SVM, XGB) Goal: comparison of different evaluation strategies default : best validation model only within 1 SE : all models within 1 SE of best validation model 7 / 9 Improving Model Selection Max Westphal
Simulation Results
Simulation Results
3 Conclusions When in doubt, delay the final model choice to the test data. Improvements in model performance and probability to correctly identify a good model in all investigated scenarios. Adjustment for multiple comparisons via approximate parametric procedure taking into account model similarity (maxT-approach). Questions & feedback welcomed! mwestphal@uni-bremen.de https://github.com/maxwestphal/ POSTER #123 (Pacific Ballroom) 9 / 9 Improving Model Selection Max Westphal
Recommend
More recommend