Improving Model Selection by Employing the Test Data Max Westphal, - PowerPoint PPT Presentation

3 Improving Model Selection by Employing the Test Data Max Westphal, Werner Brannath University of Bremen, Germany Institute for Statistics Research Training Group π 3 mwestphal@uni-bremen.de https://github.com/maxwestphal/ ICML 2019, Long Beach June 11, 2018 1 / 9 Improving Model Selection Max Westphal

3 Train-Validation-Test Split X ˆ ˆ X ˆ ˆ f 1 Y ϑ V ˆ f 1 Y 1 A 1 X ˆ ˆ X ˆ ˆ f 2 Y ϑ V ˆ f 2 Y 2 A 2 ˆ ˆ ˆ ˆ X f 3 Y ϑ V ˆ X f 3 Y 3 A 3 ˆ ˆ ˆ ˆ X f 4 Y ϑ V ˆ X f 4 Y A 4 4 Training Validation argmax ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ A M X f M ˆ ˆ ϑ V ˆ X f M ˆ ˆ Y Y M Learning 2 / 9 Improving Model Selection Max Westphal

3 Train-Validation-Test Split ˆ ˆ ˆ ˆ X f 1 Y ϑ V ˆ X f 1 Y 1 A 1 ˆ ˆ ˆ ˆ X f 2 Y ϑ V ˆ X f 2 Y 2 A 2 ˆ ˆ ˆ ˆ X f 3 Y ˆ ϑ V X f 3 Y 3 A 3 ˆ ˆ ˆ ˆ X f 4 Y ϑ V ˆ X f 4 Y A 4 4 Training Validation argmax ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ A M ˆ ˆ X f M ˆ ˆ ϑ V X f M ˆ Y Y M Learning 2 / 9 Improving Model Selection Max Westphal

3 Train-Validation-Test Split Particularly in regulated environments, we need a reliable performance assessment before implementing a prediction model in practice. Example: disease diagnosis / prognosis based on clinical data Usually recommended strategy: Evaluate a single final model on independent test data. 3 / 9 Improving Model Selection Max Westphal

3 Train-Validation-Test Split ˆ X f 1 Y ˆ X f 2 ˆ ˆ Y ˆ ˆ X f 3 Y + + ϕ 4 CI 4 ˆ X f 4 ˆ ϑ E ˆ Y 4 Test ⋅ ⋅ ⋅ ˆ X f M ˆ Y Evaluation 4 / 9 Improving Model Selection Max Westphal

3 Train-Validation-Test Split Particularly in regulated environments, we need a reliable performance assessment before implementing a prediction model in practice. Example: disease diagnosis / prognosis based on clinical data Usually recommended strategy: Evaluate a single final model on independent test data! Easy-to-use strategy, allowing for a reliable performance assessment and simple inference. However, we have no way to fix a bad model selection after having observed the test data. 5 / 9 Improving Model Selection Max Westphal

3 Simultaneous Model Evaluation ˆ ˆ ˆ ˆ X f 1 Y ϑ V ˆ X f 1 Y 1 A 1 ˆ ˆ ˆ ˆ X f 2 Y ϑ V ˆ X f 2 Y 2 A 2 ˆ ˆ ˆ ˆ X f 3 Y ˆ ϑ V X f 3 Y 3 A 3 ˆ ˆ ˆ ˆ X f 4 Y ϑ V ˆ X f 4 Y A 4 4 Training Validation selection rule ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ A M ˆ ˆ X f M ˆ ˆ ϑ V X f M ˆ Y Y M Learning 6 / 9 Improving Model Selection Max Westphal

3 Simultaneous Model Evaluation ˆ X f 1 Y ˆ + + ϕ 2 CI 2 ˆ X f 2 ˆ ϑ E ˆ Y 2 X f 3 ˆ ˆ Y + + ϕ 4 CI 4 X ˆ ˆ ˆ f 4 Y ϑ E 4 Test argmax ˆ X f 2 Y ˆ ⋅ ⋅ ⋅ + + ϕ M CI M ˆ ˆ ˆ X f M Y ϑ E M Evaluation 6 / 9 Improving Model Selection Max Westphal

3 Simulation study ϕ 1 X ˆ ˆ X ˆ ˆ f 1 Y ϑ V ˆ f 1 Y 1 A 1 + + ϕ 2 CI 2 X f 2 ˆ Y ˆ ϑ V ˆ X f 2 ˆ ˆ Y ϑ E ˆ 2 2 A 2 ϕ 3 ˆ ˆ X f 3 Y ˆ ˆ ϑ V X f 3 Y ˆ 3 A 3 + + CI 4 ϕ 4 X f 4 ˆ ˆ ϑ V ˆ X f 4 ˆ ˆ ϑ E ˆ Y Y A 4 4 4 Training Validation Test selection argmax X ˆ ˆ f 2 Y rule ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ A M + + CI M ϕ M X f M ˆ ˆ ϑ V ˆ X f M ˆ ˆ ϑ E ˆ Y Y M M Learning Evaluation Idea: simulate data and train, select and evaluate binary classifiers in different scenarios 24 artificial classification tasks 72,000 replications of complete ML pipeline 28,800,000 distinct models (EN, CART, SVM, XGB) Goal: comparison of different evaluation strategies default : best validation model only within 1 SE : all models within 1 SE of best validation model 7 / 9 Improving Model Selection Max Westphal

Simulation Results

3 Conclusions When in doubt, delay the final model choice to the test data. Improvements in model performance and probability to correctly identify a good model in all investigated scenarios. Adjustment for multiple comparisons via approximate parametric procedure taking into account model similarity (maxT-approach). Questions & feedback welcomed! mwestphal@uni-bremen.de https://github.com/maxwestphal/ POSTER #123 (Pacific Ballroom) 9 / 9 Improving Model Selection Max Westphal

Improving Model Selection by Employing the Test Data Max Westphal, - PowerPoint PPT Presentation

3 Improving Model Selection by Employing the Test Data Max Westphal, Werner Brannath University of Bremen, Germany Institute for Statistics Research Training Group 3 mwestphal@uni-bremen.de https://github.com/maxwestphal/ ICML 2019, Long

Model-Based Testing (ISTQB Chapter 4) Arie van Deursen 1 4.1 ISTQB Test Design Test Scripts

Employing Dynamic Employing Dynamic Transparency for 3D Occlusion Transparency for 3D Occlusion

Secure Interoperation in Multidomain Environments Employing UCON Policies Environments Employing

STARTS: STARTS: STARTS: STARTS: STAtic STAtic Regression Test Selection Regression Test

TEST AUTOMATION AT BMAR BMAR TEST TEAM Test Automation Planning 1. Selection Of Test

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?

STAT 213 Model Selection II Colin Reimer Dawson Oberlin College March 30, 2018 1 / 13 Outline

200511316 200511316 Test plan Test design specification g p

FLSA DUTIES TEST Exemption/Duties Test Types of Duties/Exemption Test Executive Exemption

Engineering Best Practices Test, test, test, and test some more; test as you go Start from a

Test automation Building automatically repeatable test suites Test automation n Test automation

Nehemiah Prays Nehemiah 1-2 Here is some test text Here is some test text Here is some test

SECONDHAND SELECTION Sales Price - 275,000.00 EU SECONDHAND SELECTION INTERNAL VIEWS SECONDHAND

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

SELECTION Deterministic Stochastic Proportionate selection: Roulette Wheel Selection

Selection 2 Selection Selection given a set of (distinct) elements, finding the element larger

The evolutions of spinning bodies moving in rotating black hole spacetimes Zoltn Keresztes

Practical Enterprise Integration Realising the Benefits of a Strong Canonical Architecture Andrew

rs r s r

September 2020 CIM Orientation Introduction Christiana Aina Operations Manager Agenda

Xen Summit 18 April, 2007 TJ Watson Research Center The Xen-API Ewan Mellor ewan@xensource.com

TDDD56: Multicore and GPU programming Lesson 1 Introduction to laboratory work Nicolas Melot

Fourier-pseudospectral method for Cahn-Hilliard Equation on GPU Kangping Zhu Courant Institute

Reduced Manifolds and Trajectory Curvature J. M. Powers Department of Aerospace and Mechanical

Improving Model Selection by Employing the Test Data Max Westphal, - PowerPoint PPT Presentation

3 Improving Model Selection by Employing the Test Data Max Westphal, Werner Brannath University of Bremen, Germany Institute for Statistics Research Training Group 3 mwestphal@uni-bremen.de https://github.com/maxwestphal/ ICML 2019, Long

Model-Based Testing (ISTQB Chapter 4) Arie van Deursen 1 4.1 ISTQB Test Design Test Scripts

Employing Dynamic Employing Dynamic Transparency for 3D Occlusion Transparency for 3D Occlusion

Secure Interoperation in Multidomain Environments Employing UCON Policies Environments Employing

STARTS: STARTS: STARTS: STARTS: STAtic STAtic Regression Test Selection Regression Test

TEST AUTOMATION AT BMAR BMAR TEST TEAM Test Automation Planning 1. Selection Of Test

ERP Selection KIRTANE &amp; PANDIT Suhas Deshpande Why ERP Selection is important ?

STAT 213 Model Selection II Colin Reimer Dawson Oberlin College March 30, 2018 1 / 13 Outline

200511316 200511316 Test plan Test design specification g p

FLSA DUTIES TEST Exemption/Duties Test Types of Duties/Exemption Test Executive Exemption

Engineering Best Practices Test, test, test, and test some more; test as you go Start from a

Test automation Building automatically repeatable test suites Test automation n Test automation

Nehemiah Prays Nehemiah 1-2 Here is some test text Here is some test text Here is some test

SECONDHAND SELECTION Sales Price - 275,000.00 EU SECONDHAND SELECTION INTERNAL VIEWS SECONDHAND

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

SELECTION Deterministic Stochastic Proportionate selection: Roulette Wheel Selection

Selection 2 Selection Selection given a set of (distinct) elements, finding the element larger

The evolutions of spinning bodies moving in rotating black hole spacetimes Zoltn Keresztes

Practical Enterprise Integration Realising the Benefits of a Strong Canonical Architecture Andrew

rs r s r

September 2020 CIM Orientation Introduction Christiana Aina Operations Manager Agenda

Xen Summit 18 April, 2007 TJ Watson Research Center The Xen-API Ewan Mellor ewan@xensource.com

TDDD56: Multicore and GPU programming Lesson 1 Introduction to laboratory work Nicolas Melot

Fourier-pseudospectral method for Cahn-Hilliard Equation on GPU Kangping Zhu Courant Institute

Reduced Manifolds and Trajectory Curvature J. M. Powers Department of Aerospace and Mechanical

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?