Cross-validated AUC in Stata: CVAUROC Miguel Angel Luque Fernandez - PowerPoint PPT Presentation

Cross-validated AUC in Stata: CVAUROC Miguel Angel Luque Fernandez Biomedical Research Institute of Granada Noncommunicable Disease and Cancer Epidemiology https://maluque.netlify.com 2018 Spanish Stata Conference 24 October 2018 MA Luque Fernandez (ibs.GRANADA) Cross-validated AUC in Stata: CVAUROC 24 October 2018 1 / 28

Contents Cross-validation 1 Cross-validation justification 2 Cross-validation methods 3 cvauroc 4 References 5 MA Luque Fernandez (ibs.GRANADA) Cross-validated AUC in Stata: CVAUROC 24 October 2018 2 / 28

Cross-validation Definition Cross-validation is a model validation technique for assessing how the results of a statistical analysis will generalize to an independent data set. It is mainly used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice (note: performance = model assessment ). MA Luque Fernandez (ibs.GRANADA) Cross-validated AUC in Stata: CVAUROC 24 October 2018 3 / 28

Cross-validation Applications However, cross-validation can be used to compare the performance of different modeling specifications (i.e. models with and without interactions, inclusion of exclusion of polynomial terms, number of knots with restricted cubic splines, etc). Furthermore, cross-validation can be used in variable selection and select the suitable level of flexibility in the model (note: flexibility = model selection ). MA Luque Fernandez (ibs.GRANADA) Cross-validated AUC in Stata: CVAUROC 24 October 2018 4 / 28

Cross-validation Applications MODEL ASSESSMENT: To compare the performance of different modeling specifications. MODEL SELECTION: To select the suitable level of flexibility in the model. MA Luque Fernandez (ibs.GRANADA) Cross-validated AUC in Stata: CVAUROC 24 October 2018 5 / 28

MSE Regression Model f ( x ) = f ( x 1 + x 2 + x 3 ) Y = β x 1 + β x 2 + β x 3 + ǫ MA Luque Fernandez (ibs.GRANADA) Cross-validated AUC in Stata: CVAUROC 24 October 2018 6 / 28

MSE Regression Model f ( x ) = f ( x 1 + x 2 + x 3 ) Y = β x 1 + β x 2 + β x 3 + ǫ Y = f ( x ) + ǫ MA Luque Fernandez (ibs.GRANADA) Cross-validated AUC in Stata: CVAUROC 24 October 2018 6 / 28

MSE Expectation E ( Y | X 1 = x 1 , X 2 = x 2 , X 3 = x 3 ) MSE E [( Y − ˆ f ( X )) 2 | X = x ] MA Luque Fernandez (ibs.GRANADA) Cross-validated AUC in Stata: CVAUROC 24 October 2018 7 / 28

Bias-Variance Trade-off Error descomposition f ( x 0 ))] 2 + Var ( ǫ ) MSE = E [( Y − ˆ f ( X )) 2 | X = x ] = Var (ˆ f ( x 0 )) + [ Bias (ˆ Trade-off As flexibility of ˆ f increases, its variance increases, and its bias decreases. MA Luque Fernandez (ibs.GRANADA) Cross-validated AUC in Stata: CVAUROC 24 October 2018 8 / 28

BIAS-VARIANCE-TRADE-OFF Bias-variance trade-off Choosing the model flexibility based on average test error Average Test Error E [( Y − ˆ f ( X )) 2 | X = x ] And thus, this amounts to a bias-variance trade-off. MA Luque Fernandez (ibs.GRANADA) Cross-validated AUC in Stata: CVAUROC 24 October 2018 9 / 28

BIAS-VARIANCE-TRADE-OFF Bias-variance trade-off Choosing the model flexibility based on average test error Average Test Error E [( Y − ˆ f ( X )) 2 | X = x ] And thus, this amounts to a bias-variance trade-off. Rule More flexibility increases variance but decreases bias. Less flexibility decreases variance but increases error. MA Luque Fernandez (ibs.GRANADA) Cross-validated AUC in Stata: CVAUROC 24 October 2018 9 / 28

Bias-Variance trade-off Regression Function MA Luque Fernandez (ibs.GRANADA) Cross-validated AUC in Stata: CVAUROC 24 October 2018 10 / 28

Overparameterization George E.P .Box,(1919-2013) All models are wrong but some are useful Quote, 1976 Since all models are wrong the scientist cannot obtain a "correct" one by excessive elaboration (...). Just as the ability to devise simple but evocative models is the signature of the great scientist so overelaboration and overparameterization is often the mark of mediocrity. MA Luque Fernandez (ibs.GRANADA) Cross-validated AUC in Stata: CVAUROC 24 October 2018 11 / 28

Justification AIC and BIC AIC and BIC are both maximum likelihood estimate driven and penalize free parameters in an effort to combat overfitting, they do so in ways that result in significantly different behavior. AIC = -2*ln(likelihood) + 2*k, k = model degrees of freedom MA Luque Fernandez (ibs.GRANADA) Cross-validated AUC in Stata: CVAUROC 24 October 2018 12 / 28

Justification AIC and BIC AIC and BIC are both maximum likelihood estimate driven and penalize free parameters in an effort to combat overfitting, they do so in ways that result in significantly different behavior. AIC = -2*ln(likelihood) + 2*k, k = model degrees of freedom BIC = -2*ln(likelihood) + ln(N)*k, k = model degrees of freedom and N = number of observations. MA Luque Fernandez (ibs.GRANADA) Cross-validated AUC in Stata: CVAUROC 24 October 2018 12 / 28

Justification AIC and BIC AIC and BIC are both maximum likelihood estimate driven and penalize free parameters in an effort to combat overfitting, they do so in ways that result in significantly different behavior. AIC = -2*ln(likelihood) + 2*k, k = model degrees of freedom BIC = -2*ln(likelihood) + ln(N)*k, k = model degrees of freedom and N = number of observations. There is some disagreement over the use of AIC and BIC with non-nested models. MA Luque Fernandez (ibs.GRANADA) Cross-validated AUC in Stata: CVAUROC 24 October 2018 12 / 28

Justification Fewer assumptions Cross-validation compared with AIC, BIC and adjusted R 2 provides a direct estimate of the ERROR . Cross-validation makes fewer assumptions about the true underlying model. MA Luque Fernandez (ibs.GRANADA) Cross-validated AUC in Stata: CVAUROC 24 October 2018 13 / 28

Justification Fewer assumptions Cross-validation compared with AIC, BIC and adjusted R 2 provides a direct estimate of the ERROR . Cross-validation makes fewer assumptions about the true underlying model. Cross-validation can be used in a wider range of model selections tasks, even in cases where it is hard to pinpoint the number of predictors in the model. MA Luque Fernandez (ibs.GRANADA) Cross-validated AUC in Stata: CVAUROC 24 October 2018 13 / 28

Cross-validated AUC in Stata: CVAUROC Miguel Angel Luque Fernandez - PowerPoint PPT Presentation

Cross-validated AUC in Stata: CVAUROC Miguel Angel Luque Fernandez Biomedical Research Institute of Granada Noncommunicable Disease and Cancer Epidemiology https://maluque.netlify.com 2018 Spanish Stata Conference 24 October 2018 MA Luque

WITH C++ Prof. Amr Goneid AUC Part 6. Simple and User Defined Data Types Prof. amr Goneid, AUC

WITH C++ Prof. Amr Goneid AUC Part 13. Abstract Data Types (ADTs) Prof. amr Goneid, AUC 1

WITH C++ Prof. Amr Goneid AUC Part 9. Streams & Files Prof. amr Goneid, AUC 1 Streams

WITH C++ Prof. Amr Goneid AUC Part 5. Functions Prof. amr Goneid, AUC 1 Functions Prof. amr

WITH C++ Prof. Amr Goneid AUC Part 11a. The Vector Class Prof. amr Goneid, AUC 1 The Vector

WITH C++ Prof. Amr Goneid AUC Part 8. Characters & Strings Prof. amr Goneid, AUC 1

WITH C++ Prof. Amr Goneid AUC Part 16. Linked Lists Prof. amr Goneid, AUC 1 Linked Lists

WITH C++ Prof. Amr Goneid AUC Part 12. Recursion Prof. amr Goneid, AUC 1 Recursion Prof. amr

WITH C++ Prof. Amr Goneid AUC Part 7. 1-D & 2-D Arrays Prof. Amr Goneid, AUC 1 Arrays

WITH C++ Prof. Amr Goneid AUC Part 11. The Struct Data Type Prof. amr Goneid, AUC 1 The

Bayesian hierarchical models in Stata Nikolay Balov StataCorp LP 2016 Stata Conference Nikolay

Python applications in Stata 16 BPLIM 2020 Portuguese Stata Conference BPLIM Python

Bayesian Analysis using Stata Bill Rising StataCorp LP 2016 Brazilian Stata Users Group Meeting

Joint use of AUC and SAS Olwyn Byron School of Life Sciences College of Medical, Veterinary and

Auctions Johan Stennek 1 Auc$ons Examples An$ques, fine arts Houses,

WITH C++ Prof. Amr Goneid AUC Introduction to Stacks & Queues Prof. amr Goneid, AUC 1

Multidecadal changes in the relationship of storm frequency over Euro-Mediterranean region and

Outline Bayesian modelling in R with JAGS Design goals of JAGS Martyn Plummer 1 JAGS Modules 1

WHAT DATA SHOULD BE ACCESSIBLE TO WHOM AND WHEN BBMRI-ERIC EXPERIENCE ERIK STEINFELDER WHO WE

THE CRUCIFIXION and RESURRECTION of CHRIST 1 Dr. JoLynn Gower, Executive Director Christian

Te c h n o l o g y, S k i l l s , a n d G l o b a l i za t i o n : E x p l a i n i n g I n t e

Axion Dark Matter Search at CAPP/IBS Jonghee Yoo KAIST/IBS 23 November 2016 3rd

3 Updates Up es on IBS 4 IBS Ad Adoption ion in Malaysia sia 5 Incen centives ives for

Toward the Goal of Toward the Goal of Continuous Track and Identity Continuous Track and

Cross-validated AUC in Stata: CVAUROC Miguel Angel Luque Fernandez - PowerPoint PPT Presentation

Cross-validated AUC in Stata: CVAUROC Miguel Angel Luque Fernandez Biomedical Research Institute of Granada Noncommunicable Disease and Cancer Epidemiology https://maluque.netlify.com 2018 Spanish Stata Conference 24 October 2018 MA Luque

WITH C++ Prof. Amr Goneid AUC Part 6. Simple and User Defined Data Types Prof. amr Goneid, AUC

WITH C++ Prof. Amr Goneid AUC Part 13. Abstract Data Types (ADTs) Prof. amr Goneid, AUC 1

WITH C++ Prof. Amr Goneid AUC Part 9. Streams &amp; Files Prof. amr Goneid, AUC 1 Streams

WITH C++ Prof. Amr Goneid AUC Part 5. Functions Prof. amr Goneid, AUC 1 Functions Prof. amr

WITH C++ Prof. Amr Goneid AUC Part 11a. The Vector Class Prof. amr Goneid, AUC 1 The Vector

WITH C++ Prof. Amr Goneid AUC Part 8. Characters &amp; Strings Prof. amr Goneid, AUC 1

WITH C++ Prof. Amr Goneid AUC Part 16. Linked Lists Prof. amr Goneid, AUC 1 Linked Lists

WITH C++ Prof. Amr Goneid AUC Part 12. Recursion Prof. amr Goneid, AUC 1 Recursion Prof. amr

WITH C++ Prof. Amr Goneid AUC Part 7. 1-D &amp; 2-D Arrays Prof. Amr Goneid, AUC 1 Arrays

WITH C++ Prof. Amr Goneid AUC Part 11. The Struct Data Type Prof. amr Goneid, AUC 1 The

Bayesian hierarchical models in Stata Nikolay Balov StataCorp LP 2016 Stata Conference Nikolay

Python applications in Stata 16 BPLIM 2020 Portuguese Stata Conference BPLIM Python

Bayesian Analysis using Stata Bill Rising StataCorp LP 2016 Brazilian Stata Users Group Meeting

Joint use of AUC and SAS Olwyn Byron School of Life Sciences College of Medical, Veterinary and

Auctions Johan Stennek 1 Auc$ons Examples An$ques, fine arts Houses,

WITH C++ Prof. Amr Goneid AUC Introduction to Stacks &amp; Queues Prof. amr Goneid, AUC 1

Multidecadal changes in the relationship of storm frequency over Euro-Mediterranean region and

Outline Bayesian modelling in R with JAGS Design goals of JAGS Martyn Plummer 1 JAGS Modules 1

WHAT DATA SHOULD BE ACCESSIBLE TO WHOM AND WHEN BBMRI-ERIC EXPERIENCE ERIK STEINFELDER WHO WE

THE CRUCIFIXION and RESURRECTION of CHRIST 1 Dr. JoLynn Gower, Executive Director Christian

Te c h n o l o g y, S k i l l s , a n d G l o b a l i za t i o n : E x p l a i n i n g I n t e

Axion Dark Matter Search at CAPP/IBS Jonghee Yoo KAIST/IBS 23 November 2016 3rd

3 Updates Up es on IBS 4 IBS Ad Adoption ion in Malaysia sia 5 Incen centives ives for

Toward the Goal of Toward the Goal of Continuous Track and Identity Continuous Track and

WITH C++ Prof. Amr Goneid AUC Part 9. Streams & Files Prof. amr Goneid, AUC 1 Streams

WITH C++ Prof. Amr Goneid AUC Part 8. Characters & Strings Prof. amr Goneid, AUC 1

WITH C++ Prof. Amr Goneid AUC Part 7. 1-D & 2-D Arrays Prof. Amr Goneid, AUC 1 Arrays

WITH C++ Prof. Amr Goneid AUC Introduction to Stacks & Queues Prof. amr Goneid, AUC 1