Assessing the Calibration of Dichotomous Outcome Models with the - PowerPoint PPT Presentation

Assessing the Calibration of Dichotomous Outcome Models with the Calibration Belt Giovanni Nattino The Ohio Colleges of Medicine Government Resource Center The Ohio State University Stata Conference - July 19, 2018 Giovanni Nattino 1 / 19

Background: Logistic Regression Most popular family of models for binary outcomes ( Y = 1 or Y = 0); Models Pr ( Y = 1), probability of “success” or “event”; Given predictors X 1 , ..., X p , the model is logit { Pr ( Y = 1) } = β 0 + β 1 X 1 + ... + β p X p , where logit( π ) = log ( π/ (1 − π )). Does my model fit the data well? Giovanni Nattino 2 / 19

Goodness of Fit of Logistic Regression Models Let ˆ π be the model’s estimate of Pr ( Y = 1) for a given subject. Two measures of goodness of fit: Discrimination ◮ Do subjects with Y = 1 have higher ˆ π than subjects with Y = 0? ◮ Evaluated with area under ROC curve. Calibration ◮ Does ˆ π estimate Pr ( Y = 1) accurately? Giovanni Nattino 3 / 19

An Example: ICU Data . logit sta age can sysgp_4 typ locd Iteration 0: log likelihood = -100.08048 Iteration 1: log likelihood = -70.385527 Iteration 2: log likelihood = -67.395341 Iteration 3: log likelihood = -66.763511 Iteration 4: log likelihood = -66.758491 Iteration 5: log likelihood = -66.758489 Logistic regression Number of obs = 200 LR chi2(5) = 66.64 Prob > chi2 = 0.0000 Log likelihood = -66.758489 Pseudo R2 = 0.3330 sta Coef. Std. Err. z P>|z| [95% Conf. Interval] age .040628 .0128617 3.16 0.002 .0154196 .0658364 can 2.078751 .8295749 2.51 0.012 .4528141 3.704688 sysgp_4 -1.51115 .7204683 -2.10 0.036 -2.923242 -.0990585 typ 2.906679 .9257469 3.14 0.002 1.092248 4.72111 locd 3.965535 .9820316 4.04 0.000 2.040788 5.890281 _cons -6.680532 1.320663 -5.06 0.000 -9.268984 -4.09208 Giovanni Nattino 4 / 19

An Example: ICU Data 1 .8 .6 Outcome .4 .2 0 0 .2 .4 .6 .8 1 Predicted Probability Giovanni Nattino 5 / 19

An Example: ICU Data 1 .8 Observed Proportion .6 .4 .2 0 0 .2 .4 .6 .8 1 Predicted Probability Giovanni Nattino 6 / 19

The Hosmer-Lemeshow Test Divide data into G groups (usually, G = 10). For each group, define: ◮ O 1 g and E 1 g : number of observed and expected events ( Y = 1). ◮ O 0 g and E 0 g : number of observed and expected non-events ( Y = 0). The Hosmer-Lemeshow statistic is: � � � G ( O 1 g − E 1 g ) 2 + ( O 0 g − E 0 g ) 2 � C = E 1 g E 0 g g =1 Under the hypothesis of perfect fit, � C ∼ χ 2 G − 2 . Problems: ◮ How many groups? ◮ Different G , different results. Hosmer Jr, D. W., Lemeshow, S., Sturdivant, R. X. (2013). Applied logistic regression . Giovanni Nattino 7 / 19

The Calibration Curve Let � g = logit( � π ). What about fitting a new model: logit { P ( Y = 1) } = α 0 + α 1 � g . If α 0 = 0 and α 1 = 1, logit { P ( Y = 1) } = 0 + 1 × � g = � g ⇓ logit { P ( Y = 1) } = logit( � π ) ⇓ P ( Y = 1) = ˆ π If perfect fit, � α 0 = 0 and � α 1 = 1. Problems: ◮ Only for external validation of the model. ◮ Why linear relationship? Cox, D. (1958). Two further applications of a model for a method of binary regression. Biometrika . Giovanni Nattino 8 / 19

The Calibration Curve We assume a general polynomial relationship: g 2 + ... + α m ˆ g m . logit { P ( Y = 1) } = α 0 + α 1 ˆ g + α 2 ˆ m ? fixed too low ⇒ too simplistic; fixed too high ⇒ estimation of useless parameters; Solution: Forward selection. Giovanni Nattino 9 / 19

Example: ICU Data Selected polynomial is m = 2: g 2 . logit { P ( Y = 1) } = 0 . 117 + 0 . 917ˆ g − 0 . 076ˆ This defines the calibration curve π )) 2 e 0 . 117+0 . 917 logit (ˆ π ) − 0 . 076( logit (ˆ P ( Y = 1) = π )) 2 1 + e 0 . 117+0 . 917 logit (ˆ π ) − 0 . 076( logit (ˆ Giovanni Nattino 10 / 19

Example: ICU Data 1 .8 Observed Proportion .6 .4 .2 0 0 .2 .4 .6 .8 1 Predicted Probability Giovanni Nattino 11 / 19

A Goodness of Fit Test When m is selected, we can design a goodness of fit test on g 2 + ... + α m ˆ g m . logit { P ( Y = 1) } = α 0 + α 1 ˆ g + α 2 ˆ If perfect fit: α 1 = 1, α 0 = α 2 = ... = α m = 0. A likelihood ratio test can be used to test the hypothesis H 0 : α 1 = 1 , α 0 = α 2 = ... = α m = 0 The distribution of the statistic must account for the forward selection on the same data. Inverting the test allows to generate a confidence region around the calibration curve: the calibration belt . Nattino, G., Finazzi, S., Bertolini, G. (2016). A new test and graphical tool to assess the goodness of fit of logistic regression models. Statistics in medicine . Giovanni Nattino 12 / 19

Example: ICU Data . calibrationbelt ----------------------------------------------------------- GiViTI Calibration Belt Calibration belt and test for internal validation: the calibration is evaluated on the training sample. Sample size: 200 Polynomial degree: 2 Test statistic: 1.08 p-value: 0.2994 ----------------------------------------------------------- . estat gof, group(10) Logistic model for sta, goodness-of-fit test (Table collapsed on quantiles of estimated probabilities) number of observations = 200 number of groups = 10 Hosmer-Lemeshow chi2(8) = 4.00 Prob > chi2 = 0.8570 Nattino, G., Lemeshow, S., Phillips, G., Finazzi, S., Bertolini, G. (2017). Assessing the calibration of dichotomous outcome models with the calibration belt. Stata Journal Giovanni Nattino 13 / 19

Example: ICU Data 1 Type of evaluation: internal Polynomial degree: 2 Test statistic: 1.08 .8 p-value: 0.299 n: 200 .6 Observed .4 .2 Confidence Under Over level the bisector the bisector 0 95% NEVER NEVER 0 .2 .4 .6 .8 1 Expected Giovanni Nattino 14 / 19

Example 2: Poorly Fitting Model 1 Type of evaluation: internal Polynomial degree: 2 Test statistic: 8.06 .8 p-value: 0.005 n: 200 .6 Observed .4 Confidence Under Over .2 level the bisector the bisector 0.02 - 0.20 80% 0.44 - 0.59 0.84 - 0.97 0.02 - 0.13 95% NEVER 0 0.90 - 0.97 0 .2 .4 .6 .8 1 Expected Giovanni Nattino 15 / 19

Example 3: External Validation . calibrationbelt y phat, devel("external") 1 Type of evaluation: external Polynomial degree: 1 Test statistic: 11.75 .8 p-value: 0.003 n: 200 .6 Observed .4 .2 Confidence Under Over level the bisector the bisector 80% 0.55 - 1.00 0.00 - 0.12 0 95% 0.63 - 1.00 0.00 - 0.02 0 .2 .4 .6 .8 1 Expected Giovanni Nattino 16 / 19

Example 3: External Validation . calibrationbelt y phat, cLevel1(.99) cLevel2(.6) devel("external") 1 Type of evaluation: external Polynomial degree: 1 Test statistic: 11.75 .8 p-value: 0.003 n: 200 .6 Observed .4 .2 Confidence Under Over level the bisector the bisector 60% 0.50 - 1.00 0.00 - 0.19 0 99% 0.73 - 1.00 NEVER 0 .2 .4 .6 .8 1 Expected Giovanni Nattino 17 / 19

Example 4: Goodness of Fit and Large Samples . calibrationbelt 1 Type of evaluation: internal Polynomial degree: 2 Test statistic: 17.32 .8 p-value: <0.001 n: 336266 .6 Observed .4 Confidence Under Over .2 level the bisector the bisector 0.02 - 0.06 80% 0.09 - 0.32 0.49 - 0.96 0.02 - 0.06 95% 0.10 - 0.27 0 0.55 - 0.96 0 .2 .4 .6 .8 1 Expected Giovanni Nattino 18 / 19

Discussion The calibrationbelt command implements the calibration belt and the related test in Stata. Limitation: ◮ Assumed polynomial relationship. Advantages: ◮ No need of data grouping. ◮ Informative tool to spot significance of deviations. Future work: goodness of fit in very large samples. Giovanni Nattino 19 / 19

Assessing the Calibration of Dichotomous Outcome Models with the - PowerPoint PPT Presentation

Assessing the Calibration of Dichotomous Outcome Models with the Calibration Belt Giovanni Nattino The Ohio Colleges of Medicine Government Resource Center The Ohio State University Stata Conference - July 19, 2018 Giovanni Nattino 1 / 19

Dichotomous Keys A dichotomous key is a tool that allows the user to determine the identity of

Exploring and Modeling Dichotomous Outcomes Brandon LeBeau Assistant Professor DataCamp

Basic Study Design The 2 2 table for a dichotomous outcome Comparative studies

Introduction to Outcome Harvesting Open Contracting Programme Agenda Definition of Outcome

Outcome Based Approach in Outcome Based Approach in Outcome Based Approach in Outcome Based

CT Traceability - Calibration and Accuracy Calibration and Accuracy Prof. Wim Dewulf, Group T -

Radioactive Source Calibration Radioactive Source Calibration Jonathan Asaadi University of Texas

Nested Dichotomous Models Allen Davis, MSPH Jeff Gift, Ph.D. Jay Zhao, Ph.D. National Center

Dichotomous Models Allen Davis, MSPH Jeff Gift, Ph.D. Jay Zhao, Ph.D. National Center for

ASSESSING INTELLECTUAL DISABILITIES ASSESSING INTELLECTUAL DISABILITIES ASSESSING INTELLECTUAL

Assessing Earthquake Disaster Using ALOS Assessing Earthquake Disaster Using ALOS Assessing

LabCal Wavecontrol ISO 17025 accredited field probe calibration laboratory. All equipment

Process Simulation Calibration Agenda Two levels of Process Simulation Calibration

PPP Calibration Guide What is calibration and why is it important? Working together for a safer

DETERMINING CALIBRATION INTERVALS BY AS-FOUND CALIBRATION A study of high pressure turbinemeters

Automatic Registration and Calibration Automatic Registration and Calibration Automatic

Providing accessibility to affordable power derived from renewable resources Presented By Visat

Third Quarter Report 2004 I am pleased to present BMO Financial Groups Third Quarter 2004

Search-Based Software Project Scheduling Francisco Chicano joint work with E. Alba, A. Cervantes,

Overview Statistical filtering MAP estimate Different noise models Different regularizators

Quantum graph with the Dirac operator and resonance state completeness Irina Blinova, Igor Popov

Using weekly group political presentation to enhance pronunciation - Tran Hong Le - Questions to

The LISSOM Cortical Model Dr. James A. Bednar jbednar@inf.ed.ac.uk

2 Visualizing numbers - Introduction Thanks to Ross Ihaka 1 Outline An introductive