Predictive Modeling for Suicide Risk Robert Bossarte, PhD VISN 2 Center of Excellence for Suicide Prevention West Virginia University March 10, 2015
Why Should We Invest in Modeling Risk? • Evidence that VA’s approach to suicide prevention is effective comes from findings that suicide rates in VA male patients have, in general, decreased relative to rates in the rest of American men. • Nevertheless, the finding that rates remain high represents a call to supplement VA’s current strategy with new approaches to identifying patients at risk, and new methods for enhancing their care. VETERANS HEALTH ADMINISTRATION
Excess Risk for Suicide among Males Who Used VHA Services, 2001 – 2011 VETERANS HEALTH ADMINISTRATION 3
Basic Strategy – A Foundation for Risk Stratified Care • Going beyond intercepting people on the trajectory towards suicide • Identifying people whose care should be enhanced – One target group may be those at highest predicted risk – Another includes those at more moderate risk, who account for a substantial proportion of the total burden of suicide VETERANS HEALTH ADMINISTRATION
How? • Generate a data base for patient-months using data for FY09-11 – Include all VHA users who died from suicide, by month, and 1% of VHA users who survived the month – Create split samples for model development and validation – Consider demographics and variables known to be risk factors for suicide in VA and/or other populations – Include specific events as lag variables – Include interactions known to be important • Develop a logistic regression model using the development sample – Sort and rank patients by tiers of model-predicted risk • Evaluate model using a separate validation sample • Test the extent to which it predicts suicide during a single subsequent years for all VHA patients who were alive at the start of the year • Characterize patients in high risk strata, to inform intervention development VETERANS HEALTH ADMINISTRATION
VETERANS HEALTH ADMINISTRATION
VETERANS HEALTH ADMINISTRATION
Prediction Sample: Percentage of Suicide Deaths Tier of Predicted Months Probability, % Patients 1 3 6 9 12 0.01 596 1.4 0.9 0.5 0.4 0.3 0.10 5,969 4.3 2.9 2.0 1.7 1.6 1.00 59,696 10.4 9.4 9.0 8.1 8.2 5.00 298,493 23.2 23.9 25.0 23.6 23.7 10.00 596,966 38.4 38.2 37.1 35.8 35.5 50.00 2,984,831 83.9 83.5 81.5 79.9 80.7 100.00 5,969,662 100.0 100.0 100.0 100.0 100.0 VETERANS HEALTH ADMINISTRATION
Prediction: Trajectories VETERANS HEALTH ADMINISTRATION
Who are the high risk patients? In general, most patients in the high risk strata are Veterans with known mental health conditions at ongoing use of mental health services VETERANS HEALTH ADMINISTRATION
Why Should We Worry About New Models? • Original project sought to identify VHA patients with the greatest suicide risk concentration. • The long term goals of this project was to provide a foundation for the continued development and evaluation of improved models for predicting suicide risk. • Evaluation of the preliminary model suggested instability associated with overfitting and variable selection. • Long term sustainability and operationalization of predictive models would be enhanced by development of processes requiring fewer data points/system resources. VETERANS HEALTH ADMINISTRATION 11
Observations from Initial Exercise and Next Steps • Administrative data can be used to predict suicide risk within the next 30 days. • Increased risk concentration remains over longer periods for high risk groups. • There are associations between calculated risk strata and high risk flags – The proportion of patients with flags increases with calculated risk – Only a minority of patients calculated to be at high risk are flagged • The large number of variables increases possibility of overfitting. • Extensive data and analytic requirements restrict opportunities for rapid updates to the analytic model or “real time” calculation of risk scores . VETERANS HEALTH ADMINISTRATION 12
Applying Methods of Machine Learning to VA’s Original Model • Begin with the same data. • Utilize a 3-Fold cross-validation strategy to identify the number of variables needed to achieve optimal risk concentration – logistic regression (weighted) w/ forward selection – Identify optimal number of variables (approximate) • Apply machine learning algorhithms. – Glmnet R package: fits a generalized linear model via penalized maximum likelihood. • Mixing parameter used to estimate models with penalties varying between lasso <-> elastic • Variable ‘stop’ parameter used to set approximate max variable number determined from CV logistic • Application of optimal Glmnet model to original validation and prediction cohorts to assess model stability/ “out of sample performance”. VETERANS HEALTH ADMINISTRATION
Comparison of Model Fit (AUC) POC RM 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Development Validation FY11 Prediction VETERANS HEALTH ADMINISTRATION 14
Comparison of Risk Concentration (Top 5%) POC RM 40 35 30 25 20 15 10 5 0 Development Validation FY11 Prediction VETERANS HEALTH ADMINISTRATION 15
Robert.Bossarte@va.gov rbossarte@hsc.wvu.edu Contact Information VETERANS HEALTH ADMINISTRATION 16
Recommend
More recommend