Applying Machine Learning Methods to Predicting Reliance on VA Primary Care Edwin S. Wong, PhD VA Puget Sound Health Care System Department of Health Services, University of Washington 2018 AcademyHealth Annual Research Meeting June 25, 2018
Acknowledgements • Sources of funding – VA Career Development Award (Wong, CDA 13-024) • Disclosure: – Dr. Wong reports ownership of common stock in UnitedHealth Group Inc. totaling less than $15,000 in market value VETERANS HEALTH ADMINISTRATION 2
Why Machine Learning? • Increased capabilities not possible with traditional methods – Broader range of health care applications – Support analysis of data of greater size and complexity – Ability to develop models of greater complexity – Offer richer insights – Improvement in model performance VETERANS HEALTH ADMINISTRATION
Example Machine Learning Applications in Health Services Research • Predicting health and health care outcomes • Detecting outliers – High cost patients • Classification • Subgroup analysis – Phenotyping, risk stratification • Measuring heterogenous treatment effects VETERANS HEALTH ADMINISTRATION
Example Machine Learning Applications in Health Services Research • Predicting health and health care outcomes • Detecting outliers – High cost patients • Classification • Subgroup analysis – Phenotyping, risk stratification • Measuring heterogenous treatment effects VETERANS HEALTH ADMINISTRATION
Application: Dual Use of VA and Non-VA Health Care • Veterans Affairs Health Care System (VA) – Large, nationally integrated health system – 8.4 million Veteran enrollees in FY2016 • VA enrollees are not precluded from obtaining care through non-VA sources, independent of VA – ~80% have at least one other non-VA health insurance source – Nearly all age 65+ dually enrolled in Medicare VETERANS HEALTH ADMINISTRATION
Research Objective • To examine how to best predict which VA enrollees will be mostly reliant on VA primary care next year using predictor variables in the current year • Policy Relevance : – VA reliance is an input to projection models used to inform VA health care budget requests submitted to Congress – Better predictions of reliance may improve accuracy of these requests VETERANS HEALTH ADMINISTRATION
Data Sources • VA Corporate Data Warehouse – Comprehensive administrative data on all users of VA health care • Medicare Claims – Utilization of outpatient services through fee-for-service Medicare • 2012 VA Survey of Healthcare Experiences of Patients – Random sample of Veterans receiving care at VA outpatient facilities • Area Health Resource File – Characteristics in Veterans’ residence county VETERANS HEALTH ADMINISTRATION
Population Studied • Sample of 83,825 VA patients responding to the 2012 VA SHEP – Dually enrolled in fee-for-service Medicare in FY2012 and FY2013 – Alive at end of FY2013 – Weighted to population of 4.6 million VA patients VETERANS HEALTH ADMINISTRATION
Definition of VA Reliance • Counts of face-to-face office visits in primary care • VA Reliance = Proportion of all visits from VA – # visits in VA ÷ (# visits in VA + # visits via Medicare) • Dichotomous measure denoting whether Veterans were mostly reliant on VA – VA reliance ≥ 0.5 VETERANS HEALTH ADMINISTRATION 1 Burgess JF, et al. (2011). Health Econ 20(2).
Predictor Variables (Features) • 59 features in 5 categories Group Example Variables Demographics Age, gender, marital status, race/ethnicity Access to Care Distance to nearest VA, copayment exemption Comorbidities Heart failure, hypertension, diabetes, liver disease Patient-Reported Experiences Provider rating, ability to receive immediate care, parking availability, office cleanliness Local Area Factors Poverty rate, unemployment rate, hospital beds per 1,000 VETERANS HEALTH ADMINISTRATION
Machine Learning Framework for Classification • Analytic Objective : Learn a target classifier function C that best assigns input variables X to an output variable y: – y <- C ( X ) – Binary classification: y = 0 (not VA reliant), 1 (VA reliant) – X = Matrix of predictor variables, or features • Policy Goal : Make accurate predictions of Veterans’ future reliance classification given observed features in the present VETERANS HEALTH ADMINISTRATION
Machine Learning Objective • Goal: Assessing properties of model “out -of- sample” – How would model perform in practice? – Causality deemphasized – Focus on performance and fit • Use training sample to estimate model • Assess model performance on separate validation sample • Consider multiple algorithms or models – “Best” model will depend on research question and analytical data – No single model is always superior VETERANS HEALTH ADMINISTRATION
Road Map for Classifying VA Reliance • Model set-up – Pre-processing of data (cleaning and transforming) – Identify performance metric (loss function) – Resampling methods (validation set generation methods) – Identify candidate algorithms VETERANS HEALTH ADMINISTRATION
Road Map for Classifying VA Reliance • Build Models – Estimate model parameters – Determine best value of tuning parameters – Assessing model fit – Calculating the performance of the final model • Identify “best” of candidate models VETERANS HEALTH ADMINISTRATION
Visual Roadmap Gathering data and preprocessing Train, Test, Validation split Test Cleaned Raw Data Train Data Train/ Validation Validation Build and tweak candidate models Compare best models against test data Out of sample Train Train/Val data Model Parameters Predicted Best Best Validation values Model(s) Model(s) Model Performance Metrics (loss function) Test 16 VETERANS HEALTH ADMINISTRATION
Preprocessing • Data may have irregularities that may influence model stability and performance • Differing assumptions and requirements of models • Common preprocessing tasks: – Correcting inconsistent data – Addressing missing data – Centering and scaling – Transformations of individual predictors or groups of predictors – Discretizing continuous predictors VETERANS HEALTH ADMINISTRATION
Variable Selection • More parsimonious model may be preferred – More complex models may achieve a high performance at the cost of overfit – Computational limitations – Easier to interpret • Assess gain in performance from complex model against a simpler, lower variance model VETERANS HEALTH ADMINISTRATION 18
Performance Metrics for Classification Models • Several common metrics to assess performance: Metric Description Accuracy Proportion correctly classified by model Kappa Statistic Inter-rater agreement; performance adjusting for agreement due to random chance Sensitivity True positive (TP) rate: TP / [TP + FN] True negative (TN) rate: TN / [TN + FP] Specificity Area Under ROC Curve Average value of sensitivity for all the possible (AUROC) values of specificity FP=false positive, FN=false negative VETERANS HEALTH ADMINISTRATION
Performance Metrics for Classification Models • Several common metrics to assess performance: Metric Description Accuracy Proportion correctly classified by model Kappa Statistic Inter-rater agreement; performance adjusting for agreement due to random chance Sensitivity True positive (TP) rate: TP / [TP + FN] True negative (TN) rate: TN / [TN + FP] Specificity Area Under ROC Curve Average value of sensitivity for all the possible (AUROC) values of specificity FP=false positive, FN=false negative VETERANS HEALTH ADMINISTRATION
Resampling Methods • Facilitate estimation of model performance on data “unseen” in training process • Resampling allows for assessing variability and stability of model • Helps protect against model overfitting – Overfit models will “memorize” data – Good performance using training data does not necessarily generalize “out -of- sample” VETERANS HEALTH ADMINISTRATION
Resampling Methods • Repeat for a given number of resampling iterations – Construct validation sample by holding out observations – Fit model on remaining observations (i.e. training sample) – Predict on validation sample – Calculate performance using specified metric • Assess model performance across all iterations VETERANS HEALTH ADMINISTRATION
Resampling Methods • Several commonly applied methods to define validation samples Method Brief Description Simple Cross Validation Partition data into training and test sample K-fold Cross Validation Split data into K equally sized blocks Repeated K-fold Cross Create multiple versions of K-folds Validation Leave Group Out Define random proportion of data to train model and repeat multiple times Bootstrapping Construct random sample with replacement of same size as original data set VETERANS HEALTH ADMINISTRATION
Resampling Methods • Several commonly applied methods to define validation samples Method Brief Description Simple Cross Validation Partition data into training and test sample K-fold Cross Validation Split data into K equally sized blocks Repeated K-fold Cross Create multiple versions of K-folds Validation Leave Group Out Define random proportion of data to train model and repeat multiple times Bootstrapping Construct random sample with replacement of same size as original data set VETERANS HEALTH ADMINISTRATION
Recommend
More recommend