ensemble learning and the heritage health prize
play

Ensemble Learning and the Heritage Health Prize Jonathan Stroud, - PowerPoint PPT Presentation

Ensemble Learning and the Heritage Health Prize Jonathan Stroud, Igii Enverga, Tiffany Silverstein, Brian Song, and Taylor Rogers iCAMP 2012 University of California, Irvine Advisors: Max Welling, Alexander Ihler, Sungjin Ahn, and Qiang Liu


  1. Ensemble Learning and the Heritage Health Prize Jonathan Stroud, Igii Enverga, Tiffany Silverstein, Brian Song, and Taylor Rogers iCAMP 2012 University of California, Irvine Advisors: Max Welling, Alexander Ihler, Sungjin Ahn, and Qiang Liu August 14, 2012 Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

  2. The Heritage Health Prize ◮ Goal: Identify patients who will be admitted to a hospital within the next year, using historical claims data.[1] ◮ 1,250 teams Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

  3. Purpose ◮ Reduce cost of unnecessary hospital admissions per year ◮ Identify at-risk patients earlier Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

  4. Kaggle ◮ Public online competitions ◮ Gives feedback on prediction models Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

  5. Data ◮ Provided through Kaggle ◮ Three years of patient data ◮ Two years include days spent in hospital (training set) Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

  6. Evaluation Root Mean Squared Logarithmic Error (RMSLE) � n � � 1 � � [ log ( p i + 1) − log ( a i + 1)] 2 ε = n i Threshold: ε ≤ . 4 Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

  7. The Netflix Prize ◮ $1 Million prize ◮ Leading teams combined predictors to pass threshold Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

  8. Blending Blend several predictors to create a more accurate predictor Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

  9. Prediction Models ◮ Optimized Constant Value ◮ K-Nearest Neighbors ◮ Logistic Regression ◮ Support Vector Regression ◮ Random Forests ◮ Gradient Boosting Machines ◮ Neural Networks Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

  10. Feature Selection ◮ Used Market Makers method [2] ◮ Reduced each patient to vector of 139 features Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

  11. Optimized Constant Value ◮ Predicts same number of days for each patient ◮ Best constant prediction is p = 0 . 209179 RMSLE: 0.486459 (800th place) Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

  12. K-Nearest Neighbors ◮ Weighted average of closest neighbors ◮ Very slow Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

  13. Eigenvalue Decomposition Reduces number of features for each patient X k = λ − 1 / 2 U T k X c k Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

  14. K-Nearest Neighbors Results Neighbors: k = 1000 RMSLE: 0.475197 (600th place) Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

  15. Logistic Regression RMSLE: 0.466726 (375th place) Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

  16. Support Vector Regression ε = . 02 RMSLE: 0.467152 (400th place) Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

  17. Decision Trees Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

  18. Random Forests RMSLE: 0.464918 (315th place) Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

  19. Gradient Boosting Machines Trees = 8000 Shrinkage = 0.002 Depth = 7 Minimum Observations = 100 RMSLE: 0.462998 (200th place) Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

  20. Artificial Neural Networks Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

  21. Back Propagation in Neural Networking Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

  22. Neural Networking Results Number of hidden neurons = 7 Number of cycles = 3000 RMSLE: 0.465705 (340th place) Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

  23. Individual Predictors (Summary) ◮ Optimized Constant Value 0.486459 (800th place) ◮ K-Nearest Neighbors 0.475197 (600th place) ◮ Logistic Regression 0.466726 (375th place) ◮ Support Vector Regression 0.467152 (400th place) ◮ Random Forests 0.464918 (315th place) ◮ Gradient Boosting Machines 0.462998 (200th place) ◮ Neural Networks 0.465705 (340th place) Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

  24. Individual Predictors (Summary) Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

  25. Deriving the Blending Algorithm Error (RMSE) � n � � 1 � � ( X i − Y i ) 2 ε = n i =1 n � n ε 2 ( X i − Y i ) 2 c = i =1 n � n ε 2 Y 2 0 = i i =1 Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

  26. Deriving the Blending Algorithm (Continued) X as a combination of predictors ˜ X = Xw or ˜ � X i = w c X ic c Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

  27. Deriving the Blending Algorithm (Continued) Minimizing the cost function N C = 1 � ( Y i − ˜ X i ) 2 n i =1 ∂ C � � ∂ w = ( Y i − w c X ic )( − X ic ) = 0 c i Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

  28. Deriving the Blending Algorithm (Continued) Minimizing the cost function (continued) � � � Y i X ic = w c X ic X ic c i i Y T X = w T c X T c X Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

  29. Deriving the Blending Algorithm (Continued) Optimizing predictors’ weights w c = ( Y T X )( X T X ) − 1 � � � � X 2 Y 2 ( Y i − X ic ) 2 Y i X ic = ic + ic − i i i i � � X 2 ic + n ε 2 0 − n ε 2 Y i X ic = c i i Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

  30. Deriving the Blending Algorithm (Continued) Error (RMSE) � n � � 1 � � ( X i − Y i ) 2 ε = n i =1 n � n ε 2 ( X i − Y i ) 2 c = i =1 n � n ε 2 Y 2 0 = i i =1 Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

  31. Deriving the Blending Algorithm (Continued) Optimizing predictors’ weights w c = ( Y T X )( X T X ) − 1 � � � � X 2 Y 2 ( Y i − X ic ) 2 Y i X ic = ic + ic − i i i i � � X 2 ic + n ε 2 0 − n ε 2 Y i X ic = c i i Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

  32. Deriving the Blending Algorithm (Continued) X as a combination of predictors ˜ X = Xw or ˜ � X i = w c X ic c Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

  33. Blending Algorithm (Summary) 1. Submit and record all predictions X and errors ε 2. Calculate M = ( X T X ) − 1 and v c = ( X T Y ) c = 1 i ( X 2 ic + n ε 2 0 − n ε 2 � c ) 2 3. Because w c = ( Y T X )( X T X ) − 1 , calculate weights w = Mv 4. Final blended prediction is ˜ X i = Xw Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

  34. Blending Results RMSLE: 0.461432 (98th place) Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

  35. Future Work ◮ Optimizing Blending Equation with Regularization Constant w c = ( Y T X )( X T X + λ I ) − 1 ◮ Improved feature selection ◮ More predictors Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

  36. Questions Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

  37. References Heritage provider network health prize, 2012. http://www.heritagehealthprize.com/c/hhp. David Vogel Phil Brierley and Randy Axelrod. Market makers - milestone 1 description. September 2011. Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

Recommend


More recommend