Ensemble Learning and the Heritage Health Prize Jonathan Stroud, - - PowerPoint PPT Presentation

ensemble learning and the heritage health prize
SMART_READER_LITE
LIVE PREVIEW

Ensemble Learning and the Heritage Health Prize Jonathan Stroud, - - PowerPoint PPT Presentation

Ensemble Learning and the Heritage Health Prize Jonathan Stroud, Igii Enverga, Tiffany Silverstein, Brian Song, and Taylor Rogers iCAMP 2012 University of California, Irvine Advisors: Max Welling, Alexander Ihler, Sungjin Ahn, and Qiang Liu


slide-1
SLIDE 1

Ensemble Learning and the Heritage Health Prize

Jonathan Stroud, Igii Enverga, Tiffany Silverstein, Brian Song, and Taylor Rogers

iCAMP 2012 University of California, Irvine Advisors: Max Welling, Alexander Ihler, Sungjin Ahn, and Qiang Liu

August 14, 2012

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

slide-2
SLIDE 2

The Heritage Health Prize

◮ Goal: Identify patients who will be admitted to a hospital

within the next year, using historical claims data.[1]

◮ 1,250 teams

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

slide-3
SLIDE 3

Purpose

◮ Reduce cost of unnecessary hospital admissions per year ◮ Identify at-risk patients earlier

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

slide-4
SLIDE 4

Kaggle

◮ Public online competitions ◮ Gives feedback on prediction models

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

slide-5
SLIDE 5

Data

◮ Provided through Kaggle ◮ Three years of patient data ◮ Two years include days spent in hospital (training set)

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

slide-6
SLIDE 6

Evaluation

Root Mean Squared Logarithmic Error (RMSLE) ε =

  • 1

n

n

  • i

[log(pi + 1) − log(ai + 1)]2 Threshold: ε ≤ .4

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

slide-7
SLIDE 7

The Netflix Prize

◮ $1 Million prize ◮ Leading teams combined predictors to pass threshold

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

slide-8
SLIDE 8

Blending

Blend several predictors to create a more accurate predictor

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

slide-9
SLIDE 9

Prediction Models

◮ Optimized Constant Value ◮ K-Nearest Neighbors ◮ Logistic Regression ◮ Support Vector Regression ◮ Random Forests ◮ Gradient Boosting Machines ◮ Neural Networks

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

slide-10
SLIDE 10

Feature Selection

◮ Used Market Makers method [2] ◮ Reduced each patient to vector of 139 features

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

slide-11
SLIDE 11

Optimized Constant Value

◮ Predicts same number of days for each patient ◮ Best constant prediction is p = 0.209179

RMSLE: 0.486459 (800th place)

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

slide-12
SLIDE 12

K-Nearest Neighbors

◮ Weighted average of closest neighbors ◮ Very slow

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

slide-13
SLIDE 13

Eigenvalue Decomposition

Reduces number of features for each patient Xk = λ−1/2

k

UT

k Xc

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

slide-14
SLIDE 14

K-Nearest Neighbors Results

Neighbors: k = 1000 RMSLE: 0.475197 (600th place)

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

slide-15
SLIDE 15

Logistic Regression

RMSLE: 0.466726 (375th place)

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

slide-16
SLIDE 16

Support Vector Regression

ε = .02 RMSLE: 0.467152 (400th place)

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

slide-17
SLIDE 17

Decision Trees

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

slide-18
SLIDE 18

Random Forests

RMSLE: 0.464918 (315th place)

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

slide-19
SLIDE 19

Gradient Boosting Machines

Trees = 8000 Shrinkage = 0.002 Depth = 7 Minimum Observations = 100 RMSLE: 0.462998 (200th place)

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

slide-20
SLIDE 20

Artificial Neural Networks

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

slide-21
SLIDE 21

Back Propagation in Neural Networking

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

slide-22
SLIDE 22

Neural Networking Results

Number of hidden neurons = 7 Number of cycles = 3000 RMSLE: 0.465705 (340th place)

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

slide-23
SLIDE 23

Individual Predictors (Summary)

◮ Optimized Constant Value

0.486459 (800th place)

◮ K-Nearest Neighbors

0.475197 (600th place)

◮ Logistic Regression

0.466726 (375th place)

◮ Support Vector Regression

0.467152 (400th place)

◮ Random Forests

0.464918 (315th place)

◮ Gradient Boosting Machines

0.462998 (200th place)

◮ Neural Networks

0.465705 (340th place)

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

slide-24
SLIDE 24

Individual Predictors (Summary)

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

slide-25
SLIDE 25

Deriving the Blending Algorithm

Error (RMSE) ε =

  • 1

n

n

  • i=1

(Xi − Yi)2 nε2

c = n

  • i=1

(Xi − Yi)2 nε2

0 = n

  • i=1

Y 2

i

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

slide-26
SLIDE 26

Deriving the Blending Algorithm (Continued)

X as a combination of predictors ˜ X = Xw

  • r

˜ Xi =

  • c

wcXic

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

slide-27
SLIDE 27

Deriving the Blending Algorithm (Continued)

Minimizing the cost function C = 1 n

N

  • i=1

(Yi − ˜ Xi)2 ∂C ∂w =

  • i

(Yi −

  • c

wcXic)(−Xic) = 0

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

slide-28
SLIDE 28

Deriving the Blending Algorithm (Continued)

Minimizing the cost function (continued)

  • i

YiXic =

  • i
  • c

wcXicXic Y TX = wT

c X T c X

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

slide-29
SLIDE 29

Deriving the Blending Algorithm (Continued)

Optimizing predictors’ weights wc = (Y TX)(X TX)−1

  • i

YiXic =

  • i

X 2

ic +

  • i

Y 2

ic −

  • i

(Yi − Xic)2

  • i

YiXic =

  • i

X 2

ic + nε2 0 − nε2 c

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

slide-30
SLIDE 30

Deriving the Blending Algorithm (Continued)

Error (RMSE) ε =

  • 1

n

n

  • i=1

(Xi − Yi)2 nε2

c = n

  • i=1

(Xi − Yi)2 nε2

0 = n

  • i=1

Y 2

i

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

slide-31
SLIDE 31

Deriving the Blending Algorithm (Continued)

Optimizing predictors’ weights wc = (Y TX)(X TX)−1

  • i

YiXic =

  • i

X 2

ic +

  • i

Y 2

ic −

  • i

(Yi − Xic)2

  • i

YiXic =

  • i

X 2

ic + nε2 0 − nε2 c

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

slide-32
SLIDE 32

Deriving the Blending Algorithm (Continued)

X as a combination of predictors ˜ X = Xw

  • r

˜ Xi =

  • c

wcXic

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

slide-33
SLIDE 33

Blending Algorithm (Summary)

  • 1. Submit and record all predictions X and errors ε
  • 2. Calculate M = (X TX)−1 and

vc = (X TY )c = 1 2

  • i(X 2

ic + nε2 0 − nε2 c)

  • 3. Because wc = (Y TX)(X TX)−1, calculate weights w = Mv
  • 4. Final blended prediction is ˜

Xi = Xw

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

slide-34
SLIDE 34

Blending Results

RMSLE: 0.461432 (98th place)

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

slide-35
SLIDE 35

Future Work

◮ Optimizing Blending Equation with Regularization Constant

wc = (Y TX)(X TX + λI)−1

◮ Improved feature selection ◮ More predictors

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

slide-36
SLIDE 36

Questions

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

slide-37
SLIDE 37

References

Heritage provider network health prize, 2012. http://www.heritagehealthprize.com/c/hhp. David Vogel Phil Brierley and Randy Axelrod. Market makers - milestone 1 description. September 2011.

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning