workshop 11 classification and regression trees
play

Workshop 11: Classification and Regression Trees Murray Logan - PDF document

Workshop 11: Classification and Regression Trees Murray Logan 26-011-2013 Limitations of Linear Models Feature complexity non-linear trends (GAM) complex (multi-way) interactions non-additive interactions Limitations of Linear


  1. Workshop 11: Classification and Regression Trees Murray Logan 26-011-2013 Limitations of Linear Models • Feature complexity – non-linear trends (GAM) – complex (multi-way) interactions – non-additive interactions Limitations of Linear Models • Feature complexity – non-linear trends (GAM) – complex (multi-way) interactions – non-additive interactions single models fail to capture complexity Limitations of Linear Models • Feature complexity • Prediction • Relative importance • (Multi)collinearity 1

  2. Classication & Regression Trees Advantanges • Feature complexity • Prediction • Relative importance • (Multi)collinearity Classication & Regression Trees Diss-advantanges • over-fitting (over learning) Classication & Regression Trees Classification • Categorical response Regression • Continuous response CART Simple regression trees • split ( partition ) data up into major chunks Simple regression trees • split ( partition ) data up into major chunks – maximizing change in explained deviance – when Gaussian error, ∗ maximizing between group SS ∗ minimizing SSerror 2

  3. Figure 1: Simple regression trees • split ( partition ) data up into major chunks Figure 2: Simple regression trees • split ( partition ) data up into major chunks 3

  4. Figure 3: Simple regression trees • split ( partition ) data up into major chunks Figure 4: Simple regression trees • split these subsets 4

  5. Figure 5: Simple regression trees • split these subsets Figure 6: Simple regression trees • split these subsets 5

  6. Figure 7: Simple regression trees • split these subsets Figure 8: Simple regression trees • recursively partition (split) • decision tree 6

  7. • simple trees tend to overfit. – error is fitted along with the model Figure 9: Simple regression trees Pruning • reduce overfitting – deviance at each terminal node (leaf) Simple regression trees Predictions • partial plots Classification and Regression Trees R packages • simple CART 7

  8. Figure 10: Figure 11: 8

  9. library (tree) • an extension that facilitates (some) non-gaussian errors library (rpart) Classification and Regression Trees Limitations • crude overfitting protection • low resolution • limited error distributions • little scope for random effects Boosted regression Trees Boosting • machine learning meets predictive modelling • ensemble models – sequence of simple Trees (10,000+ trees) – built to predict residuals of previous tree – shrinkage – produce excellent fit Boosted regression Trees Over fitting • over vs under fitting • residual error vs precision • minimizing square error loss 9

  10. Boosted regression Trees minimizing square error loss • test (validation) data – 75% train, 25% test • out of bag – 50% in, 50% out • cross validation – 3 folds Boosted regression Trees Over fitting Error: unused argument (to = c(0.5, 2.5)) Error: unused argument (to = c(0, 1)) Error: object ’at’ not found Boosted regression Trees Predictions Boosted regression Trees Variable importance var rel.inf x1 x1 55.06 x2 x2 44.94 Boosted regression Trees R 2 [1] 0.5952 10

  11. Figure 12: Figure 13: 11

  12. Figure 14: Worked Examples paruelo <- read.table (’../data/paruelo.csv’, header=T, sep=’,’, strip.white=T) head (paruelo) C3 LAT LONG MAP MAT JJAMAP DJFMAP 1 0.65 46.40 119.55 199 12.4 0.12 0.45 2 0.65 47.32 114.27 469 7.5 0.24 0.29 3 0.76 45.78 110.78 536 7.2 0.24 0.20 4 0.75 43.95 101.87 476 8.2 0.35 0.15 5 0.33 46.90 102.82 484 4.8 0.40 0.14 6 0.03 38.87 99.38 623 12.0 0.40 0.11 library (tree) paruelo.tree <- tree (C3 ~ LAT+LONG+MAP+MAT+JJAMAP, data=paruelo) plot ( residuals (paruelo.tree)~ predict (paruelo.tree)) plot (paruelo.tree) text (paruelo.tree, cex=0.75) plot ( prune.tree (paruelo.tree)) 12

  13. Figure 15: plot of chunk unnamed-chunk-9 13

  14. Figure 16: plot of chunk unnamed-chunk-9 14

  15. Figure 17: plot of chunk unnamed-chunk-9 15

  16. paruelo.tree1 <- prune.tree (paruelo.tree, best=4) plot ( residuals (paruelo.tree1)~ predict (paruelo.tree1)) Figure 18: plot of chunk unnamed-chunk-9 plot (paruelo.tree1) text (paruelo.tree1) summary (paruelo.tree1) Regression tree: snip.tree(tree = paruelo.tree, nodes = c(7L, 5L, 4L, 6L)) 16

  17. Figure 19: plot of chunk unnamed-chunk-9 17

  18. Variables actually used in tree construction: [1] "LAT" "MAT" Number of terminal nodes: 4 Residual mean deviance: 0.0304 = 2.1 / 69 Distribution of residuals: Min. 1st Qu. Median Mean 3rd Qu. Max. -0.4220 -0.1010 -0.0325 0.0000 0.0787 0.4820 paruelo.tree1$frame var n dev yval splits.cutleft 1 LAT 73 4.9093 0.2714 <42.785 2 MAT 50 1.6364 0.1592 <7.25 4 <leaf> 12 0.4384 0.3425 5 <leaf> 38 0.6674 0.1013 3 MAT 23 1.2762 0.5152 <6.9 6 <leaf> 12 0.6912 0.4083 7 <leaf> 11 0.2984 0.6318 splits.cutright 1 >42.785 2 >7.25 4 5 3 >6.9 6 7 library (scales) #ys <- with(paruelo, #rescale(C3, from=c(min(C3), max(C3)), #to=c(0.8,0))) #plot(paruelo$LONG,paruelo$LAT,col=grey(ys), #pch=20,xlab="Longitude",ylab="Latitude") #partition.tree(paruelo.tree1,ordvars=c("MAT","LAT"),add=TRUE) #partition.tree(paruelo.tree1,add=TRUE) #Prediction xlat <- seq ( min (paruelo$LAT), max (paruelo$LAT), l=100) #xlong <- mean(paruelo$LONG) #xmat <- mean(paruelo$MAT) pred <- predict (paruelo.tree1, newdata= data.frame (LAT=xlat, LONG= mean (paruelo$LONG), MAT= mean (paruelo$MAT), MAP= mean (paruelo$MAP), JJAMAP= mean (paruelo$JJAMAP))) 18

  19. par (mfrow= c (1,2)) plot (C3~LAT, paruelo, type="p",pch=16, cex=0.2) points ( I ( predict (paruelo.tree1)- resid (paruelo.tree1)) ~LAT, paruelo, type="p",pch=16, col="grey") lines (pred~xlat, col="red", lwd=2) xmat <- seq ( min (paruelo$MAT), max (paruelo$MAT), l=100) #xlat <- mean(paruelo$LAT) pred <- predict (paruelo.tree1, newdata= data.frame (LAT= mean (paruelo$LAT), LONG= mean (paruelo$LONG), MAT=xmat, MAP= mean (paruelo$MAP), JJAMAP= mean (paruelo$JJAMAP))) plot (C3~MAT, paruelo, type="p",pch=16, cex=0.2) points ( I ( predict (paruelo.tree1)- resid (paruelo.tree1))~MAT, paruelo, type="p",pch=16, col="grey" lines (pred~xmat, col="red", lwd=2) #xlong <- seq(min(paruelo$LONG), max(paruelo$LONG), l=100) ##xlat <- mean(paruelo$LAT) #pred <- predict(paruelo.tree1, # newdata=data.frame(LAT=mean(paruelo$LAT), LONG=xlong, MAT=mean(paruelo$MAT), # MAP=mean(paruelo$MAP), # JJAMAP=mean(paruelo$JJAMAP))) #plot(C3~LONG, paruelo, type="p",pch=16, cex=0.2) #points(I(predict(paruelo.tree1)-resid(paruelo.tree1))~LONG, paruelo, type="p",pch=16, col="grey") #lines(pred~xlong, col="red", lwd=2) ## paruelo.tree <- tree(C3 ~ LAT+LONG+MAP+MAT+ ## JJAMAP+DJFMAP, data=paruelo) ## plot(paruelo.tree) ## text(paruelo.tree, cex=0.75) ## xlat <- seq(min(paruelo$LAT), max(paruelo$LAT), l=100) ## xlong <- mean(paruelo$LONG) ## xmat<- mean(paruelo$MAT) ## xmap<- mean(paruelo$MAP) ## xjjamap<- mean(paruelo$JJAMAP) ## xdjfmap<- mean(paruelo$DJFMAP) ## pp <- predict(paruelo.tree1, newdata=data.frame(LAT=xlat, LONG=xlong, MAT=xmat, MAP=xmap, ## par(mfrow=c(1,2)) ## plot(C3~LAT, paruelo, type="p",pch=16, cex=0.2) ## points(I(predict(paruelo.tree1)-resid(paruelo.tree1))~LAT, paruelo, type="p",pch=16, col="grey") ## lines(pp~xlat, col="red", lwd=2) 19

  20. Figure 20: plot of chunk unnamed-chunk-9 20

  21. ## xlat <- mean(paruelo$LAT) ## xlong <- mean(paruelo$LONG) ## xmat<- seq(min(paruelo$MAT), max(paruelo$MAT), l=100) ## xmap<- mean(paruelo$MAP) ## xjjamap<- mean(paruelo$JJAMAP) ## xdjfmap<- mean(paruelo$DJFMAP) ## pp <- predict(paruelo.tree1, newdata=data.frame(LAT=xlat, LONG=xlong, MAT=xmat, MAP=xmap, ## par(mfrow=c(1,2)) ## plot(C3~MAT, paruelo, type="p",pch=16, cex=0.2) ## points(I(predict(paruelo.tree1)-resid(paruelo.tree1))~MAT, paruelo, type="p",pch=16, col="grey") ## lines(pp~xmat, col="red", lwd=2) ## Now GBM library (gbm) paruelo.gbm <- gbm (C3~LAT+LONG+MAP+MAT+JJAMAP+DJFMAP, data=paruelo, distribution="gaussian", n.trees=10000, interaction.depth=3, # 1: additive model, 2: two-way interactions, etc. cv.folds=3, train.fraction=0.75, bag.fraction=0.5, shrinkage=0.001, n.minobsinnode=2) ## Out of Bag method of determining number of iterations (best.iter <- gbm.perf (paruelo.gbm, method="test")) [1] 1533 (best.iter <- gbm.perf (paruelo.gbm, method="OOB")) [1] 1197 (best.iter <- gbm.perf (paruelo.gbm, method="OOB",oobag.curve=TRUE,overlay=TRUE, plot.it=TRUE)) [1] 1197 (best.iter <- gbm.perf (paruelo.gbm, method="cv")) 21

  22. Figure 21: plot of chunk unnamed-chunk-9 22

  23. Figure 22: plot of chunk unnamed-chunk-9 23

  24. [1] 1844 par (mfrow= c (1,2)) Figure 23: plot of chunk unnamed-chunk-9 best.iter <- gbm.perf (paruelo.gbm, method="cv", oobag.curve=TRUE,overlay=TRUE,plot.it=TRUE) best.iter [1] 1844 24

  25. Figure 24: plot of chunk unnamed-chunk-9 25

Recommend


More recommend