Workshop 11: Classification and Regression Trees Murray Logan - PDF document

Workshop 11: Classification and Regression Trees Murray Logan 26-011-2013 Limitations of Linear Models • Feature complexity – non-linear trends (GAM) – complex (multi-way) interactions – non-additive interactions Limitations of Linear Models • Feature complexity – non-linear trends (GAM) – complex (multi-way) interactions – non-additive interactions single models fail to capture complexity Limitations of Linear Models • Feature complexity • Prediction • Relative importance • (Multi)collinearity 1

Classication & Regression Trees Advantanges • Feature complexity • Prediction • Relative importance • (Multi)collinearity Classication & Regression Trees Diss-advantanges • over-fitting (over learning) Classication & Regression Trees Classification • Categorical response Regression • Continuous response CART Simple regression trees • split ( partition ) data up into major chunks Simple regression trees • split ( partition ) data up into major chunks – maximizing change in explained deviance – when Gaussian error, ∗ maximizing between group SS ∗ minimizing SSerror 2

Figure 1: Simple regression trees • split ( partition ) data up into major chunks Figure 2: Simple regression trees • split ( partition ) data up into major chunks 3

Figure 3: Simple regression trees • split ( partition ) data up into major chunks Figure 4: Simple regression trees • split these subsets 4

Figure 5: Simple regression trees • split these subsets Figure 6: Simple regression trees • split these subsets 5

Figure 7: Simple regression trees • split these subsets Figure 8: Simple regression trees • recursively partition (split) • decision tree 6

• simple trees tend to overfit. – error is fitted along with the model Figure 9: Simple regression trees Pruning • reduce overfitting – deviance at each terminal node (leaf) Simple regression trees Predictions • partial plots Classification and Regression Trees R packages • simple CART 7

Figure 10: Figure 11: 8

library (tree) • an extension that facilitates (some) non-gaussian errors library (rpart) Classification and Regression Trees Limitations • crude overfitting protection • low resolution • limited error distributions • little scope for random effects Boosted regression Trees Boosting • machine learning meets predictive modelling • ensemble models – sequence of simple Trees (10,000+ trees) – built to predict residuals of previous tree – shrinkage – produce excellent fit Boosted regression Trees Over fitting • over vs under fitting • residual error vs precision • minimizing square error loss 9

Boosted regression Trees minimizing square error loss • test (validation) data – 75% train, 25% test • out of bag – 50% in, 50% out • cross validation – 3 folds Boosted regression Trees Over fitting Error: unused argument (to = c(0.5, 2.5)) Error: unused argument (to = c(0, 1)) Error: object ’at’ not found Boosted regression Trees Predictions Boosted regression Trees Variable importance var rel.inf x1 x1 55.06 x2 x2 44.94 Boosted regression Trees R 2 [1] 0.5952 10

Figure 12: Figure 13: 11

Figure 14: Worked Examples paruelo <- read.table (’../data/paruelo.csv’, header=T, sep=’,’, strip.white=T) head (paruelo) C3 LAT LONG MAP MAT JJAMAP DJFMAP 1 0.65 46.40 119.55 199 12.4 0.12 0.45 2 0.65 47.32 114.27 469 7.5 0.24 0.29 3 0.76 45.78 110.78 536 7.2 0.24 0.20 4 0.75 43.95 101.87 476 8.2 0.35 0.15 5 0.33 46.90 102.82 484 4.8 0.40 0.14 6 0.03 38.87 99.38 623 12.0 0.40 0.11 library (tree) paruelo.tree <- tree (C3 ~ LAT+LONG+MAP+MAT+JJAMAP, data=paruelo) plot ( residuals (paruelo.tree)~ predict (paruelo.tree)) plot (paruelo.tree) text (paruelo.tree, cex=0.75) plot ( prune.tree (paruelo.tree)) 12

Figure 15: plot of chunk unnamed-chunk-9 13

paruelo.tree1 <- prune.tree (paruelo.tree, best=4) plot ( residuals (paruelo.tree1)~ predict (paruelo.tree1)) Figure 18: plot of chunk unnamed-chunk-9 plot (paruelo.tree1) text (paruelo.tree1) summary (paruelo.tree1) Regression tree: snip.tree(tree = paruelo.tree, nodes = c(7L, 5L, 4L, 6L)) 16

Variables actually used in tree construction: [1] "LAT" "MAT" Number of terminal nodes: 4 Residual mean deviance: 0.0304 = 2.1 / 69 Distribution of residuals: Min. 1st Qu. Median Mean 3rd Qu. Max. -0.4220 -0.1010 -0.0325 0.0000 0.0787 0.4820 paruelo.tree1$frame var n dev yval splits.cutleft 1 LAT 73 4.9093 0.2714 <42.785 2 MAT 50 1.6364 0.1592 <7.25 4 <leaf> 12 0.4384 0.3425 5 <leaf> 38 0.6674 0.1013 3 MAT 23 1.2762 0.5152 <6.9 6 <leaf> 12 0.6912 0.4083 7 <leaf> 11 0.2984 0.6318 splits.cutright 1 >42.785 2 >7.25 4 5 3 >6.9 6 7 library (scales) #ys <- with(paruelo, #rescale(C3, from=c(min(C3), max(C3)), #to=c(0.8,0))) #plot(paruelo$LONG,paruelo$LAT,col=grey(ys), #pch=20,xlab="Longitude",ylab="Latitude") #partition.tree(paruelo.tree1,ordvars=c("MAT","LAT"),add=TRUE) #partition.tree(paruelo.tree1,add=TRUE) #Prediction xlat <- seq ( min (paruelo$LAT), max (paruelo$LAT), l=100) #xlong <- mean(paruelo$LONG) #xmat <- mean(paruelo$MAT) pred <- predict (paruelo.tree1, newdata= data.frame (LAT=xlat, LONG= mean (paruelo$LONG), MAT= mean (paruelo$MAT), MAP= mean (paruelo$MAP), JJAMAP= mean (paruelo$JJAMAP))) 18

par (mfrow= c (1,2)) plot (C3~LAT, paruelo, type="p",pch=16, cex=0.2) points ( I ( predict (paruelo.tree1)- resid (paruelo.tree1)) ~LAT, paruelo, type="p",pch=16, col="grey") lines (pred~xlat, col="red", lwd=2) xmat <- seq ( min (paruelo$MAT), max (paruelo$MAT), l=100) #xlat <- mean(paruelo$LAT) pred <- predict (paruelo.tree1, newdata= data.frame (LAT= mean (paruelo$LAT), LONG= mean (paruelo$LONG), MAT=xmat, MAP= mean (paruelo$MAP), JJAMAP= mean (paruelo$JJAMAP))) plot (C3~MAT, paruelo, type="p",pch=16, cex=0.2) points ( I ( predict (paruelo.tree1)- resid (paruelo.tree1))~MAT, paruelo, type="p",pch=16, col="grey" lines (pred~xmat, col="red", lwd=2) #xlong <- seq(min(paruelo$LONG), max(paruelo$LONG), l=100) ##xlat <- mean(paruelo$LAT) #pred <- predict(paruelo.tree1, # newdata=data.frame(LAT=mean(paruelo$LAT), LONG=xlong, MAT=mean(paruelo$MAT), # MAP=mean(paruelo$MAP), # JJAMAP=mean(paruelo$JJAMAP))) #plot(C3~LONG, paruelo, type="p",pch=16, cex=0.2) #points(I(predict(paruelo.tree1)-resid(paruelo.tree1))~LONG, paruelo, type="p",pch=16, col="grey") #lines(pred~xlong, col="red", lwd=2) ## paruelo.tree <- tree(C3 ~ LAT+LONG+MAP+MAT+ ## JJAMAP+DJFMAP, data=paruelo) ## plot(paruelo.tree) ## text(paruelo.tree, cex=0.75) ## xlat <- seq(min(paruelo$LAT), max(paruelo$LAT), l=100) ## xlong <- mean(paruelo$LONG) ## xmat<- mean(paruelo$MAT) ## xmap<- mean(paruelo$MAP) ## xjjamap<- mean(paruelo$JJAMAP) ## xdjfmap<- mean(paruelo$DJFMAP) ## pp <- predict(paruelo.tree1, newdata=data.frame(LAT=xlat, LONG=xlong, MAT=xmat, MAP=xmap, ## par(mfrow=c(1,2)) ## plot(C3~LAT, paruelo, type="p",pch=16, cex=0.2) ## points(I(predict(paruelo.tree1)-resid(paruelo.tree1))~LAT, paruelo, type="p",pch=16, col="grey") ## lines(pp~xlat, col="red", lwd=2) 19

## xlat <- mean(paruelo$LAT) ## xlong <- mean(paruelo$LONG) ## xmat<- seq(min(paruelo$MAT), max(paruelo$MAT), l=100) ## xmap<- mean(paruelo$MAP) ## xjjamap<- mean(paruelo$JJAMAP) ## xdjfmap<- mean(paruelo$DJFMAP) ## pp <- predict(paruelo.tree1, newdata=data.frame(LAT=xlat, LONG=xlong, MAT=xmat, MAP=xmap, ## par(mfrow=c(1,2)) ## plot(C3~MAT, paruelo, type="p",pch=16, cex=0.2) ## points(I(predict(paruelo.tree1)-resid(paruelo.tree1))~MAT, paruelo, type="p",pch=16, col="grey") ## lines(pp~xmat, col="red", lwd=2) ## Now GBM library (gbm) paruelo.gbm <- gbm (C3~LAT+LONG+MAP+MAT+JJAMAP+DJFMAP, data=paruelo, distribution="gaussian", n.trees=10000, interaction.depth=3, # 1: additive model, 2: two-way interactions, etc. cv.folds=3, train.fraction=0.75, bag.fraction=0.5, shrinkage=0.001, n.minobsinnode=2) ## Out of Bag method of determining number of iterations (best.iter <- gbm.perf (paruelo.gbm, method="test")) [1] 1533 (best.iter <- gbm.perf (paruelo.gbm, method="OOB")) [1] 1197 (best.iter <- gbm.perf (paruelo.gbm, method="OOB",oobag.curve=TRUE,overlay=TRUE, plot.it=TRUE)) [1] 1197 (best.iter <- gbm.perf (paruelo.gbm, method="cv")) 21

[1] 1844 par (mfrow= c (1,2)) Figure 23: plot of chunk unnamed-chunk-9 best.iter <- gbm.perf (paruelo.gbm, method="cv", oobag.curve=TRUE,overlay=TRUE,plot.it=TRUE) best.iter [1] 1844 24

Workshop 11: Classification and Regression Trees Murray Logan - PDF document

Workshop 11: Classification and Regression Trees Murray Logan 26-011-2013 Limitations of Linear Models Feature complexity non-linear trends (GAM) complex (multi-way) interactions non-additive interactions Limitations of Linear

Trees Trees CSE, IIT KGP Trees and Spanning Trees Trees and Spanning Trees A graph having

Outline Univariate Trees 1 Decision Trees Classification Regression Pruning Steven J Zeil

Trees Applied Multivariate Statistics Spring 2012 Overview Intuition for Trees

( ( ) ) ( ) ( ) = = Work = h log t n B- B -Trees Trees B B- -Trees

Trees Chapter 11 Chapter Summary Introduction to Trees Applications of Trees Tree

Lecture 8: Regression Trees Instructor: Saravanan Thirumuruganathan CSE 5334 Saravanan

Trees Eric McCreath Overview In this lecture we will explore: general trees, binary trees,

Classification or Regression? Regression Classification: want to learn a discrete target

Regression trees DAAG Chapter 11 Learning objectives In this section, we will learn about

2-3-4 Trees and Red- Black Trees 204 erm CS 16: Balanced Trees 2-3-4 Trees Revealed Nodes

/ + - * * 5 3 2 6 5 2 Examples Binary Trees BSTs Augmenting BinExpr General Trees

Introduction to Machine Learning Classification and Regression Trees (CART): Basics

Decision Trees TJ Machine Learning Club Classification vs. Regression Classification

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

10-601 Machine Learning Regression Outline Regression vs Classification Linear regression

Evaluating an Alternative CS1 for Students with Prior Programming Experience Michael S.

Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression

Stat 8053, Fall 2013: Robust Regression Duncans occupational-prestige regression was introduced

Least Squares Estimation-Finite-Sample Properties Ping Yu School of Economics and Finance The

1A89 Push-Open Undermount Slide (16mm) Specifications Length: 250mm to 600mm Travel:

Employer Training: Adjustments Slide 1 Adjustment Reporting - Always contact your PERA

Pricing Uncertainty Induced by Climate Change Michael Barnett William Brock Lars Peter Hansen

Work in in Africa Zainab Usman, PhD (World Bank) Presentation at the Transforming economies