introd u ction to random forest
play

Introd u ction to Random Forest TR E E - BASE D MOD E L S IN R - PowerPoint PPT Presentation

Introd u ction to Random Forest TR E E - BASE D MOD E L S IN R Erin LeDell Instr u ctor Random Forest be er performance sample s u bset of the feat u res impro v ed v ersion of bagging red u ced correlation bet w een the sampled trees TREE


  1. Introd u ction to Random Forest TR E E - BASE D MOD E L S IN R Erin LeDell Instr u ctor

  2. Random Forest be � er performance sample s u bset of the feat u res impro v ed v ersion of bagging red u ced correlation bet w een the sampled trees TREE - BASED MODELS IN R

  3. Random Forest in R library(randomForest) ?randomForest TREE - BASED MODELS IN R

  4. randomForest E x ample library(randomForest) # Train a default RF model (500 trees) model <- randomForest(formula = response ~ ., data = train) TREE - BASED MODELS IN R

  5. Let ' s practice ! TR E E - BASE D MOD E L S IN R

  6. Understanding the Random Forest model o u tp u t TR E E - BASE D MOD E L S IN R Erin LeDell Instr u ctor

  7. Random Forest o u tp u t # Print the credit_model output print(credit_model) Call: randomForest(formula = default ~ ., data = credit_train) Type of random forest: classification Number of trees: 500 No. of variables tried at each split: 4 OOB estimate of error rate: 24.12% Confusion matrix: no yes class.error no 516 46 0.08185053 yes 147 91 0.61764706 TREE - BASED MODELS IN R

  8. O u t - of - bag error matri x # Grab OOB error matrix & take a look err <- credit_model$err.rate head(err) OOB no yes [1,] 0.3414634 0.2657005 0.5375000 [2,] 0.3311966 0.2462908 0.5496183 [3,] 0.3232831 0.2476636 0.5147929 [4,] 0.3164933 0.2180294 0.5561224 [5,] 0.3197756 0.2095808 0.5801887 [6,] 0.3176944 0.2115385 0.5619469 TREE - BASED MODELS IN R

  9. O u t - of - bag error estimate OOB # Look at final OOB error rate 0.24125 oob_err <- err[nrow(err), "OOB"] print(oob_err) print(credit_model) Call: randomForest(formula = default ~ ., data = credit_train) Type of random forest: classification Number of trees: 500 No. of variables tried at each split: 4 OOB i f 24 12% TREE - BASED MODELS IN R

  10. Plot the OOB error rates TREE - BASED MODELS IN R

  11. Let ' s practice ! TR E E - BASE D MOD E L S IN R

  12. OOB error v s . test set error TR E E - BASE D MOD E L S IN R Erin LeDell Instr u ctor

  13. Ad v antages & Disad v antages of OOB estimates ? Can e v al u ate y o u r model w itho u t a separate test set ? Comp u ted a u tomaticall y b y the randomForest() f u nction ? OOB Error onl y estimates error ( not AUC , log - loss , etc .) ? Can ' t compare Random Forest performance to other t y pes of models TREE - BASED MODELS IN R

  14. Let ' s practice ! TR E E - BASE D MOD E L S IN R

  15. T u ning a Random Forest model TR E E - BASE D MOD E L S IN R Erin LeDell Instr u ctor

  16. Random Forest H y perparameters ntree : n u mber of trees mtr y : n u mber of v ariables randoml y sampled as candidates at each split sampsi z e : n u mber of samples to train on nodesi z e : minim u m si z e ( n u mber of samples ) of the terminal nodes ma x nodes : ma x im u m n u mber of terminal nodes TREE - BASED MODELS IN R

  17. T u ning mtr y w ith t u neRF () # Execute the tuning process set.seed(1) res <- tuneRF(x = train_predictor_df, y = train_response_vector, ntreeTry = 500) # Look at results print(res) mtry OOBError 2.OOB 2 0.2475 4.OOB 4 0.2475 8.OOB 8 0.2425 TREE - BASED MODELS IN R

  18. Let ' s practice ! TR E E - BASE D MOD E L S IN R

Recommend


More recommend