Random Forests COMPSCI 371D Machine Learning COMPSCI 371D Machine - PowerPoint PPT Presentation

Random Forests COMPSCI 371D — Machine Learning COMPSCI 371D — Machine Learning Random Forests 1 / 10

Outline 1 Motivation 2 Bagging 3 Randomizing Split Dimension 4 Training 5 Inference 6 Out-of-Bag Statistical Risk Estimate COMPSCI 371D — Machine Learning Random Forests 2 / 10

Motivation From Trees to Forests • Trees are flexible → good expressiveness • Trees are flexible → poor generalization • Pruning is an option, but messy and heuristic • Random Decision Forests let several trees vote • Use the bootstrap to give different trees different views of the data • Randomize split rules to make trees even more independent COMPSCI 371D — Machine Learning Random Forests 3 / 10

Bagging Random Forests • M trees instead of one • Train trees to completion (perfectly pure leaves) or to near completion (few samples per leaf) • Give tree m training bag B m • Training samples drawn independently at random with replacement out of T • | B m | = | T | • About 63% of samples from T are in B m • Make trees more independent by randomizing split dim: • Original trees: for j = 1 , . . . , d ( u j ) for t = t ( 1 ) , . . . , t j j • Forest trees: j = random out of 1 , . . . , d for t = t ( 1 ) ( u j ) , . . . , t j j COMPSCI 371D — Machine Learning Random Forests 4 / 10

Randomizing Split Dimension Randomizing Split Dimension j = random out of 1 , . . . , d ( u j ) for t = t ( 1 ) , . . . , t j j • Still search for the optimal threshold • Give up optimality for independence • Dimensions are revisited anyway in a tree • Tree may get deeper, but still achieves zero training loss • Independent splits and different data views lead to good generalization when voting • Bonus: training a single tree is now d times faster • Can be easily parallelized COMPSCI 371D — Machine Learning Random Forests 5 / 10

Training Training function φ ← trainForest ( T , M ) ⊲ M is the desired number of trees φ ← ∅ ⊲ The initial forest has no trees for m = 1 , . . . , M do S ← | T | samples unif. at random out of T with replacement φ ← φ ∪ { trainTree ( S , 0 ) } ⊲ Slightly modified trainTree end for end function COMPSCI 371D — Machine Learning Random Forests 6 / 10

Inference Inference function y ← forestPredict ( x , φ, summary ) V = {} ⊲ A set of values, one per tree, initially empty for τ ∈ φ do y ← predict ( x , τ, summary ) ⊲ The predict function for trees V ← V ∪ { y } end for return summary ( V ) end function COMPSCI 371D — Machine Learning Random Forests 7 / 10

Out-of-Bag Statistical Risk Estimate Out-of-Bag Statistical Risk Estimate • Random forests have “built-in” test splits • Tree m : B m for training, V m = T \ B m for testing • h oob is a predictor that works only for ( x n , y n ) ∈ T : • Let tree m vote for y only if x n / ∈ B m • h oob ( x n ) is the summary of the votes over participating trees • Summary: majority (classification); mean, median (regression) • Out-of-bag risk estimate: • T ′ = { t ∈ T | ∃ m such that t / ∈ B m } (samples that were left out of some bag) • Statistical risk estimate: empirical risk over T ′ : e oob ( h , T ′ ) = 1 � ( x , y ) ∈ T ′ ℓ ( y , h oob ( x )) | T ′ | COMPSCI 371D — Machine Learning Random Forests 8 / 10

Out-of-Bag Statistical Risk Estimate T ′ ≈ T • e oob ( h , T ′ ) can be shown to be an unbiased estimate of the statistical risk • No separate test set needed if T ′ is large enough • How big is T ′ ? • | T ′ | has a binomial distribution with N points, p = 1 − ( 1 − 0 . 37 ) M ≈ 1 as soon as M > 20 • Mean µ ≈ pN , variance σ 2 ≈ p ( 1 − p ) N � 1 − p • σ/µ ≈ pN → 0 quite rapidly with growing M and N • For reasonably large N , the size of T ′ is very predictably about N : Practically all samples in T are also in T ′ COMPSCI 371D — Machine Learning Random Forests 9 / 10

Out-of-Bag Statistical Risk Estimate Summary of Random Forests • Random views of the training data by bagging • Independent decisions by randomizing split dimensions • Ensemble voting leads to good generalization • Number M of trees tuned by cross-validation • OOB estimate can replace final testing • (In practice, that won’t fly for papers) • More efficient to train than a single tree if M < d • Still rather efficient otherwise, and parallelizable • Conceptually simple, easy to adapt to different problems • Lots of freedom about split rule • Example: Hybrid regression/classification problems COMPSCI 371D — Machine Learning Random Forests 10 / 10

Random Forests COMPSCI 371D Machine Learning COMPSCI 371D Machine - PowerPoint PPT Presentation

Random Forests COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning Random Forests 1 / 10 Outline 1 Motivation 2 Bagging 3 Randomizing Split Dimension 4 Training 5 Inference 6 Out-of-Bag Statistical Risk Estimate COMPSCI 371D

Chapter 9 Object recognition Random Forests 9.9 Random forests 2 9.9 Random forests

STK-IN4300 Details of Random Forests Statistical Learning Methods in Data Science Adaptive

Random Forests September 29, 2019 Random Forests September 29, 2019 1 / 30 Motto The clearest

A Look at our Wyoming Forests December 18 - 20, 2013 Governors Task Force on Forests Forests

Random forests and wine Machine Learning Toolbox Random forests Popular type of machine

Random Forests What, Why, And How Andy Liaw Biometrics Research, Merck & Co., Inc.

Forests and Climate Forests and Climate Keeping Earth a Livable Place Keeping Earth a Livable

South- -East East Pahang Pahang Peat Peat South Swamp Forests, Malaysia Swamp Forests,

Mangrove forests and sea level rise 1 / 48 00001 - 00:00:01 Mangrove forests and sea level rise

Econ 2148, fall 2019 Trees, forests, and causal trees Maximilian Kasy Department of Economics,

Random Numbers RANDOM VS PSEUDO RANDOM Truly Random numbers From Wolfram: A random number

Introduction to Machine Learning Random Forests: Proximities compstat-lmu.github.io/lecture_i2ml

Our Changing Forests Harvard Forest Schoolyard Project August 22, 2019 1. How do forests change?

Conservation Plan Update Liz Dent, State Forests Division Chief Brian Pew, State Forests Deputy

Forests NSW Forests NSW Spotted Gum ( Corymbia spp.) Tree improvement and deployment strategy

Lecture #15: Regression Trees & Random Forests Data Science 1 CS 109A, STAT 121A, AC 209A,

I use Blue Waters to simulate an ultracold inferno. Micheline Soley Micheline Soley, Harvard

Artificial Intelligence George Konidaris gdk@cs.brown.edu Fall 2019 1410 Team Instructor :

Analysis by Synthesis of Speech Prosody: from Data to Models. Daniel Hirst Laboratoire Parole

Linear Solvers for Singularly Perturbed Problems Numerical Analysis for Singularly Perturbed

Lecture 15 : Pairs of Discrete Random Variables 0/ 21 Today we start Chapter 5. The transition we

Why and how to use random forest Introduction Construction R functions variable importance

Stat 5101 Lecture Slides: Deck 8 Dirichlet Distribution Charles J. Geyer School of Statistics

OBJECT ORIENTED PROGRAMMING Coin.java and CoinTester.java This excellent tutorial written by

Random Forests COMPSCI 371D Machine Learning COMPSCI 371D Machine - PowerPoint PPT Presentation

Random Forests COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning Random Forests 1 / 10 Outline 1 Motivation 2 Bagging 3 Randomizing Split Dimension 4 Training 5 Inference 6 Out-of-Bag Statistical Risk Estimate COMPSCI 371D

Chapter 9 Object recognition Random Forests 9.9 Random forests 2 9.9 Random forests

STK-IN4300 Details of Random Forests Statistical Learning Methods in Data Science Adaptive

Random Forests September 29, 2019 Random Forests September 29, 2019 1 / 30 Motto The clearest

A Look at our Wyoming Forests December 18 - 20, 2013 Governors Task Force on Forests Forests

Random forests and wine Machine Learning Toolbox Random forests Popular type of machine

Random Forests What, Why, And How Andy Liaw Biometrics Research, Merck &amp; Co., Inc.

Forests and Climate Forests and Climate Keeping Earth a Livable Place Keeping Earth a Livable

South- -East East Pahang Pahang Peat Peat South Swamp Forests, Malaysia Swamp Forests,

Mangrove forests and sea level rise 1 / 48 00001 - 00:00:01 Mangrove forests and sea level rise

Econ 2148, fall 2019 Trees, forests, and causal trees Maximilian Kasy Department of Economics,

Random Numbers RANDOM VS PSEUDO RANDOM Truly Random numbers From Wolfram: A random number

Introduction to Machine Learning Random Forests: Proximities compstat-lmu.github.io/lecture_i2ml

Our Changing Forests Harvard Forest Schoolyard Project August 22, 2019 1. How do forests change?

Conservation Plan Update Liz Dent, State Forests Division Chief Brian Pew, State Forests Deputy

Forests NSW Forests NSW Spotted Gum ( Corymbia spp.) Tree improvement and deployment strategy

Lecture #15: Regression Trees &amp; Random Forests Data Science 1 CS 109A, STAT 121A, AC 209A,

I use Blue Waters to simulate an ultracold inferno. Micheline Soley Micheline Soley, Harvard

Artificial Intelligence George Konidaris gdk@cs.brown.edu Fall 2019 1410 Team Instructor :

Analysis by Synthesis of Speech Prosody: from Data to Models. Daniel Hirst Laboratoire Parole

Linear Solvers for Singularly Perturbed Problems Numerical Analysis for Singularly Perturbed

Lecture 15 : Pairs of Discrete Random Variables 0/ 21 Today we start Chapter 5. The transition we

Why and how to use random forest Introduction Construction R functions variable importance

Stat 5101 Lecture Slides: Deck 8 Dirichlet Distribution Charles J. Geyer School of Statistics

OBJECT ORIENTED PROGRAMMING Coin.java and CoinTester.java This excellent tutorial written by

Random Forests What, Why, And How Andy Liaw Biometrics Research, Merck & Co., Inc.

Lecture #15: Regression Trees & Random Forests Data Science 1 CS 109A, STAT 121A, AC 209A,