Photo byUnsplash user @nathananderson BBM406 Fundamentals of Machine Learning Lecture 19: What is Ensemble Learning? Bagging Random Forests Aykut Erdem // Hacettepe University // Fall 2019
Last time… Decision Trees slide by David Sontag 2
Last time… Information Gain • Decrease&in&entropy&(uncertainty)&aper&spliong& X 1 X 2 Y In our running example: T T T T F T IG(X 1 ) = H(Y) – H(Y|X 1 ) T T T = 0.65 – 0.33 T F T IG(X 1 ) > 0 we prefer the split! F T T slide by David Sontag F F F 3
Last time… Continuous features • Binary tree, split on attribute X - One branch: X < t - Other branch: X ≥ t • Search through possible values of t - Seems hard!!! • But only a finite number of t ’s are important: X j c 1 c 2 t 1 t 2 • Sort data according to X into { x 1 ,..., x m } • Consider split points of the form x i + ( x i +1 – x i )/2 • Moreover, only splits between examples from di ff erent classes matter! slide by David Sontag X j c 1 c 2 4 t 2 t 1
Last time… Decision trees will overfit • Standard decision trees have no learning bias - Training set error is always zero! (If there is no label noise) • - Lots of variance - Must introduce some bias towards simpler trees • Many strategies for picking simpler trees - Fixed depth - Fixed number of leaves slide by David Sontag • Random forests 5
Today • Ensemble Methods - Bagging Random Forests • 6
Ensemble Methods • High level idea – Generate multiple hypotheses – Combine them to a single classifier • Two important questions – How do we generate multiple hypotheses • we have only one sample – How do we combine the multiple hypotheses • Majority, AdaBoost, ... slide by Yishay Mansour 7
Bias/Variance Tradeo ff Bias/Variance&Tradeoff& slide by David Sontag Hastie, Tibshirani, Friedman “Elements of Statistical Learning” 2001 8
Bias/Variance Tradeo ff slide by David Sontag Graphical illustration of bias and variance. http://scott.fortmann-roe.com/docs/BiasVariance.html 9
Fighting the bias-variance tradeo ff • Simple (a.k.a. weak) learners are good - e.g., naïve Bayes, logistic regression, decision stumps (or shallow decision trees) - Low variance, don’t usually overfit • Simple (a.k.a. weak) learners are bad – High bias, can’t solve hard learning problems slide by Aarti Singh 10
Reduce Variance Without Increasing Bias • Averaging reduces variance: (when prediction are independent ) • Average models to reduce model variance • One problem: - Only one training set - Where do multiple models come from? slide by David Sontag 11
Bagging (Bootstrap Aggregating) • Leo Breiman (1994) • Take repeated bootstrap samples from training set D. • Bootstrap sampling: Given set D containing N training examples, create D ’ by drawing N examples at random with replacement from D. • Bagging: - Create k bootstrap samples D 1 ... D k . - Train distinct classifier on each D i . slide by David Sontag - Classify new instance by majority vote / average. 12
Bagging • Best case: • In practice: - models are correlated, so reduction is smaller than 1/N - variance of models trained on fewer training cases usually somewhat larger slide by David Sontag 13
14 Bagging Example slide by David Sontag
CART* decision boundary slide by David Sontag * A decision tree learning algorithm; very similar to ID3 15
100 bagged trees slide by David Sontag • Shades of blue/red indicate strength of vote for particular classification 16
Random Forests 17
Random Forests • Ensemble method specifically designed for decision tree classifiers • Introduce two sources of randomness: “Bagging” and “Random input vectors” - Bagging method: each tree is grown using a bootstrap sample of training data - Random vector method: At each node , best split is chosen from a random sample of m attributes instead of all attributes slide by David Sontag 18
Classification tree Classification tree Data in feature space training ?" ?" ?" slide by Nando de Freitas [Criminisi et al., 2011] 19
Use information gain to decide splits Split&1& Before&split& Split&2& slide by Nando de Freitas [Criminisi et al., 2011] 20
Advanced: Gaussian information gain to decide splits Split&1& Before&split& Split&2& slide by Nando de Freitas [Criminisi et al., 2011] 21
Split&1& 𝜾 =1 𝜾 =2 … Split node (train) leaf% leaf% Leaf model: probabilistic Split node (test) Node%weak%learner% leaf% slide by Nando de Freitas [Criminisi et al., 2011] 22
Alternative node decisions slide by Nando de Freitas axis aligned oriented line conic section examples of weak learners 23
Building a random tree Building a random tree slide by Nando de Freitas 24
Random Forests algorithm slide by Nando de Freitas 25 [From the book of Hastie, Friedman and Tibshirani]
Randomization slide by Nando de Freitas 26
Building a forest (ensemble) Tree t=1 t=2 t=3 slide by Nando de Freitas 27
E ff ect of forest size slide by Nando de Freitas 28
29 E ff ect of forest size slide by Nando de Freitas
E ff ect of more classes and noise Effect of more classes and noise slide by Nando de Freitas 30 [Criminisi et al, 2011]
E ff ect of more classes and noise slide by Nando de Freitas 31
E ff ect of tree depth (D) Training'points:'4.class'mixed' slide by Nando de Freitas D=3 D=6 D=15 (underfitting) (overfitting) 32
E ff ect of bagging no bagging => max-margin slide by Nando de Freitas 33
Random Forests and the Kinect slide by Nando de Freitas 34
Random Forests and the Kinect adopted from Nando de Freitas depth image body parts 3D joint proposals [Jamie Shotton et al., 2011] 35
Random Forests and the Kinect • Use computer graphics to generate plenty of data synthetic (train & test) real (test) adopted from Nando de Freitas [Jamie Shotton et al., 2011] 36
Reduce Bias 2 and Decrease Variance? • Bagging reduces variance by averaging • Bagging has little e ff ect on bias • Can we average and reduce bias? • Yes: Boosting slide by David Sontag 37
Next Lecture: Boosting 38
Recommend
More recommend