Ensemble Learning 4/10/17 Ensemble Learning Hypothesis Space: - PowerPoint PPT Presentation

Ensemble Learning 4/10/17

Ensemble Learning Hypothesis Space: • Supervised learning (data has labels) • Classification (labels are discrete) • Also regression, but the algorithms differ. • The type of mapping that can be learned depends on the base classifiers. Key idea: Train lots of classifiers and have them vote. Base-classifier requirements: • Must be better than random guessing. • Must be (relatively) uncorrelated.

A first try at ensemble learning… We’ve learned lots of methods for classification: • Neural networks • Decision trees • Naïve Bayes • K-nearest neighbors • Support vector machines We could train one of each and let them vote. Problems: • We’d like to vote over more models. • Some of these are quite slow to train. • Errors may be correlated.

A better approach… Train lots of variations on the same model, and pick a simple one, like decision trees. Note: we’ll use decision trees in all our examples, Problem: re-running the decision tree and they’re the algorithm on the same data set will give most popular, but the same ideas the same classifier. apply with other base-learners. Solutions: 1. Change the data set. • Bagging 2. Change the learning algorithm. • Boosting

Bagging ( B ootstrap A gg ggregating ) Key idea: change the data set by sampling with replacement. Resample #1 Resample #2 Data set (size=N) (N samples drawn (N samples drawn with replacement) with replacement) • Train a strong classifier on each sample. • For example: a deep decision tree. • Voting reduces over-fitting. • Different trees will over-fit in different ways.

Boosting Key idea: change the algorithm by restricting its complexity and/or randomizing. • Train lots of weak classifiers. • For example: shallow decision trees (stumps). • Randomize some part of the algorithm. • For example: the sequence of features to split on. • Voting increases accuracy. • Different stumps will make different errors.

What is this accomplishing? Simple models often have high bias. • They can’t fit the data precisely. • They may under-fit the data. Complex models often have high variance. • Small perturbations in the data can drastically change the model. • They may over-fit the data. Boosting and bagging are trying to find a sweet-spot in the bias/ variance tradeoff.

Ensembles and Bias/Variance Bagging fits complex models to resamples of the data set. • Each model will be over-fit to its sample. • The models will have high-variance. • Taking lots of samples and voting reduces the overall variance. Boosting fits simple models to the whole data set. • Each model will be under-fit to the data set. • The models will have high bias. • As long as the biases are uncorrelated, voting reduces the overall bias.

Ada-Boost Algorithm Training: assign equal weight to all data points repeat num_classifiers times: train a classifier on the weighted data set assign a weight to the new classifier to minimize (weighted) error compute weighted error of the ensemble increase weight of misclassified points decrease weight of correctly classified points Prediction: for each classifier in the ensemble: predict(classifier, test_point) return plurality label according to weighted vote

Random Forest Algorithm Training: repeat num_classifiers times: Different from the reading. resample = bootstrap(data set) for max_depth iterations: choose a random feature choose the best split on that feature add tree to ensemble Prediction: for each tree in the ensemble: predict(tree, test_point) return plurality vote over the predictions

Discussion: Extending to Regression How can we extend ensemble learning to regression? 1. Suppose we had a base-learner like linear regression. How can we do boosting or bagging? 2. Suppose we used decision trees as in our earlier examples. How can we extend decision trees to do regression? Hint: think about how we extended K-nearest neighbors to do a type of regression.

Ensemble Learning 4/10/17 Ensemble Learning Hypothesis Space: - PowerPoint PPT Presentation

Ensemble Learning 4/10/17 Ensemble Learning Hypothesis Space: Supervised learning (data has labels) Classification (labels are discrete) Also regression, but the algorithms differ. The type of mapping that can be learned depends

Boosting (ensemble) Module 4 - Ensemble classifiers - Objectives module 4: boosting (ensemble

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

Ensemble Learning INFO-4604, Applied Machine Learning University of Colorado Boulder November

Stochastic Physics Perturbations For Ensemble Forecast Yuejian Zhu Ensemble Team Environmental

Ensemble verification: Old scores, new perspectives Sabrina Wahl, Petra Friederichs, Jan Keller

State Song & Dance Ensemble LIETUVA proposal of cooperation Who are we? We are

Linear ensemble transform filters: A unified perspective on ensemble Kalman and particle filters

Ensemble Docking Revisited Oliver Korb Cambridge Crystallographic Data Centre

Gaussian ensemble screening (GES): A new Gaussian ensemble screening (GES): A new approach to

Ensemble Models for Dependency Parsing: Cheap and Good? Mihai Surdeanu and Christopher D. Manning

Progress Report of Local Ensemble Kalman Progress Report of Local Ensemble Kalman Filter/fvGCM

Introduction to ensemble methods EN S EMBLE METH ODS IN P YTH ON Romn de las Heras Data

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

What is it? Instrument or ensemble? Lars Bo Andersen, Humans and IT research seminar, 13/5-2015

Convergence of ensemble Kalman filters in the large ensemble limit and infinite dimension Jan

CSI5180. MachineLearningfor BioinformaticsApplications Ensemble Learning by Marcel Turcotte

Introduction to Artificial Intelligence Decision Trees, Random Forest Janyl Jumadinova October

1 Real-valued features Non-binary class variable Noise and overfjtting 1.1

Decision Tree Learning Based on Machine Learning, T. Mitchell, McGRAW Hill, 1997, ch. 3

Lecture 8: Regression Trees Instructor: Saravanan Thirumuruganathan CSE 5334 Saravanan

Markov Chains Gonzalo Mateos Dept. of ECE and Goergen Institute for Data Science University of

A first intermediate class with limit object Jaroslav Neetil Patrice Ossona de Mendez Charles

Some representation theory arising from set-theoretic homological algebra Jan Trlifaj Univerzita

A projective Fra ss e presentation of the Menger curve Aristotelis Panagiotopoulos,

Ensemble Learning 4/10/17 Ensemble Learning Hypothesis Space: - PowerPoint PPT Presentation

Ensemble Learning 4/10/17 Ensemble Learning Hypothesis Space: Supervised learning (data has labels) Classification (labels are discrete) Also regression, but the algorithms differ. The type of mapping that can be learned depends

Boosting (ensemble) Module 4 - Ensemble classifiers - Objectives module 4: boosting (ensemble

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

Ensemble Learning INFO-4604, Applied Machine Learning University of Colorado Boulder November

Stochastic Physics Perturbations For Ensemble Forecast Yuejian Zhu Ensemble Team Environmental

Ensemble verification: Old scores, new perspectives Sabrina Wahl, Petra Friederichs, Jan Keller

State Song &amp; Dance Ensemble LIETUVA proposal of cooperation Who are we? We are

Linear ensemble transform filters: A unified perspective on ensemble Kalman and particle filters

Ensemble Docking Revisited Oliver Korb Cambridge Crystallographic Data Centre

Gaussian ensemble screening (GES): A new Gaussian ensemble screening (GES): A new approach to

Ensemble Models for Dependency Parsing: Cheap and Good? Mihai Surdeanu and Christopher D. Manning

Progress Report of Local Ensemble Kalman Progress Report of Local Ensemble Kalman Filter/fvGCM

Introduction to ensemble methods EN S EMBLE METH ODS IN P YTH ON Romn de las Heras Data

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

What is it? Instrument or ensemble? Lars Bo Andersen, Humans and IT research seminar, 13/5-2015

Convergence of ensemble Kalman filters in the large ensemble limit and infinite dimension Jan

CSI5180. MachineLearningfor BioinformaticsApplications Ensemble Learning by Marcel Turcotte

Introduction to Artificial Intelligence Decision Trees, Random Forest Janyl Jumadinova October

1 Real-valued features Non-binary class variable Noise and overfjtting 1.1

Decision Tree Learning Based on Machine Learning, T. Mitchell, McGRAW Hill, 1997, ch. 3

Lecture 8: Regression Trees Instructor: Saravanan Thirumuruganathan CSE 5334 Saravanan

Markov Chains Gonzalo Mateos Dept. of ECE and Goergen Institute for Data Science University of

A first intermediate class with limit object Jaroslav Neetil Patrice Ossona de Mendez Charles

Some representation theory arising from set-theoretic homological algebra Jan Trlifaj Univerzita

A projective Fra ss e presentation of the Menger curve Aristotelis Panagiotopoulos,

State Song & Dance Ensemble LIETUVA proposal of cooperation Who are we? We are