Machine Learning: Ensembles of Classifiers Madhavan Mukund Chennai - PowerPoint PPT Presentation

Machine Learning: Ensembles of Classifiers Madhavan Mukund Chennai Mathematical Institute http://www.cmi.ac.in/~madhavan AlgoLabs Certification Course on Machine Learning 24 February, 2015

Bottlenecks in building a classifier Noise : Uncertainty in classification function Bias : Systematic inability to predict a particular value Variance: Variation in model based on sample of training data

Bottlenecks in building a classifier Noise : Uncertainty in classification function Bias : Systematic inability to predict a particular value Variance: Variation in model based on sample of training data Models with high variance are unstable Decision trees: choice of attributes influenced by entropy of training data Overfitting: model is tied too closely to training set Is there an alternative to pruning?

Multiple models Build many models (ensemble) and “average” them How do we build different models from the same data? Strategy to build the model is fixed Same data will produce same model Choose different samples of training data

Bootstrap Aggregating = Bagging Training data has N items TD = { d 1 , d 2 , . . . , d N } Pick a random sample with replacement

Bootstrap Aggregating = Bagging Training data has N items TD = { d 1 , d 2 , . . . , d N } Pick a random sample with replacement Pick an item at random (probability 1 N ) Put it back into the set Repeat K times

Bootstrap Aggregating = Bagging Training data has N items TD = { d 1 , d 2 , . . . , d N } Pick a random sample with replacement Pick an item at random (probability 1 N ) Put it back into the set Repeat K times Some items in the sample will be repeated

Bootstrap Aggregating = Bagging Training data has N items TD = { d 1 , d 2 , . . . , d N } Pick a random sample with replacement Pick an item at random (probability 1 N ) Put it back into the set Repeat K times Some items in the sample will be repeated If sample size is same as data size ( K = N ), expected number of distinct items is (1 − 1 e ) · N Approx 63.2%

Bootstrap Aggregating = Bagging Sample with replacement of size N : bootstrap sample Approx 60% of full training data Take K such samples Build a model for each sample Models will vary because each uses different training data Final classifier: report the majority answer Assumptions: binary classifier, K odd Provably reduces variance

Bagging with decision trees

Random Forest Applying bagging to decision trees with a further twist

Random Forest Applying bagging to decision trees with a further twist Each data item has M attributes Normally, decision tree building chooses one among M attributes, then one among remaining M − 1, . . .

Random Forest Applying bagging to decision trees with a further twist Each data item has M attributes Normally, decision tree building chooses one among M attributes, then one among remaining M − 1, . . . Instead, fix a small limit m < M At each level, choose m of the available attributes at random, and only examine these for next split No pruning Seems to improve on bagging in practice

Boosting Looking at a few attributes gives “rule of thumb” heuristic If Amla does well, South Africa usually wins If opening bowlers take at least 2 wickets within 5 overs, India usually wins . . . Each heuristic is a weak classifier Can we combine such weak classifiers to boost performance and build a strong classifier?

Adaptively boosting a weak classifier (AdaBoost) Weak binary classifier: output is {− 1 , +1 } Initially, all training inputs have equal weight, D 1

Adaptively boosting a weak classifier (AdaBoost) Weak binary classifier: output is {− 1 , +1 } Initially, all training inputs have equal weight, D 1 Build a weak classifier C 1 for D 1 Compute its error rate, e 1 (Details suppressed) Increase weightage to all incorrectly classified inputs, D 2

Adaptively boosting a weak classifier (AdaBoost) Weak binary classifier: output is {− 1 , +1 } Initially, all training inputs have equal weight, D 1 Build a weak classifier C 1 for D 1 Compute its error rate, e 1 (Details suppressed) Increase weightage to all incorrectly classified inputs, D 2 Build a weak classifier C 2 for D 2 Compute its error rate, e 2 Increase weightage to all incorrectly classified inputs, D 3 . . .

Adaptively boosting a weak classifier (AdaBoost) Weak binary classifier: output is {− 1 , +1 } Initially, all training inputs have equal weight, D 1 Build a weak classifier C 1 for D 1 Compute its error rate, e 1 (Details suppressed) Increase weightage to all incorrectly classified inputs, D 2 Build a weak classifier C 2 for D 2 Compute its error rate, e 2 Increase weightage to all incorrectly classified inputs, D 3 . . . Combine the outputs o 1 , o 2 , . . . , o k of C 1 , C 2 , . . . , C k as w 1 o 1 + w 2 o 2 + · · · + w k o k Each weigth w j depends on error rate e j Report the sign (negative �→ − 1, positive �→ +1)

Boosting

Summary Variance in unstable models (e.g., decision trees) can be reduced using an ensemble — bagging Further refinement for decision tree bagging Choose a random small subset of attributes to explore at each level Random Forest Combining weak classifiers (“rules of thumb”) — boosting

References Bagging Predictors , Leo Breiman, http://statistics.berkeley.edu/sites/default/files/ tech-reports/421.pdf Random Forests , Leo Breiman and Adele Cutler, https://www.stat.berkeley.edu/~breiman/RandomForests/ cc_home.htm A Short Introduction to Boosting , Yoav Fruend and Robert E. Schapire, http: //www.site.uottawa.ca/~stan/csi5387/boost-tut-ppr.pdf AdaBoost and the Super Bowl of Classifiers A Tutorial Introduction to Adaptive Boosting , Ra´ ul Rojas, http://www.inf.fu-berlin.de/inst/ag-ki/adaboost4.pdf

Machine Learning: Ensembles of Classifiers Madhavan Mukund Chennai - PowerPoint PPT Presentation

Machine Learning: Ensembles of Classifiers Madhavan Mukund Chennai Mathematical Institute http://www.cmi.ac.in/~madhavan AlgoLabs Certification Course on Machine Learning 24 February, 2015 Bottlenecks in building a classifier Noise :

Nonlinear Classifiers II 2 Nonlinear Classifiers: Introduction Classifiers Supervised

COS424 Scribe Notes Lecture 14: Ensembles Donghun Lee April 8, 2010 1 Ensembles A set of

Machine Learning Nave Bayes classifiers Types of classifiers We can divide the large

Ensembles of Classifiers Larry Holder CSE 6363 Machine Learning Computer Science and

Linear, Binary SVM Classifiers COMPSCI 371D Machine Learning COMPSCI 371D Machine

Cognitive Modeling Unseen Examples 2 Bayes Classifiers Lecture 14: Naive Bayes Classifiers

Linear Classifiers: Expressiveness Machine Learning 1 Lecture outline Linear models:

Fusion of Continuous Output Classifiers Classifiers Jacob Hays Amit Pillay James DeFelice

Occasion-level Classifiers or Event-level Classifiers? -Evidence from Child Language Acquisition

CS440/ECE448 Lecture 22: Including Slides by Svetlana Lazebnik, 10/2016 Linear Classifiers

Unfolding and Shrinking Neural Machine Translation Ensembles Felix Stahlberg and Bill Byrne

Classifiers: Support Vector Machine 1 MACHINE LEARNING What is Classification? Female Adult

Monte Carlo in different ensembles Daan Frenkel Different Ensembles Ensemble Name Constant

Coulomb gas ensembles in 2D H. Hedenmalm December 11, 2015 H. Hedenmalm Coulomb gas ensembles

ENSEMBLES FOR TIME SERIES FORECASTING Mariana Oliveira & Lus Torgo Ensembles for Time

Data Dependence in Data Dependence in Combining Classifiers Combining Classifiers Mohamed

Verification of Hybrid Controlled Processing Systems based on Decomposition and Deduction Goran

SAT-based Verification Methods and Applications in Hardware Verification Aarti Gupta

SMT-BASED ANALYSIS OF BIOLOGICAL SYSTEMS Nicola Paoletti CS department, Oxford University

INTRODUCTION TO THE CCC AND THE CCC COUNCIL June 20, 2017 AN OVERVIEW OF THE COMPUTING

PROPOSITIONAL SATISFIABILITY (SAT) Enrico Giunchiglia DIST , University of Genoa, Italy

1 2 We arent just supporting the technology, we are the people supporting the customer

AT FINITE T AND FOR STRONG COUPLING Mati Pter BME, ELTE, UD Jakovc Antal ELTE The

The B K Puzzle: A Status Report Robert Fleischer CERN, Department of Physics, Theory

Machine Learning: Ensembles of Classifiers Madhavan Mukund Chennai - PowerPoint PPT Presentation

Machine Learning: Ensembles of Classifiers Madhavan Mukund Chennai Mathematical Institute http://www.cmi.ac.in/~madhavan AlgoLabs Certification Course on Machine Learning 24 February, 2015 Bottlenecks in building a classifier Noise :

Nonlinear Classifiers II 2 Nonlinear Classifiers: Introduction Classifiers Supervised

COS424 Scribe Notes Lecture 14: Ensembles Donghun Lee April 8, 2010 1 Ensembles A set of

Machine Learning Nave Bayes classifiers Types of classifiers We can divide the large

Ensembles of Classifiers Larry Holder CSE 6363 Machine Learning Computer Science and

Linear, Binary SVM Classifiers COMPSCI 371D Machine Learning COMPSCI 371D Machine

Cognitive Modeling Unseen Examples 2 Bayes Classifiers Lecture 14: Naive Bayes Classifiers

Linear Classifiers: Expressiveness Machine Learning 1 Lecture outline Linear models:

Fusion of Continuous Output Classifiers Classifiers Jacob Hays Amit Pillay James DeFelice

Occasion-level Classifiers or Event-level Classifiers? -Evidence from Child Language Acquisition

CS440/ECE448 Lecture 22: Including Slides by Svetlana Lazebnik, 10/2016 Linear Classifiers

Unfolding and Shrinking Neural Machine Translation Ensembles Felix Stahlberg and Bill Byrne

Classifiers: Support Vector Machine 1 MACHINE LEARNING What is Classification? Female Adult

Monte Carlo in different ensembles Daan Frenkel Different Ensembles Ensemble Name Constant

Coulomb gas ensembles in 2D H. Hedenmalm December 11, 2015 H. Hedenmalm Coulomb gas ensembles

ENSEMBLES FOR TIME SERIES FORECASTING Mariana Oliveira &amp; Lus Torgo Ensembles for Time

Data Dependence in Data Dependence in Combining Classifiers Combining Classifiers Mohamed

Verification of Hybrid Controlled Processing Systems based on Decomposition and Deduction Goran

SAT-based Verification Methods and Applications in Hardware Verification Aarti Gupta

SMT-BASED ANALYSIS OF BIOLOGICAL SYSTEMS Nicola Paoletti CS department, Oxford University

INTRODUCTION TO THE CCC AND THE CCC COUNCIL June 20, 2017 AN OVERVIEW OF THE COMPUTING

PROPOSITIONAL SATISFIABILITY (SAT) Enrico Giunchiglia DIST , University of Genoa, Italy

1 2 We arent just supporting the technology, we are the people supporting the customer

AT FINITE T AND FOR STRONG COUPLING Mati Pter BME, ELTE, UD Jakovc Antal ELTE The

The B K Puzzle: A Status Report Robert Fleischer CERN, Department of Physics, Theory

ENSEMBLES FOR TIME SERIES FORECASTING Mariana Oliveira & Lus Torgo Ensembles for Time