CS570 Data Mining Classification: Ensemble Methods Cengiz Gnay - PowerPoint PPT Presentation

CS570 Data Mining Classification: Ensemble Methods Cengiz Günay Dept. Math & CS, Emory University Fall 2013 Some slides courtesy of Han-Kamber-Pei, Tan et al. , and Li Xiong Günay (Emory) Classification: Ensemble Methods Fall 2013 1 / 6

Today Due today midnight: Homework #2 – Frequent itemsets Given today: Homework #3 – Classification Today’s menu: Classification: Ensemble Methods Günay (Emory) Classification: Ensemble Methods Fall 2013 2 / 6

Ensemble Methods • Given a data set, generate multiple models and combine the results • Bagging • Random Forests • Boosting – PAC learning significance

General Idea

Why does it work?  Suppose there are 25 base classifiers Each classifier has error rate, ε = 0.35  Assume classifiers are independent  Probability that the ensemble classifier makes a wrong  prediction: 25 ( i ) ε i ( 1 − ε ) 25 − i = 0.06 25 ∑ i= 13

Types of Ensemble Methods Can be obtained by manipulating: 1 Training set: Bagging Boosting Günay (Emory) Classification: Ensemble Methods Fall 2013 3 / 6

Types of Ensemble Methods Can be obtained by manipulating: 1 Training set: Bagging Boosting 2 Input features: Random forests Multi-objective evolutionary algorithms Forward/backward elimination? Günay (Emory) Classification: Ensemble Methods Fall 2013 3 / 6

Types of Ensemble Methods Can be obtained by manipulating: 1 Training set: Bagging Boosting 2 Input features: Random forests Multi-objective evolutionary algorithms Forward/backward elimination? 3 Class labels: Multi-classes Active learning Günay (Emory) Classification: Ensemble Methods Fall 2013 3 / 6

Types of Ensemble Methods Can be obtained by manipulating: 1 Training set: Bagging Boosting 2 Input features: Random forests Multi-objective evolutionary algorithms Forward/backward elimination? 3 Class labels: Multi-classes Active learning 4 Learning algorithm: ANNs Decision trees Günay (Emory) Classification: Ensemble Methods Fall 2013 3 / 6

Bagging • Create a data set by sampling data points with replacement • Create model based on the data set • Generate more data sets and models • Predict by combining votes – Classification: majority vote – Prediction: average

Bagging  Sampling with replacement Original Data 1 2 3 4 5 6 7 8 9 10 Bagging (Round 1) 7 8 10 8 2 5 10 10 5 9 Bagging (Round 2) 1 4 9 1 2 3 2 7 3 2 Bagging (Round 3) 1 8 5 10 5 5 9 6 3 7  Build classifier on each bootstrap sample  Each sample has probability (1 – 1/n) n of being selected

Bagging Advantages: Less overfitting Helps when classifier is unstable (has high variance) Disadvantages: Not useful when classifier is stable and has large bias Günay (Emory) Classification: Ensemble Methods Fall 2013 4 / 6

PAC learning • Model defining learning with given accuracy and confidence using polynomial sample complexity • References: – L. Valiant. A theory of the learnable. • http://web.mit.edu/6.435/www/Valiant84.pdf – D. Haussler. Overview of the Probably Approximately Correct (PAC) Learning Framework • http://www.cs.iastate.edu/~honavar/pac.pdf

Boosting • Use weak learners and combine to form strong learner in PAC learning sense • Learn using a weak learner • Boost the accuracy by reweighting the examples misclassified by previous weak learner and forcing the next weak learner to focus on the “hard” examples • Predict by using a weighted combination of the weak learners – Weight is determined by their accuracy

Boosting  An iterative procedure to adaptively change distribution of training data by focusing more on previously misclassified records Initially, all N records are assigned equal weights  Unlike bagging, weights may change at the end of boosting  round

Boosting  Records that are wrongly classified will have their weights increased  Records that are classified correctly will have their weights decreased Original Data 1 2 3 4 5 6 7 8 9 10 Boosting (Round 1) 7 3 2 8 7 9 4 10 6 3 Boosting (Round 2) 5 4 9 4 2 5 1 7 4 2 Boosting (Round 3) 4 4 8 10 4 5 4 6 3 4 • Example 4 is hard to classify • Its weight is increased, therefore it is more likely to be chosen again in subsequent rounds

Boosting Advantages: Focuses on samples that are hard to classify Sample weights can be used for: Sampling probability 1 Used by classifier to value them more 2 Adaboost: Calculates classifier importance instead of voting Exponential weight update rules But, susceptible to overfitting Günay (Emory) Classification: Ensemble Methods Fall 2013 5 / 6

Example: AdaBoost  Base classifiers: C 1 , C 2 , …, C T  Error rate: N ε i = 1 N ∑ w j δ ( C i ( x j )≠ y j ) j= 1  Importance of a classifier: 2 ln ( ε i ) 1 − ε i α i = 1

Example: AdaBoost  Weight update: if C j ( x i )≠ y i } Z j { ( j ) exp − α j ( j+ 1 ) = w i if C j ( x i ) =y i w i α j exp where Z j is the normalization factor  If any intermediate rounds produce error rate higher than 50%, the weights are reverted back to 1/n and the resampling procedure is repeated  Classification: T ( ) ∑ = α δ = C * ( x ) arg max C ( x ) y j j y = j 1

Illustrating AdaBoost Initial weights for each data point Data points for training (C) Vipin Kumar, Parallel 11 Issues in Data Mining, V

Illustrating AdaBoost (C) Vipin Kumar, Parallel 12 Issues in Data Mining, V

Random Forests • Sample a data set with replacement • Select m variables at random from p variables • Create a tree • Similarly create more trees • Combine the results • Reference: – Hastie, Tibshirani, Friedman, The Elements of Statistical Learning, Chapter 15

Random Forests Advantages: Only for decision trees Lowers generalization error Uses randomization in tree construction: #features = log 2 d + 1 Equivalent accuracy to Adaboost, but faster See table in Tan et al p. 294 for comparison of ensemble methods. Günay (Emory) Classification: Ensemble Methods Fall 2013 6 / 6

CS570 Data Mining Classification: Ensemble Methods Cengiz Gnay - PowerPoint PPT Presentation

CS570 Data Mining Classification: Ensemble Methods Cengiz Gnay Dept. Math & CS, Emory University Fall 2013 Some slides courtesy of Han-Kamber-Pei, Tan et al. , and Li Xiong Gnay (Emory) Classification: Ensemble Methods Fall 2013 1 /

CS570 Data Mining Frequent Pattern Mining and Association Analysis 2 Cengiz Gunay Slide

CS570 Introduction to Data Mining Frequent Pattern Mining and Association Analysis Cengiz Gunay

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

CS570 Introduction to Data Mining Department of Mathematics and Computer Science Li Xiong Data

CS570 Introduction to Data Mining Classification and Prediction Partial slide credits: Han and

Boosting (ensemble) Module 4 - Ensemble classifiers - Objectives module 4: boosting (ensemble

Classification 1 Classification: Basic Concepts and Methods Classification: Basic Concepts

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

CS570 Introduction to Data Mining Department of Mathematics and Computer Science Li Xiong Today

Ensemble Methods Albert Bifet May 2012 COMP423A/COMP523A Data Stream Mining Outline 1.

Introduction to ensemble methods EN S EMBLE METH ODS IN P YTH ON Romn de las Heras Data

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Data Mining 2020 Text Classification Naive Bayes Ad Feelders Universiteit Utrecht Ad Feelders

Data Mining Based Detection Methods Data Mining in Intrusion detection Feng Pan Outline

Introduction What is data mining? to Data mining functionalities Data Mining Major

Learning and Inference in Markov Logic Networks CS 486/686 University of Waterloo Lecture 23:

Data Mining in Bioinformatics Day 8: Feature Selection in Bioinformatics Karsten Borgwardt

Computing for engineering simulation Data analysis I, II and Experimental Thinking Jin Yoon

Factor Vocab Word 2 Its meaning Introduction to (As it is used A whole number A whole number

Recent Theoretical Advances in Sparse Approximation Joel A. Tropp

Probabilistic Graphical Models Lecture 7 Variable Elimination CS/CNS/EE 155 Andreas Krause

solving systems L. Olson Department of Computer Science University of Illinois at

IntroductiontoIsabelle/HOL [| A 1 ; A 2 ; canbereadasif A 1 and A