Ensemble Methods (Model Combination) Duen Horng (Polo) Chau - PowerPoint PPT Presentation

Jan 22, 2023 •392 likes •506 views

http://poloclub.gatech.edu/cse6242 CSE6242 / CX4242: Data & Visual Analytics Ensemble Methods (Model Combination) Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech Parishit Ram

http://poloclub.gatech.edu/cse6242   CSE6242 / CX4242: Data & Visual Analytics   Ensemble Methods   (Model Combination) Duen Horng (Polo) Chau   Assistant Professor   Associate Director, MS Analytics   Georgia Tech Parishit Ram   GT PhD alum; SkyTree Partly based on materials by   Professors Guy Lebanon, Jeffrey Heer, John Stasko, Christos Faloutsos, Parishit Ram (GT PhD alum; SkyTree), Alex Gray
Numerous Possible Classifiers! Classifier Training Cross   Testing time Accuracy time validation kNN   None Can be slow Slow ?? classifier Decision   Slow Very slow Very fast ?? trees Naive   Fast None Fast ?? Bayes   classifier … … … … … 2
Which Classifier/Model to Choose? Possible strategies: • Go from simplest model to more complex model until you obtain desired accuracy • Discover a new model if the existing ones do not work for you • Combine all (simple) models 3
Common Strategy: Bagging   ( B ootstrap Agg regat ing ) Consider the data set S = {(x i , y i )} i=1,..,n • Pick a sample S * with replacement of size n • Train on S * to get a classifier f * • Repeat above steps B times to get f 1 , f 2 ,...,f B • Final classifier f(x) = majority {f b (x)} j=1,...,B http://statistics.about.com/od/Applications/a/What-Is-Bootstrapping.htm 4
Common Strategy: Bagging Why would bagging work? • Combining multiple classifiers reduces the variance of the final classifier When would this be useful? • We have a classifier with high variance 5
Bagging decision trees Consider the data set S • Pick a sample S * with replacement of size n • Grow a decision tree T b greedily • Repeat B times to get T 1 ,...,T B • The final classifier will be 6
Random Forests Almost identical to bagging decision trees,   except we introduce some randomness: • Randomly pick m of the d attributes available • Grow the tree only using those m attributes Bagged random decision trees = Random forests 7
Points about random forests Algorithm parameters • Usual values for m: • Usual value for B : keep increasing B until the training error stabilizes 8
Explicit CV not necessary • Unbiased test error can be estimated using out-of-bag data points (OOB error estimate) • You can still do CV explicitly, but that's not necessary, since research shows that OOB estimate is as accurate https://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#ooberr http://stackoverflow.com/questions/18541923/what-is-out-of-bag-error-in-random-forests 9
Final words Advantages • Efficient and simple training • Allows you to work with simple classifiers • Random-forests generally useful and accurate in practice (one of the best classifiers) • Embarrassingly parallelizable Caveats: • Needs low-bias classifiers • Can make a not-good-enough classifier worse 10
Final words Reading material • Bagging: ESL Chapter 8.7 • Random forests: ESL Chapter 15   http://www-stat.stanford.edu/~tibs/ElemStatLearn/printings/ESLII_print10.pdf 11

Recommend

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

Variable Selection Variable Selection Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection Variable selection Variable Selection Bias in Classification bias bias rpart Trees and Ensemble Methods

184 views • 6 slides

Boosting (ensemble) Module 4 - Ensemble classifiers - Objectives module 4: boosting (ensemble

Boosting (ensemble) Module 4 - Ensemble classifiers - Objectives module 4: boosting (ensemble models) LEARNING PERFORMANCE REPRESENTATION DATA PROBLEM RAW DATA CLUSTERING EVALUATION FEATURES UCI datasets unigrams 20newsgroups SUPERVISED

927 views • 34 slides

Introduction to ensemble methods EN S EMBLE METH ODS IN P YTH ON Romn de las Heras Data

Introduction to ensemble methods EN S EMBLE METH ODS IN P YTH ON Romn de las Heras Data Scientist, SAP / Agile Solutions Choosing the best model ENSEMBLE METHODS IN PYTHON Surveys ENSEMBLE METHODS IN PYTHON Prerequisite knowledge

582 views • 18 slides

What is it? Instrument or ensemble? Lars Bo Andersen, Humans and IT research seminar, 13/5-2015

What is it? Instrument or ensemble? Lars Bo Andersen, Humans and IT research seminar, 13/5-2015 Instrument <> Ensemble Our instruments Concepts and methods Empirical stuff Ensembles From French ensemble "all the parts of a thing

285 views • 12 slides

Decision trees and Ensemble methods Camilo Fosco CS109A Introduction to Data Science Pavlos

Advanced Section #7: Decision trees and Ensemble methods Camilo Fosco CS109A Introduction to Data Science Pavlos Protopapas and Kevin Rader 1 Outline Decision trees Metrics Tree-building algorithms Ensemble methods

710 views • 58 slides

Ensemble Methods Albert Bifet May 2012 COMP423A/COMP523A Data Stream Mining Outline 1.

Ensemble Methods Albert Bifet May 2012 COMP423A/COMP523A Data Stream Mining Outline 1. Introduction 2. Stream Algorithmics 3. Concept drift 4. Evaluation 5. Classification 6. Ensemble Methods 7. Regression 8. Clustering 9. Frequent

696 views • 23 slides

CS570 Data Mining Classification: Ensemble Methods Cengiz Gnay Dept. Math & CS, Emory

CS570 Data Mining Classification: Ensemble Methods Cengiz Gnay Dept. Math & CS, Emory University Fall 2013 Some slides courtesy of Han-Kamber-Pei, Tan et al. , and Li Xiong Gnay (Emory) Classification: Ensemble Methods Fall 2013 1 /

580 views • 23 slides

Overview of Decision Trees, Ensemble Methods and Reinforcement Learning CMSC 678 UMBC Outline

Overview of Decision Trees, Ensemble Methods and Reinforcement Learning CMSC 678 UMBC Outline Decision Trees Ensemble Methods Bagging Random Forests Reinforcement Learning Decision Trees 20 Questions: http://20q.net/ Goals: 1.

771 views • 62 slides

CSC 411: Lecture 17: Ensemble Methods I Class based on Raquel Urtasun & Rich Zemels

CSC 411: Lecture 17: Ensemble Methods I Class based on Raquel Urtasun & Rich Zemels lectures Sanja Fidler University of Toronto March 23, 2016 Urtasun, Zemel, Fidler (UofT) CSC 411: 17-Ensemble Methods I March 23, 2016 1 / 34 Today

751 views • 34 slides

Stochastic Physics Perturbations For Ensemble Forecast Yuejian Zhu Ensemble Team Environmental

Stochastic Physics Perturbations For Ensemble Forecast Yuejian Zhu Ensemble Team Environmental Modeling Center NCEP/NWS/NOAA Acknowledgements: Philip Pegion . , Walter Kolczynski, Dingchen Hou and Xiaqiong Zhou Special thanks to IITM and Dr.

1.33k views • 54 slides

Ensemble verification: Old scores, new perspectives Sabrina Wahl, Petra Friederichs, Jan Keller

Ensemble verification Quantile score and decomposition Generalization to CRPS Ensemble verification: Old scores, new perspectives Sabrina Wahl, Petra Friederichs, Jan Keller WMO Verification Workshop Berlin, May 2017 Sabrina Wahl Ensemble

601 views • 21 slides

State Song & Dance Ensemble LIETUVA proposal of cooperation Who are we? We are

State Song & Dance Ensemble LIETUVA proposal of cooperation Who are we? We are Lithuania! Ensemble Lietuva consisting of you , me and all of us together. True, authentic, with a long history of its own as an ensemble, genuine and

657 views • 16 slides

Linear ensemble transform filters: A unified perspective on ensemble Kalman and particle filters

Linear ensemble transform filters: A unified perspective on ensemble Kalman and particle filters Yuan Cheng & Sebastian Reich University of Potsdam and University of Reading EnKF workshop 2014 Bergen, 23 June 2014 Yuan Cheng &

829 views • 43 slides

Ensemble Docking Revisited Oliver Korb Cambridge Crystallographic Data Centre

Ensemble Docking Revisited Oliver Korb Cambridge Crystallographic Data Centre korb@ccdc.cam.ac.uk www.ccdc.cam.ac.uk Outline Introduction Simulated Ensemble Docking / Screening GOLD Ensemble Docking Future Work www.ccdc.cam.ac.uk

911 views • 27 slides

Gaussian ensemble screening (GES): A new Gaussian ensemble screening (GES): A new approach to

Gaussian ensemble screening (GES): A new Gaussian ensemble screening (GES): A new approach to approach to polypharmacology polypharmacology and virtual and virtual screening screening Violeta I. Prez-Nueno, Vishwesh Venkatraman, Lazaros

392 views • 28 slides

Ensemble Models for Dependency Parsing: Cheap and Good? Mihai Surdeanu and Christopher D. Manning

Ensemble Models for Dependency Parsing: Cheap and Good? Mihai Surdeanu and Christopher D. Manning Stanford University June 3, 2010 Ensemble Parsing Parser 2 Parser 1 Parser 3 Ensemble Parser Parser 4

195 views • 18 slides

Estimating Beverage Container-Related Profits from Curbside & Dropoff Programs in California

Estimating Beverage Container-Related Profits from Curbside & Dropoff Programs in California Jenny Gitlitz, Consultant to CRI Feb. 26, 2019 Revenues, Costs, and Profits Overview: CA within larger context More than 50 container deposit

273 views • 13 slides

Beta Theta Pi Delta Nu Chapter Organization and Chapter History Nationally founded in 1839

Beta Theta Pi Delta Nu Chapter Organization and Chapter History Nationally founded in 1839 Clemsons Delta Nu chapter was created in 1970 Celebrating our 50th anniversary this year 104 active members Over 950 living

240 views • 7 slides

PERFORM A REAL JOB TASK. DISCOVER YOUR CAREER FIT. GRADCAREERS.WUSTL.EDU/INTERSECT THESE

INTERACTIVE SIMULATION EXERCISES FOR CAREER TRANSITIONS PERFORM A REAL JOB TASK. DISCOVER YOUR CAREER FIT. GRADCAREERS.WUSTL.EDU/INTERSECT THESE SIMULATE A SUBSET OF DAILY TASKS SIMULATIONS THAT PHD PROFESSIONALS SAY ARE THEY PERFORM

266 views • 13 slides

The 2-Hour Job Search, Part 2: Finding Contacts & Contact Information Presented by Steve

March Webinar for MBA Career Services Professionals The 2-Hour Job Search, Part 2: Finding Contacts & Contact Information Presented by Steve Dalton Author of The 2-Hour Job Search and Associate Director at Duke Universitys Fuqua School of

429 views • 9 slides

CLASSIC FILE SYSTEMS: FFS AND LFS Hakim Weatherspoon CS6410 A Fast File System for UNIX

1 CLASSIC FILE SYSTEMS: FFS AND LFS Hakim Weatherspoon CS6410 A Fast File System for UNIX Marshall K. McKusick, William N. Joy, Samuel J Leffler, and Robert S Fabry Bob Fabry Professor at Berkeley. Started CSRG (Computer Science

350 views • 34 slides

Trace a game-changing discovery Stephen W. Tsai Stanford University October 5, 2015 Topics

Trace a game-changing discovery Stephen W. Tsai Stanford University October 5, 2015 Topics Why trace? The one and only unifying constant Theory of trace simple; easy to understand C-Ply by Stanford/Chomarat bi-angle NCF

669 views • 52 slides

Buzzwords: Microservices, containers and serverless PRESENTED BY: Dave Nugent - Developer Advocate

Buzzwords: Microservices, containers and serverless PRESENTED BY: Dave Nugent - Developer Advocate PRESENTED BY: Ivan Dwyer - Head of Business Development Why Serverless About Me Dave Nugent | Dev Advocacy CMU alum Astrobiology at NASA

450 views • 44 slides

Purdue School of Engineering and Technology, IUPUI Deans Industry Advisory Council November

Purdue School of Engineering and Technology, IUPUI Deans Industry Advisory Council November 29, 2011 1 Todays Agenda Meeting Convened (Joe Bentley) 2011 Highlights (Joe Bentley) Student Innovation Fund (Tom Ward, Alum &

204 views • 19 slides

Ensemble Methods (Model Combination) Duen Horng (Polo) Chau - PowerPoint PPT Presentation

http://poloclub.gatech.edu/cse6242 CSE6242 / CX4242: Data & Visual Analytics Ensemble Methods (Model Combination) Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech Parishit Ram

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

Boosting (ensemble) Module 4 - Ensemble classifiers - Objectives module 4: boosting (ensemble

Introduction to ensemble methods EN S EMBLE METH ODS IN P YTH ON Romn de las Heras Data

What is it? Instrument or ensemble? Lars Bo Andersen, Humans and IT research seminar, 13/5-2015

Decision trees and Ensemble methods Camilo Fosco CS109A Introduction to Data Science Pavlos

Ensemble Methods Albert Bifet May 2012 COMP423A/COMP523A Data Stream Mining Outline 1.

CS570 Data Mining Classification: Ensemble Methods Cengiz Gnay Dept. Math & CS, Emory

Overview of Decision Trees, Ensemble Methods and Reinforcement Learning CMSC 678 UMBC Outline

CSC 411: Lecture 17: Ensemble Methods I Class based on Raquel Urtasun & Rich Zemels

Stochastic Physics Perturbations For Ensemble Forecast Yuejian Zhu Ensemble Team Environmental

Ensemble verification: Old scores, new perspectives Sabrina Wahl, Petra Friederichs, Jan Keller

State Song & Dance Ensemble LIETUVA proposal of cooperation Who are we? We are

Linear ensemble transform filters: A unified perspective on ensemble Kalman and particle filters

Ensemble Docking Revisited Oliver Korb Cambridge Crystallographic Data Centre

Gaussian ensemble screening (GES): A new Gaussian ensemble screening (GES): A new approach to

Ensemble Models for Dependency Parsing: Cheap and Good? Mihai Surdeanu and Christopher D. Manning

Estimating Beverage Container-Related Profits from Curbside & Dropoff Programs in California

Beta Theta Pi Delta Nu Chapter Organization and Chapter History Nationally founded in 1839

PERFORM A REAL JOB TASK. DISCOVER YOUR CAREER FIT. GRADCAREERS.WUSTL.EDU/INTERSECT THESE

The 2-Hour Job Search, Part 2: Finding Contacts & Contact Information Presented by Steve

CLASSIC FILE SYSTEMS: FFS AND LFS Hakim Weatherspoon CS6410 A Fast File System for UNIX

Trace a game-changing discovery Stephen W. Tsai Stanford University October 5, 2015 Topics

Buzzwords: Microservices, containers and serverless PRESENTED BY: Dave Nugent - Developer Advocate

Purdue School of Engineering and Technology, IUPUI Deans Industry Advisory Council November

Sambuz

Useful Links

Newsletter

Mail Us

Ensemble Methods (Model Combination) Duen Horng (Polo) Chau - PowerPoint PPT Presentation

http://poloclub.gatech.edu/cse6242 CSE6242 / CX4242: Data & Visual Analytics Ensemble Methods (Model Combination) Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech Parishit Ram

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

Boosting (ensemble) Module 4 - Ensemble classifiers - Objectives module 4: boosting (ensemble

Introduction to ensemble methods EN S EMBLE METH ODS IN P YTH ON Romn de las Heras Data

What is it? Instrument or ensemble? Lars Bo Andersen, Humans and IT research seminar, 13/5-2015

Decision trees and Ensemble methods Camilo Fosco CS109A Introduction to Data Science Pavlos

Ensemble Methods Albert Bifet May 2012 COMP423A/COMP523A Data Stream Mining Outline 1.

CS570 Data Mining Classification: Ensemble Methods Cengiz Gnay Dept. Math &amp; CS, Emory

Overview of Decision Trees, Ensemble Methods and Reinforcement Learning CMSC 678 UMBC Outline

CSC 411: Lecture 17: Ensemble Methods I Class based on Raquel Urtasun &amp; Rich Zemels

Stochastic Physics Perturbations For Ensemble Forecast Yuejian Zhu Ensemble Team Environmental

Ensemble verification: Old scores, new perspectives Sabrina Wahl, Petra Friederichs, Jan Keller

State Song &amp; Dance Ensemble LIETUVA proposal of cooperation Who are we? We are

Linear ensemble transform filters: A unified perspective on ensemble Kalman and particle filters

Ensemble Docking Revisited Oliver Korb Cambridge Crystallographic Data Centre

Gaussian ensemble screening (GES): A new Gaussian ensemble screening (GES): A new approach to

Ensemble Models for Dependency Parsing: Cheap and Good? Mihai Surdeanu and Christopher D. Manning

Estimating Beverage Container-Related Profits from Curbside &amp; Dropoff Programs in California

Beta Theta Pi Delta Nu Chapter Organization and Chapter History Nationally founded in 1839

PERFORM A REAL JOB TASK. DISCOVER YOUR CAREER FIT. GRADCAREERS.WUSTL.EDU/INTERSECT THESE

The 2-Hour Job Search, Part 2: Finding Contacts &amp; Contact Information Presented by Steve

CLASSIC FILE SYSTEMS: FFS AND LFS Hakim Weatherspoon CS6410 A Fast File System for UNIX

Trace a game-changing discovery Stephen W. Tsai Stanford University October 5, 2015 Topics

Buzzwords: Microservices, containers and serverless PRESENTED BY: Dave Nugent - Developer Advocate

Purdue School of Engineering and Technology, IUPUI Deans Industry Advisory Council November

Sambuz

Useful Links

Newsletter

Mail Us

CS570 Data Mining Classification: Ensemble Methods Cengiz Gnay Dept. Math & CS, Emory

CSC 411: Lecture 17: Ensemble Methods I Class based on Raquel Urtasun & Rich Zemels

State Song & Dance Ensemble LIETUVA proposal of cooperation Who are we? We are

Estimating Beverage Container-Related Profits from Curbside & Dropoff Programs in California

The 2-Hour Job Search, Part 2: Finding Contacts & Contact Information Presented by Steve