When Semi-Supervised Learning Meets Ensemble Learning Zhi-Hua Zhou - PowerPoint PPT Presentation

http:/ / lam da.nju.edu.cn When Semi-Supervised Learning Meets Ensemble Learning Zhi-Hua Zhou http://cs.nju.edu.cn/zhouzh/ Email: zhouzh@nju.edu.cn LAMDA Group National Key Laboratory for Novel Software Technology, Nanjing University, China

http:/ / lam da.nju.edu.cn The presentation involves some joint work with : Ming Li Wei Wang Qiang Yang Min Ling Zhang De Chuan Zhan … … http://cs.nju.edu.cn/zhouzh/

One Goal, Two Paradigms http:/ / lam da.nju.edu.cn Semi-supervised learning Using unlabeled data Using multiple learners Generalization Ensemble learning !! This presentation http://cs.nju.edu.cn/zhouzh/

Outline http:/ / lam da.nju.edu.cn � Ensemble Learning � Semi-Supervised Learning � Classifier Combination vs. Unlabeled Data http://cs.nju.edu.cn/zhouzh/

What’s ensemble learning? http:/ / lam da.nju.edu.cn Ensemble learning is a machine learning paradigm where multiple (homogenous/heterogeneous) individual learners are trained for the same problem e.g. neural network ensemble, decision tree ensemble, etc. Problem Problem … ... Learner Learner Learner Learner … ... http://cs.nju.edu.cn/zhouzh/

Many ensemble methods http:/ / lam da.nju.edu.cn Parallel methods � • Bagging [L. Breiman, MLJ96] • Random Subspace [T. K . Ho, TPAMI98] • Random Forests [L. Breiman, MLJ01] • … … Sequential methods � • AdaBoost [Y. Freund & R. Schapire, JCSS97] • Arc-x4 [L. Breiman, AnnStat98] • LPBoost [A. Demiriz et al., MLJ06] • … … http://cs.nju.edu.cn/zhouzh/

Selective ensemble http:/ / lam da.nju.edu.cn Many Could be Better Than All: When a number of base learners are available, …, ensembling many of the base learners may be better than ensembling all of them [Z.-H. Zhou et al., IJCAI’01 & AIJ02] http://cs.nju.edu.cn/zhouzh/

Theoretical foundations http:/ / lam da.nju.edu.cn Abundant studies on theoretical properties of ensemble methods Appeared/ing in many leading statistical journals, e.g. Annals of Statistics √ Agreement: Different ensemble methods may have different foundations http://cs.nju.edu.cn/zhouzh/

Many mysteries http:/ / lam da.nju.edu.cn Diversity among the base learners is (possibly) the key of ensembles = − E E A [A. Krogh & J. Vedelsby, NIPS’94] The more accurate and the more diverse, the better but, what is “diversity”? [L.I. Kuncheva & C.J. Whitaker, MLJ03] http://cs.nju.edu.cn/zhouzh/

Many mysteries (con’t) http:/ / lam da.nju.edu.cn Even for some theory-intrigued methods, … still mysteries E.g., Why AdaBoost does not overfit? Margin ! − [R.E. Schapire et al., AnnStat98] No! − [L. Breiman, NCJ99] (contrary evidence: minimal margin) Wait … − [L. Reyzin & R.E. Schapire, ICML’06 best paper] (minimal Margin ?? Margin distribution) One more support − [L. Wang et al., COLT’08] For the whole story see: Z.‐H. Zhou & Y. Yu, AdaBoost . In: X. Wu and V. Kumar eds. The Top Ten Algorithms in Data Mining, Boca Raton, FL: Chapman & Hall, 2009 http://cs.nju.edu.cn/zhouzh/

Great success of ensemble methods http:/ / lam da.nju.edu.cn � KDDCup’05 : all awards (“ Precision Award ”, “ Performance Award ”, “ Creativity Award ”) for “ An ensemble search based method … ” � KDDCup’06 : 1 st place of Task1 for “ Modifying Boosted Trees to … ”; 1 st place of Task2 & 2 nd place of Task1 for “ Voting … by means of a Classifier Committee ” � KDD Time-series Classification Challenge 2007 : 1 st place for “ … Decision Forests and … ” http://cs.nju.edu.cn/zhouzh/

Great success of ensemble methods (con’t) http:/ / lam da.nju.edu.cn � KDDCup’08 : 1 st place of Challenge1 for a method using Bagging; 1 st place of Challenge2 for “ … Using an Ensemble Method ” � KDDCup’09 : 1 st place of Fast Track for “ Ensemble … ”; 2 nd place of Fast Track for “ … bagging … boosting tree models … ”, 1 st place of Slow Track for “ Boosting with classification trees and shrinkage ”; 2 nd place of Slow Track for “ Stochastic Gradient Boosting ” � ... ... http://cs.nju.edu.cn/zhouzh/

Great success of ensemble methods (con’t) http:/ / lam da.nju.edu.cn � Netflix Prize : � 2007 Progress Prize Winner: Ensemble � 2008 Progress Prize Winner: Ensemble � 2009 $1 Million Grand Prize Winner: Ensemble !! � “Top 10 Data Mining Algorithms” (ICDM’06): AdaBoost � Application to almost all areas � ... ... http://cs.nju.edu.cn/zhouzh/

http:/ / lam da.nju.edu.cn Recently, very few papers in top machine learning conferences Why? Easier tasks finished New challenges needed http://cs.nju.edu.cn/zhouzh/

Outline http:/ / lam da.nju.edu.cn � Ensemble Learning � Semi-Supervised Learning � Classifier Combination vs. Unlabeled Data http://cs.nju.edu.cn/zhouzh/

Labeled vs. Unlabeled http:/ / lam da.nju.edu.cn In many practical applications, unlabeled training examples are readily available but labeled ones are fairly expensive to obtain because labeling the unlabeled examples requires human effort (almost) infinite number of web pages on the Internet class = “ war ” ? http://cs.nju.edu.cn/zhouzh/

SSL: Why unlabeled data can be helpful? http:/ / lam da.nju.edu.cn Suppose the data is well-modeled by a mixture density: = ∑ ( ) L ( ) ∑ θ α θ where and θ = { θ l l α L = } f x f x 1 l l = l 1 = l 1 The class labels are viewed as random quantities and are assumed chosen ∈ conditioned on the selected mixture component m i {1,2,…, L } and possibly on the feature value, i.e. according to the probabilities P[ c i | x i , m i ] Thus, the optimal classification rule for this model is the MAP rule: ∑ ( ) = ⎡ = = ⎤ ⎡ = ⎤ S x arg max P ⎣ c k m j x , ⎦ P ⎣ m j x ⎦ i i i i i j k ( ) α θ f x j i j ⎡ = ⎤ = where P ⎣ m j x ⎦ unlabeled examples can be used i i L ( ) ∑ α θ to help estimate this term f x l i l = l 1 [D.J. Miller & H.S. Uyar, NIPS’96] http://cs.nju.edu.cn/zhouzh/

SSL: Why unlabeled data can be helpful? (con’t) http:/ / lam da.nju.edu.cn Intuitively, blue or red? http://cs.nju.edu.cn/zhouzh/

SSL: Why unlabeled data can be helpful? (con’t) http:/ / lam da.nju.edu.cn Intuitively, blue or red? Blue ! http://cs.nju.edu.cn/zhouzh/

SSL: Representative approaches http:/ / lam da.nju.edu.cn Generative methods � Using a generative model for the classifier and employing EM to model the label estimation or parameter estimation process [Miller & Uyar, NIPS’96; Nigam et al., MLJ00; Fujino et al., AAAI’05; etc.] S3VMs (Semi-Supervised SVMs) � Using unlabeled data to adjust the decision boundary such that it goes through the less dense region [Joachims, ICML’99; Chapelle & Zien, AISTATS’05; Collobert et al., ICML’06; etc.] Graph-based methods � Disagreement-based methods � http://cs.nju.edu.cn/zhouzh/

When Semi-Supervised Learning Meets Ensemble Learning Zhi-Hua Zhou - PowerPoint PPT Presentation

http:/ / lam da.nju.edu.cn When Semi-Supervised Learning Meets Ensemble Learning Zhi-Hua Zhou http://cs.nju.edu.cn/zhouzh/ Email: zhouzh@nju.edu.cn LAMDA Group National Key Laboratory for Novel Software Technology, Nanjing University, China

Margin-based Semi-supervised Learning Using Apollonius circle MONA EMADI AND JAFAR TANHA T TC S

Boosting (ensemble) Module 4 - Ensemble classifiers - Objectives module 4: boosting (ensemble

Semi-Supervised Learning Maria-Florina Balcan 03/30/2015 Readings: Semi-Supervised Learning.

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Support Vector Machines (SVMs). Semi-Supervised Learning. Semi-Supervised SVMs.

Ensemble Learning 4/10/17 Ensemble Learning Hypothesis Space: Supervised learning (data has

CS330 Paper Presentation: October 16th, 2019 Supervised Classification Semi-Supervised

Semi-Supervised Kernel Mean Shift Clustering A Semi-Supervised Clustering Approach Motivation:

Semi-Supervised Local Fisher Semi-Supervised Local Fisher Discriminant Analysis Discriminant

Semi-Supervised Learning Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824

Iterative Hybrid Algorithm for Semi-supervised Classification Martin SAVESKI Supervised by

Stacking for supervised learning Stacking for supervised learning Niall Rooney, NIKEL,

PCA CS 446 Supervised learning So far, weve done supervised learning: Given (( x i , y i )) ,

5 Semi-Supervised Learning BVM Tutorial: Advanced Deep Learning Methods David Zimmerer, Division

10701 Semi supervised learning Can Unlabeled Data improve supervised learning? Important

DataMods Programmable File System Services Noah Watkins*, Carlos Maltzahn, Scott Brandt UC Santa

Re-Open S martly with Confidence Jeri Denniston Small Business Development Center At Yavapai

The discovery and execution of entirely new classes of Web attacks i l l f b k in order to

1 June 26. Punch-through detection using Muon Spectrometer Showers & MET resolution

August 28, 2020 BILINGUAL COORDINATORS NETWORK (BCN) UPDATES FEDERAL PROGRAM MONITORING T

Proposed Approaches to Determine Progress on the Local Control Funding Formula Evaluation Rubrics

S yntax Darrell Larsen Linguistics 101 Introduction Syntactic Categories Constituency Tests

An Algebraic Theory of Markov Processes Giorgio Bacci , Radu Mardare, Prakash Panangaden and

When Semi-Supervised Learning Meets Ensemble Learning Zhi-Hua Zhou - PowerPoint PPT Presentation

http:/ / lam da.nju.edu.cn When Semi-Supervised Learning Meets Ensemble Learning Zhi-Hua Zhou http://cs.nju.edu.cn/zhouzh/ Email: zhouzh@nju.edu.cn LAMDA Group National Key Laboratory for Novel Software Technology, Nanjing University, China

Margin-based Semi-supervised Learning Using Apollonius circle MONA EMADI AND JAFAR TANHA T TC S

Boosting (ensemble) Module 4 - Ensemble classifiers - Objectives module 4: boosting (ensemble

Semi-Supervised Learning Maria-Florina Balcan 03/30/2015 Readings: Semi-Supervised Learning.

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Support Vector Machines (SVMs). Semi-Supervised Learning. Semi-Supervised SVMs.

Ensemble Learning 4/10/17 Ensemble Learning Hypothesis Space: Supervised learning (data has

CS330 Paper Presentation: October 16th, 2019 Supervised Classification Semi-Supervised

Semi-Supervised Kernel Mean Shift Clustering A Semi-Supervised Clustering Approach Motivation:

Semi-Supervised Local Fisher Semi-Supervised Local Fisher Discriminant Analysis Discriminant

Semi-Supervised Learning Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824

Iterative Hybrid Algorithm for Semi-supervised Classification Martin SAVESKI Supervised by

Stacking for supervised learning Stacking for supervised learning Niall Rooney, NIKEL,

PCA CS 446 Supervised learning So far, weve done supervised learning: Given (( x i , y i )) ,

5 Semi-Supervised Learning BVM Tutorial: Advanced Deep Learning Methods David Zimmerer, Division

10701 Semi supervised learning Can Unlabeled Data improve supervised learning? Important

DataMods Programmable File System Services Noah Watkins*, Carlos Maltzahn, Scott Brandt UC Santa

Re-Open S martly with Confidence Jeri Denniston Small Business Development Center At Yavapai

The discovery and execution of entirely new classes of Web attacks i l l f b k in order to

1 June 26. Punch-through detection using Muon Spectrometer Showers &amp; MET resolution

August 28, 2020 BILINGUAL COORDINATORS NETWORK (BCN) UPDATES FEDERAL PROGRAM MONITORING T

Proposed Approaches to Determine Progress on the Local Control Funding Formula Evaluation Rubrics

S yntax Darrell Larsen Linguistics 101 Introduction Syntactic Categories Constituency Tests

An Algebraic Theory of Markov Processes Giorgio Bacci , Radu Mardare, Prakash Panangaden and

1 June 26. Punch-through detection using Muon Spectrometer Showers & MET resolution