CSI5180. MachineLearningfor BioinformaticsApplications Ensemble Learning by Marcel Turcotte Version December 5, 2019
Preamble Preamble 2/50
Preamble Ensemble Learning In this lecture, we consider several meta learning algorithms all based on the principle that the combined opinion of a large group of individuals is often more accurate than the opinion of a single expert — this is often referred to as the wisdom of the crowd . Today, we tell apart the following meta-algorithms: bagging , pasting , random patches , random subspaces , boosting , and stacking . General objective : Compare the specific features of various ensemble learning meta-algorithms Preamble 3/50
Learning objectives Discuss the intuition behind bagging and pasting methods Explain the difference between random patches and random subspaces Describe boosting methods Contrast the stacking meta-algorithms from bagging Reading: Jaswinder Singh, Jack Hanson, Kuldip Paliwal, and Yaoqi Zhou. RNA secondary structure prediction using an ensemble of two-dimensional deep neural networks and transfer learning. Nature Communications 10 (1):5407, 2019. Preamble 4/50
www.mims.ai bioinformatics.ca/job-postings Preamble 5/50
Plan 1. Preamble 2. Introduction 3. Justification 4. Meta-algorithms 5. Prologue Preamble 6/50
Introduction Introduction 7/50
Ensemble Learning - What is it? “ Ensemble learning is a learning paradigm that, instead of trying to learn one super-accurate model, focuses on training a large number of low-accuracy models and then combining the predictions given by those weak models to obtain a high-accuracy meta-model .” [Burkov, 2019] §7.5 Introduction 8/50
Ensemble Learning - What is it? “ Ensemble learning is a learning paradigm that, instead of trying to learn one super-accurate model, focuses on training a large number of low-accuracy models and then combining the predictions given by those weak models to obtain a high-accuracy meta-model .” [Burkov, 2019] §7.5 Weak learners (low-accuracy) models are simple and fast, both for training and prediction. Introduction 8/50
Ensemble Learning - What is it? “ Ensemble learning is a learning paradigm that, instead of trying to learn one super-accurate model, focuses on training a large number of low-accuracy models and then combining the predictions given by those weak models to obtain a high-accuracy meta-model .” [Burkov, 2019] §7.5 Weak learners (low-accuracy) models are simple and fast, both for training and prediction. The general idea is that each learner has a vote , and these votes are combined to establish the final decision. Introduction 8/50
Ensemble Learning - What is it? “ Ensemble learning is a learning paradigm that, instead of trying to learn one super-accurate model, focuses on training a large number of low-accuracy models and then combining the predictions given by those weak models to obtain a high-accuracy meta-model .” [Burkov, 2019] §7.5 Weak learners (low-accuracy) models are simple and fast, both for training and prediction. The general idea is that each learner has a vote , and these votes are combined to establish the final decision. Decision trees are the most commonly used weak learners. Introduction 8/50
Ensemble Learning - What is it? “ Ensemble learning is a learning paradigm that, instead of trying to learn one super-accurate model, focuses on training a large number of low-accuracy models and then combining the predictions given by those weak models to obtain a high-accuracy meta-model .” [Burkov, 2019] §7.5 Weak learners (low-accuracy) models are simple and fast, both for training and prediction. The general idea is that each learner has a vote , and these votes are combined to establish the final decision. Decision trees are the most commonly used weak learners. Ensemble learning is fact an umbrella for a large family of meta-algorithms, including bagging , pasting , random patches , random subspaces , boosting , and stacking . Introduction 8/50
Justification Justification 9/50
Weak learners/high accuracy 10 experiments See: [Géron, 2019] §7 Justification 10/50
Weak learners/high accuracy 10 experiments Each experiment consists of tossing a loaded coin See: [Géron, 2019] §7 Justification 10/50
Weak learners/high accuracy 10 experiments Each experiment consists of tossing a loaded coin 51 % head, 49 % tail See: [Géron, 2019] §7 Justification 10/50
Weak learners/high accuracy 10 experiments Each experiment consists of tossing a loaded coin 51 % head, 49 % tail As the number of toss increases, the proportion of heads will approach 51% See: [Géron, 2019] §7 Justification 10/50
Source code t o s s e s = ( np . random . rand (10000 , 10) < 0 . 5 1 ) . astype ( np . i n t 8 ) cumsum = np . cumsum( tosses , a x i s =0) / np . arange (1 , 10001). reshape ( − 1, 1) with p l t . xkcd ( ) : p l t . f i g u r e ( f i g s i z e =(8 ,3.5)) p l t . p l o t (cumsum) p l t . p l o t ( [ 0 , 10000] , [ 0 . 5 1 , 0 . 5 1 ] , "k − − " , l i n e w i d t h =2, l a b e l="51%" ) p l t . p l o t ( [ 0 , 10000] , [ 0 . 5 , 0 . 5 ] , "k − " , l a b e l="50%" ) p l t . x l a b e l ( "Number of coin t o s s e s " ) p l t . y l a b e l ( " Heads r a t i o " ) p l t . legend ( l o c=" lower r i g h t " ) p l t . a x i s ( [ 0 , 10000 , 0.42 , 0 . 5 8 ] ) p l t . t i g h t _ l a y o u t () p l t . s a v e f i g ( " weak_learner . pdf " , format =" pdf " , dpi =264) See: [Géron, 2019] §7 Justification 11/50
Weak learners/high accuracy Adapted from [Géron, 2019] §7 Justification 12/50
Independent learners Clearly, the learners are using the same input, they are not independent . Justification 13/50
Independent learners Clearly, the learners are using the same input, they are not independent . Ensemble learning works best when the learners are as independent one from another as possible. Justification 13/50
Independent learners Clearly, the learners are using the same input, they are not independent . Ensemble learning works best when the learners are as independent one from another as possible. Different algorithms Justification 13/50
Independent learners Clearly, the learners are using the same input, they are not independent . Ensemble learning works best when the learners are as independent one from another as possible. Different algorithms Different sets of features Justification 13/50
Independent learners Clearly, the learners are using the same input, they are not independent . Ensemble learning works best when the learners are as independent one from another as possible. Different algorithms Different sets of features Different data sets Justification 13/50
Data set - moons import m a t p l o t l i b . pyplot as p l t from s k l e a r n . d a t a s e t s import make_moons X, y = make_moons( n_samples =100, n o i s e =0.15) with p l t . xkcd ( ) : p l t . p l o t (X [ : , 0 ] [ y==0], X [ : , 1 ] [ y==0], " bs " ) p l t . p l o t (X [ : , 0 ] [ y==1], X [ : , 1 ] [ y==1], "g^" ) p l t . a x i s ([ − 1.5 , 2.5 , − 1, 1 . 5 ] ) p l t . g r i d ( True , which=’ both ’ ) p l t . x l a b e l ( r "$x_1$" , f o n t s i z e =20) p l t . y l a b e l ( r "$x_2$" , f o n t s i z e =20, r o t a t i o n =0) p l t . t i g h t _ l a y o u t () p l t . s a v e f i g ( "make_moons . pdf " , format =" pdf " , dpi =264) Adapted from: [Géron, 2019] §5 Justification 14/50
Data set - moons Adapted from [Géron, 2019] §5 Justification 15/50
Source code - VotingClassifier - hard from s k l e a r n . ensemble import V o t i n g C l a s s i f i e r from s k l e a r n . ensemble import R a n d o m F o r e s t C l a s s i f i e r from s k l e a r n . linear_model import L o g i s t i c R e g r e s s i o n from s k l e a r n . svm import SVC l o g _ c l f = L o g i s t i c R e g r e s s i o n () rnd_ clf = R a n d o m F o r e s t C l a s s i f i e r () svm_clf = SVC() e s t i m a t o r s =[( ’ l r ’ , l o g _ c l f ) , ( ’ r f ’ , rnd_ clf ) , ( ’ svc ’ , svm_clf ) ] v o t i n g _ c l f = V o t i n g C l a s s i f i e r ( e s t i m a t o r s=estimators , v oting=’ hard ’ ) v o t i n g _ c l f . f i t ( X_train , y_train ) Source: [Géron, 2019] §7 Justification 16/50
VotingClassifier 0.904 LogisticRegression 0.864 SVC 0.888 RandomForestClassifier 0.896 Source code - accuracy from s k l e a r n . m e t r i c s import accuracy_score for c l f in ( log_clf , rnd_clf , svm_clf , v o t i n g _ c l f ) : c l f . f i t ( X_train , y_train ) y_pred = c l f . p r e d i c t ( X_test ) p r i n t ( c l f . __class__ . __name__, accuracy_score ( y_test , y_pred )) Justification 17/50
VotingClassifier 0.904 LogisticRegression 0.864 SVC 0.888 RandomForestClassifier 0.896 Source code - accuracy from s k l e a r n . m e t r i c s import accuracy_score for c l f in ( log_clf , rnd_clf , svm_clf , v o t i n g _ c l f ) : c l f . f i t ( X_train , y_train ) y_pred = c l f . p r e d i c t ( X_test ) p r i n t ( c l f . __class__ . __name__, accuracy_score ( y_test , y_pred )) Justification 17/50
Recommend
More recommend