The effectiveness of gradual learning EN S EMBLE METH ODS IN P YTH ON Román de las Heras Data Scientist, SAP / Agile Solutions
Collective vs gradual learning Collective Learning Gradual Learning Principle: wisdom of the crowd Principle: iterative learning Independent estimators Dependent estimators Learning the same task for the same goal Learning different tasks for the same goal Parallel building Sequential building ENSEMBLE METHODS IN PYTHON
Gradual learning Possible steps in gradual learning: 1. First attempt (initial model) 2. Feedback (model evaluation) 3. Correct errors (subsequent model) ENSEMBLE METHODS IN PYTHON
Fitting to noise White noise Uncorrelated errors Unbiased errors and with constant variance Improvement tolerance If Performance difference < improvement threshold : Stop training ENSEMBLE METHODS IN PYTHON
It's your turn! EN S EMBLE METH ODS IN P YTH ON
Adaptive boosting: award winning model EN S EMBLE METH ODS IN P YTH ON Román de las Heras Data Scientist, SAP / Agile Solutions
Award winning model About AdaBoost: Proposed by Yoav Freund and Robert Schapire (1997) Winner of the Gödel Prize in (2003) The �rst practical boosting algorithm Highly used and well known ensemble method ENSEMBLE METHODS IN PYTHON
AdaBoost properties 1. Instances are drawn using a sample distribution Dif�cult instances have higher weights Initialized to be uniform 2. Estimators are combined with a weighted majority voting Good estimators are given higher weights 3. Guaranteed to improve 4. Classi�cation and Regression ENSEMBLE METHODS IN PYTHON
AdaBoost classi�er with scikit-learn AdaBoostClassi�er Parameters from sklearn.ensemble import AdaBoostClassifier base_estimator Default: Decision Tree (max_depth=1) clf_ada = AdaBoostClassifier( n_estimators base_estimator, Default: 50 n_estimators, learning_rate learning_rate ) Default: 1.0 Trade-off between n_estimators and learning_rate ENSEMBLE METHODS IN PYTHON
AdaBoost regressor with scikit-learn AdaBoostRegressor Parameters from sklearn.ensemble import AdaBoostRegressor base_estimator Default: Decision Tree (max_depth=3) reg_ada = AdaBoostRegressor( loss base_estimator, linear (default) n_estimators, learning_rate, square loss exponential ) ENSEMBLE METHODS IN PYTHON
Let's practice! EN S EMBLE METH ODS IN P YTH ON
Gradient boosting EN S EMBLE METH ODS IN P YTH ON Román de las Heras Data Scientist, SAP / Agile Solutions
Intro to gradient boosting machine 1. Initial model (weak estimator): 2. New model �ts to residuals: 3. New additive model: 4. Repeat n times or until error is small enough 5. Final additive model: ENSEMBLE METHODS IN PYTHON
Equivalence to gradient descent Gradient Descent: Residuals = Negative Gradient ENSEMBLE METHODS IN PYTHON
Gradient boosting classi�er Gradient Boosting Classi�er n_estimators Default: 100 from sklearn.ensemble import GradientBoostingClassifier learning_rate clf_gbm = GradientBoostingClassifier( Default: 0.1 n_estimators=100, learning_rate=0.1, max_depth max_depth=3, Default: 3 min_samples_split, min_samples_leaf, min_samples_split max_features ) min_samples_leaf max_features ENSEMBLE METHODS IN PYTHON
Gradient boosting regressor Gradient Boosting Regressor from sklearn.ensemble import GradientBoostingRegressor reg_gbm = GradientBoostingRegressor( n_estimators=100, learning_rate=0.1, max_depth=3, min_samples_split, min_samples_leaf, max_features ) ENSEMBLE METHODS IN PYTHON
Time to boost! EN S EMBLE METH ODS IN P YTH ON
Gradient boosting �avors EN S EMBLE METH ODS IN P YTH ON Román de las Heras Data Scientist, SAP / Agile Solutions
Variations of gradient boosting Gradient Boosting Algorithm Implementation Extreme Gradient Boosting XGBoost Light Gradient Boosting Machine LightGBM Categorical Boosting CatBoost ENSEMBLE METHODS IN PYTHON
Extreme gradient boosting (XGBoost) Some properties: import xgboost as xgb Optimized for distributed computing clf_xgb = xgb.XGBClassifier( Parallel training by nature n_estimators=100, Scalable, portable, and accurate learning_rate=0.1, max_depth=3, random_state ) clg_xgb.fit(X_train, y_train) pred = clf_xgb.predict(X_test) ENSEMBLE METHODS IN PYTHON
Light gradient boosting machine Some properties: import lightgbm as lgb Released by Microsoft (2017) clf_lgb = lgb.LGBMClassifier( Faster training and more ef�cient n_estimators=100, Lighter in terms of space learning_rate=0.1, Optimized for parallel and GPU processing max_depth=-1, random_state Useful for problems with big datasets and ) constraints of speed or memory clf_lgb.fit(X_train, y_train) pred = clf_lgb.predict(X_test) ENSEMBLE METHODS IN PYTHON
Categorical boosting Some properties: import catboost as cb Open sourced by Yandex (April 2017) clf_cat = cb.CatBoostClassifier( Built-in handling of categorical features n_estimators=1000, Accurate and robust learning_rate=0.03, Fast and scalable max_depth=6, random_state User-friendly API ) clf_cat.fit(X_train, y_train) pred = clf_cat.predict(X_test) ENSEMBLE METHODS IN PYTHON
It's your turn! EN S EMBLE METH ODS IN P YTH ON
Recommend
More recommend