XGBOOST: A SCALABLE TREE BOOSTING SYSTEM ADVISOR: JIA-LING KOH SPEAKER: YIN-HSIANG LIAO 2018/04/17, FROM KDD 2016
Outline Introduction Method Experiment Conclusion 2
Introduction Regression tree CART (Gini) Boosting Ensemble method, an iterative procedure adaptively change the distribution of training examples. Adaboost 3
Introduction The most important factor of XGBoost — Scalability. Billions of examples. 4
Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10 teams all used XGBoost in KDDcup 2015 T-brain: used in top-3 teams. Ad click through rate prediction, malware classification, customer behavior prediction, etc. 5
Method Tree ensemble model: Prediction Leaf weights of a tree 6
Objective function Method Regularized objective function: Model complexity Number of leaves Differentiable convex loss function Number of leaves + Weights on leave 7
Objective function Method Usual Gradient tree boosting: Model is trained in additive manner. __ _________ 8
Objective function Method Additive training (Boosting) 9
Objective function Method Taylor expansion: 10
Objective function Method :instance set of j ( x i in leaf j ) T : number of leaf 11
Objective function Method For a fixed tree q, the optimal weight is: 12
Objective function Method For a fixed tree q, the optimal weight is: The corresponding optimal value is: 13
Objective function Method From now, if the tree is known, we get the optimal value. The problem becomes “what tree is the best ?” Loss reduction Greedy strategy Left subtree. Right subtree. Parent The larger the better, might be negative 14
Objective function Method Preventing overfitting further: Shrinkage. Subsampling. (column) 15
Split Finding Method Basic Exact Greedy Algorithm. Approximate Algorithm. Global Local 16
Split Finding Method Basic Exact Greedy Algorithm: .m When to stop? 17
Split Finding Method B.E.G.A. is good, since all possible splits, but…. When data can’t fit in memory, the thrashing slow down the system. Approximations: 18
Split Finding Method Local/ Global agendas: Global: less proposal but more candidate point. 19
Split Finding Method Weighted quantile sketch: Each interval has the same “impact” on OF . 20
Split Finding Method Sparsity-aware: Possible reasons: Missing value Frequent zero Artifacts of feature engineering (like one-hot) Solution: default direction 21
Split Finding Method Learn the best direction (of the feature) Sort criteria: Missing value last 22
Split Finding Method Non-presence -> missing value. Only deal with presence. 50x faster than naive ver. , on Allstate. 23
System Design Method The most time consuming part: sorting. Sort just once. Store data in in-memory unit: block. 24
System Design Method CSC format (compressed column) Ex: Di ff erent blocks can be distributed across machine, stored on disk in the out-of-core setting. 25
System Design Method Block structure helps split finding. However, it’s a non-continuous memory access. Solution: allocate an internal bu ff er in each thread. 26
System Design Method Block size matters. (max number of examples) Small blocks result in small workload for each thread. Balance! Large blocks lead cache missing. 27
System Design Method Out-of-core computation: Block compression Ex: [0, 2, 2, 0, 1, 2] Block sharding A prefetch thread is assigned to each disk. 28
Experiment The open source package: GitHub.com/dmlc/xgboost 29
Experiment Classification: GBM expands one branch of a tree. Other two expand full tree. 30
Experiment Learning to rank: pGBRT: the best previously published system. pGBRT only supports approximate algorithm. 31
Experiment Out-of-core experiment Compression helps 3x times. Sharding into two give 2x speedup. 32
System Design Conclusion The most Important feature: Scalability ! Lessons from building XGBoost: Sparsity aware, weighted quantile sketch, cache aware, parallelization. Fin. 33
Recommend
More recommend