Intelligible Models for Classification and Regression Yin Lou 1 Rich Caruana 2 Johannes Gehrke 1 Department of Computer Science 1 Microsoft Research 2 Cornell University Microsoft Corporation Aug. 13, 2012 Yin Lou (Cornell University) Intelligible Models Aug. 13, 2012 1 / 31
Motivation Simple Model Linear regression, logistic regression Regression: y = β 0 + β 1 x 1 + ... + β n x n Classification: logit ( y ) = β 0 + β 1 x 1 + ... + β n x n Linear Regression Yin Lou (Cornell University) Intelligible Models Aug. 13, 2012 2 / 31
Motivation Simple Model Linear regression, logistic regression Regression: y = β 0 + β 1 x 1 + ... + β n x n Classification: logit ( y ) = β 0 + β 1 x 1 + ... + β n x n Linear Regression Intelligible but usually less accurate Yin Lou (Cornell University) Intelligible Models Aug. 13, 2012 2 / 31
Motivation Complex Model Random forest, SVMs with RBF kernel, etc. y = f ( x 1 , ..., x n ) Random Forest Yin Lou (Cornell University) Intelligible Models Aug. 13, 2012 3 / 31
Motivation Complex Model Random forest, SVMs with RBF kernel, etc. y = f ( x 1 , ..., x n ) Random Forest Unintelligible but usually more accurate Yin Lou (Cornell University) Intelligible Models Aug. 13, 2012 3 / 31
Motivation The tradeoff SVMs with RBF Kernel Random Forest Complexity ? Linear Regression Logistic Regression Intelligibility Yin Lou (Cornell University) Intelligible Models Aug. 13, 2012 4 / 31
Motivation The tradeoff SVMs with RBF Kernel Random Forest Complexity ? Linear Regression Logistic Regression Intelligibility Intelligibility is important Medical applications Domains where we want scientific understanding Efficient model engineering Impact of features in a ranker Yin Lou (Cornell University) Intelligible Models Aug. 13, 2012 4 / 31
Outline Motivation 1 Towards More Accurate Models 2 Algorithms 3 Experiments 4 Discussion 5 Conclusion 6 Yin Lou (Cornell University) Intelligible Models Aug. 13, 2012 5 / 31
Outline Motivation 1 Towards More Accurate Models 2 Algorithms 3 Experiments 4 Discussion 5 Conclusion 6 Yin Lou (Cornell University) Intelligible Models Aug. 13, 2012 6 / 31
Generalized Additive Models Developed by Hastie and Tibshirani Regression: y = f 1 ( x 1 ) + ... + f n ( x n ) Classification: logit ( y ) = f 1 ( x 1 ) + ... + f n ( x n ) Each feature is “shaped” by shape function f i Intelligible and accurate T. Hastie and R. Tibshirani. Generalized additive models . Chapman & Hall/CRC, 1990. Yin Lou (Cornell University) Intelligible Models Aug. 13, 2012 7 / 31
Example 2 + √ x 3 + log x 4 + e x 5 + 2 sin x 6 + ǫ y = x 1 + x 2 2 2 2 1 1 1 0 0 0 −1 −1 −1 −2 −2 −2 0 1 2 3 4 0.0 0.5 1.0 1.5 2.0 0 5 10 15 f 1 ( x 1 ) f 2 ( x 2 ) f 3 ( x 3 ) 2 2 2 1 1 1 0 0 0 −1 −1 −1 −2 −2 −2 0 10 20 30 40 50 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 0 1 2 3 4 5 6 f 4 ( x 4 ) f 5 ( x 5 ) f 6 ( x 6 ) Figure: Shape Functions for Synthetic Dataset. Yin Lou (Cornell University) Intelligible Models Aug. 13, 2012 8 / 31
Model Space Model Form Intelligibility Accuracy Linear Model y = β 0 + β 1 x 1 + ... + β n x n +++ + Generalized Linear Model g ( y ) = β 0 + β 1 x 1 + ... + β n x n +++ + Additive Model y = f 1 ( x 1 ) + ... + f n ( x n ) ++ ++ Generalized Additive Model g ( y ) = f 1 ( x 1 ) + ... + f n ( x n ) ++ ++ Full Complexity Model y = f ( x 1 , ..., x n ) + +++ Table: From Linear to Additive Models. Yin Lou (Cornell University) Intelligible Models Aug. 13, 2012 9 / 31
Outline Motivation 1 Towards More Accurate Models 2 Algorithms 3 Experiments 4 Discussion 5 Conclusion 6 Yin Lou (Cornell University) Intelligible Models Aug. 13, 2012 10 / 31
Fitting GAMs g ( y ) = f 1 ( x 1 ) + ... + f n ( x n ) Shape Functions Splines (SP) Single Tree (TR) Bagged Trees (bagTR) Boosted Trees (bstTR) Boosted Bagged Trees (bbTR) Yin Lou (Cornell University) Intelligible Models Aug. 13, 2012 11 / 31
Fitting GAMs g ( y ) = f 1 ( x 1 ) + ... + f n ( x n ) Shape Functions Splines (SP) Single Tree (TR) Bagged Trees (bagTR) Boosted Trees (bstTR) Boosted Bagged Trees (bbTR) Learning Methods Penalized Least Squares (P-LS/P-IRLS) Backfitting (BF) Gradient Boosting (BST) Yin Lou (Cornell University) Intelligible Models Aug. 13, 2012 11 / 31
Fitting GAMs g ( y ) = f 1 ( x 1 ) + ... + f n ( x n ) Shape Function: Splines (SP) f i ( x i ) = � d k =1 β k b k ( x i ) 40 30 20 10 0 −10 −20 100 200 300 400 500 Yin Lou (Cornell University) Intelligible Models Aug. 13, 2012 12 / 31
Fitting GAMs g ( y ) = f 1 ( x 1 ) + ... + f n ( x n ) Shape Function: Single Tree (TR) f i ( x i ) = RegressionTree ( x i , response ) Yin Lou (Cornell University) Intelligible Models Aug. 13, 2012 13 / 31
Fitting GAMs g ( y ) = f 1 ( x 1 ) + ... + f n ( x n ) Shape Function: Bagged Trees (bagTR) f i ( x i ) = 1 � B j =1 RegressionTree ( x i , bootstrap sample j ) B 1 ( + ... + ) B Yin Lou (Cornell University) Intelligible Models Aug. 13, 2012 14 / 31
Fitting GAMs g ( y ) = f 1 ( x 1 ) + ... + f n ( x n ) Shape Function: Boosted Trees (bstTR) f i ( x i ) = � B j =1 RegressionTree ( x i , residual j ) + ... + Yin Lou (Cornell University) Intelligible Models Aug. 13, 2012 15 / 31
Fitting GAMs g ( y ) = f 1 ( x 1 ) + ... + f n ( x n ) Shape Function: Boosted Bagged Trees (bbTR) f i ( x i ) = � B j =1 BaggedRegressionTree ( x i , residual j ) Yin Lou (Cornell University) Intelligible Models Aug. 13, 2012 16 / 31
Fitting GAMs g ( y ) = f 1 ( x 1 ) + ... + f n ( x n ) Learning Method: Penalized Least Squares (P-LS/P-IRLS) Works only on Splines ( f i ( x i ) = � d k =1 β k b k ( x i )) Converts the optimization problem to fitting linear regression/logistic regression with different basis S. Wood. Generalized additive models: an introduction with R . CRC Press, 2006. Yin Lou (Cornell University) Intelligible Models Aug. 13, 2012 17 / 31
Fitting GAMs g ( y ) = f 1 ( x 1 ) + ... + f n ( x n ) Learning Method: Backfitting (BF) 1: f j ← 0 2: for m = 1 to M do for j = 1 to n do 3: k � = j f k } N R ← { x ij , y i − � 4: 1 Learn shaping function S : x j → y using R as training dataset 5: f j ← S 6: end for 7: 8: end for T. Hastie and R. Tibshirani. Generalized additive models . Chapman & Hall/CRC, 1990. Yin Lou (Cornell University) Intelligible Models Aug. 13, 2012 18 / 31
Fitting GAMs g ( y ) = f 1 ( x 1 ) + ... + f n ( x n ) Learning Method: Gradient Boosting (BST) 1: f j ← 0 2: for m = 1 to M do for j = 1 to n do 3: k f k } N R ← { x ij , y i − � 4: 1 Learn shaping function S : x j → y using R as training dataset 5: f j ← f j + S 6: end for 7: 8: end for J. Friedman. Greedy function approximation: a gradient boosting machine. Annals of Statistics , 29:1189–1232, 2001. Yin Lou (Cornell University) Intelligible Models Aug. 13, 2012 19 / 31
Contributions First large-scale study that uses trees as shape function for GAMs Novel methods for using trees as shape functions Largest empirical study of fitting GAMs Yin Lou (Cornell University) Intelligible Models Aug. 13, 2012 20 / 31
Outline Motivation 1 Towards More Accurate Models 2 Algorithms 3 Experiments 4 Discussion 5 Conclusion 6 Yin Lou (Cornell University) Intelligible Models Aug. 13, 2012 21 / 31
Datasets Dataset Size Attributes %Pos Concrete 1030 9 - Regression Wine 4898 12 - Delta 7192 6 - CompAct 8192 22 - Music 50000 90 - Synthetic 10000 6 - Spambase 4601 58 39.40 Classification Insurance 9823 86 5.97 Magic 19020 11 64.84 Letter 20000 17 49.70 Adult 46033 9/43 16.62 Physics 50000 79 49.72 Yin Lou (Cornell University) Intelligible Models Aug. 13, 2012 22 / 31
Methods Shape Least Gradient Backfitting Function Squares Boosting Splines P-LS/P-IRLS BST-SP BF-SP Single Tree N/A BST-TR x BF-TR Bagged Trees N/A BST-bagTR x BF-bagTR Boosted Trees N/A BST-TR x BF-bstTR x Boosted N/A BST-bagTR x BF-bbTR x Bagged Trees Table: Notation for learning methods and shape functions. 9 different methods 5-fold cross validation for each method Yin Lou (Cornell University) Intelligible Models Aug. 13, 2012 23 / 31
Results Model Regression Classification Mean Linear/Logistic P-LS/P-IRLS BST-SP BF-SP BST-bagTR2 BST-bagTR3 BST-bagTR4 BST-bagTRX Random Forest Yin Lou (Cornell University) Intelligible Models Aug. 13, 2012 24 / 31
Results Model Regression Classification Mean Linear/Logistic 1.68 1.22 1.45 P-LS/P-IRLS BST-SP BF-SP BST-bagTR2 BST-bagTR3 BST-bagTR4 BST-bagTRX Random Forest Yin Lou (Cornell University) Intelligible Models Aug. 13, 2012 24 / 31
Results Model Regression Classification Mean Linear/Logistic 1.68 1.22 1.45 P-LS/P-IRLS BST-SP BF-SP BST-bagTR2 BST-bagTR3 BST-bagTR4 BST-bagTRX Random Forest 0.88 0.80 0.84 Yin Lou (Cornell University) Intelligible Models Aug. 13, 2012 24 / 31
Results Model Regression Classification Mean Linear/Logistic 1.68 1.22 1.45 P-LS/P-IRLS 1.00 1.00 1.00 BST-SP 1.04 1.00 1.02 BF-SP 1.00 1.00 1.00 BST-bagTR2 BST-bagTR3 BST-bagTR4 BST-bagTRX Random Forest 0.88 0.80 0.84 Yin Lou (Cornell University) Intelligible Models Aug. 13, 2012 24 / 31
Recommend
More recommend