Prediction, Estimation, and Attribution Bradley Efron brad@stat.stanford.edu Department of Statistics Stanford University
Regression Gauss (1809), Galton (1877) › Prediction random forests, boosting, support vector machines, neural nets, deep learning › Estimation OLS, logistic regression, GLM: MLE › Attribution (significance) ANOVA, lasso, Neyman–Pearson Bradley Efron, Stanford University Prediction, Estimation, and Attribution 2 36
Estimation Normal Linear Regression › Observe y i = — i + › i for i = 1 ; : : : ; n — i = x t i ˛ x i a p -dimensional covariate y n = X n ˆ p ˛ p + ǫ n › i ‰ N (0 ; ff 2 ) ˛ unknown › Surface plus noise y = µ ( x ) + ǫ › Surface f µ ( x ) ; x 2 Xg : codes scientific truth (hidden by noise) › Newton’s second law acceleration = force / mass Bradley Efron, Stanford University Prediction, Estimation, and Attribution 3 36
Newton's 2nd law: acceleration=force/mass a c c e l e r a t i o n mass e c r o f Bradley Efron, Stanford University Prediction, Estimation, and Attribution 4 36
If Newton had done the experiment A c c e l e r a t i o n mass e c r o f Bradley Efron, Stanford University Prediction, Estimation, and Attribution 5 36
Example The Cholesterol Data › n = 164 men took cholostyramine › Observe ( c i ; y i ) c i = normalized compliance (how much taken) y i = reduction in cholesterol y i = x t › Model i ˛ + › i x t i = (1 ; c i ; c 2 i ; c 3 › i ‰ N (0 ; ff 2 ) i ) › n = 164 , p = 4 Bradley Efron, Stanford University Prediction, Estimation, and Attribution 6 36
OLS cubic regression: cholesterol decrease vs normalized compliance; bars show 95% confidence intervals for the curve. Adj Rsquared =.481 ● ● 100 ● ● ● ● ● ● ● ● 80 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 60 ● ● cholesterol decrease ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● 40 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −20 ● ● ● ● −2 −1 0 1 2 normalized compliance sigmahat=21.9; only intercept and linear coefs significant Bradley Efron, Stanford University Prediction, Estimation, and Attribution 7 36
Neonate Example › n = 800 babies in an African facility › 600 lived, 200 died › 11 covariates: apgar score, body weight, . . . › Logistic regression n = 800 , p = 11 ‰ 800 ˆ 11 ; binomial) glm ( y X 800 y i = 1 or 0 as baby dies or lives x i = i th row of X (vector of 11 covariates) › Linear logistic surface, Bernoulli noise Bradley Efron, Stanford University Prediction, Estimation, and Attribution 8 36
Output of logistic regression program predictive error 15% estimate st.error z -value p -value gest ` : 474 .163 ` 2 : 91 .004** ap ` : 583 .110 ` 5 : 27 .000*** bwei ` : 488 .163 ` 2 : 99 .003** resp .784 .140 5.60 .000*** cpap .271 .122 2.21 .027* ment 1.105 .271 4.07 .000*** ` : 089 ` : 507 rate .176 .612 hr .013 .108 .120 .905 head .103 .111 .926 .355 gen ` : 001 .109 ` : 008 .994 temp .015 .124 .120 .905 Bradley Efron, Stanford University Prediction, Estimation, and Attribution 9 36
Prediction Algorithms Random Forests, Boosting, Deep Learning, . . . › Data d = f ( x i ; y i ) ; i = 1 ; 2 ; : : : ; n g y i = response x i = vector of p predictors (Neonate: n = 800 , p = 11 , y = 0 or 1) › Prediction rule f ( x; d ) New ( x , ?) gives ^ y = f ( x; d ) Go directly for high predictive accuracy; › Strategy forget (mostly) about surface + noise › Machine learning Bradley Efron, Stanford University Prediction, Estimation, and Attribution 10 36
Classification Using Regression Trees › n cases: n 0 = “0” and n 1 = “1” › p predictors (features) (Neonate: n = 800 ; n 0 = 600 ; n 1 = 200 ; p = 11 ) › Split into two groups with predictor and split value chosen to maximize difference in rates › Then split the splits, etc.. . . (some stopping rule) Bradley Efron, Stanford University Prediction, Estimation, and Attribution 11 36
Classification Tree: 800 neonates, 200 died ( <<−− lived died −−>> ) cpap< 0.6654 | gest>=−1.672 gest>=−1.941 ap>=−1.343 1 0 1 1/40 resp< 1.21 544/73 3/11 1 5/22 ● 0 1 worst bin 39/29 13/32 Bradley Efron, Stanford University Prediction, Estimation, and Attribution 12 36
Random Forests Breiman (2001) 1. Draw a bootstrap sample of original n cases 2. Make a classification tree from the bootstrap data set except at each split use only a random subset of the p predictors 3. Do all this lots of times ( ı 1000) 4. Prediction rule For any new x predict ^ y = majority of the 1000 predictions Bradley Efron, Stanford University Prediction, Estimation, and Attribution 13 36
The Prostate Cancer Microarray Study › n = 100 men: 50 prostate cancer, 50 normal controls › For each man measure activity of p = 6033 genes › Data set d is 100 ˆ 6033 matrix (“wide”) › Wanted: Prediction rule f ( x; d ) that inputs new 6033-vector x and outputs ^ y correctly predicting cancer/normal Bradley Efron, Stanford University Prediction, Estimation, and Attribution 14 36
Random Forests for Prostate Cancer Prediction › Randomly divide the 100 subjects into “training set” of 50 subjects (25 + 25) “test set” of the other 50 (25 + 25) › Run R program randomforest on the training set › Use its rule f ( x; d train ) on the test set and see how many errors it makes Bradley Efron, Stanford University Prediction, Estimation, and Attribution 15 36
Prostate cancer prediction using random forests Black is cross−validated training error, Red is test error rate train err 5.9% 0.5 test err 2.0% 0.4 0.3 error 0.2 0.1 0.0 0 100 200 300 400 500 number trees Bradley Efron, Stanford University Prediction, Estimation, and Attribution 16 36
Now with boosting algorithm 'gbm' error rates 0.5 train 0%, test=4% 0.4 0.3 err.rate 0.2 0.1 0.0 0 100 200 300 400 # tree Bradley Efron, Stanford University Prediction, Estimation, and Attribution 17 36
Now using deep learning (“Keras”) # parameters = 780 ; 738 1.0 data training 0.9 validation 0.8 acc 0.7 0.6 0.5 0 100 200 300 400 500 epoch Bradley Efron, Stanford University Prediction, Estimation, and Attribution 18 36
Prediction is Easier than Estimation › Observe ind x 1 ; x 2 ; x 3 ; : : : ; x 25 ‰ N ( —; 1) ? x = mean, x = median — › Estimation ? ff ffi x ) 2 n x ) 2 o E ( — ` E ( — ` — = 1 : 57 › Wish to predict new X 0 ‰ N ( —; 1) › Prediction ? ff ffi x ) 2 n x ) 2 o E ( X 0 ` E ( X 0 ` — = 1 : 02 Bradley Efron, Stanford University Prediction, Estimation, and Attribution 19 36
Prediction is Easier than Attribution ind N genes: z j ‰ N ( ‹ j ; 1) ; j = 1 ; 2 ; : : : ; N › Microarray study N 0 : ‹ j = 0 (null genes) N 1 : ‹ j > 0 (non-null) 8 + sick > < › New subject’s microarray: x j ‰ N ( ˚ ‹ j ; 1) ` healthy > : › Prediction „ « 1 = 2 Possible if N 1 = O N 0 › Attribution Requires N 1 = O ( N 0 ) › Prediction allows accrual of “weak learners” Bradley Efron, Stanford University Prediction, Estimation, and Attribution 20 36
Prediction and Medical Science › Random forest test set predictions made only 1 error out of 50! › Promising for diagnosis › Not so much for scientific understanding › Next “Importance measures” for the predictor genes Bradley Efron, Stanford University Prediction, Estimation, and Attribution 21 36
Importance measures for genes in randomForest prostate analysis; Top two genes # 1022 and 5569 0.35 ● ● 0.30 0.25 0.20 Importance ● 0.15 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.10 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.05 0.00 0 20 40 60 80 100 index Bradley Efron, Stanford University Prediction, Estimation, and Attribution 22 36
Were the Test Sets Really a Good Test? › Prediction can be highly context-dependent and fragile › Before Randomly divided subjects into “training” and “test” › Next 50 earliest subjects for training 50 latest for test both 25 + 25 Bradley Efron, Stanford University Prediction, Estimation, and Attribution 23 36
Random Forests: Train on 50 earliest, Test on 50 latest subjects; Test error was 2%, now 24% 0.5 train err 0% test err 24% 0.4 0.3 error 0.2 0.1 before 2% ● 0.0 0 100 200 300 400 500 number trees Bradley Efron, Stanford University Prediction, Estimation, and Attribution 24 36
Recommend
More recommend