bias variance tradeoff
play

Bias-Variance Tradeoff David Dalpiaz STAT 430, Fall 2017 1 - PowerPoint PPT Presentation

Bias-Variance Tradeoff David Dalpiaz STAT 430, Fall 2017 1 Announcements Homework 03 released Regrade policy Style policy? 2 Statistical Learning Supervised Learning Regression Parametric Non-Parametric


  1. Bias-Variance Tradeoff David Dalpiaz STAT 430, Fall 2017 1

  2. Announcements • Homework 03 released • Regrade policy • Style policy? 2

  3. Statistical Learning • Supervised Learning • Regression • Parametric • Non-Parametric • Classification • Unsupervised Learning 3

  4. Regression Setup Given a random pair ( X , Y ) ∈ R p × R . We would like to “predict” Y with some function of X , say, f ( X ). Define the squared error loss of estimating Y using f ( X ) as L ( Y , f ( X )) � ( Y − f ( X )) 2 We call the expected loss the risk of estimating Y using f ( X ) R ( Y , f ( X )) � E [ L ( Y , f ( X ))] = E X , Y [( Y − f ( X )) 2 ] 4

  5. Minimizing Risk After conditioning on X � ( Y − f ( X )) 2 � � � ( Y − f ( X )) 2 | X = x E X , Y = E X E Y | X We see that the risk is minimzied by the conditional mean f ( x ) = E ( Y | X = x ) We call this, the regression function . 5

  6. Estimating f Given data D = ( x i , y i ) ∈ R p × R our goal is to find some ˆ f that is a good estimate of the regression function f . 6

  7. Expected Prediction Error �� � 2 � � � Y , ˆ Y − ˆ � E X , Y , D EPE f ( X ) f ( X ) 7

  8. Reducible and Irreducible Error �� � � � � 2 | X = x Y , ˆ Y − ˆ = E Y | X , D EPE f ( x ) f ( X ) �� � � 2 | X = x Y − ˆ = E Y | X , D f ( X ) �� � 2 � f ( x ) − ˆ = E D + V Y | X [ Y | X = x ] f ( x ) � �� � � �� � irreducible error reducible error 8

  9. Bias and Variance Recall the definition of the bias of an estimator. � ˆ � bias(ˆ θ ) � E − θ θ Also recall the definition of the variance of an estimator. � � ˆ � ) 2 � V (ˆ θ ) = var(ˆ (ˆ θ ) � E θ − E θ 9

  10. Bias and Variance Figure 1: Dartboard Analogy of Bias and Variance 10

  11. Bias-Variance Decomposition �� � 2 � � � f ( x ) , ˆ f ( x ) − ˆ = E D MSE f ( x ) f ( x ) �� ˆ �� 2 � � � ˆ �� 2 � ˆ f ( x ) − E + E f ( x ) − E = f ( x ) f ( x ) � �� � � �� � bias 2 ( ˆ f ( x ) ) var ( ˆ f ( x ) ) 11

  12. Bias-Variance Decomposition � � = bias 2 � ˆ � � ˆ � f ( x ) , ˆ MSE f ( x ) f ( x ) + var f ( x ) 12

  13. Bias-Variance Decomposition More Dominant Variance Decomposition of Prediction Error More Dominant Bias Squared Bias Variance Bayes EPE Error Error Error Model Complexity Model Complexity Model Complexity 13

  14. Expected Test Error Error versus Model Complexity Error (Expected) Test Train Low ← Complexity → High High ← Bias → Low Low ← Variance → High 14

  15. Simulation Study, Regression Function We will illustrate these decompositions, most importantly the bias-variance tradeoff, through simulation. Suppose we would like to train a model to learn the true regression function function f ( x ) = x 2 f = function(x) { x ^ 2 } 15

  16. Simulation Study, Regression Function More specifically, we’d like to predict an observation, Y , given that X = x by using ˆ f ( x ) where E [ Y | X = x ] = f ( x ) = x 2 and V [ Y | X = x ] = σ 2 . 16

  17. Simulation Study, Data Generating Process To carry out a concrete simulation example, we need to fully specify the data generating process . We do so with the following R code. get_sim_data = function(f, sample_size = 100) { x = runif (n = sample_size, min = 0, max = 1) y = rnorm (n = sample_size, mean = f (x), sd = 0.3) data.frame (x, y) } 17

  18. Simulation Study, Models Using this setup, we will generate datasets, D , with a sample size n = 100 and fit four models. predict(fit0, x) = ˆ f 0 ( x ) = ˆ β 0 predict(fit1, x) = ˆ f 1 ( x ) = ˆ β 0 + ˆ β 1 x predict(fit2, x) = ˆ f 2 ( x ) = ˆ β 0 + ˆ β 1 x + ˆ β 2 x 2 β 2 x 2 + . . . + ˆ predict(fit9, x) = ˆ f 9 ( x ) = ˆ β 0 + ˆ β 1 x + ˆ β 9 x 9 18

  19. Simulation Study, Trained Models Four Polynomial Models fit to a Simulated Dataset 1.5 y ~ 1 y ~ poly(x, 1) y ~ poly(x, 2) y ~ poly(x, 9) 1.0 truth 0.5 y 0.0 −0.5 0.0 0.2 0.4 0.6 0.8 1.0 x 19

  20. Simulation Study, Repeated Training Simulated Dataset 1 Simulated Dataset 2 Simulated Dataset 3 1.5 1.5 y ~ 1 y ~ 1 y ~ 1 y ~ poly(x, 9) y ~ poly(x, 9) y ~ poly(x, 9) 1.5 1.0 1.0 1.0 0.5 y y y 0.5 0.5 0.0 0.0 0.0 −0.5 −0.5 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 x x x 20

  21. Simulation Study, KNN Simulated Dataset 1 Simulated Dataset 2 Simulated Dataset 3 1.5 1.5 k = 5 k = 5 k = 5 k = 100 k = 100 k = 100 1.5 1.0 1.0 1.0 0.5 y y y 0.5 0.5 0.0 0.0 0.0 −0.5 −0.5 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 x x x 21

  22. Simulation Study, Setup set.seed (1) n_sims = 250 n_models = 4 x = data.frame (x = 0.90) predictions = matrix (0, nrow = n_sims, ncol = n_models) 22

  23. Simulation Study, Running Simulations for(sim in 1:n_sims) { sim_data = get_sim_data (f) # fit models fit_0 = lm (y ~ 1, data = sim_data) fit_1 = lm (y ~ poly (x, degree = 1), data = sim_data) fit_2 = lm (y ~ poly (x, degree = 2), data = sim_data) fit_9 = lm (y ~ poly (x, degree = 9), data = sim_data) # get predictions predictions[sim, 1] = predict (fit_0, x) predictions[sim, 2] = predict (fit_1, x) predictions[sim, 3] = predict (fit_2, x) predictions[sim, 4] = predict (fit_9, x) } 23

  24. Simulation Study, Results Simulated Predictions for Polynomial Models 1.0 0.8 Predictions 0.6 0.4 0.2 0 1 2 9 Polynomial Degree 24

  25. Bias-Variance Tradeoff • As complexity increases , bias decreases . • As complexity increases , variance increases . 25

  26. Simulation Study, Quantities of Interest � � � � ˆ � � 2 f (0 . 90) , ˆ E MSE f k (0 . 90) = f k (0 . 90) − f (0 . 90) � �� � bias 2 ( ˆ f k (0 . 90) ) �� ˆ �� 2 � � ˆ + E f k (0 . 90) − E f k (0 . 90) � �� � var ( ˆ f k (0 . 90) ) 26

  27. Estimation Using Simulation � � � � 2 n sims � 1 � f (0 . 90) , ˆ f (0 . 90) − ˆ MSE f k (0 . 90) = f k (0 . 90) n sims i =1 � ˆ � n sims � ˆ � � 1 � bias f (0 . 90) = f k ( x 0 . 90) − f (0 . 90) n sims i =1 � � 2 � ˆ � n sims n sims � � 1 1 ˆ ˆ var � f (0 . 90) = f k (0 . 90) − f k (0 . 90) n sims n sims i =1 i =1 27

  28. Simulation Study, Results Degree Mean Squared Error Bias Squared Variance 0 0.22643 0.22476 0.00167 1 0.00829 0.00508 0.00322 2 0.00387 0.00005 0.00381 9 0.01019 0.00002 0.01017 28

  29. If Time • Note that, ˆ f 9 ( x ) is ubiased • Some live coding 29

Recommend


More recommend