diagnostics
play

Diagnostics Gad Kimmel Outline Introduction. Bootstrap method. - PowerPoint PPT Presentation

Diagnostics Gad Kimmel Outline Introduction. Bootstrap method. Cross validation. ROC plot. Introduction Motivation Estimating properties of an estimator (an estimator is a function of input points). x 1, x 2, ... ,x N


  1. Diagnostics Gad Kimmel

  2. Outline ● Introduction. ● Bootstrap method. ● Cross validation. ● ROC plot.

  3. Introduction

  4. Motivation ● Estimating properties of an estimator (an estimator is a function of input points). x 1, x 2, ... ,x N − Given data samples , evaluate some estimator, say the average: ∑ x i N − How can we estimate its properties (e.g., its variance)? var  ∑ x i = 1 2 var  ∑ x i  N N ● Model selection. − How many parameters should we use?

  5. Bootstrap Method

  6. Evaluating Accuracy ● A simple approach for accuracy estimation is to provide the bias or variance of the estimator. ● Example: suppose the samples are independently identically distributed (i.i.d.), with finite variance. − We know, by the central limit theorem, that 1 / 2   x n − n  Z ~ N  0,1   − Roughly speaking, is normally distributed with x n  2 / n   expectation and variance .

  7. Assumptions Do Not Hold ● What if the r.v. are not i.i.d. ? ● What if we want to evaluate another estimator (and not )? x n  ● It would be nice to have many different samples of samples. ● In that case, one could calculate the estimator for each sample of samples, and infer its distribution. ● But... we don't have it.

  8. Solution - Bootstrap ● Estimating the sampling distribution of an estimator by resampling with replacement from the original sample. ● Efron, The Annals of Statistics , '79.

  9. Bootstrap - Illustration ● Goal: Sampling from P. P

  10. Bootstrap - Illustration ● Goal: Sampling from P. x 1 , x 2 , x 3 , x 4 , ... , x n P

  11. Bootstrap - Illustration ● Goal: Sampling from P. x 1 , x 2 , x 3 , x 4 , ... , x n P ... in order to estimate the variance of an estimator.

  12. Bootstrap - Illustration Samples Estimator x 1,1 ,x 1,2 , x 1,3 , ... , x 1, n e 1 x 2,1 , x 2,2 , x 2,3 , ... ,x 2, n e 2 x 3,1 , x 3,2 , x 3,3 , ... , x 3, n e 3 P x 4,1 , x 4,2 , x 4,3 , ... ,x 4, n e 4 ... x m , 1 ,x m , 2 , x m, 3 , ... , x m, n e m

  13. Bootstrap - Illustration Samples Estimator x 1,1 ,x 1,2 , x 1,3 , ... , x 1, n e 1 x 2,1 , x 2,2 , x 2,3 , ... ,x 2, n e 2 x 3,1 , x 3,2 , x 3,3 , ... , x 3, n e 3 P x 4,1 , x 4,2 , x 4,3 , ... ,x 4, n e 4 ... x m , 1 ,x m , 2 , x m, 3 , ... , x m, n e m ● What is the variance of ? e

  14. Bootstrap - Illustration Samples Estimator x 1,1 ,x 1,2 , x 1,3 , ... , x 1, n e 1 x 2,1 , x 2,2 , x 2,3 , ... ,x 2, n e 2 x 3,1 , x 3,2 , x 3,3 , ... , x 3, n e 3 P x 4,1 , x 4,2 , x 4,3 , ... ,x 4, n e 4 ... x m , 1 ,x m , 2 , x m, 3 , ... , x m, n e m var  e = 1 m ● Estimate the variance by m ∑ i = 1 2  e i − 

  15. Bootstrap - Illustration ● We only have 1 sample: x 1 , x 2 , x 3 , x 4 , ... , x n P

  16. Bootstrap - Illustration ● Sampling is done from the empirical distribution. Samples Estimator z 1,1 ,z 1,2 , z 1,3 , ... , z 1, n e 1 z 2,1 , z 2,2 , z 2,3 , ... , z 2, n e 2 P z 3,1 , z 3,2 , z 3,3 , ... , z 3, n e 3 x 1 , x 2 , x 3 , x 4 , ... ,x n z 4,1 , z 4,2 , z 4,3 , ... , z 4, n e 4 ... z m , 1 ,z m, 2 , z m, 3 , ... , z m , n e m

  17. Formalization ● The data is . Note that the distribution  x 1, x 2, ... , x n ~ P function P is unknown. ● We sample m samples . Y 1, Y 2, ... ,Y m contains n samples drawn from Y i = z i , 1 , z i , 2 , ... , z i, n  the empirical distribution of the data: # x i Pr [ z j , k = x i ]= n Where is the number of times appears in # x i x i the original data.

  18. The Main Idea ● . Y i ~  P ● We wish that . Is it (always) true? NO. P =  P ● Rather, is an approximation of .  P P

  19. Example 1 ● The yield of the Dow Jones Index over the past two years is ~12%. ● You are considering a broker that had a yield of 25%, by picking specific stocks from the Dow Jones. ● Let x be a r.v. that represents the yield of randomly selected stocks. ● Do we know the distribution of x ?

  20. Example 1 x 1, x 2, ... ,x 10,000 ● Prepare a sample , where each x i is the yield of randomly selected stocks. ● Approximate the distribution of x using this sample.

  21. Evaluation of Estimators ● Using the approximate distribution, we can evaluate estimators. E.g.: − Variance of the mean. − Confidence intervals.

  22. Example 1 ● What is the probability to obtain yield larger than 25% (p-value)?

  23. Example 1 ● What is the probability to obtain yield larger than 25% (p-value)? 30%

  24. Example 2 - Decision tree ● Decision tree - short introduction.

  25. Example 2 ● Building a decision tree.

  26. Example 2 ● Many other trees can be built, using different algorithms. ● For a specific tree one can calculate prediction accuracy: # of elements classified correctly total # of elements

  27. Example 2 ● Many other trees can be built, using different algorithms. ● For a specific tree one can calculate prediction accuracy: # of elements classified correctly total # of elements ● For calculating error bars for this value, we need to sample more, apply the algorithm many times, and each time evaluate the prediction.

  28. Example 2 - Applying Bootstrap Build decision tree for each sample. Calculate prediction for each tree. Evaluate error bars based on predictions.

  29. Example 2 - Applying Bootstrap Build decision T 1 ,T 2 , ... ,T n tree for each sample. Calculate prediction p 1 , p 2 , ... , p n p 1 , p 2 , ... , p n for each tree. Evaluate error bars ± 1.96 STD  p 1 , p 2 , ... , p n  based on predictions.

  30. Example 2 - Applying Bootstrap But we have Build decision only one data tree for each set ! sample. Calculate prediction for each tree. Evaluate error bars based on predictions.

  31. Example 2 - Applying Bootstrap Use bootstrap Build decision to prepare many tree for each samples. sample. Calculate prediction for each tree. Evaluate error bars based on predictions.

  32. Cross Validation

  33. Objective ● Model selection.

  34. Formalization ● Let (x, y) drawn from distribution P . Where n and y ∈ℜ x ∈ℜ ● Let be a learning algorithm, with n  ℜ f  : ℜ parameter(s) . 

  35. Example ● Regression model.

  36. What Do We Want? ● We want the method that is going to predict future data most accurately, assuming they are drawn from the distribution P .

  37. What Do We Want? ● We want the method that is going to predict future data most accurately, assuming they are drawn from the distribution P . ● Niels Bohr: " It is very difficult to make an accurate prediction, especially about the future. "

  38. Choosing the Best Model ● For a sample ( x , y ) which is drawn from the distribution function P : 2  f   x − y  or |  f   x − y  | ● Since ( x , y ) is a r.v. we are usually interested in: 2 ] E [ f   x − y 

  39. Choosing the Best Model (cont.) ● Choose the parameter(s) :  2 ] argmin  E [ f   x − y  ● The problem is that we don't know to sample from P .

  40. Regression − Order of 1 (Linear) 20 18 16 14 12 10 8 6 4 2 0 4 6 8 10 12 14 16

  41. Regression − Order of 2 20 18 16 14 12 10 8 6 4 2 0 4 6 8 10 12 14 16

  42. Regression − Order of 3 20 18 16 14 12 10 8 6 4 2 0 4 6 8 10 12 14 16

  43. Regression − Order of 4 20 18 16 14 12 10 8 6 4 2 0 4 6 8 10 12 14 16

  44. Regression − Order of 5 20 18 16 14 12 10 8 6 4 2 0 4 6 8 10 12 14 16

  45. Regression − Join the Dots 20 18 16 14 12 10 8 6 4 2 0 4 6 8 10 12 14 16

  46. Solution - Cross Validation ● Partition the data to 2 sets: − Training set T . − Test set S . ● Calculate using only the training set T .  ● Given , calculate  1 | S | ∑  x i , y i ∈ S  f   x i − y i  2

  47. Back to the Example ● In our case, we should try different orders for the regression (or different # of params). ● Each time apply the regression only on the training set, and calculate estimation error on the test set. ● The # of parameters will be the one minimizing the error.

  48. Variants of Cross Validation ● Test - set. ● Leave one out. ● k-fold cross validation.

  49. K-fold Cross Validation Train Train Test Train Train

  50. K-fold Cross Validation ● We want to find a parameter that minimizes the cross validation estimate of prediction error: CV  = 1 | N | ∑ L  y i , f − k  i   x i ,  

  51. K-fold Cross Validation ● How to choose K? ● K=N ( = leave one out) - CV is unbiased for true prediction error, but can have high variance. ● When K increases - CV has lower variance, but bias could be a problem (depending on how the performance of the learning method varies with size of training set).

  52. ROC Plot (Receiver Operating Characteristic)

Recommend


More recommend