Carnegie Mellon University 10-701 Machine Learning Spring 2013 Bias Variance Trade-off � Intuition: � If the model is too simple, the solution is biased and does not fit the data � If the model is too complex then it is very sensitive to small changes in the data 2/12/13 Introduction to Probability Theory 1
Carnegie Mellon University 10-701 Machine Learning Spring 2013 Bias � If you sample a dataset D multiple times you expect to learn a different h(x) � Expected hypothesis is E D [h(x)] � Bias: difference between the truth and what you expect to learn Z � bias 2 = { E D [ h ( x )] − t ( x ) 2 } 2 p ( x ) dx x � Decreases with more complex models 2/12/13 Recitation 1: Statistics Intro 2
Carnegie Mellon University 10-701 Machine Learning Spring 2013 Variance � Variance: difference between what you learn from a particular dataset and what you expect to learn Z { E D [( h ( x ) − ¯ � h ( x )) 2 ] } p ( x ) dx variance = x ¯ h ( x ) = E D [ h ( x )] � Decreases with simpler models 2/12/13 Recitation 1: Statistics Intro 3
Carnegie Mellon University 10-701 Machine Learning Spring 2013 Bias-Variance Tradeoff � The choice of hypothesis class introduces a learning bias � More complex class: less bias and more variance. 2/12/13 Recitation 1: Statistics Intro 4
Carnegie Mellon University 10-701 Machine Learning Spring 2013 Training error � Given a dataset � Chose a loss function (L 2 for regression for example) � Training set error: N train 1 ⇣ ⌘ X error train = I ( y i 6 = h ( x )) N train j =1 N train ⌘ 2 1 ⇣ X error train = y i − w. x i N train j =1 2/12/13 Recitation 1: Statistics Intro 5
Carnegie Mellon University 10-701 Machine Learning Spring 2013 Training error as a function of complexity 2/12/13 Recitation 1: Statistics Intro 6
Carnegie Mellon University 10-701 Machine Learning Spring 2013 Prediction error � Training error is not necessary a good measure � We care about the error over all inputs points: ⇣ ⌘ error true = E x I ( y 6 = h ( x )) 2/12/13 Recitation 1: Statistics Intro 7
Carnegie Mellon University 10-701 Machine Learning Spring 2013 Prediction error as a function of complexity 2/12/13 Recitation 1: Statistics Intro 8
Carnegie Mellon University 10-701 Machine Learning Spring 2013 Prediction error � Training error is not necessary a good measure � We care about the error over all inputs points: ⇣ ⌘ error true = E x I ( y 6 = h ( x )) � Training error is an optimistically biased estimate of prediction error. You optimized with respect to training set. 2/12/13 Recitation 1: Statistics Intro 9
Carnegie Mellon University 10-701 Machine Learning Spring 2013 Train-test � In practice: � Randomly divide the dataset into test and train. � Use training data to optimize parameters. � Test error: N test 1 ⇣ ⌘ X error test = I ( y i 6 = h ( x i )) N test i =1 2/12/13 Recitation 1: Statistics Intro 10
Carnegie Mellon University 10-701 Machine Learning Spring 2013 Test error as a function of complexity 2/12/13 Recitation 1: Statistics Intro 11
Carnegie Mellon University 10-701 Machine Learning Spring 2013 Overfitting � Overfitting happens when we obtain a model h when there exist another solution h’ such that: [ error train ( h ) < error train ( h 0 )] ∧ [ error true ( h ) > error true ( h 0 )] 2/12/13 Recitation 1: Statistics Intro 12
Carnegie Mellon University 10-701 Machine Learning Spring 2013 Error as a function of data size for fixed complexity 2/12/13 Recitation 1: Statistics Intro 13
Carnegie Mellon University 10-701 Machine Learning Spring 2013 Careful � Test set only unbiased if never ever do any learning on it (including parameter selection!). 2/12/13 Recitation 1: Statistics Intro 14
Recommend
More recommend