Overfitting Many hypotheses consistent with/close to the data - PowerPoint PPT Presentation

Overfitting Many hypotheses consistent with/close to the data About this class With enough features and a rich enough hypothesis space, it becomes easy to find mean- ingless regularity in the data The problem of overfitting and how to deal with it Day/Month/Rain may give you a function that exactly matches the outcomes of dice rolls, but Modifying logistic regression training to avoid would that function be a good predictor of fu- overfitting ture dice rolls? Evaluating classifiers. Accuracy, precision, recall, ROC curves Linear regression vs. polynomial example How do you decide on a particular preference? Simpler functions vs. more complex ones. But how do we define complexity in all cases? 1 2

Regularization Simpler polynomial: lower degree Simpler decision tree: less depth Simpler linear function: Prevent overfitting by creating a function that lower weights? you are trying to maximize (on the training data) that explicitly penalizes model complex- Have to make a tradeo ff between fit to training ity data and complexity. Sometimes we will see that we can make this mathematically explicit We’ll see a number of di ff erent examples, but let’s examine regularization in the context of Other possibilities: statistical significance, sim- logistic regression (Section 3.3 of the Mitchell ulating unseen test data chapter) 3

Evaluating Accuracy: Random Training/Test Splits Divide dataset into training set and test set What is the central limit theorem? Sum of Apply learning algorithm to training set, gen- independent random variables with finite mean erating hypothesis h and variance tends to normality in the limit. Notice: no condition on the distribution from which they are drawn! Classify examples in test set using h , and measure percentage of correct predictions made by Therefore, so does the mean of the indepen- h (accuracy) dent r.v.s Repeat a set number of times. Sampling distribution of the mean then tells us: 95% confidence interval given by mean Repeat the whole thing for di ff erently sized σ / √ n ± 1 . 96ˆ training and test sets, if you want to construct a learning curve... How do you compute confidence intervals for test accuracy? 4

Cross-Validation Another possible method if you have two dif- ferent candidate models Leave-One-Out Cross-Validation Attempt to estimate accuracy of the models Just what it sounds like. Train on all examples on simulated “test” data except one and then test on that one example. Repeat for all examples in the training data Standard approach: n -fold cross validation (very typical: n = 10). Divide the data into n equally Most e ffi cient use of available data in terms of sized sets. Train on n − 1 of them and test on getting an estimate of accuracy the n th. Repeat for all n folds Is the accuracy of the better one then a good Can be horribly computationally ine ffi cient, un- estimate of expected accuracy on unseen test less you can figure out a smart way to retrain data? without throwing away everything when swap- ping in one example for another If you tune your parameters in any way on training data (including for model selection), you must test on fresh test data to get a good estimate! 5 6

Confusion Matrices, Precision, and Recall Two kinds of errors: false positives and false Spam you catch, and precision gives a measure negatives (can generalize this to k classes as of how often you will label a legitimate mes- “Predict class i , actually class j ” sage as Spam. Pred. Negative Pred. Positive Precision and Recall are usually traded o ff against Act. Negative TN FP each other. 100% recall can be achieved by Act. Positive FN TP predicting everything to be positive. Extremely high precision can be achieved by predicting Precision: Percentage of predicted positives only the examples you are most confident of that were actually positive – TP / (TP + FP) to be positive. (also known as Specificity) Important in information retrieval. What would Recall: Percentage of actual positives that were precision and recall be in terms of searching for predicted positive – TP / (TP + FN) (also information on the web? known as Sensitivity) Spam example: if Spam messages are consid- ered “positives” then recall is how much of the 7

ROC Curves Let’s generalize. For any predictor, for a given false positive rate, what will the true positive rate be? How do we do this? Well, just think about ranking test examples by confidence and then taking cuto ff s wherever we want to test. If one classifier dominates another at every point on the ROC curve it is better People often use the area under the curve (AUC) as a single summary statistic to measure per- formance of classifiers 8

Overfitting Many hypotheses consistent with/close to the data - PowerPoint PPT Presentation

Overfitting Many hypotheses consistent with/close to the data About this class With enough features and a rich enough hy- pothesis space, it becomes easy to find mean- ingless regularity in the data The problem of overfitting and how to deal

Overfitting Can Happen Overfitting Can Happen Overfitting Can Happen Overfitting Can Happen

Learning From Data Lecture 11 Overfitting What is Overfitting When does Overfitting Occur

The Problem of Overfitting The Problem of Overfitting BR data: neural network with 20%

Overfitting Validation process. Overfitting Ettore Lanzarone March 18, 2020 LESSON 3 Lesson 3

4 2 5 6 6 7 1 Overfitting Overfitting 2 Examples Attributes Sometimes, model

CSE 446: Week 3: Decision Trees (Apr 4) Instructor: Sergey Levine I. Overfitting idea 1: holdout

The Paradox of Overfitting Volker Nannen February 1, 2003 Artificial Intelligence

Regularization The problem of overfitting Machine Learning Example: Linear regression (housing

Advanced Classification; Overfitting and regularization; From .R to Notebooks Structure of the

GMM & EM Last time summary Normalization Bias-Variance trade-off Overfitting and

Class 2 & 3 Overfitting & Regularization Carlo Ciliberto Department of Computer Science,

Data Mining Lecture 05: Overfitting Evaluation: accuracy, precision, recall, ROC

Summary Overfitting arises when we evaluate and train on the same data. We can bound error

Machine Learning (CSE 446): (continuation of overfitting &) Limits of Learning Sham M Kakade

recap: Overfitting Fitting the data more than is warranted Learning From Data Data Lecture 12

Outline IAML: Overfitting and Capacity Control Generalization error Estimating

Overfitting and Regularization March 31, 2020 Data Science CSCI 1951A Brown University

Learning From Data Lecture 22 Neural Networks and Overfitting Approximation vs. Generalization

Introduction to Artificial Intelligence Classification Algorithms Decision Trees and Overfitting

Overfitting + k-Nearest Neighbors Matt Gormley Lecture 4 Jan. 27, 2020 1 Course Staff 3

Machine Learning Basics Classification & Text Categorization Features Overfitting

Lecture 22 Interpolation, Overfitting, Ridgeless Regression, and Neural Networks Sasha Rakhlin

Overfitting, Cross-Validation Recommended reading: Neural nets: Mitchell Chapter 4

Learning Algorithm Evaluation Outline Why ? Overfitting How? Holdout vs Cross-validation

Overfitting Many hypotheses consistent with/close to the data - PowerPoint PPT Presentation

Overfitting Many hypotheses consistent with/close to the data About this class With enough features and a rich enough hy- pothesis space, it becomes easy to find mean- ingless regularity in the data The problem of overfitting and how to deal

Overfitting Can Happen Overfitting Can Happen Overfitting Can Happen Overfitting Can Happen

Learning From Data Lecture 11 Overfitting What is Overfitting When does Overfitting Occur

The Problem of Overfitting The Problem of Overfitting BR data: neural network with 20%

Overfitting Validation process. Overfitting Ettore Lanzarone March 18, 2020 LESSON 3 Lesson 3

4 2 5 6 6 7 1 Overfitting Overfitting 2 Examples Attributes Sometimes, model

CSE 446: Week 3: Decision Trees (Apr 4) Instructor: Sergey Levine I. Overfitting idea 1: holdout

The Paradox of Overfitting Volker Nannen February 1, 2003 Artificial Intelligence

Regularization The problem of overfitting Machine Learning Example: Linear regression (housing

Advanced Classification; Overfitting and regularization; From .R to Notebooks Structure of the

GMM &amp; EM Last time summary Normalization Bias-Variance trade-off Overfitting and

Class 2 &amp; 3 Overfitting &amp; Regularization Carlo Ciliberto Department of Computer Science,

Data Mining Lecture 05: Overfitting Evaluation: accuracy, precision, recall, ROC

Summary Overfitting arises when we evaluate and train on the same data. We can bound error

Machine Learning (CSE 446): (continuation of overfitting &amp;) Limits of Learning Sham M Kakade

recap: Overfitting Fitting the data more than is warranted Learning From Data Data Lecture 12

Outline IAML: Overfitting and Capacity Control Generalization error Estimating

Overfitting and Regularization March 31, 2020 Data Science CSCI 1951A Brown University

Learning From Data Lecture 22 Neural Networks and Overfitting Approximation vs. Generalization

Introduction to Artificial Intelligence Classification Algorithms Decision Trees and Overfitting

Overfitting + k-Nearest Neighbors Matt Gormley Lecture 4 Jan. 27, 2020 1 Course Staff 3

Machine Learning Basics Classification &amp; Text Categorization Features Overfitting

Lecture 22 Interpolation, Overfitting, Ridgeless Regression, and Neural Networks Sasha Rakhlin

Overfitting, Cross-Validation Recommended reading: Neural nets: Mitchell Chapter 4

Learning Algorithm Evaluation Outline Why ? Overfitting How? Holdout vs Cross-validation

GMM & EM Last time summary Normalization Bias-Variance trade-off Overfitting and

Class 2 & 3 Overfitting & Regularization Carlo Ciliberto Department of Computer Science,

Machine Learning (CSE 446): (continuation of overfitting &) Limits of Learning Sham M Kakade

Machine Learning Basics Classification & Text Categorization Features Overfitting