Overfitting and Regularization March 31, 2020 Data Science CSCI - PowerPoint PPT Presentation

Overfitting and Regularization March 31, 2020 Data Science CSCI 1951A Brown University Instructor: Ellie Pavlick HTAs: Josh Levin, Diane Mutako, Sol Zitter 1

Announcements • Office Hours—watch calendar • ML assignment out later today • Analysis project deliverable out soon

Today • Overfitting and Regularization

Train/Test Splits • By definition, trained models are minimizing their objective for the data they see, but not for the data they don’t see • What we really care about is how the model does on data we don’t see • So we split our training data into disjoin sets—a train set and a test set—and assess performance on test given parameters set using train.

Train/Test Splits 5

Train/Test Splits Train 6

Train/Test Splits Train MSE = 6 7

Train/Test Splits Test 8

Train/Test Splits Test MSE = 12 9

Train/Test Splits Problem gets worse as models get more powerful/flexible Train MSE = 4 10

Train/Test Splits Problem gets worse as models get more powerful/flexible MSE = 14 11

Cross Validation • Some train/test splits are harder than others • To get a more stable estimate of test performance, we can use cross validation accs = [] for i in range(num_folds): train, test = random.split(data) clf.fit(train) accs.append(clf.score(test))

Overfitting • Models are likely to overfit when the model is more “complex” than is needed to explain the variation we care about • “Complex” generally means the number of parameters (i.e. features) is high • When the number of parameters is >= the number of observations, you can trivially memorize your training data, often without learning anything generalizable to test time

Regularization • Incur a cost for including more features (more non-zero weights), or for assuming features are very important (more higher weights) • Or “early stopping”—for iterative training procedures (i.e. gradient descent) stop before the model has fully converged (i.e. you assume the final steps are spent memorizing noise) • By definition regularization will make your model worse during training… • But hopefully better at test (which is what you really care about)

<latexit sha1_base64="W8ULAvdO+AKQ8XWFOrbuSfWxYPM=">ACJHicbVDLSgMxFM3UV62vUZdugkVoEcqMCArdFN24rGAf0Cklk6ZtaCYZkjtiKf0YN/6KGxc+cOHGbzFtZ6GtBwKHc85Nck8YC27A876czMrq2vpGdjO3tb2zu+fuH9SNSjRlNaqE0s2QGCa4ZDXgIFgz1oxEoWCNcHg9Rv3TBu5B2MYtaOSF/yHqcErNRxyxGXnQAGDAgOQt4vCGVM4aGM51oRn+JA2Ou6BFNloJDK02ix4+a9kjcDXiZ+SvIoRbXjvgdRZOISaCGNPyvRjaY6KBU8EmuSAxLCZ0SPqsZakETPt8WzJCT6xShf3lLZHAp6pvyfGJDJmFIU2GREYmEVvKv7ntRLoXbHXMYJMEnD/USgUHhaWO4yzWjIEaWEKq5/SumA6IJBdtrzpbgL68TOpnJd8r+bfn+cpVWkcWHaFjVEA+ukAVdIOqIYoekTP6BW9OU/Oi/PhfM6jGSedOUR/4Hz/AOuvo7Y=</latexit> <latexit sha1_base64="W8ULAvdO+AKQ8XWFOrbuSfWxYPM=">ACJHicbVDLSgMxFM3UV62vUZdugkVoEcqMCArdFN24rGAf0Cklk6ZtaCYZkjtiKf0YN/6KGxc+cOHGbzFtZ6GtBwKHc85Nck8YC27A876czMrq2vpGdjO3tb2zu+fuH9SNSjRlNaqE0s2QGCa4ZDXgIFgz1oxEoWCNcHg9Rv3TBu5B2MYtaOSF/yHqcErNRxyxGXnQAGDAgOQt4vCGVM4aGM51oRn+JA2Ou6BFNloJDK02ix4+a9kjcDXiZ+SvIoRbXjvgdRZOISaCGNPyvRjaY6KBU8EmuSAxLCZ0SPqsZakETPt8WzJCT6xShf3lLZHAp6pvyfGJDJmFIU2GREYmEVvKv7ntRLoXbHXMYJMEnD/USgUHhaWO4yzWjIEaWEKq5/SumA6IJBdtrzpbgL68TOpnJd8r+bfn+cpVWkcWHaFjVEA+ukAVdIOqIYoekTP6BW9OU/Oi/PhfM6jGSedOUR/4Hz/AOuvo7Y=</latexit> <latexit sha1_base64="W8ULAvdO+AKQ8XWFOrbuSfWxYPM=">ACJHicbVDLSgMxFM3UV62vUZdugkVoEcqMCArdFN24rGAf0Cklk6ZtaCYZkjtiKf0YN/6KGxc+cOHGbzFtZ6GtBwKHc85Nck8YC27A876czMrq2vpGdjO3tb2zu+fuH9SNSjRlNaqE0s2QGCa4ZDXgIFgz1oxEoWCNcHg9Rv3TBu5B2MYtaOSF/yHqcErNRxyxGXnQAGDAgOQt4vCGVM4aGM51oRn+JA2Ou6BFNloJDK02ix4+a9kjcDXiZ+SvIoRbXjvgdRZOISaCGNPyvRjaY6KBU8EmuSAxLCZ0SPqsZakETPt8WzJCT6xShf3lLZHAp6pvyfGJDJmFIU2GREYmEVvKv7ntRLoXbHXMYJMEnD/USgUHhaWO4yzWjIEaWEKq5/SumA6IJBdtrzpbgL68TOpnJd8r+bfn+cpVWkcWHaFjVEA+ukAVdIOqIYoekTP6BW9OU/Oi/PhfM6jGSedOUR/4Hz/AOuvo7Y=</latexit> <latexit sha1_base64="W8ULAvdO+AKQ8XWFOrbuSfWxYPM=">ACJHicbVDLSgMxFM3UV62vUZdugkVoEcqMCArdFN24rGAf0Cklk6ZtaCYZkjtiKf0YN/6KGxc+cOHGbzFtZ6GtBwKHc85Nck8YC27A876czMrq2vpGdjO3tb2zu+fuH9SNSjRlNaqE0s2QGCa4ZDXgIFgz1oxEoWCNcHg9Rv3TBu5B2MYtaOSF/yHqcErNRxyxGXnQAGDAgOQt4vCGVM4aGM51oRn+JA2Ou6BFNloJDK02ix4+a9kjcDXiZ+SvIoRbXjvgdRZOISaCGNPyvRjaY6KBU8EmuSAxLCZ0SPqsZakETPt8WzJCT6xShf3lLZHAp6pvyfGJDJmFIU2GREYmEVvKv7ntRLoXbHXMYJMEnD/USgUHhaWO4yzWjIEaWEKq5/SumA6IJBdtrzpbgL68TOpnJd8r+bfn+cpVWkcWHaFjVEA+ukAVdIOqIYoekTP6BW9OU/Oi/PhfM6jGSedOUR/4Hz/AOuvo7Y=</latexit> Regularization � � loss ( x ; θ ) + λ cost ( θ ) min θ • Adds an extra “hyperparameter” which controls how much you penalize

Dev/Validation Sets • Often you need to make meta-decisions (not just set the parameters), E.g. • Which model is better (i.e. generalizes better to held out data)? • What regularization to use? • How many training iterations? • Do do this, you have to split into train/dev/test, not just train/dev. If you use test to set these parameters, you are “peaking” at unseen data in order to fit the model, and thus test performance is no longer actually representative of how you would do in the real world

Overfitting and Regularization March 31, 2020 Data Science CSCI - PowerPoint PPT Presentation

Overfitting and Regularization March 31, 2020 Data Science CSCI 1951A Brown University Instructor: Ellie Pavlick HTAs: Josh Levin, Diane Mutako, Sol Zitter 1 Announcements Office Hourswatch calendar ML assignment out later today

Overfitting Can Happen Overfitting Can Happen Overfitting Can Happen Overfitting Can Happen

The Problem of Overfitting The Problem of Overfitting BR data: neural network with 20%

Learning From Data Lecture 11 Overfitting What is Overfitting When does Overfitting Occur

Overfitting Validation process. Overfitting Ettore Lanzarone March 18, 2020 LESSON 3 Lesson 3

Regularization The problem of overfitting Machine Learning Example: Linear regression (housing

Introduction CSCE 970 CSCE 970 Lecture 3: Lecture 3: Regularization Regularization CSCE 970

Regularization Regularization is a general approach to add a complexity parameter to a

Regularization Overview Regularization Overview Problems & Multicollinearity We will

Advanced Classification; Overfitting and regularization; From .R to Notebooks Structure of the

recap: Overfitting Fitting the data more than is warranted Learning From Data Data Lecture 12

Class 2 & 3 Overfitting & Regularization Carlo Ciliberto Department of Computer Science,

Learning From Data Lecture 12 Regularization Constraining the Model Weight Decay Augmented

Lecture 3: Regularization I Princeton University COS 495 Instructor: Yingyu Liang What is

Holdout and Cross- -Validation Validation Holdout and Cross Methods Overfitting Avoidance

Overfitting Many hypotheses consistent with/close to the data About this class With enough

CSE 446: Week 3: Decision Trees (Apr 4) Instructor: Sergey Levine I. Overfitting idea 1: holdout

Back to the future : the Back and Forth Nudging algorithm Conference on Applied Inverse

Back to the future : the Back and Forth Nudging Scaling Up and Modeling for Transport and Flow in

On the back-and-forth relation on Boolean Algebras. Antonio Montalb an. U. of Chicago AMS -

Strictification of Circular Programs J.P. Fernandes 1 J. Saraiva 1 D. Seidel 2 ander 2 J. Voigtl

Nuclear Browns Ferry Nuclear Plant Filtering Strategies January 9, 2013 Preston Swafford,

An introduction to Nonnegative Matrix Factorisation Slim ESSID Telecom ParisTech June 2015 Slim

Analyze Breakdown in All Seasons Cavity K. Yonehara APC,

Breakdown in All Seasons Cavity K. Yonehara APC, Fermilab

Overfitting and Regularization March 31, 2020 Data Science CSCI - PowerPoint PPT Presentation

Overfitting and Regularization March 31, 2020 Data Science CSCI 1951A Brown University Instructor: Ellie Pavlick HTAs: Josh Levin, Diane Mutako, Sol Zitter 1 Announcements Office Hourswatch calendar ML assignment out later today

Overfitting Can Happen Overfitting Can Happen Overfitting Can Happen Overfitting Can Happen

The Problem of Overfitting The Problem of Overfitting BR data: neural network with 20%

Learning From Data Lecture 11 Overfitting What is Overfitting When does Overfitting Occur

Overfitting Validation process. Overfitting Ettore Lanzarone March 18, 2020 LESSON 3 Lesson 3

Regularization The problem of overfitting Machine Learning Example: Linear regression (housing

Introduction CSCE 970 CSCE 970 Lecture 3: Lecture 3: Regularization Regularization CSCE 970

Regularization Regularization is a general approach to add a complexity parameter to a

Regularization Overview Regularization Overview Problems &amp; Multicollinearity We will

Advanced Classification; Overfitting and regularization; From .R to Notebooks Structure of the

recap: Overfitting Fitting the data more than is warranted Learning From Data Data Lecture 12

Class 2 &amp; 3 Overfitting &amp; Regularization Carlo Ciliberto Department of Computer Science,

Learning From Data Lecture 12 Regularization Constraining the Model Weight Decay Augmented

Lecture 3: Regularization I Princeton University COS 495 Instructor: Yingyu Liang What is

Holdout and Cross- -Validation Validation Holdout and Cross Methods Overfitting Avoidance

Overfitting Many hypotheses consistent with/close to the data About this class With enough

CSE 446: Week 3: Decision Trees (Apr 4) Instructor: Sergey Levine I. Overfitting idea 1: holdout

Back to the future : the Back and Forth Nudging algorithm Conference on Applied Inverse

Back to the future : the Back and Forth Nudging Scaling Up and Modeling for Transport and Flow in

On the back-and-forth relation on Boolean Algebras. Antonio Montalb an. U. of Chicago AMS -

Strictification of Circular Programs J.P. Fernandes 1 J. Saraiva 1 D. Seidel 2 ander 2 J. Voigtl

Nuclear Browns Ferry Nuclear Plant Filtering Strategies January 9, 2013 Preston Swafford,

An introduction to Nonnegative Matrix Factorisation Slim ESSID Telecom ParisTech June 2015 Slim

Analyze Breakdown in All Seasons Cavity K. Yonehara APC,

Breakdown in All Seasons Cavity K. Yonehara APC, Fermilab

Regularization Overview Regularization Overview Problems & Multicollinearity We will

Class 2 & 3 Overfitting & Regularization Carlo Ciliberto Department of Computer Science,