Holdout and Cross- -Validation Validation Holdout and Cross - PowerPoint PPT Presentation

Holdout and Cross- -Validation Validation Holdout and Cross Methods Overfitting Avoidance Methods Overfitting Avoidance Decision Trees Decision Trees – Reduce error pruning Reduce error pruning – – Cost Cost- -complexity pruning complexity pruning – Neural Networks Neural Networks – Early stopping Early stopping – – Adjusting Regularizers via Cross Adjusting Regularizers via Cross- -Validation Validation – Nearest Neighbor Nearest Neighbor – Choose number of neighbors Choose number of neighbors – Support Vector Machines Support Vector Machines – Choose C Choose C – σ for Gaussian Kernels Choose σ – Choose for Gaussian Kernels –

Reduce Error Pruning Reduce Error Pruning Given a data sets S Given a data sets S – Subdivide S into S Subdivide S into S train and S dev – train and S dev – Build tree using S Build tree using S train – train – Pass all of the S Pass all of the S dev training examples through – dev training examples through the tree and estimate the error rate of each the tree and estimate the error rate of each node using S dev node using S dev – Convert a node to a leaf if it would have lower Convert a node to a leaf if it would have lower – estimated error than the sum of the errors of estimated error than the sum of the errors of its children its children

Reduce Error Pruning Example Reduce Error Pruning Example

Cost- -Complexity Pruning Complexity Pruning Cost The CART system (Breiman et al, 1984), The CART system (Breiman et al, 1984), employs cost- -complexity pruning: complexity pruning: employs cost α |Tree| J(Tree,S) = ErrorRate(Tree,S) + α |Tree| J(Tree,S) = ErrorRate(Tree,S) + where |Tree| is the number of nodes in the where |Tree| is the number of nodes in the α is a parameter that controls the tree and α is a parameter that controls the tree and tradeoff between the error rate and the tradeoff between the error rate and the penalty penalty α is set by cross α is set by cross- -validation validation

α Determining Important Values of α Determining Important Values of Goal: Identify a finite set of candidate values for Goal: Identify a finite set of candidate values for α . Then evaluate them via cross α . Then evaluate them via cross- -validation validation α = α 0 Set α = α = 0; t = 0 Set 0 = 0; t = 0 Train S to produce tree T Train S to produce tree T Repeat until T is completely pruned Repeat until T is completely pruned α = α k+1 determine next larger value of α = α – determine next larger value of that would – k+1 that would cause a node to be pruned from T cause a node to be pruned from T – prune this node prune this node – – t := t + 1 t := t + 1 – This can be done efficiently This can be done efficiently

α by Cross Choosing an α by Cross- -Validation Validation Choosing an Divide S into 10 subsets S 0 , … …, S , S 9 Divide S into 10 subsets S 0 , 9 In fold v In fold v – Train a tree on U Train a tree on U i S i – v S ≠ v i ≠ i α k For each α – For each , prune the tree to that level and – k , prune the tree to that level and measure the error rate on S v measure the error rate on S v ε k Compute ε – Compute to be the average error rate over – k to be the average error rate over α = α k the 10 folds when α = α the 10 folds when k α k ε k α * Choose the α that minimizes ε . Call it α – Choose the – k that minimizes k . Call it * ε * and let ε be the corresponding error rate and let * be the corresponding error rate α * Prune the original tree according to α Prune the original tree according to *

α SE Rule for Setting α The 1- -SE Rule for Setting The 1 ε * Compute a confidence interval on ε and let U U be the be the Compute a confidence interval on * and let upper bound of this interval upper bound of this interval α k ε k Compute the smallest α whose ε · U . If we use Z=1 Compute the smallest k whose k · U . If we use Z=1 for the confidence interval computation, this is called the for the confidence interval computation, this is called the 1- -SE rule, because the bound is one SE rule, because the bound is one “ “standard error standard error” ” 1 ε * above ε above *

Notes on Decision Tree Pruning Notes on Decision Tree Pruning Cost- -complexity pruning usually gives best results in complexity pruning usually gives best results in Cost experimental studies experimental studies Pessimistic pruning is the most efficient (does not Pessimistic pruning is the most efficient (does not require holdout or cross- -validation) and it is quite robust validation) and it is quite robust require holdout or cross Reduce- -error pruning is rarely used, because it error pruning is rarely used, because it Reduce consumes training data consumes training data Pruning is more important for regression trees than for Pruning is more important for regression trees than for classification trees classification trees Pruning has relatively little effect for classification trees. Pruning has relatively little effect for classification trees. There are only a small number of possible prunings of a There are only a small number of possible prunings of a tree, and usually the serious errors made by the tree- - tree, and usually the serious errors made by the tree growing process (i.e., splitting on the wrong features) growing process (i.e., splitting on the wrong features) cannot be repaired by pruning. cannot be repaired by pruning. – Ensemble methods work much better than pruning – Ensemble methods work much better than pruning

Holdout Methods for Neural Networks Holdout Methods for Neural Networks Early Stopping using a development set Early Stopping using a development set Adjusting Regularizers using a Adjusting Regularizers using a development set or via cross- -validation validation development set or via cross – amount of weight decay amount of weight decay – – number of hidden units number of hidden units – – learning rate learning rate – – number of epochs number of epochs –

Early Stopping using an Evaluation Set Early Stopping using an Evaluation Set Dev Test Split S into S train and S dev Split S into S train and S dev Train on S train , after every epoch, evaluate on S dev . If Train on S train , after every epoch, evaluate on S dev . If error rate is best observed, save the weights error rate is best observed, save the weights

Reconstituted Early Stopping Reconstituted Early Stopping Recombine S train and S dev to produce S Recombine S train and S dev to produce S Train on S and stop at the point (# of epochs or Train on S and stop at the point (# of epochs or mean squared error) identified using S dev mean squared error) identified using S dev

Reconstituted Early Stopping Reconstituted Early Stopping Dev Test We can stop either when MSE on the training set matches the We can stop either when MSE on the training set matches the predicted optimal MSE or when the number of epochs matches the predicted optimal MSE or when the number of epochs matches the predicted optimal number of epochs predicted optimal number of epochs Experimental studies show little or no advantage for reconstituted Experimental studies show little or no advantage for reconstitut ed early stopping. Most people just use simple holdout early stopping. Most people just use simple holdout

Nearest Neighbor: Choosing k Nearest Neighbor: Choosing k Dev Test LOOCV k=9 gives best performance on development set and on test set. k=13 k=13 k=9 gives best performance on development set and on test set. gives best performance based on leave- -one one- -out cross out cross- -validation validation gives best performance based on leave

σ SVM Choosing C and σ SVM Choosing C and (BR Data Set; 100 examples; Valentini 2003) (BR Data Set; 100 examples; Valentini 2003)

20% label noise 20% label noise

σ for fixed C BR Data Set: Varying σ for fixed C BR Data Set: Varying

Summary Summary Holdout methods are the best way to Holdout methods are the best way to choose a classifier classifier choose a – Reduce error pruning for trees Reduce error pruning for trees – – Early stopping for neural networks Early stopping for neural networks – Cross- -validation methods are the best way validation methods are the best way Cross to set a regularization parameter regularization parameter to set a α complexity pruning parameter α – Cost Cost- -complexity pruning parameter – – Neural network weight decay setting Neural network weight decay setting – – Number Number k k of nearest neighbors in of nearest neighbors in k -NN NN – k - σ for SVMs C and σ – C and for SVMs –

Holdout and Cross- -Validation Validation Holdout and Cross - PowerPoint PPT Presentation

Holdout and Cross- -Validation Validation Holdout and Cross Methods Overfitting Avoidance Methods Overfitting Avoidance Decision Trees Decision Trees Reduce error pruning Reduce error pruning Cost Cost- -complexity pruning

CSE 446: Week 3: Decision Trees (Apr 4) Instructor: Sergey Levine I. Overfitting idea 1: holdout

Learning Algorithm Evaluation Outline Why ? Overfitting How? Holdout vs Cross-validation

The problems with holdout sets MODEL VALIDATION IN P YTH ON Kasey Jones Data Scientist

Cross-validation and the Bootstrap In the section we discuss two resampling methods:

STAT 213 Cross-Validation (and Multifactor ANOVA?) Colin Reimer Dawson Oberlin College 12

Progress to Date in A3: Method Transfer, Partial Validation and Cross validation A3: Method

Introduction to Data Science: Classifier n 1 n 1 k k Suppose you want to compare two

02 | 27 SOUTHERN CROSS 23.04 03 | 27 SOUTHERN CROSS 23.04 04 | 27 SOUTHERN CROSS 23.04 06

Validation of National Burn Severity Validation of National Burn Severity Validation of National

Form Validation 1 CS380 What is form validation? 2 validation: ensuring that form's values

The Shadow of the Cross The Cross of Jesus part 1B The Shadow of the Cross Hebrews 10:1-14 The

Criticality experiments and benchmarks for for validation of cross validation of cross sections:

Stratified Cross-Validation in Multi-Label Classification Using Genetic Algorithms 7-8/02/2013

Importance-Weighted Cross- Importance-Weighted Cross- Validation for Covariate Shift Validation

Data Mining II Model Validation Heiko Paulheim Why Model Validation? We have seen so far

in Spark Using GPU Minsik Cho, Rajesh Bordawekar IBM TJW Research 1 Cross-Validation 101

Lecture 2: Mappings of Probabilities to RKHS and Applications MLSS Cadiz, 2016 Arthur Gretton

Cervix cancer committee SENTICOL III: International prospective validation trial of sentinel node

The Cross-Sectional Dispersion of Stock Returns, Alpha and the Information Ratio Forthcoming in

Interim Treatment Selection In Clinical Trials Zhenming Shun, Gordon Lan, Yuhwen Soo

Errors and uncertainty in variables When to worry and when to Bayes? Stefanie Muff

An overview of Boosting Yoav Freund UCSD Plan of talk Generative vs. non-generative

Alpha-bits, Teleportation and Black Holes ArXiv:1706.09434, ArXiv:1807.06041 Geoffrey Penington,

Machine Learning and Data Mining Ensembles of Learners Kalev Kask HW4 Download data from