Improving Cross-Validation Classifier Selection Accuracy through - PowerPoint PPT Presentation

Improving Cross-Validation Classifier Selection Accuracy through Meta- learning Jesse H. Krijthe Jesse H. Krijthe, Tin Kam Ho, Marco Loog

Classifier Selection Problem Classifiers Classification problem D QDA Classification problem SVM 4 LDA GBM 2 Random Forest 0 Feature 2 Fisher Nearest Mean − 2 − 4 Nearest Neighbor − 6 C4.5 − 8 ID3 − 10 − 5 0 5 Feature 1 e Which classifier gives the lowest error rate when evaluated on a large test set?

A Practical Solution e • In practice: have no large test set to determine e • Alternative: estimate through a cross-validation procedure, ˆ e cv • Procedure is practically unbiased and intuitive • Use the estimates of each classifier to select the best one • Used for: – Classifier selection – Parameter tuning – Feature selection – Performance estimation

Goal Is it possible to use meta-learning techniques to improve the accuracy (rather than the computational efficiency) of classifier selection using cross-validation?

Cross-validation revisited (1/2) • C={c 1 ,..c m } a set of classifiers, D a dataset • Calculate the k -fold cross-validation error 1. Randomly assign the n objects in the dataset to k parts (folds) 2. Use fold 2 to k to train a classifier 3. Use fold 1 to test its accuracy 4. Cycle through, using each fold as the test set once 5. Average the accuracies over all the folds e 1 D Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 Fold 6 Fold 7 Fold 8 Fold 9 Fold10 n n n n n n n n n n k k k k k k k k k k

Cross-validation revisited (1/2) • C={c 1 ,..c m } a set of classifiers, D a dataset • Calculate the k -fold cross-validation error 1. Randomly assign the n objects in the dataset to k parts (folds) 2. Use fold 2 to k to train a classifier 3. Use fold 1 to test its accuracy 4. Cycle through, using each fold as the test set once 5. Average the accuracies over all the folds e 1 e 2 D Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 Fold 6 Fold 7 Fold 8 Fold 9 Fold10 n n n n n n n n n n k k k k k k k k k k

Cross-validation revisited (1/2) • C={c 1 ,..c m } a set of classifiers, D a dataset • Calculate the k -fold cross-validation error 1. Randomly assign the n objects in the dataset to k parts (folds) 2. Use fold 2 to k to train a classifier 3. Use fold 1 to test its accuracy 4. Cycle through, using each fold as the test set once k e i 5. Average the accuracies over all the folds ∑ ˆ e cv = k i = 1 e 1 e 2 e 3 D Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 Fold 6 Fold 7 Fold 8 Fold 9 Fold10 n n n n n n n n n n k k k k k k k k k k

Cross-validation revisited (2/2) • Select the classifier with lowest ˆ e cv • Bias decreases as k increases – Unbiased as estimator for n − n k – Small bias for reasonable k, large n • For a particular dataset, interested in the di ff erence and e ˆ e cv • Variance – High as k goes to n – High as k goes to 2 – Lowest usually around 5-10 – Higher than for bootstrap and resubstition

Why would cross-validation fail? • As Braga-Neto et. al. 2004 and others note, if n is small, variance of the cross-validation error estimate becomes large • Cross-validation error estimates become unreliable for a given dataset • Specifically: classifier selection based on these estimates may su ff er

Meta-learning (1/2) • Learning which classifier to select based on characteristics of the dataset • Classifier selection as just another classification problem – Classes: the most accurate classifier – Features: statistics on the dataset (meta-features) • Meta-features are preferably – Computationally e ff icient – Predictive – Interpretable

Meta-learning (2/2) Datasets Measures Parameterization of dataset space D 1 0.3 2 fold CV Error Quadratic Discriminant 0.25 0.2 D1 D2 D 2 0.15 D3 0.1 D 3 0.05 0 0 0.05 0.1 0.15 0.2 0.25 0.3 2 fold CV Error Linear Discriminant

Cross-validation selection as meta-learning • Cross-validation errors are measures on the dataset as well • Idea: Treat them as meta-features • Meta-classifier in this case: – Select the classifier with the lowest cross- validation error – Static diagonal rule Meta-classes Best classifier (m) Meta-features Cross-validation error (m) Meta-classifier Static ‘diagonal’ rule

Cross-validation Meta-problem Datasets Measures Parameterization of dataset space D 1 0.3 2 fold CV Error Quadratic Discriminant 0.25 0.2 D1 D2 D 2 0.15 D3 0.1 D 3 0.05 0 0 0.05 0.1 0.15 0.2 0.25 0.3 2 fold CV Error Linear Discriminant

Cross-validation Meta-problem Datasets Measures Parameterization of dataset space D 1 0.3 2 fold CV Error Quadratic Discriminant 0.25 0.2 D1 D2 D 2 0.15 D3 0.1 D 3 0.05 0 0 0.05 0.1 0.15 0.2 0.25 0.3 2 fold CV Error Linear Discriminant Is this simple, static rule justified?

A Meta-learning Universe (1/3) • Choice between two simple classifiers: – Nearest Mean – 1-Nearest Neighbor • Two simple problem types – Each suited to one of the classifiers – Small training samples (20-100) – Generate enough data to estimate the real error (~20000) – Problem types have equal priors • Slightly contrived – Visualization – Illustrate Concept

A Meta-learning Universe (2/3) • Randomly vary the width • Randomly vary the distance (variance) • Generate 500 problems • Generate 500 problems • B={B 1 ,B 2 ,…,B 500 } • G={G 1 ,G 2 ,…,G 500 } • Low Bayes error • High Bayes error

Meta − problem 0.7 NM, G 1 − NN, G 0.6 NM, B 1 − NN, B 0.5 10 − fold CV error 1 − NN 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 10 − fold CV error Nearest Mean Error: 0.16 -> 0.06 (learning makes a di ff erence)

Additional meta-features (1/2) Can characteristics of the data improve classifier selection a fu er we know the cross validation errors? • Classifiers: Nearest mean and Least Squares • Elongated boundary problem (100 dimensions) • Randomness – Class priors – Number of objects (20-100) • Extra features – Number of objects n – Variance of the cross-validation errors

Additional meta-features (2/2) Classifier CV errors +n +Variance +n & Variance CV- 0.237 selection k-NN 0.238 0.151 0.221 0.127 LDA 0.241 0.159 0.239 0.110 100 100 80 80 60 60 40 40 20 20 0 0.05 0.1 0.15 0 0.05 0.1 0.15 − − − −

Pseudo real-world data Real − world data meta − problem 0.6 0.5 10 − fold CV error Parzen 0.4 0.3 0.2 0.1 Fisher Best Parzen Best 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 10 − fold CV error Fisher

Pseudo real-world data Classifier Best on Nearest Mean 236 k-Nearest Neighbor 118 Fisher 243 Quadratic Discriminant 32 Parzen Density 286 Decision Stump (Purity Criterion) 221 Linear Support Vector Machine 164 Radial Basis Support Vector Machine 200 Classifier CV errors +Variance CV- 0.695 selection k-NN 0.605 0.587 LDA 0.618 0.599

Conclusion • There are universes were me meta- a-le learning arning can an outperf outperform orm cr cross oss-validation alidation based classifier selection • Additional statistics of the data can aid in classifier selection • Some indication this works on real-world datasets, more experiments are needed • Evidence to support meta-learning not just as a time- e ff icient alternative to cross-validation, but potentially more accurate

Improving Cross-Validation Classifier Selection Accuracy through - PowerPoint PPT Presentation

Improving Cross-Validation Classifier Selection Accuracy through Meta- learning Jesse H. Krijthe Jesse H. Krijthe, Tin Kam Ho, Marco Loog Classifier Selection Problem Classifiers Classification problem D QDA Classification problem SVM 4 LDA

Introduction to Data Science: Classifier n 1 n 1 k k Suppose you want to compare two

The Nave Bayes Classifier Machine Learning 1 Todays lecture The nave Bayes Classifier

Learning From Data Lecture 13 Validation and Model Selection The Validation Set Model Selection

Cross-validation and the Bootstrap In the section we discuss two resampling methods:

STAT 213 Cross-Validation (and Multifactor ANOVA?) Colin Reimer Dawson Oberlin College 12

A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection (Kohavi,

Progress to Date in A3: Method Transfer, Partial Validation and Cross validation A3: Method

02 | 27 SOUTHERN CROSS 23.04 03 | 27 SOUTHERN CROSS 23.04 04 | 27 SOUTHERN CROSS 23.04 06

Data Mining II Model Validation Heiko Paulheim Why Model Validation? We have seen so far

Lazy Associative Classification Decision Tree Classifier (Eager) Associative Classifier By

Classifier Selection Nicholas Ver Hoeve Craig Martek Ben Gardner Classifier Ensembles Assume

Dynamic Classifier Selection Based on Imprecise Probabilities Meizhu Li Ghent University

The Shadow of the Cross The Cross of Jesus part 1B The Shadow of the Cross Hebrews 10:1-14 The

Validation of National Burn Severity Validation of National Burn Severity Validation of National

Form Validation 1 CS380 What is form validation? 2 validation: ensuring that form's values

Stratified Cross-Validation in Multi-Label Classification Using Genetic Algorithms 7-8/02/2013

Stacking for supervised learning Stacking for supervised learning Niall Rooney, NIKEL,

Towards Proximity Graph Auto-Configuration: an Approach Based on Meta-learning Rafael S. Oyamada,

Meta-policies for Distributed Role-based Access Control Andrs Belokosztolszki, Ken Moody

Meta Queries Workshop Scott Joyce Advanced Meta Queries Which table do I use? How do I

A toolkit for metainferential logics David Ripley Monash University http://davewripley.rocks

Efficient Off-Policy Meta- Reinforcement Learning via Probabilistic Context Variables Rakelly,

(Meta-)Datamanagement with KNIME SWIB 2017 Workshop SWIIB 2017 Workshop KNIME 1 Your mentors

Metaprogramming Prof. Dr. Ralf Lmmel Universitt Koblenz-Landau Software Languages Team