Machine Learning Classification: Introduction Hamid R. Rabiee - PowerPoint PPT Presentation

Machine Learning Classification: Introduction Hamid R. Rabiee Jafar Muhammadi, Nima Pourdamghani Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1/

Agenda Agenda  Introduction  Classification: A Two-Step Process  Evaluating Classification Methods  Classifier Performance  Performance Measures  Partitioning Methods Sharif University of Technology, Computer Engineering Department, Machine Learning Course 2

Int Introducti roduction on  Classification  predicts categorical class labels (discrete or nominal)  classifies data (constructs a model), based on the training set and the class labels, and uses it in classifying new data  Typical applications  Credit approval  Target marketing  Medical diagnosis  Fraud detection Sharif University of Technology, Computer Engineering Department, Machine Learning Course 3

Cl Classif assificati ication: on: A tw A two-step step p proc rocess ess  Model construction  Each sample is assumed to belong to a predefined class, as determined by the class label  The set of samples used for model construction is called “training set”  The model is represented as classification rules, decision trees, probabilistic model, mathematical formulae and etc.  Model usage  for classifying future or unknown objects  Estimate accuracy of the model  The known label of test sample is compared with the classified result from the model  Accuracy rate is the percentage of test set samples that are correctly classified by the model  Test set is independent of training set, otherwise over-fitting will occur  If the accuracy is acceptable, use the model to classify data samples whose class labels are not known Sharif University of Technology, Computer Engineering Department, Machine Learning Course 4

Evaluati Evaluating ng classi classifi ficati cation met on methods hods  Performance  classifier performance: predicting class label  Accuracy, {true positive, true negative}, {false positive, false negative}, …  Time Complexity  time to construct the model (training time)  the model will be constructed once  can be large  time to use the model (classification time)  must be tolerable  need for good data structures  Robustness  handling noise and missing values  handling incorrect training data Sharif University of Technology, Computer Engineering Department, Machine Learning Course 5

Evaluati Evaluating ng classi classifi ficati cation met on methods hods  Scalability  efficiency in disk-resident databases  Interpretability  understanding and insight provided by the model  Other measures: goodness of rules or compactness of classification rules  rule of thumb: more compact, better generalization Sharif University of Technology, Computer Engineering Department, Machine Learning Course 6

Perfor Performance mance measures measures  Accuracy is not a good measure for classifier performance always (Why?)  Suppose a “cancer detection” problem  Presentation of Classifier Performance  Use a confusion matrix or a receiver-operating characteristic (ROC) curve Real P N Predicted P N  We can extract some performance measures from the above matrix (or curve) Sharif University of Technology, Computer Engineering Department, Machine Learning Course 7

Perfor Performance mance measures measures  ROC Example: ROC Space TP: 63 FP: 28 91  A: Acc: 0.68 FN: 37 TN: 72 109 100 100 200 TP: 77 FP: 77 154  B: Acc: 0.50 FN: 23 TN: 23 46 100 100 200 TP: 24 FP: 88 112  C: Acc: 0.18 FN: 76 TN: 12 88 100 100 200 TP: 76 FP: 12 88  C’: Acc: 0.82 FN: 24 TN: 88 112 100 100 200 Sharif University of Technology, Computer Engineering Department, Machine Learning Course 8

Perfor Performance mance measures measures  Performance Measures  Accuracy: (TP+TN) / (#data)  Specificity: TN / (FP+TN)  Sensitivity: TP / (FN+TP)  Index of Merit: (Specificity + Sensitivity) / 2 = (TP%+TN%) / 2  Also known as “percentage correct classifications”  Performance measured using test set results  Test set should be distinct and different from the train (learning) set.  Several methods are available to partition the data into separated training and testing sets, resulting in different estimates of the “true” index of merit Sharif University of Technology, Computer Engineering Department, Machine Learning Course 9

Dat Data a parti partiti tioning oning  Goal: validating the classifier and its parameters  Choose the best parameter set  Idea: use a part of training data as the validation set  Validation set must be a good representative for the whole data  How to partition the training data Sharif University of Technology, Computer Engineering Department, Machine Learning Course 10

Dat Data a parti partiti tioning oning methods methods  Holdout methods: Random Sampling  data is randomly partitioned into two independent sets  Always size of train set is twice of test set Training set Test set  Assumption: data is uniformly distributed All examples  The true error estimate is obtained as the average of the separate estimates E i  Holdout methods: Bootstrap  resample with replacement n sample of original data as training set.  Some numbers in the original sample may be included several times in the bootstrap sample (63.2% of samples are distinct) Sharif University of Technology, Computer Engineering Department, Machine Learning Course 11

Dat Data a parti partiti tioning oning methods methods  Holdout methods: Multiple train-and-test experiment Bootstrap Total of number examples Test set Experiment #1 Experiment #2 Experiment #3  Holdout methods Drawbacks  In problems where we have a sparse dataset we may not be able to afford the “luxury” of setting aside a portion of the dataset for testing.  Since it is a single train-and-test experiment, the holdout estimate of error rate will be misleading if we happen to get an “unfortunate” split. Sharif University of Technology, Computer Engineering Department, Machine Learning Course 12

Dat Data a parti partiti tioning oning methods methods  Cross-validation (k-fold, where k = 10 is most popular)  Randomly partition the data into k mutually exclusive subsets, each approximately equal size  At i th iteration, use D i as test set and others as training set  The mean of measures obtained in iterations used as output performance measure Experiment #1 Test set Experiment #2 Test set Experiment # i Test set … Experiment # k Test set Sharif University of Technology, Computer Engineering Department, Machine Learning Course 13

Dat Data a parti partiti tioning oning methods methods  Cross-validation (k-fold, where k = 10 is most popular)  Divide the total dataset into three subsets:  Training data is used for learning the parameters of the model.  Validation data is not used of learning but is used for deciding what type of model and what amount of regularization works best.  Test data is used to get a final, unbiased estimate of how well the network works. We expect this estimate to be worse than on the validation data.  As before, the true error is estimated as the average error rate: Sharif University of Technology, Computer Engineering Department, Machine Learning Course 14

Dat Data a parti partiti tioning oning methods methods  Leave-one-out  k folds where k = # of samples, for small sized data  As usual, the true error is estimated as the average error rate on test examples: Experiment #1 Experiment #2 Experiment # i … Test set Experiment # k Sharif University of Technology, Computer Engineering Department, Machine Learning Course 15

Dat Data a parti partiti tioning oning methods methods  Stratified cross-validation  folds are stratified so that class distributions in each fold is approximate the same as that in the initial data Sharif University of Technology, Computer Engineering Department, Machine Learning Course 16

How many fol ow many folds are ds are need needed? ed?  With a large number of folds  + The bias of the true error rate estimator will be small (the estimator will be very accurate)  - The variance of the true error rate estimator will be large  - The computational time will be very large as well (many experiments)  With small number of folds  + The number of experiments and, therefore, computation time are reduced  + The variance of the estimator will be small  - The bias of the estimator will be large( conservative or higher than the true error rate)  In practice, the choice of the number of folds depends on the size of the dataset  For large datasets, even 3-Fold Cross Validation will be quite accurate  For very sparse datasets, we may have to use leave-one-out in order to train on as many examples as possible Sharif University of Technology, Computer Engineering Department, Machine Learning Course 17

Machine Learning Classification: Introduction Hamid R. Rabiee - PowerPoint PPT Presentation

Machine Learning Classification: Introduction Hamid R. Rabiee Jafar Muhammadi, Nima Pourdamghani Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1/ Agenda Agenda Introduction Classification: A Two-Step Process Evaluating

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

2013 rockchalk 1 / 81 K.U. Introduction Data Outreg Plots Free Lunch Conclusions Guessing

Infotheory for Statistics and Learning Lecture 4 Binary hypothesis testing The

Description of the Detection Process Detektor: receives signals and decides on object existence

Pattern Recognition. Bayesian and non-Bayesian Tasks. Petr Po s k This lecture is based

Lecture 8 Hypothesis Testing I-Hsiang Wang Department of Electrical Engineering National Taiwan

Statistics in LHC Phenomenology Tilman Plehn MPI f ur Physik & University of Edinburgh

Recover Reco verin ing S Struct cture o e of N Nois isy D y Data thro throug ugh h

Joint work with Matthew Holden Andrs Almansa Kostas Zygalakis (Maxwell Institute) (CNRS,

Machine Learning Classification: Introduction Hamid R. Rabiee - PowerPoint PPT Presentation

Machine Learning Classification: Introduction Hamid R. Rabiee Jafar Muhammadi, Nima Pourdamghani Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1/ Agenda Agenda Introduction Classification: A Two-Step Process Evaluating

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

2013 rockchalk 1 / 81 K.U. Introduction Data Outreg Plots Free Lunch Conclusions Guessing

Infotheory for Statistics and Learning Lecture 4 Binary hypothesis testing The

Description of the Detection Process Detektor: receives signals and decides on object existence

Pattern Recognition. Bayesian and non-Bayesian Tasks. Petr Po s k This lecture is based

Lecture 8 Hypothesis Testing I-Hsiang Wang Department of Electrical Engineering National Taiwan

Statistics in LHC Phenomenology Tilman Plehn MPI f ur Physik &amp; University of Edinburgh

Recover Reco verin ing S Struct cture o e of N Nois isy D y Data thro throug ugh h

Joint work with Matthew Holden Andrs Almansa Kostas Zygalakis (Maxwell Institute) (CNRS,

Statistics in LHC Phenomenology Tilman Plehn MPI f ur Physik & University of Edinburgh