Applying One-vs-One and One-vs-All Classifiers in k -Nearest - PowerPoint PPT Presentation

Oral Presentation at MIE 2011 30th August 2011 Oslo Applying One-vs-One and One-vs-All Classifiers in k -Nearest Neighbour Method and Support Vector Machines to an Otoneurological Multi-Class Problem Kirsi Varpa, Henry Joutsijoki, Kati Iltanen, Martti Juhola School of Information Sciences - Computer Science University of Tampere, Finland

MIE 2011 Oslo -- Kirsi Varpa -- 1 Introduction: From a Multi-Class Classifier to Several Two-Class Classifiers • We studied how splitting of a multi-class classification task into several binary classification tasks affected predictive accuracy of machine learning methods. • One classifier holding nine disease class patterns was separated into multiple two-class classifiers. • Multi-class classifier can be converted into • One-vs-One (OVO, 1-vs-1) or • One-vs-All the rest (OVA,1-vs-All) classifiers.

MIE 2011 Oslo -- Kirsi Varpa -- 2 From a Multi-Class Classifier to Several Two- Class Classifiers 1-2-3-4-5-6-7-8-9 OVA ¡ OVO ¡ ¡nr ¡of ¡classifiers ¡= ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡9 ¡= ¡nr ¡of ¡classes ¡ nr ¡of ¡classifiers ¡= ¡36 ¡= ¡ nr ¡of ¡classes ¡·√ ¡ ( nr ¡of ¡classes ¡− ¡1) ¡ ¡ ¡ ¡ ¡2 ¡

MIE 2011 Oslo -- Kirsi Varpa -- 3 One-vs-One (OVO) Classifier • The results of each classifier are put together, thus having 36 class proposals (votes) for the class of the test sample. • The final class for the test sample is chosen by the majority voting method, the max-wins rule: A class, which gains the most votes, is chosen as the final class. [2 3 3 4 5 6 7 8 1 2 5 6 7 8 9 1 5 3 7 5 6 1 2 4 8 5 1 7 3 4 1 8 9 1 2 1 ] à max votes to class 1 ( max-wins ) [2 3 3 4 5 6 7 8 1 2 5 6 7 8 9 1 5 3 7 5 6 1 2 4 8 5 6 7 3 4 1 8 9 1 2 9] à max votes to classes 1 and 5 à tie: SVM: 1-NN between tied classes 1 and 5, k- NN: nearest class (1 or 5) from classifiers 5-6, 1-3, 3-5, 1-4, 2-5, 5-8, 1-5, 5-9, 1-7 and 1-8.

MIE 2011 Oslo -- Kirsi Varpa -- 4 One-vs-All (OVA) Classifier • Each classifier is trained to separate one class from all the rest of the classes. • The class of the rest of the cases is marked to 0. • The test sample is input to each classifier and the final class for the test sample is assigned according to the winner-takes-all rule from a classifier voting a class. [ 0 0 0 0 5 0 0 0 0] à vote to a class 5 ( winner-takes-all ) [ 0 0 0 0 0 0 0 0 0] à tie: find 1-NN from all the classes [ 0 2 0 0 0 6 0 0 0] à votes to classes 2 and 6 à tie: SVM: 1-NN between tied classes 2 and 6, k- NN: nearest class (2 or 6) from classifiers 2-vs-All and 6-vs-All.

MIE 2011 Oslo -- Kirsi Varpa -- 5 Disease name ¡ N ¡ % ¡ Data Acoustic Neurinoma ¡ 131 ¡ 12.7 ¡ Benign Positional Vertigo ¡ 173 ¡ 16.8 ¡ • Classifiers were tested with Meniere's Disease ¡ 350 ¡ 34.0 ¡ Sudden Deafness ¡ 47 ¡ 4.6 ¡ an otoneurological data Traumatic Vertigo ¡ 73 ¡ 7.1 ¡ containing 1,030 vertigo Vestibular Neuritis ¡ 157 ¡ 15.2 ¡ cases from nine disease Benign Recurrent Vertigo ¡ 20 ¡ 1.9 ¡ classes. Vestibulopatia ¡ 55 ¡ 5.3 ¡ • The dataset consists of 94 Central Lesion ¡ 24 ¡ 2.3 ¡ attributes concerning a patient’s health status: • The data had about 11 % occurring symptoms, missing values, which medical history and clinical were imputed. findings.

MIE 2011 Oslo -- Kirsi Varpa -- 6 Methods • OVO and OVA classifiers were tested using 10-fold cross-validation 10 times with • k -Nearest Neighbour ( k- NN) method and • Support Vector Machines (SVM). • Basic 5 - NN method (using a classifier with all disease classes) was also run in order to have the baseline where to compare the effects of using multiple classifiers.

MIE 2011 Oslo -- Kirsi Varpa -- 7 k -Nearest Neighbour Method ( k- NN) • k- NN method is a widely used, basic instance-based learning method that searches for the k most similar cases of a test case from the training data. • In similarity calculation were used Heterogeneous Value Difference Metric (HVDM).

MIE 2011 Oslo -- Kirsi Varpa -- 8 Support Vector Machine (SVM) • The aim of SVM is to find a hyperplane that separates classes C 1 and C 2 and maximizes the margin, the distance between the hyperplane and the closest members of both classes. • The points, which are the closest to the separating hyperplane, are called Support Vectors. • Kernel functions were used with SVM because the data was linearly non-separable in the input space.

MIE 2011 Oslo -- Kirsi Varpa -- 9 OVO Classifiers ¡ OVA Classifiers ¡ Results SVM SVM SVM SVM linear RBF linear RBF Cases 5-NN 5-NN 5-NN % ¡ % ¡ % ¡ % ¡ Disease ¡ % ¡ % ¡ % ¡ 1,030 Acoustic Neurinoma ¡ 131 ¡ 89.5 ¡ 95.0 ¡ 91.6 ¡ 87.2 ¡ 90.2 ¡ 90.6 ¡ 90.7 ¡ Benign Positional Vertigo ¡ 173 ¡ 77.9 ¡ 79.0 ¡ 70.0 ¡ 67.0 ¡ 77.6 ¡ 73.5 ¡ 78.6 ¡ Meniere ’ s disease ¡ 350 ¡ 92.4 ¡ 93.1 ¡ 83.8 ¡ 90.1 ¡ 89.8 ¡ 87.8 ¡ 91.5 ¡ Sudden Deafness ¡ 47 ¡ 77.4 ¡ 94.3 ¡ 88.3 ¡ 79.4 ¡ 87.4 ¡ 61.3 ¡ 58.1 ¡ Traumatic vertigo ¡ 73 ¡ 89.6 ¡ 96.2 ¡ 99.9 ¡ 99.3 ¡ 77.7 ¡ 79.9 ¡ 96.7 ¡ Vestibular Neuritis ¡ 157 ¡ 87.7 ¡ 88.2 ¡ 82.4 ¡ 81.4 ¡ 85.0 ¡ 85.4 ¡ 84.3 ¡ Benign Recurrent Vertigo ¡ 20 ¡ 3.0 ¡ 4.0 ¡ 20.0 ¡ 16.5 ¡ 8.0 ¡ 21.0 ¡ 8.0 ¡ Vestibulopatia ¡ 55 ¡ 9.6 ¡ 14.0 ¡ 16.5 ¡ 22.8 ¡ 15.8 ¡ 15.3 ¡ 13.5 ¡ Central Lesion ¡ 24 ¡ 5.0 ¡ 2.1 ¡ 26.0 ¡ 28.5 ¡ 15.0 ¡ 19.0 ¡ 15.8 ¡ Median of True Positive Rate (%) ¡ 77.9 ¡ 88.2 ¡ 82.4 ¡ 79.4 ¡ 77.7 ¡ 73.5 ¡ 78.6 ¡ Total Classification accuracy (%) ¡ 79.8 ¡ 82.4 ¡ 77.4 ¡ 78.2 ¡ 78.8 ¡ 76.8 ¡ 79.4 ¡ Linear kernel with box constraint bc = 0.20 (OVO and OVA) ¡ Radial Basis Function (RBF) kernel with bc = 0.4 and scaling factor σ = 8.20 (OVO), bc = 1.4 and σ =10.0 (OVA) ¡

MIE 2011 Oslo -- Kirsi Varpa -- 10 Conclusions • The results show that in most of the disease classes the use of multiple binary classifiers improves the true positive rates of disease classes. • The results show that in most of the disease classes the use of multiple binary classifiers improves the true positive rates of disease classes . • with this data than 5-NN with OVA classifiers. Especially, 5-NN with OVO classifiers worked out better •

MIE 2011 Oslo -- Kirsi Varpa -- 11 Thank you for your attention! Questions? Kirsi.Varpa@cs.uta.fi More information about the subject: Questions? Kirsi.Varpa@cs.uta.fi • More information about the subject: Allwein EL, Schapire RE, Singer Y. Reducing multiclass to binary: a unifying approach for margin

Applying One-vs-One and One-vs-All Classifiers in k -Nearest - PowerPoint PPT Presentation

Oral Presentation at MIE 2011 30th August 2011 Oslo Applying One-vs-One and One-vs-All Classifiers in k -Nearest Neighbour Method and Support Vector Machines to an Otoneurological Multi-Class Problem Kirsi Varpa, Henry Joutsijoki, Kati Iltanen,

Nonlinear Classifiers II 2 Nonlinear Classifiers: Introduction Classifiers Supervised

Cognitive Modeling Unseen Examples 2 Bayes Classifiers Lecture 14: Naive Bayes Classifiers

Fusion of Continuous Output Classifiers Classifiers Jacob Hays Amit Pillay James DeFelice

Machine Learning Nave Bayes classifiers Types of classifiers We can divide the large

Occasion-level Classifiers or Event-level Classifiers? -Evidence from Child Language Acquisition

CS440/ECE448 Lecture 22: Including Slides by Svetlana Lazebnik, 10/2016 Linear Classifiers

First look at structures CS 6355: Structured Prediction 1 So far Binary classifiers

BAYES AND NEAREST NEIGHBOR BAYES AND NEAREST NEIGHBOR CLASSIFIERS CLASSIFIERS Matthieu R Bloch

Linear Classifiers and the Perceptron William Cohen February 4, 2008 1 Linear classifiers

Data Dependence in Data Dependence in Combining Classifiers Combining Classifiers Mohamed

On Robust Trimming of Bayesian Network Classifiers YooJung Choi and Guy Van den Broeck UCLA

Visualization for Explainable Classifiers Yao MING THE HONG KONG UNIVERSITY OF SCIENCE AND

MAXIMUM MARGIN CLASSIFIERS MAXIMUM MARGIN CLASSIFIERS Matthieu R Bloch Tuesday, February 11,

Linear, Binary SVM Classifiers COMPSCI 371D Machine Learning COMPSCI 371D Machine

PLUGIN CLASSIFIERS: NAIVE BAYES, LDA, PLUGIN CLASSIFIERS: NAIVE BAYES, LDA, LOGISTIC REGRESSION

Automatically Evading Classifiers A Case Study on PDF Malware Classifiers Weilin Xu

Lubosz Sarnecki <lubosz.sarnecki@collabora.com> https://twitter.com/spulaniraba Very brief

Scott Stanfield CEO Silverlight 2 Identity Background Web 2.0 The Web today is a dynamic,

The future of privacy University of Canberra February 2012 Mark D. Ryan EPSRC Leadership Fellow

Integrating Concurrency Control and Energy Management in Device Drivers Chenyang Lu Why worry

CS 4495 Computer Vision Camera Model Aaron Bobick School of Interactive Computing Camera Model

High performance reactive applications with Vert.x Tim Fox Red Hat @timfox Bio Employed By

Cameras, Light and Shading CS 543 / ECE 549 Saurabh Gupta Spring 2020, UIUC

In this video Medical contraindications Alternatives for those who are contraindicated

Applying One-vs-One and One-vs-All Classifiers in k -Nearest - PowerPoint PPT Presentation

Oral Presentation at MIE 2011 30th August 2011 Oslo Applying One-vs-One and One-vs-All Classifiers in k -Nearest Neighbour Method and Support Vector Machines to an Otoneurological Multi-Class Problem Kirsi Varpa, Henry Joutsijoki, Kati Iltanen,

Nonlinear Classifiers II 2 Nonlinear Classifiers: Introduction Classifiers Supervised

Cognitive Modeling Unseen Examples 2 Bayes Classifiers Lecture 14: Naive Bayes Classifiers

Fusion of Continuous Output Classifiers Classifiers Jacob Hays Amit Pillay James DeFelice

Machine Learning Nave Bayes classifiers Types of classifiers We can divide the large

Occasion-level Classifiers or Event-level Classifiers? -Evidence from Child Language Acquisition

CS440/ECE448 Lecture 22: Including Slides by Svetlana Lazebnik, 10/2016 Linear Classifiers

First look at structures CS 6355: Structured Prediction 1 So far Binary classifiers

BAYES AND NEAREST NEIGHBOR BAYES AND NEAREST NEIGHBOR CLASSIFIERS CLASSIFIERS Matthieu R Bloch

Linear Classifiers and the Perceptron William Cohen February 4, 2008 1 Linear classifiers

Data Dependence in Data Dependence in Combining Classifiers Combining Classifiers Mohamed

On Robust Trimming of Bayesian Network Classifiers YooJung Choi and Guy Van den Broeck UCLA

Visualization for Explainable Classifiers Yao MING THE HONG KONG UNIVERSITY OF SCIENCE AND

MAXIMUM MARGIN CLASSIFIERS MAXIMUM MARGIN CLASSIFIERS Matthieu R Bloch Tuesday, February 11,

Linear, Binary SVM Classifiers COMPSCI 371D Machine Learning COMPSCI 371D Machine

PLUGIN CLASSIFIERS: NAIVE BAYES, LDA, PLUGIN CLASSIFIERS: NAIVE BAYES, LDA, LOGISTIC REGRESSION

Automatically Evading Classifiers A Case Study on PDF Malware Classifiers Weilin Xu

Lubosz Sarnecki &lt;lubosz.sarnecki@collabora.com&gt; https://twitter.com/spulaniraba Very brief

Scott Stanfield CEO Silverlight 2 Identity Background Web 2.0 The Web today is a dynamic,

The future of privacy University of Canberra February 2012 Mark D. Ryan EPSRC Leadership Fellow

Integrating Concurrency Control and Energy Management in Device Drivers Chenyang Lu Why worry

CS 4495 Computer Vision Camera Model Aaron Bobick School of Interactive Computing Camera Model

High performance reactive applications with Vert.x Tim Fox Red Hat @timfox Bio Employed By

Cameras, Light and Shading CS 543 / ECE 549 Saurabh Gupta Spring 2020, UIUC

In this video Medical contraindications Alternatives for those who are contraindicated

Lubosz Sarnecki <lubosz.sarnecki@collabora.com> https://twitter.com/spulaniraba Very brief