an empirical study on lazy multilabel
play

An Empirical Study on Lazy Multilabel Classification Algorithms - PowerPoint PPT Presentation

Introduction Lazy Multilabel Algorithms Experimental Setup Experimental Results Conclusions An Empirical Study on Lazy Multilabel Classification Algorithms Eleftherios Spyromitros, Grigorios Tsoumakas and Ioannis Vlahavas Machine Learning


  1. Introduction Lazy Multilabel Algorithms Experimental Setup Experimental Results Conclusions An Empirical Study on Lazy Multilabel Classification Algorithms Eleftherios Spyromitros, Grigorios Tsoumakas and Ioannis Vlahavas Machine Learning & Knowledge Discovery Group Department of Informatics Aristotle University of Thessaloniki Greece Eleftherios Spyromitros, Griogorios Tsoumakas and Ioannis Vlahavas An Empirical Study on Lazy Multilabel Classification Algorithms

  2. Introduction Lazy Multilabel Algorithms Multilabel Classification Experimental Setup Multilabel Classification Methods Experimental Results Conclusions What is Multilabel Classification? • Single-label Classification  • Results are associated with a single label from a set of L disjoint labels L  • If , binary classification | | 2 L  • If , multi-class classification | | 2 • Multilabel Classification  • Results are associated with a set of labels Y L Eleftherios Spyromitros, Griogorios Tsoumakas and Ioannis Vlahavas An Empirical Study on Lazy Multilabel Classification Algorithms

  3. Introduction Lazy Multilabel Algorithms Multilabel Classification Experimental Setup Multilabel Classification Methods Experimental Results Conclusions Data With Multilabel Nature • Traditional • Text Classification • A web article concerning the Antikythera Mechanism Research Project can be categorized into both categorys { Science_Technology, History_Culture } • Medical Diagnosis • Multiple diseases for a patient { Obesity, Hypertension} • Modern • Gene Function Classification • A gene usually has multiple functions { Protein Synthesis, Cellular Biogenesis, Cellular Transport} • Classification of Music into Emotions • A song can make you feel { Sad_Lonely, Quiet_Still} • Semantic Scene Analysis • { Mountain, Trees, Lake } Eleftherios Spyromitros, Griogorios Tsoumakas and Ioannis Vlahavas An Empirical Study on Lazy Multilabel Classification Algorithms

  4. Introduction Lazy Multilabel Algorithms Multilabel Classification Experimental Setup Multilabel Classification Methods Experimental Results Conclusions Types of Multilabel Classification Methods • Problem transformation methods • They transform the learning problem into one (LP) or more (BR) single-label classification or label ranking problems • Algorithm independent • Algorithm adaptation methods • They extend specific algorithms to handle multi-label data • SVM, decision tree, neural network, lazy, Bayesian, boosting Eleftherios Spyromitros, Griogorios Tsoumakas and Ioannis Vlahavas An Empirical Study on Lazy Multilabel Classification Algorithms

  5. Introduction Lazy Multilabel Algorithms Multilabel Classification Experimental Setup Multilabel Classification Methods Experimental Results Conclusions The Binary Relevance (BR) Method • How it works     • Learns one binary classifier for each : { , } h X    different label L • The original dataset is transformed into datasets | | L D   • contains all examples of labeled as if they are D D     associated with and as otherwise • Criticism • Label correlations are not considered Eleftherios Spyromitros, Griogorios Tsoumakas and Ioannis Vlahavas An Empirical Study on Lazy Multilabel Classification Algorithms

  6. Introduction Lazy Multilabel Algorithms Multilabel Classification Experimental Setup Multilabel Classification Methods Experimental Results Conclusions The Label Powerset (LP) Method • How it works • Considers its different subset of as a single label L  • It learns one single-label classifier : ( ) h X P L • Criticism | | • Large number of label subsets ( ) 2 L • Most of these are associated with very few examples Eleftherios Spyromitros, Griogorios Tsoumakas and Ioannis Vlahavas An Empirical Study on Lazy Multilabel Classification Algorithms

  7. Introduction The BRkNN Algorithm Lazy Multilabel Algorithms The Problem of BRkNN Experimental Setup Extensions of BRkNN Experimental Results MLkNN and LPkNN Conclusions The BRkNN Algorithm • Origin • Equivalent to using the BR method in conjunction with the kNN algorithm • Refinement • times faster than BR + kNN in prediction | | L • Avoids the redundant calculations of k nearest neighbors in each one of the transformed datasets D  • A single k nearest neighbors search is followed by independent predictions for each label • Benefit • Applies better in domains with large number of labels and examples, requiring low response times Eleftherios Spyromitros, Griogorios Tsoumakas and Ioannis Vlahavas An Empirical Study on Lazy Multilabel Classification Algorithms

  8. Introduction The BRkNN Algorithm Lazy Multilabel Algorithms The Problem of BRkNN Experimental Setup Extensions of BRkNN Experimental Results MLkNN and LPkNN Conclusions How it works • Confidence scores • BrKNN is based on the calculation of confidence scores for   each label L • Confidence is obtained considering the percentage of the k c  nearest neighbors that include each label • A label is included in the label-set when the percentage is higher than or equal to 50% Eleftherios Spyromitros, Griogorios Tsoumakas and Ioannis Vlahavas An Empirical Study on Lazy Multilabel Classification Algorithms

  9. Introduction The BRkNN Algorithm Lazy Multilabel Algorithms The Problem of BRkNN Experimental Setup Extensions of BRkNN Experimental Results MLkNN and LPkNN Conclusions Independent Predictions… • The weakness 35% scene Percenage of instances, where the enpty set is output • The empty set is a possible overall output yeast 30% emotions • Arises when none of the labels has a confidence higher than 25% 50% • The reason 20% • Independent predictions for each label, a general 15% disadvantage of the BR method 10% • Is this common in BrkNN? 5% 0% 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 Nearest Neighbors Eleftherios Spyromitros, Griogorios Tsoumakas and Ioannis Vlahavas An Empirical Study on Lazy Multilabel Classification Algorithms

  10. Introduction The BRkNN Algorithm Lazy Multilabel Algorithms The Problem of BRkNN Experimental Setup Extensions of BRkNN Experimental Results MLkNN and LPkNN Conclusions The Proposed Extensions • Trying to dissolve the aforementioned problem • BRkNN-a • Checks if BRkNN outputs the empty set • In that case outputs the label with the highest confidence • BRkNN-b • 1 st step: Calculates the average size of the label sets of the k s   1 k nearest neighbors ( ) | | s Y  j 1 j k • 2 nd step: outputs the (nearest integer of s) labels with the [ ] s highest confidence Eleftherios Spyromitros, Griogorios Tsoumakas and Ioannis Vlahavas An Empirical Study on Lazy Multilabel Classification Algorithms

  11. Introduction The BRkNN Algorithm Lazy Multilabel Algorithms The Problem of BRkNN Experimental Setup Extensions of BRkNN Experimental Results MLkNN and LPkNN Conclusions The MLkNN and LPkNN Algorithms • Two more lazy multi-label classification methods • LPkNN • The pairing of LP problem transformation method with the kNN algorithm • A little discussed in the past • MLkNN • An adaptation of kNN for multi-label data • Main difference with BRkNN: prior and posterior probabilities estimated from the training set • Extended with an option for min-max normalization Eleftherios Spyromitros, Griogorios Tsoumakas and Ioannis Vlahavas An Empirical Study on Lazy Multilabel Classification Algorithms

  12. Introduction Lazy Multilabel Algorithms Evaluation Measures Experimental Setup Datasets Evaluation Methodology Experimental Results Conclusions Evaluation Measures • Example-based • Calculate the difference between the actual and predicted label sets for each example • Average the results over all examples of the test set • Label-based • Calculate a binary evaluation measure separately for each label • Micro/Macro averaging operations over all labels Eleftherios Spyromitros, Griogorios Tsoumakas and Ioannis Vlahavas An Empirical Study on Lazy Multilabel Classification Algorithms

  13. Introduction Lazy Multilabel Algorithms Evaluation Measures Experimental Setup Datasets Evaluation Methodology Experimental Results Conclusions Example Based Measures • Notation 2| | Y Z ( , ) • Let be a multi-label example, x Y  | | | | Z Y • Let be a multi-label classifier h  ( , ) • Let be the set of labels predicted by h for x Y ( ) Z h x • Hamming Loss | | Y Z • , where is the symmetric difference of two sets | | L • Classification Accuracy or Subset Accuracy  • 1, if Y Z  • 0, if Y Z • IR-inspired measures 2| | Y Z | | | | Y Z Y Z • Precision , Recall , F-measure  | | | | Z Y | | | | Z Y Eleftherios Spyromitros, Griogorios Tsoumakas and Ioannis Vlahavas An Empirical Study on Lazy Multilabel Classification Algorithms

Recommend


More recommend