Multi-label classification using rule-based classifier systems Shabnam Nazmi (PhD candidate) Department of electrical and computer engineering North Carolina A&T state university Advisor: Dr. A. Homaifar
outline • Motivation • Introduction • Multi-label classification overview Shabnam Nazmi • Confidence level in prediction • Multi-label classification using learning classifier systems (LCSs) • Simulation results • Conclusion and future works 2
Motivation • Data-driven techniques are ubiquitous in many applications such as classification, estimation and modeling • In some classification applications, samples in the data set attribute to more than one class simultaneously Shabnam Nazmi • Multi-label classification methods that solve a single problem are in advantage • The level of confidence in assigned labels to the samples, is vital to train an accurate machine • When modeling a dynamical system, the overlap among adjacent sub-models can be handled using multi-label data with appropriate confidence levels 3
Introduction Multi-class classification Multi-label classification Shabnam Nazmi 4
Introduction Multi-class classification Multi-label classification • In contrast to simple binary-class classification, each instance Shabnam Nazmi of the data set belongs to one of (𝑁 > 2) different classes • The goal is to construct a function which, given a new data point, will correctly predict the class to which the new point belongs • One-vs-all: trains 𝑁 binary classifiers one for each class • One-vs-all: trains 𝑁(𝑁 − 1) classifiers to distinguish each pair of classes 5 • Decision tress, naïve Bayes, neural networks,…
Introduction Multi-class classification Multi-label classification • In contrast to conventional (single-label) classification, Shabnam Nazmi the setting of multi-label classification (MLC) allows an instance to belong to several classes simultaneously. • Multi-label classification tasks are ubiquitous in real-world problems • Text categorization: each document may belong to several predefined topics • Bioinformatics: one protein may have many effects on a cell when 6 predicting its functional classes
Definitions • Notation: 𝐸: 𝑛𝑣𝑚𝑢𝑗 − 𝑚𝑏𝑐𝑓𝑚 𝑒𝑏𝑢𝑏 𝑡𝑓𝑢 𝐼: 𝑌 → 𝑍 𝑗 , 𝑍 𝑗 𝜗 𝑍 𝑍 = 𝑧 1 , 𝑧 2 , … , 𝑧 𝑚 • Label cardinality of 𝐸 : the average number of labels of the Shabnam Nazmi examples in 𝐸 • Label density of 𝐸 : the average number of labels of the examples in 𝐸 divided by |𝑍| • Hamming loss: |𝐸| 𝐼𝑀 𝐼, 𝐸 = 1 |𝐸| |𝑍 𝑗 ∆𝑍 𝑗 | |𝑍| 𝑗=1 • Ranking loss: 7 |𝐸| 𝑆𝑀 𝑔 = 1 1 |𝐸| | |𝑆(𝑦)| 𝑍 |𝑍 𝑗=1
MLC methods • Problem transformation methods • Algorithm adaptation methods Shabnam Nazmi 8
MLC methods • Problem transformation methods • Select family: discards ML data or selects one of the multiple labels for each instance • It discard a lot of information content in the original dataset • Label power set method: considers each different set of labels, as Shabnam Nazmi a single label • It may lead to large number of classes with a few examples per class • Binary relevance: learns |𝑍| binary classifiers one for each different label • The most common problem transformation method • Ranking by pairwise comparison: generates |𝑍| 2 binary label data sets 9 • Outputs a ranking of labels based on the votes from binary classifiers
MLC methods • Problem transformation methods • Select family: discards ML data or selects one of the multiple labels for each instance • It discard a lot of information content in the original dataset • Label power set method : considers each different set of labels, as Shabnam Nazmi a single label • It may lead to large number of classes with a few examples per class • Random 𝑙 -labelsets: breaks the initial set of labels into small random, disjoint or overlapping, subsets • Improves label power set results, still is challenged with domains with large number of labels and instances 10
MLC methods • Algorithm adaptation methods • Decision trees: 𝐷4.5 was adapted to learn ML data • ML models that are understandable by human • Probabilistic methods: proposed for text classification, a generative model is trained according to which, each label Shabnam Nazmi generates different words • The ML document in generated by a mixture of the word distributions of its labels using EM • Neural networks: the back-propagation algorithm is adapted by introduction of a new error function similar to ranking loss • Lazy methods: 𝑙 -nearest neighbors algorithm is used to maximize the posterior probability of labels assigned to new instances • Outputs a ranking function for the probability of each label 11
MLC methods • Algorithm adaptation methods • Support vector machines: the one-versus-one strategy is used to partition a dataset with 𝑍 labels into 𝑍 2 double label subsets. • Assumes double label instances are located at marginal region Shabnam Nazmi between positive and negative instances • Associative classification methods: constructs classification rule sets using associative rule mining • MMAC learns an initial set of rules, removes the examples associated with this rule set, and recursively learns a new rule set from the remaining examples until no further frequent items are left. 12
Confidence in prediction • The AdaBoost algorithm has been extended to generate a confidence degree for the predictions of “weaker” hypotheses • Confidence scores give a reliability of each prediction • Classification methods like probabilistic approaches and Shabnam Nazmi logistic regression, output a value as a probability of a label to be true • The idea of confidence in prediction can be extended to one step prior to training 13
Confidence in prediction • The AdaBoost algorithm has been extended to generate a confidence degree for the predictions of “weaker” hypotheses • Confidence scores give a reliability of each prediction • Classification methods like probabilistic approaches and Shabnam Nazmi logistic regression, output a value as a probability of a label to be true • The idea of confidence in prediction can be extended to one step prior to training Encounter confidence levels in training data provided by the expert 14
Confidence in prediction • The AdaBoost algorithm has been extended to generate a confidence degree for the predictions of “weaker” hypotheses • Confidence scores give a reliability of each prediction • Classification methods like probabilistic approaches and Shabnam Nazmi logistic regression, output a value as a probability of a label to be true • The idea of confidence in prediction can be extended to one step prior to training Encounter confidence levels in training data provided by the expert 15 • The hypothesis will learn confidence levels and output a confidence degree along with its predicted labels for new instances
Notations • 𝑌 denotes the instance space and 𝑍 = {𝑧 1 , 𝑧 2 , … , 𝑧 𝑙 } is the finite set of class labels • Each instance 𝑦 ∈ 𝑌 is associated with a subset of labels 𝑧 ⊂ 𝑍 Shabnam Nazmi • 𝐸 is the set of data 𝐸 = { 𝑦 1 , 𝜇 1 , 𝐷 1 , 𝑦 2 , 𝜇 2 , 𝐷 2 , … 𝑦 𝑜 , 𝜇 𝑜 , 𝐷 𝑜 } • 𝜇 𝑗 is the binary relevance vector of labels for instance 𝑦 𝑗 𝜇 𝑗,𝑘 = {1: 𝑧 𝑘 ∈ 𝑧, 0: 𝑧 𝑘 ∉ 𝑧|∀𝑗 ∈ 1, 𝑜 , 𝑘 ∈ [1, 𝑙]} ) along • 𝐼: 𝑌 → (𝑍, 𝐷) , outputs a set of predicted labels (𝑍 with a vector of confidence level (𝑋) of the hypothesis in each of the labels 16
LCS structure • A strength based Michigan-style classifier system has been used to extract knowledge from ML data • Michigan-style classifier system are rule-based and supervised learning systems with a fixed rule length Shabnam Nazmi • Genetic algorithm acts as a driving force to help evolve useful rules • Classification model consists of a population of rule in the form of “IF condition - THEN action” • Originally structured for learning binary classification problems • Isolated structure of the action part of the classifiers, lets further modifications to adapt to more general cases of 17 classification problems, namely multi-class and multi-label
LCS structure Data set Training instance Model Covering [P] Shabnam Nazmi Genetic algorithm CR [M] Data set: a set of triples in the form of: (sample, label, confidence level) ] [ A Training instance: randomly drawn individual [A] from the data set Update rule 19 parameters
LCS structure Data set Training instance Model Covering [P] Shabnam Nazmi Genetic algorithm CR [M] [P]: population of rules/classifiers Classifier parameters: ] • [ A Condition [A] • Action Strength ( 𝑇 ) • Update rule 20 • Confidence estimate parameters 𝑋 = 𝑥 1 , 𝑥 2 , … , 𝑥 𝑙 Confidence error ( 𝜁 ) •
LCS structure Data set Training instance Model Covering [P] Shabnam Nazmi Genetic algorithm CR [M] Condition: • For binary-valued attributes ] composed of {0,1, #} [ A [A] • For real-valued attributes takes the form of an ordered list of pairs of Update rule 21 center and spread (𝑑 𝑗 , 𝑡 𝑗 ) parameters
Recommend
More recommend