Introduction Our Method Empirical evaluation Conclusions & Future Work Dealing with Concept Drift and Class Imbalance in Multi-label Stream Classification Eleftherios Spyromitros-Xioufis 1 , Myra Spiliopoulou 2 , Grigorios Tsoumakas 1 and Ioannis Vlahavas 1 1 Department of Informatics, Aristotle University of Thessaloniki, Greece 2 Faculty of Computer Science, OvG University of Magdeburg , Germany 1 Eleftherios Spyromitros – Xioufis | espyromi@csd.auth.gr | July 2011 Dealing with Concept Drift and Class Imbalance in Multi-label Stream Classification
Multi-label Classification Introduction Stream Classification Our Method Multi-label Stream Classification Empirical evaluation Concept Drift Conclusions & Future Work Class Imbalance Multi-label Classification • Classification of data which can be associated with multiple labels • Why more than one or labels? • Orthogonal labels • Thematic and confidentiality labels in the categorization of enterprise documents • Overlapping labels typical in news • An article about Fukushima could be annotated with {“nuclear crisis”, “Asia - Pacific news”, “energy”, environment} • Where can multi-label classification be useful? • Automated annotation of large object collections for information retrieval, tag suggestion, query categorization,.. 2 Eleftherios Spyromitros – Xioufis | espyromi@csd.auth.gr | July 2011 Dealing with Concept Drift and Class Imbalance in Multi-label Stream Classification
Multi-label Classification Introduction Stream Classification Our Method Multi-label Stream Classification Empirical evaluation Concept Drift Conclusions & Future Work Class Imbalance Stream Classification • Classification of instances with the properties of a data stream: • Time ordered • Arriving continuously and at a high speed • Concept drift : gradual or abrupt changes in the target variable over time • Data stream examples: • Sensor data, ATM transactions, e-mails • Desired properties of stream classification algorithms: • Handling infinite data with finite resources • Adaption to concept drift • Real-time prediction 3 Eleftherios Spyromitros – Xioufis | espyromi@csd.auth.gr | July 2011 Dealing with Concept Drift and Class Imbalance in Multi-label Stream Classification
Multi-label Classification Introduction Stream Classification Our Method Multi-label Stream Classification Empirical evaluation Concept Drift Conclusions & Future Work Class Imbalance Multi-label Stream Classification (MLSC) • The classification of streaming multi-label data • Multi-label streams are very common (rss feeds, incoming mail) • Batch multi-label methods • Do not have the desired characteristics of stream algorithms • Stream classification methods • Are designed for single-label data • Only a few recent methods for MLSC • Special MLSC challenges (explained next): • Multiple concept drift • Class imbalance 4 Eleftherios Spyromitros – Xioufis | espyromi@csd.auth.gr | July 2011 Dealing with Concept Drift and Class Imbalance in Multi-label Stream Classification
Multi-label Classification Introduction Stream Classification Our Method Multi-label Stream Classification Empirical evaluation Concept Drift Conclusions & Future Work Class Imbalance Concept Drift • Types of concept drift : • Change in the definition of the target variable , what is spam today may not be spam tomorrow • Virtual concept drift , change in prior probability distribution • In both cases -> the model needs to be revised • In multi-label streams • Multiple concepts (multiple concept drift) • Cannot assume that all concepts drift at the same rate • A mainstream drift adaptation strategy in single-label streams: • Moving window: a window that keeps only the most recently read examples 5 Eleftherios Spyromitros – Xioufis | espyromi@csd.auth.gr | July 2011 Dealing with Concept Drift and Class Imbalance in Multi-label Stream Classification
Multi-label Classification Introduction Stream Classification Our Method Multi-label Stream Classification Empirical evaluation Concept Drift Conclusions & Future Work Class Imbalance Class Imbalance in Multi-label Data • Multi-label data exhibit class imbalance: • Inter-class imbalance • Some labels are much more frequent • Inner-class imbalance • Strong imbalance between the numbers of positive and negative examples • Imbalance can be exacerbated by virtual concept drift: • A label may become extremely infrequent for some time • Consequences: • Very few positive training examples for some labels • Decision boundaries are pushed away from the positive class 6 Eleftherios Spyromitros – Xioufis | espyromi@csd.auth.gr | July 2011 Dealing with Concept Drift and Class Imbalance in Multi-label Stream Classification
Introduction Single Window vs. Multiple Windows Our Method Binary Relevance Empirical evaluation Incremental Thresholding Conclusions & Future Work Single Moving Window (SW) in MLSC 5 instance window Labels x n-10 x n-9 x n-8 x n-7 x n-6 x n-4 x n-3 x n-2 x n-1 x n Most recent λ 1 + + + instance λ 2 + + λ 3 + + + + + λ 4 + + + λ 5 + + + + Old concept New /current concept • Implication of having a common window: • Some labels may have only a few or even no positive examples inside the window (λ 2 , λ 4 ) – imbalanced learning situation • If we increase the window size: • Enough positive examples for all labels but risk of including old examples 7 • Not necessary for all labels. λ 1 , λ 3 , λ 5 already have enough positive examples Eleftherios Spyromitros – Xioufis | espyromi@csd.auth.gr | July 2011 Dealing with Concept Drift and Class Imbalance in Multi-label Stream Classification
Introduction Single Window vs. Multiple Windows Our Method Binary Relevance Empirical evaluation Incremental Thresholding Conclusions & Future Work Multiple Windows (MW) Approach for MLSC • Motivation: • More positive examples for training infrequent labels • We associate each label with two instance-windows: • One with positive and one with negative examples • The size of the positive window is fixed to a number n p which should be: • Large enough to allow learning an accurate model • Small enough to decrease the probability of drift inside the window • The size of the negative window n n is determined using the formula n n = n p /r where r has the role of balancing the distribution of positive and negative examples 8 Eleftherios Spyromitros – Xioufis | espyromi@csd.auth.gr | July 2011 Dealing with Concept Drift and Class Imbalance in Multi-label Stream Classification
Introduction Single Window vs. Multiple Windows Our Method Binary Relevance Empirical evaluation Incremental Thresholding Conclusions & Future Work Multiple Windows (MW) Approach for MLSC Stream n p n n p p n n n n n n n p n n p n n n Single window * * * * * * * * * * * Multiple Window * * * * * * * * * * * SW : window size = 10, r = 2/8 MW : n p = 4, n n = 6, r = 2/3 • Compared to an equally-sized single window we: • Over-sample the positive examples by adding the most recent ones • Under-sample the negative examples by retaining only the most recent ones • The high variance caused by insufficient positive examples in the SW approach is reduced • There is a possible increase in bias due to the introduction of old positive examples • Usually small because the negative examples will always be current 9 Eleftherios Spyromitros – Xioufis | espyromi@csd.auth.gr | July 2011 Dealing with Concept Drift and Class Imbalance in Multi-label Stream Classification
Introduction Single Window vs. Multiple Windows Our Method Binary Relevance Empirical evaluation Incremental Thresholding Conclusions & Future Work Essentially Binary Relevance • Our method follows the binary relevance (BR) paradigm • Transforms the multi-label classification problem into multiple binary classification problems • Disadvantage • Potential label correlations are ignored • Advantages • The independent modeling of BR allows handling the expected differences in frequencies and drift rates of the labels • Can be coupled with any binary classifier • It can be parallelized to achieve constant time complexity with respect to the number of labels 10 Eleftherios Spyromitros – Xioufis | espyromi@csd.auth.gr | July 2011 Dealing with Concept Drift and Class Imbalance in Multi-label Stream Classification
Introduction Single Window vs. Multiple Windows Our Method Binary Relevance Empirical evaluation Incremental Thresholding Conclusions & Future Work Incremental Thresholding • BR can typically output numerical scores for each label • Confidence scores are usually transformed into hard 0/1 classifications via an implicit 0.5 threshold • We use an incremental version of the PCut (proportional cut) thresholding method: • Every n instances (n is a parameter): • We calculate for each label a threshold that would most accurately approximate the observed frequency of that label in the last n instances • The calculated thresholds are used on the next batch of n instances 11 Eleftherios Spyromitros – Xioufis | espyromi@csd.auth.gr | July 2011 Dealing with Concept Drift and Class Imbalance in Multi-label Stream Classification
Recommend
More recommend