classifier chains for multi label classification
play

Classifier Chains for Multi-label Classification Jesse Read, - PowerPoint PPT Presentation

Classifier Chains for Multi-label Classification Jesse Read, Bernhard Pfahringer, Geoff Holmes, Eibe Frank University of Waikato New Zealand ECML PKDD 2009, September 9, 2009. Bled, Slovenia J. Read, B. Pfahringer, G. Holmes, E. Frank (UoW)


  1. Classifier Chains for Multi-label Classification Jesse Read, Bernhard Pfahringer, Geoff Holmes, Eibe Frank University of Waikato New Zealand ECML PKDD 2009, September 9, 2009. Bled, Slovenia J. Read, B. Pfahringer, G. Holmes, E. Frank (UoW) Classifier Chains ECML PKDD 2009 1 / 10

  2. Introduction Multi-label Classification Each instance may be associated with multiple labels set of instances X = { x 1 , · · · , x m } ; set of predefined labels L = { l 1 , · · · , l n } ; dataset ( x 1 , S 1 ) , ( x 2 , S 2 ) , · · · where each S i ⊆ L . For example, a film can be labeled { romance , comedy } J. Read, B. Pfahringer, G. Holmes, E. Frank (UoW) Classifier Chains ECML PKDD 2009 2 / 10

  3. Introduction Multi-label Classification Each instance may be associated with multiple labels set of instances X = { x 1 , · · · , x m } ; set of predefined labels L = { l 1 , · · · , l n } ; dataset ( x 1 , S 1 ) , ( x 2 , S 2 ) , · · · where each S i ⊆ L . For example, a film can be labeled { romance , comedy } Applications Scene, Video classification Text classification Medical classification Biology, Genomics J. Read, B. Pfahringer, G. Holmes, E. Frank (UoW) Classifier Chains ECML PKDD 2009 2 / 10

  4. Introduction Multi-label Classification Each instance may be associated with multiple labels set of instances X = { x 1 , · · · , x m } ; set of predefined labels L = { l 1 , · · · , l n } ; dataset ( x 1 , S 1 ) , ( x 2 , S 2 ) , · · · where each S i ⊆ L . For example, a film can be labeled { romance , comedy } Applications Scene, Video classification Text classification Medical classification Biology, Genomics Multi-label Issues label correlations: consider { romance , comedy } vs { romance , horror } computational complexity J. Read, B. Pfahringer, G. Holmes, E. Frank (UoW) Classifier Chains ECML PKDD 2009 2 / 10

  5. Prior Work Binary relevance method ( BR ): binary problem for each label simple, efficient does not take into account label correlations J. Read, B. Pfahringer, G. Holmes, E. Frank (UoW) Classifier Chains ECML PKDD 2009 3 / 10

  6. Prior Work Binary relevance method ( BR ): binary problem for each label simple, efficient does not take into account label correlations Nearest neighbor approaches based on BR , e.g. MLkNN Stacking approaches, e.g. meta level stacking ( MS ) Pairwise approaches, e.g. calibrated label ranking J. Read, B. Pfahringer, G. Holmes, E. Frank (UoW) Classifier Chains ECML PKDD 2009 3 / 10

  7. Prior Work Binary relevance method ( BR ): binary problem for each label simple, efficient does not take into account label correlations Nearest neighbor approaches based on BR , e.g. MLkNN Stacking approaches, e.g. meta level stacking ( MS ) Pairwise approaches, e.g. calibrated label ranking Label powerset method: label sets are treated as single labels takes into account label correlations computationally complex J. Read, B. Pfahringer, G. Holmes, E. Frank (UoW) Classifier Chains ECML PKDD 2009 3 / 10

  8. Prior Work Binary relevance method ( BR ): binary problem for each label simple, efficient does not take into account label correlations Nearest neighbor approaches based on BR , e.g. MLkNN Stacking approaches, e.g. meta level stacking ( MS ) Pairwise approaches, e.g. calibrated label ranking Label powerset method: label sets are treated as single labels takes into account label correlations computationally complex RAKEL : ensembles of subsets EPS : ensembles of pruned sets J. Read, B. Pfahringer, G. Holmes, E. Frank (UoW) Classifier Chains ECML PKDD 2009 3 / 10

  9. Prior Work Binary relevance method ( BR ): binary problem for each label simple, efficient does not take into account label correlations Nearest neighbor approaches based on BR , e.g. MLkNN Stacking approaches, e.g. meta level stacking ( MS ) Pairwise approaches, e.g. calibrated label ranking Label powerset method: label sets are treated as single labels takes into account label correlations computationally complex RAKEL : ensembles of subsets EPS : ensembles of pruned sets Many other methods take into account label correlations complex, prone to overfitting J. Read, B. Pfahringer, G. Holmes, E. Frank (UoW) Classifier Chains ECML PKDD 2009 3 / 10

  10. Binary Relevance ( BR ) L = { romance , horror , comedy , drama , action , western } ( | L | = 6) Classifiers Classifications C 1 : x → { romance , !romance } romance C 2 : x → { horror , !horror } !horror C 3 : x → { comedy , !comedy } comedy C 4 : x → { drama , !drama } !drama C 5 : x → { action , !action } !action C 6 : x → { western , !western } !western Y ⊆ L { romance , comedy } J. Read, B. Pfahringer, G. Holmes, E. Frank (UoW) Classifier Chains ECML PKDD 2009 4 / 10

  11. Binary Relevance ( BR ) L = { romance , horror , comedy , drama , action , western } ( | L | = 6) Classifiers Classifications C 1 : x → { romance , !romance } romance C 2 : x → { horror , !horror } !horror C 3 : x → { comedy , !comedy } comedy C 4 : x → { drama , !drama } !drama C 5 : x → { action , !action } !action C 6 : x → { western , !western } !western Y ⊆ L { romance , comedy } simple, intuitive efficient useful for incremental contexts doesn’t account for label correlations J. Read, B. Pfahringer, G. Holmes, E. Frank (UoW) Classifier Chains ECML PKDD 2009 4 / 10

  12. Classifier Chains ( CC ) L = { romance , horror , comedy , drama , action , western } ( | L | = 6) Classifiers Classifications C 1 : x → { romance , !romance } romance C 2 : x ∪ romance → { horror , !horror } !horror C 3 : x ∪ romance ∪ !horror → { comedy , !comedy } comedy C 4 : x ∪ romance ∪ !horror ∪ comedy → { drama , !drama } !drama C 5 : x ∪ romance ∪ !horror ∪ comedy ∪ !drama → · · · !action C 6 : x ∪ romance ∪ !horror ∪ comedy ∪ !drama ∪ · · · !western Y ⊆ L = { romance , comedy } J. Read, B. Pfahringer, G. Holmes, E. Frank (UoW) Classifier Chains ECML PKDD 2009 5 / 10

  13. Classifier Chains ( CC ) L = { romance , horror , comedy , drama , action , western } ( | L | = 6) Classifiers Classifications C 1 : x → { romance , !romance } romance C 2 : x ∪ romance → { horror , !horror } !horror C 3 : x ∪ romance ∪ !horror → { comedy , !comedy } comedy C 4 : x ∪ romance ∪ !horror ∪ comedy → { drama , !drama } !drama C 5 : x ∪ romance ∪ !horror ∪ comedy ∪ !drama → · · · !action C 6 : x ∪ romance ∪ !horror ∪ comedy ∪ !drama ∪ · · · !western Y ⊆ L = { romance , comedy } similar advantages to binary relevance method time complexity similar in practice takes into account label correlations how to order the chain? J. Read, B. Pfahringer, G. Holmes, E. Frank (UoW) Classifier Chains ECML PKDD 2009 5 / 10

  14. Ensembles of Classifier Chains ( ECC ) Ensembles known for augmenting accuracy more label correlations learnt, without overfitting solves ‘chain order’ issue: each chain has a random order J. Read, B. Pfahringer, G. Holmes, E. Frank (UoW) Classifier Chains ECML PKDD 2009 6 / 10

  15. Ensembles of Classifier Chains ( ECC ) Ensembles known for augmenting accuracy more label correlations learnt, without overfitting solves ‘chain order’ issue: each chain has a random order For i ∈ 1 · · · m iterations: L ′ ← shuffle label set L D ′ ← subset of training set D train a model CC i given L ′ and D ′ Generic vote/score/threshold method for classification: collect votes from models assign a score to each label apply a threshold to determine relevant labels J. Read, B. Pfahringer, G. Holmes, E. Frank (UoW) Classifier Chains ECML PKDD 2009 6 / 10

  16. Ensembles of Classifier Chains ( ECC ) Ensembles known for augmenting accuracy more label correlations learnt, without overfitting solves ‘chain order’ issue: each chain has a random order For i ∈ 1 · · · m iterations: L ′ ← shuffle label set L D ′ ← subset of training set D train a model CC i given L ′ and D ′ Generic vote/score/threshold method for classification: collect votes from models assign a score to each label apply a threshold to determine relevant labels Can also be applied to binary relevance method , i.e. EBR J. Read, B. Pfahringer, G. Holmes, E. Frank (UoW) Classifier Chains ECML PKDD 2009 6 / 10

  17. Experiments WEKA -based framework Support Vector Machines as base classifiers Multi-label datasets: Labels | L | Instances | D | 6 Standard 6 · · · 103 2407 · · · 6000 6 Large 22 · · · 983 7395 · · · 95424 Multi-label evaluation metrics: accuracy, macro F-measure (label set evaluation) log loss, AU ( PRC ) (per-label evaluation) build times, test times Method parameters preset to optimise predictive performance ( ECC requires no additional parameters) Experiments: Compare Classifier Chains ( CC ) to the Binary Relevance method ( BR ) 1 and related BR -based methods. Compare ECC to EBR and modern methods of proven success: RAKEL , 2 EPS , and MLkNN J. Read, B. Pfahringer, G. Holmes, E. Frank (UoW) Classifier Chains ECML PKDD 2009 7 / 10

  18. Results 1 Comparing CC to BR and related methods SM 1 and MS 2 . Table: Standard Datasets: Wins for each evaluation measure. CC BR SM MS Accuracy 0 1 0 5 Macro F1 5 0 1 0 Micro F1 3 1 0 2 Exact Match 6 0 0 0 Total wins 19 1 2 2 CC ’s chaining technique justified over default BR CC outperforms other similar methods 1 Subset Mapping: maps output of BR to nearest (Hamming distance) known subset 2 Meta Stacking: stacking the output of BR with meta classifiers J. Read, B. Pfahringer, G. Holmes, E. Frank (UoW) Classifier Chains ECML PKDD 2009 8 / 10

Recommend


More recommend