Work on Multi-label Classification Jesse Read Supervised by Bernhard Pfahringer jmr30@cs.waikato.ac.nz Machine Learning Group University of Waikato Hamilton New Zealand Work on Multi-label Classification – p. 1/1
Outline Multi-label Classification Multi-label Applications Problem Transformation Binary Method Combination Method PS: Pruned Sets Method Results I Results II On-line Applications Experiments II Summary Work on Multi-label Classification – p. 2/1
Multi-label Classification Single-label (Multi-class) Classification: Set of instances D . Set of labels (classes) L . For each d ∈ D , select a label (class) l ∈ L Single-label representation: ( d, l ) Multi-label Classification: Set of instances D . Set of labels L . For each d ∈ D , select a label subset S ⊆ L Multi-label representation: ( d, S ) Work on Multi-label Classification – p. 3/1
Applications Any applications where data can be classified with more than one category/subject/label/tag: News articles, encyclopedia articles, . . . Web pages (bookmarks, web directories) Academic papers Emails, Newsgroups, Internet forum posts, RSS, . . . Medical text classification Images, video, music, . . . Biological applications (genes, . . . ) Work on Multi-label Classification – p. 4/1
Problem Transformation 1. Problem Transformation Transform multi-label data into a single-label representation Use one or more single-label classifiers Transform the classifications into multi-label representations e.g.: Binary Method, Combination Method, Ranking Method (next slides. . . ) 2. Algorithm Transformation Transform a singe-label algorithm so it can make multi-label classifications Uses some form of problem transformation internally e.g.: Modifications to AdaBoost, SVM, Naive Bayes, Decision Trees. . . Work on Multi-label Classification – p. 5/1
PT. Binary Method One binary (single-label) classifier for each label. A label is relevant, or ¬ relevant (1/0). L = { A, B, C, D } D S ⊆ L { A, D } d 0 { C, D } d 1 d 2 { A } d 3 { B, C } Work on Multi-label Classification – p. 6/1
PT. Binary Method One binary (single-label) classifier for each label. A label is relevant, or ¬ relevant (1/0). L = { A, B, C, D } L A = { A, ! A } = { B, ! B } = { C, ! C } = { D, ! D } D S ⊆ L D A l ∈ L A D B D C D D { A, D } d 0 . . . . . . . . . d 0 A { C, D } d 1 d 1 ¬ A . . . . . . . . . d 2 { A } d 2 A . . . . . . . . . d 3 { B, C } ¬ A d 3 . . . . . . . . . Work on Multi-label Classification – p. 6/1
PT. Binary Method One binary (single-label) classifier for each label. A label is relevant, or ¬ relevant (1/0). L = { A, B, C, D } L A = { A, ! A } = { B, ! B } = { C, ! C } = { D, ! D } D S ⊆ L D A l ∈ L A D B D C D D { A, D } d 0 . . . . . . . . . d 0 A { C, D } d 1 d 1 ¬ A . . . . . . . . . d 2 { A } d 2 A . . . . . . . . . d 3 { B, C } ¬ A d 3 . . . . . . . . . Work on Multi-label Classification – p. 6/1
PT. Binary Method One binary (single-label) classifier for each label. A label is relevant, or ¬ relevant (1/0). L = { A, B, C, D } L A = { A, ! A } = { B, ! B } = { C, ! C } = { D, ! D } D S ⊆ L D A l ∈ L A D B D C D D { A, D } d 0 d 0 A . . . . . . . . . { C, D } d 1 ¬ A . . . . . . . . . d 1 d 2 { A } . . . . . . . . . d 2 A d 3 { B, C } d 3 ¬ A . . . . . . . . . { C, D } d t Work on Multi-label Classification – p. 6/1
PT. Binary Method One binary (single-label) classifier for each label. A label is relevant, or ¬ relevant (1/0). L = { A, B, C, D } L A = { A, ! A } = { B, ! B } = { C, ! C } = { D, ! D } D S ⊆ L D A l ∈ L A D B D C D D { A, D } d 0 d 0 A . . . . . . . . . { C, D } d 1 ¬ A . . . . . . . . . d 1 d 2 { A } . . . . . . . . . d 2 A d 3 { B, C } d 3 ¬ A . . . . . . . . . { C, D } d t Assumes that all labels are independent Can be unbalanced by many negative examples Work on Multi-label Classification – p. 6/1
PT. Combination Method One decision involves multiple labels. Each label subset becomes one atomic label. L ′ = { AD, CD, A, BC } e.g. L = { A, B, C, D } S ⊆ L l ∈ L ′ D D d 0 { A, D } d 0 AD d 1 { C, D } d 1 CD { A } d 2 d 2 A { B, C } d 3 d 3 BC Work on Multi-label Classification – p. 7/1
PT. Combination Method One decision involves multiple labels. Each label subset becomes one atomic label. L ′ = { AD, CD, A, BC } e.g. L = { A, B, C, D } S ⊆ L l ∈ L ′ D D d 0 { A, D } d 0 AD d 1 { C, D } d 1 CD { A } d 2 d 2 A { B, C } d 3 d 3 BC Work on Multi-label Classification – p. 7/1
PT. Combination Method One decision involves multiple labels. Each label subset becomes one atomic label. L ′ = { AD, CD, A, BC } e.g. L = { A, B, C, D } S ⊆ L l ∈ L ′ D D d 0 { A, D } d 0 AD d 1 { C, D } d 1 CD { A } d 2 d 2 A { B, C } d 3 d 3 BC Work on Multi-label Classification – p. 7/1
PT. Combination Method One decision involves multiple labels. Each label subset becomes one atomic label. L ′ = { AD, CD, A, BC } e.g. L = { A, B, C, D } S ⊆ L l ∈ L ′ D D d 0 { A, D } d 0 AD d 1 { C, D } d 1 CD { A } d 2 d 2 A { B, C } d 3 d 3 BC May generate many single-labels (classes) from few examples Can only predict combinations seen in the training set Work on Multi-label Classification – p. 7/1
Ensembles of Pruned Sets (E.PS) Reduces the number of combinations to only the core label combinations. e.g. 10 examples, 6 combinations: Doc. Labels ( S ⊆ L ) {Sports,Science} d 1 d 2 {Environment,Science,Politics} {Sports} d 3 d 4 {Environment,Science} d 5 {Science} {Sports} d 6 d 7 {Environment,Science} {Politics} d 8 {Politics} d 9 d 10 {Science} Work on Multi-label Classification – p. 8/1
Ensembles of Pruned Sets (E.PS) Reduces the number of combinations to only the core label combinations. e.g. 10 examples, 6 combinations: Doc. Labels ( S ⊆ L ) {Sports,Science} d 1 d 2 {Environment,Science,Politics} {Sports} d 3 d 4 {Environment,Science} d 5 {Science} {Sports} d 6 d 7 {Environment,Science} {Politics} d 8 {Politics} d 9 d 10 {Science} Work on Multi-label Classification – p. 8/1
Ensembles of Pruned Sets (E.PS) Reduces the number of combinations to only the core label combinations. e.g. 10 examples, 6 combinations: Doc. Labels ( S ⊆ L ) {Sports} d 3 d 4 {Environment,Science} Doc. Labels ( S ⊆ L ) {Science} d 5 d 1 {Sports,Science} d 6 {Sports} d 2 {Environment,Science,Politics} d 7 {Environment,Science} {Politics} d 8 d 9 {Politics} {Science} d 10 Work on Multi-label Classification – p. 8/1
Ensembles of Pruned Sets (E.PS) Reduces the number of combinations to only the core label combinations. e.g. 10 examples, 6 combinations: Doc. Labels ( S ⊆ L ) {Sports} d 3 d 4 {Environment,Science} Doc. Labels ( S ⊆ L ) {Science} d 5 d 1 {Sports,Science} d 6 {Sports} d 2 {Environment,Science,Politics} d 7 {Environment,Science} {Politics} d 8 d 9 {Politics} {Science} d 10 Lost 20% of data. Can we save any of that data? Work on Multi-label Classification – p. 8/1
Ensembles of Pruned Sets (E.PS) Reduces the number of combinations to only the core label combinations. e.g. 10 examples, 6 combinations: Doc. Labels ( S ⊆ L ) {Sports} d 3 d 4 {Environment,Science} Doc. Labels ( S ⊆ L ) {Science} d 5 d 1 {Sports,Science} d 6 {Sports} d 2 {Environment,Science,Politics} d 7 {Environment,Science} {Politics} d 8 d 9 {Politics} {Science} d 10 Lost 20% of data. Can we save any of that data? Yes. By splitting up S into more frequent sub sub sets Work on Multi-label Classification – p. 8/1
Ensembles of Pruned Sets (E.PS) Reduces the number of combinations to only the core label combinations. e.g. 10 examples, 6 combinations: Doc. Labels ( S ⊆ L ) Doc. Labels ( S ⊆ L ) {Sports} d 3 d 4 {Environment,Science} d 1 {Sports,Science} {Science} {Sports} d 5 d 1 d 6 {Sports} d 1 {Science} d 7 {Environment,Science} d 2 {Environment,Science,Politics} {Politics} {Environment,Science} d 8 d 2 d 9 {Politics} d 2 {Politics} {Science} d 10 Lost 20% of data. Can we save any of that data? Yes. By splitting up S into more frequent sub sub sets Work on Multi-label Classification – p. 8/1
Ensembles of Pruned Sets (E.PS) Reduces the number of combinations to only the core label combinations. e.g. 10 examples, 6 combinations: Doc. Labels ( S ⊆ L ) {Sports} d 3 Doc. Labels ( S ⊆ L ) d 4 {Environment,Science} {Science} {Sports} d 5 d 1 d 6 {Sports} d 1 {Science} d 7 {Environment,Science} d 2 {Environment,Science} {Politics} {Politics} d 8 d 2 d 9 {Politics} {Science} d 10 Lost 20% of data. Can we save any of that data? Yes. By splitting up S into more frequent sub sub sets Work on Multi-label Classification – p. 8/1
Ensembles of Pruned Sets (E.PS) Reduces the number of combinations to only the core label combinations. e.g. 10 examples, 6 combinations: Doc. Labels ( S ⊆ L ) {Sports} d 1 d 1 {Science} {Environment,Science} d 2 d 2 {Politics} d 3 {Sports} {Environment,Science} d 4 d 5 {Science} {Sports} d 6 {Environment,Science} d 7 d 8 {Politics} {Politics} d 9 d 10 {Science} Work on Multi-label Classification – p. 8/1
Recommend
More recommend