On-line Hierarchical Multi-label Text Classification Jesse Read - PowerPoint PPT Presentation

On-line Hierarchical Multi-label Text Classification Jesse Read Supervised by Bernhard (and Eibe and Geoff) On-line Hierarchical Multi-label Text Classification 1

Multi-label Classification Multi-class (“Single-label”) Classification e.g. Class set C = { Sports, Environment, Science, Politics } For a text document d , select a class c ∈ C Multi-label Classification e.g. Label set L = { Sports, Environment, Science, Politics } . For a text document d select a label subset S ⊆ L Doc. Labels ( S ⊆ L ) 1 { Sports,Politics } e.g.: 2 { Science,Politics } 3 { Sports } 4 { Environment,Science } ...how to do multi-label classification? On-line Hierarchical Multi-label Text Classification 2

Problem Transformation Methods (PT) Transforming a multi-label problem into a multi-class problem without losing information: 1. (LC) Label Combination Method 2. (BC) Binary Classifiers Method 3. (RT) Ranking Threshold Method Our toy multi-label problem: Label Set L = { Sports, Environment, Science, Politics } Doc. Labels ( S ⊆ L ) 1 { Sports,Politics } 2 { Science,Politics } 3 { Sports } 4 { Environment,Science } On-line Hierarchical Multi-label Text Classification 3

1. Label Combination Method (LC) Train Doc. Class 1 Sports+Politics 2 Science+Politics 3 Sports 4 Science+Environment Test Doc. Class X ? • May generate many classes for few documents • Possibly inflexible for time-ordered data On-line Hierarchical Multi-label Text Classification 4

2. Binary Classifiers Method (BC) Train B Sports B Environment B Science B P olitics Doc. Class Doc. Class Doc. Class Doc. Class 1 1 1 0 1 0 1 1 2 0 2 0 2 1 2 1 3 1 3 0 3 0 3 0 4 0 4 1 4 1 4 0 Doc. B Sports B Environment B Science B P olitics Test X ? ? ? ? • Slow, need | L | classifiers. • Assumes that all labels are independent On-line Hierarchical Multi-label Text Classification 5

3. Ranking Threshold Method (RT) Doc. Class 1 Sports 1 Politics 2 Science Train 2 Politics 3 Sports 4 Science 4 Environment Doc. Certainty Distribution Test X ( Y w , Y x , Y y , Y z ) = (?,?,?,?) • Difficulty in selecting a threshold • Assumes that all labels are independent On-line Hierarchical Multi-label Text Classification 6

Algorithm Adaption Methods We have seen the 3 main “Problem Transformation” methods. There are also Algorithm Adaption methods, for example: • Modifying the entropy of J48 • Multiple actions for Association Rules • AdaBoost.MH, AdaBoost.MR • Modifications to SMO, kNN, . . . Although most algorithm adaption methods just use a problem transformation method internally, e.g. Association Rules—LC, AdaBoost.MH—“AdaBoost Transformation”(AT), AdaBoost.MR—RT ...what about hierarchy? On-line Hierarchical Multi-label Text Classification 7

Hierarchical Classification Includes some method to recognise relationships between labels. For text data, we recognise a tree structured topic hierarchy, known as a taxonomy . There are two approaches to hierarchical classification: • Global Hierarchical (a.k.a. the “big bang” approach) • Local Hierarchical (a.k.a the “top down” approach) On-line Hierarchical Multi-label Text Classification 8

Global Hierarchical root Americas. Americas MidEast. MidEast. Sports. Sports. Sci/Tech US Canada Iraq Iran Soccer Rugby + Improvements in accuracy − Difficult to maintain; can get very computationally complex E.g. • Stacking (e.g. on BC) • EM (e.g. on LC) • Boosting (e.g. with AT) • Association Rules • Predictive Clustering Trees (multi-label tree learners) On-line Hierarchical Multi-label Text Classification 9

Local Hierarchical root Americas Mid.East Sci/Tech Sports US Canada Iraq Iran Soccer Rugby + Divides up the problem: easy to maintain; intuitive − Error propagation; accuracy similar to flat PT E.g. • Pachinko Machine, e.g. Fuzzy Relational Thesauri (FRT) • Probabilistic • Hybrid: ECOC, Error Recovery, Can return to higher nodes On-line Hierarchical Multi-label Text Classification 10

Multi-label Datasets Key | D | | L | UC ( D, L ) LC ( D, L ) Hier. Seq. Text YEAST 2,417 14 198 4.24 N N N MEDC 978 45 94 1.25 N N Y 20NG 19,300 20 55 1.03 Y Y Y ENRN 1,702 53 753 3.38 Y Y Y MARX 3,617 101 208 1.13 Y Y Y REUT 6,000 103 811 1.46 Y N Y | D | = Number of documents | L | = Number of possible labels UC ( D, L ) = | S | S ⊆ L, ∃ d ∈ D : L ( d ) = S | � | D | 1 LC ( D, L ) = for i = 1 · · · | D | , S i ⊆ L , where ( d i , S i ): i =1 | S i | | D | Hier. = Hierarchical structure defined within dataset Seq. = Time-ordered data Text = Text Dataset On-line Hierarchical Multi-label Text Classification 11

Multi-label Evaluation • Percentage of correctly classified instances? – Too harsh • Percentage of correctly classified labels? – Too easy Let C be a multi-label classifier, S i ⊆ L and Y i = C ( x i ) be label predictions by C for document x i : | D | 1 | S i ∩ Y i | � Accuracy ( C, D ) = (1) | D | | S i ∪ Y i | i =1 Hierarchical Evaluation: • Should we give partial credit? • If so, how? On-line Hierarchical Multi-label Text Classification 12

Algorithms Multi-class algorithms commonly used in prior multi-label work: Key Type Description NB Bayes Na¨ ıve Bayes BAG. Meta Bagging (with J48) SMO Function Support Vector Machines J48 Tree J48 IBk kNN k Nearest Neighbor NN Neural Neural Networks Pilot experiments showed that: • Default NN too slow • IBk does not perform well with sparse data On-line Hierarchical Multi-label Text Classification 13

Experiments — Tables Flat vs Global Hierarchical vs Local Hierarchical 1. Problem Transformation LC BC RT NB BAG SMO J48 NB BAG SMO J48 NB BAG SMO J48 MEDI 68.05 71.77* 71.10* 72.13* 55.82 75.58* 73.59* 65.83 67.81 64.20 65.72 60.22 20NG 57.47* 57.58* 57.35* 52.74 32.33 - 47.67 41.09 56.05* 47.19 54.61* 50.55 ENRN 32.72* 25.42 - 22.96 21.82 31.35* 30.56* 26.26 15.16 30.25* 24.09 27.82 MARX 48.15* 48.93* 43.26 44.79 32.6 31.69 38.64 33.95 48.44* 36.07 40.46 38.71 REUT 43.76 51.47 - 41.68 18.21 44.09 56.23* 43.83 37.13 45.9 58.65* 45.31 2. Global Hierarchical LC-EM BC-Stack(RT-NB) AT NB BAG SMO J48 NB BAG SMO J48 BAG J48 MEDI 67.45 74.71* 70.75 72.31 56.09 70.76 73.65* 65.85 67.06 67.82 20NG 57.48* 57.58* 57.45* 53.39 29.8 - 49.06 40.88 - - ENRN 34.6* 25.46 - 23.31 20.66 31.79 27.01 25.35 - - MARX 48.18 50.64* 43.29 44.82 39.09 32.08 38.87 34.25 - - REUT 43.77 51.49* - 41.69 19.78 43.83 57.32* 43.68 - - 3. Local Hierarchical LC BC RT NB BAG SMO J48 NB BAG SMO J48 NB BAG SMO J48 20NG 56.49 58.31* 58.83* 53.48 43.68 - 52.44 42.03 54.87 40.58 53.37 49.26 ENRN 25.96 29.38 27.73 25.23 15.3 34.99* - 26.26 4.67 25.51 23.59 27.63 MARX 48.49 54.57* 42.4 46.84 41.69 38.67 40.34 38.65 46.44 33.59 38.32 41.23 On-line Hierarchical Multi-label Text Classification 14

Experiments — 20NG — Accuracy 100 LH.LC-SMO LH.BC-SMO LH.RT-NB GH.LC_EM-NB GH.BC_STACK-SMO 80 GH.AT_J48 60 40 20 0 10 100 1000 10000 100000 % Labeled(Training) Examples On-line Hierarchical Multi-label Text Classification 15

Experiments — 20NG — Build Time 12000 LH.LC-SMO LH.BC-SMO LH.RT-NB GH.LC_EM-NB 10000 GH.BC_STACK-SMO GH.AT_J48 8000 6000 4000 2000 0 10 100 1000 10000 100000 % Labeled(Training) Examples On-line Hierarchical Multi-label Text Classification 16

Experiments — ENRN — Accuracy 100 LH.LC-SMO LH.BC-SMO LH.RT-NB GH.LC_EM-NB GH.BC_STACK-SMO 80 GH.AT_J48 60 40 20 0 10 100 1000 10000 % Labeled(Training) Examples On-line Hierarchical Multi-label Text Classification 17

Experiments — ENRN — Build Time 4500 LH.LC-SMO LH.BC-SMO 4000 LH.RT-NB GH.LC_EM-NB GH.BC_STACK-SMO 3500 GH.AT_J48 3000 2500 2000 1500 1000 500 0 10 100 1000 10000 % Labeled(Training) Examples On-line Hierarchical Multi-label Text Classification 18

Experiments — MARX — Accuracy 100 LH.LC-SMO LH.BC-SMO LH.RT-NB GH.LC_EM-NB GH.BC_STACK-SMO 80 GH.AT_J48 60 40 20 0 10 100 1000 10000 % Labeled(Training) Examples On-line Hierarchical Multi-label Text Classification 19

Experiments — MARX — Build Time 1400 LH.LC-SMO LH.BC-SMO LH.RT-NB 1200 GH.LC_EM-NB GH.BC_STACK-SMO GH.AT_J48 1000 800 600 400 200 0 10 100 1000 10000 % Labeled(Training) Examples On-line Hierarchical Multi-label Text Classification 20

Conclusions Problem Transformation methods: • No problem transformation method is best on all datasets • BC and RT might do better with a better selected | S | • Complexity determined by D , L , LC ( D, L ) and UC ( D, L ) Multi-class algorithms: • J48 not that great • BC doesn’t go well with Na¨ ıve Bayes, RT does, and LC works equally with either Hierarchical: • Global PT-extensions improve on flat • In practice there is overhead involved with building local hierarchical classifiers but also in theory more flexible On-line Hierarchical Multi-label Text Classification 21

On-line Hierarchical Multi-label Text Classification Jesse Read - PowerPoint PPT Presentation

On-line Hierarchical Multi-label Text Classification Jesse Read Supervised by Bernhard (and Eibe and Geoff) On-line Hierarchical Multi-label Text Classification 1 Multi-label Classification Multi-class (Single-label) Classification e.g.

On-line Hierarchical Multi-label Text Classification Jesse Read September 7, 2007 On-line

On-line Hierarchical Multi-label Classification last 6 months Jesse Read jesse.read@gmail.com

Blue Label Pilot-plant Reactor 1 Product Line-up Platinum Label Gold Label Blue Label Blue

AG! Blue Label Bench-top Reactor 1 Product line up Platinum Label Gold Label Blue Label Blue

Extreme Classification A New Paradigm for Ranking & Recommendation Manik Varma Microsoft

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

A Pruned Problem Transformation Method for Multi-label Classification Jesse Read

Work on Multi-label Classification Jesse Read Supervised by Bernhard Pfahringer

Learning Context-dependent Label Permutations for Multi-label Classification Jinseok Nam Amazon

The Slope of a Line The Slope of a Line The Slope of a Line The Slope of a Line The Slope of a

Title Slide Math 696 Class July 19, 2002 Line 1 Line 2 Line 3 Line 4 Line 5 Line 6 Line 7

On-line Multi-label Classification A Problem Transformation Approach Jesse Read Supervisors:

Text Classification and Sequence Labeling Graham Neubig Text Classification

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

Web Information Retrieval Lecture 14 Text classification Sec. 13.1 Text Classification

Factorization of the Label Conditional Distribution for Multi-Label Classification ECML PKDD 2015

PRESENTATION BOARD LAYOUT NEW YORK CITY COLLEGE OF TECHNOLOGY THE CITY UNIVERSITY OF NEW YORK

Hierarchy in Meritocracy: Community Building and Code Production in the ASF Oscar Castaeda

EAST SIDE! STRONG SIDE! 2018-2019 August 23, 2018 Trojan Pride Call: East Side Response:

Towards Visualization Scalability through Time Intervals and Hierarchical Organization of

Working with people who hoard How to talk to someone who is hoarding Use judgmental language.

Marci Van De Mark, LCSW-C Sharon Myers, LCSW-C Assistant Director Supervisor Adult Protective

This Full House Mission A & O: Support Services for Older Adults is a not- for-profit social

Disorder? Definition: When a person is unable to use the rooms in their home for their intended

On-line Hierarchical Multi-label Text Classification Jesse Read - PowerPoint PPT Presentation

On-line Hierarchical Multi-label Text Classification Jesse Read Supervised by Bernhard (and Eibe and Geoff) On-line Hierarchical Multi-label Text Classification 1 Multi-label Classification Multi-class (Single-label) Classification e.g.

On-line Hierarchical Multi-label Text Classification Jesse Read September 7, 2007 On-line

On-line Hierarchical Multi-label Classification last 6 months Jesse Read jesse.read@gmail.com

Blue Label Pilot-plant Reactor 1 Product Line-up Platinum Label Gold Label Blue Label Blue

AG! Blue Label Bench-top Reactor 1 Product line up Platinum Label Gold Label Blue Label Blue

Extreme Classification A New Paradigm for Ranking &amp; Recommendation Manik Varma Microsoft

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

A Pruned Problem Transformation Method for Multi-label Classification Jesse Read

Work on Multi-label Classification Jesse Read Supervised by Bernhard Pfahringer

Learning Context-dependent Label Permutations for Multi-label Classification Jinseok Nam Amazon

The Slope of a Line The Slope of a Line The Slope of a Line The Slope of a Line The Slope of a

Title Slide Math 696 Class July 19, 2002 Line 1 Line 2 Line 3 Line 4 Line 5 Line 6 Line 7

On-line Multi-label Classification A Problem Transformation Approach Jesse Read Supervisors:

Text Classification and Sequence Labeling Graham Neubig Text Classification

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

Web Information Retrieval Lecture 14 Text classification Sec. 13.1 Text Classification

Factorization of the Label Conditional Distribution for Multi-Label Classification ECML PKDD 2015

PRESENTATION BOARD LAYOUT NEW YORK CITY COLLEGE OF TECHNOLOGY THE CITY UNIVERSITY OF NEW YORK

Hierarchy in Meritocracy: Community Building and Code Production in the ASF Oscar Castaeda

EAST SIDE! STRONG SIDE! 2018-2019 August 23, 2018 Trojan Pride Call: East Side Response:

Towards Visualization Scalability through Time Intervals and Hierarchical Organization of

Working with people who hoard How to talk to someone who is hoarding Use judgmental language.

Marci Van De Mark, LCSW-C Sharon Myers, LCSW-C Assistant Director Supervisor Adult Protective

This Full House Mission A &amp; O: Support Services for Older Adults is a not- for-profit social

Disorder? Definition: When a person is unable to use the rooms in their home for their intended

Extreme Classification A New Paradigm for Ranking & Recommendation Manik Varma Microsoft

This Full House Mission A & O: Support Services for Older Adults is a not- for-profit social