sigir 10 siddharth gopal yiming yang introduction
play

SIGIR 10 Siddharth Gopal & Yiming Yang Introduction Motivation - PowerPoint PPT Presentation

SIGIR 10 Siddharth Gopal & Yiming Yang Introduction Motivation Proposed approach Ranking Thresholding Experiments 7/20/2010 2 Webpage/Image/ News Article Binary classification (e.g.) Ad vs Not-an-Ad


  1. SIGIR ’10 Siddharth Gopal & Yiming Yang

  2.  Introduction  Motivation  Proposed approach –  Ranking  Thresholding  Experiments 7/20/2010 2

  3. Webpage/Image/ News Article  Binary classification (e.g.)  Ad vs Not-an-Ad  Spam vs Genuine  Multiclass classification (e.g.)  Which country is it about ? Switzerland, France, Italy, United States, ..  Multilabel classification  What topics is it related to ? Politics , Terrorism, Health, Sports, .. 7/20/2010 3

  4.  Our goal Subset of categories    d : , , { 1,2,....., } F x y x R y m Webpage , Image , etc..  Given:   A set of training examples d { | } x x R i i  For each training instance, the set of  relevant categories { | { 1,2,3.... }} y y m i i 7/20/2010 4

  5.  Binary relevance learning  Split the problem into several independent binary classification problems - One vs Rest, Pairwise.  Instance based multilabel classifier  Standard ML-kNN ( Yang, SIGIR 1994 )  Bayesian style ML-kNN. ( Zhang and Zhou , Pattern Recognition 2007)  Logistic regression style – (IBLR-ML) using kNN features ( Cheng and Hüllermeier, Machine Learning 2009)  Model based method  Rank-SVM for MLC, A maximum margin method re-enforcing partial order constraints. (Elisseff and Weston, NIPS 2002) 7/20/2010 5

  6.  Rank-svm  Having a global optimization criteria: Not break- down into multiple independent binary problems  A large number of parameters ( mD )  Different from Rank-SVM for IR [ and other Learning to rank IR methods ]  Follows a two-step procedure (a) Rank categories for a given instance (b) Select an instance specific threshold.  Our approach – to leverage recent learning to rank methods in IR to solve (a). 7/20/2010 6

  7. The typical learning to rank framework    d   Corpus 10       d   d   1   3   d Query      2  d   Model  1    q d 3       ..   ..               .. .. Documents are represented using a combined feature representation between  query, and document (TF, Cosine-sim, BM25 , Okapi etc)   d    ( , ) q d 10     1 Corpus     d    ( , ) q d     3 d   2 1          d d Query ( , )   q d Model  1   2  3       q d   .. 3     ..       ..         ..     ..     .. 7/20/2010 7

  8.  Given a new instance, rank the categories ..   Cats     5     1         1 Doc   2 Model         d 3 2         ..     ..           m   ..  How do we define a Combined Feature representation ?     Cats ( ,1) vec d     5         1 ( ,2) vec d       1   Doc   2     Model   ( ,3) vec d   2     d 3         ..     .. ..          ( , )  vec d m         m .. 7/20/2010 8

  9.  Define feature representation of the pair ( instance, category ) as follows  ( , ) vec x c i [ ( ( ...., ( ] Dist x ,D ),Dist x ,D ), Dist x ,D ) 1 2 NN i c NN i c kNN i c  D Instances that belong to category 'c' c  Distance to category centroid also appended  Concatenated L1, L2 and cosine similarity distances 7/20/2010 9

  10.  Pictorially (using only L2 links)  Thicker lines denotes links to the centroid  Thinner lines denotes links to the category neighborhood 7/20/2010 10

  11.  In short,  Represent the relation between each instance and category using ( , ) vec x c i  Substantially reduced model parameters compared to Rank-SVM for MLC.  Allow to use any learning to rank algorithm for IR to rank the categories  In our experience, we used SVM-MAP as the learning to rank method. 7/20/2010 11

  12.  Introduction  Motivation  Proposed approach –  Ranking  Thresholding  Experiments 7/20/2010 12

  13. Supervised learning of instance-specific threshold (Elisseff and Weston, NIPS 2002) Ranklist of category scores   [ , 1 2 ,... ] 1) m x LETOR s s s  1... i n i i i i Threshold for a ranklist is the ( 1 , 2 ,... ], ) m s s s t [ 2) one that minimizes the sum of 1 1 1 1 FP and FN ( 1 , 2 ,... ], ) m s s s t [ 2 2 2 2 ::: 1 2 ( , ,... m ], ) s s s t [ n n n n  3) Learn : 1 , 2 ,... ] T m w w s s s t [ i  4) : [ 1 , 2 ,... ] T m Predict Threshold t w s s s test test test test 7/20/2010 13

  14.  Introduction  Motivation  Proposed approach –  Ranking  Thresholding  Experiments 7/20/2010 14

  15. Dataset #Training #Testing #Categories #Avg-label per #Features instance Emotions 391 202 6 1.87 72 Scene 1211 1196 6 1.07 294 Yeast 1500 917 14 4.24 103 Citeseer 5303 1326 17 1.26 14601 Reuters- 21578 7770 3019 90 1.23 18637 7/20/2010 15

  16.  SVM-MAP-MLC  Our proposed approach  ML-kNN ( Zhang and Zhou , Pattern Recognition 2007)  IBLR-ML ( Cheng and Hüllermeier, Machine Learning 2009)  Rank-SVM (Elisseff and Weston, NIPS 2002)  Standard One vs Rest SVM 7/20/2010 16

  17.  Average Precision  Standard metric in IR  For a ranklist, measures the precision at each relevant category and averages them.  RankingLoss  Measures the average number of inversions between the relevant and irrelevant categories in the ranklist  Micro-F1 & Macro-F1  F1 is the harmonic mean of precision and recall.  Micro-averaging gives equal importance to each document.  Macro-averaging gives equal importance to each category. 7/20/2010 17

  18. MAP performance 1 0.95 0.9 SVM-MAP-MLC 0.85 ML-kNN 1-Rankloss Rank-SVM performance 0.8 Binary-SVM 0.75 IBLR 1 0.98 0.7 0.96 0.94 SVM-MAP- 0.92 MLC 0.9 ML-kNN 0.88 Rank-SVM 0.86 0.84 Binary-SVM 0.82 IBLR 0.8 7/20/2010 18

  19. Micro-F1 performance 0.9 0.85 0.8 Macro-F1 0.75 SVM-MAP-MLC 0.7 performance 0.65 ML-kNN 0.6 Rank-SVM 0.8 0.55 Binary-SVM 0.5 0.7 IBLR 0.45 0.4 0.6 0.5 SVM-MAP- MLC 0.4 ML-kNN 0.3 Rank-SVM Binary-SVM 0.2 IBLR 7/20/2010 19

  20.  Meta-level features to represent the relationship between instances and categories  Merging learning to rank and multilabel classification using the Meta-level features.  Improve the state-of-the-art for multilabel classification 7/20/2010 20

  21.  Different kinds of meta-level features  Different Learning to rank methods  Optimize different metrics other than MAP. 7/20/2010 21

  22. THANKS ! 7/20/2010 22

  23.  A Typical scenario in text categorization Wall Street Market Bag of Classifie Crime Words . r .  Support vector machine, logistic regression or boosting learn ‘m’ weight vectors each of length | vocabulary |, a total of m*| vocabulary | parameters. Is this good or bad ? 7/20/2010 23

  24.  Words are fairly discriminative  Current methods build a predictor based on weighting different words  Disadvantages  Too many words  Does not allow us to have a firm control over how each instance is related to a particular category. 7/20/2010 24

  25. 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 ALL L2 0.2 L1 0.1 Cos 0 Emotions Yeast Scene Citeseer Reuters-21578  Effect of Different feature-sets 7/20/2010 25

  26. Rank-svm for IR Rank-svm for MLC 7/20/2010 26

  27. 1 0.9 0.8 0.7 0.6 SVM-MAP 0.5 MLKNN 0.4 RANKSVM-MLC 0.3 SVM 0.2 IBLR-ML 0.1 0 7/20/2010 27

  28. 1 0.9 0.8 0.7 0.6 SVM-MAP 0.5 MLKNN 0.4 RANKSVM-MLC 0.3 SVM 0.2 IBLR-ML 0.1 0 7/20/2010 28

Recommend


More recommend