large margin taxonomy embedding with an application to
play

Large Margin Taxonomy Embedding with an Application to Document - PowerPoint PPT Presentation

Large Margin Taxonomy Embedding with an Application to Document Categorization K. Weinberger and O. Chapelle NIPS 2008 presented by J. Silva, Duke University Large Margin Taxonomy Embedding with an Application to Document Categorization May


  1. Large Margin Taxonomy Embedding with an Application to Document Categorization K. Weinberger and O. Chapelle NIPS 2008 presented by J. Silva, Duke University Large Margin Taxonomy Embedding with an Application to Document Categorization May 13, 2011 1 / 16

  2. Problem Multi-class classification ◮ Application: document categorization The classes (topics) follow a taxonomy ( e.g. , a hierarchy) Misclassification errors are not all the same. Examples: ◮ It is worse to misclassify a male pedestrian as a traffic light than as a female pedestrian ◮ ...or to misclassify a medical journal on heart attack as a publication on athlete’s foor than on coronary disease The proposed approach is cost-sensitive , and aims to move beyond hierarchical representations to discover a continuous latent semantic space Large Margin Taxonomy Embedding with an Application to Document Categorization May 13, 2011 2 / 16

  3. Multi-class classification of documents based on a taxonomy of topics x i is the i -th document � � p α is the protype for the α -th class P , W are mappings to latent semantic space Note: All figures adapted from the original paper Large Margin Taxonomy Embedding with an Application to Document Categorization May 13, 2011 3 / 16

  4. Contributions A supervised regression algorithm called taxem (taxonomy embedding) is presented The algorithm learns the regression for the documents and the placement of the topic prototypes in a single optimization The regression is done by solving a convex semi-definite programming (SDP) problem In this case, the SDP admits a particular form that can be solved efficiently for large datasets Large Margin Taxonomy Embedding with an Application to Document Categorization May 13, 2011 4 / 16

  5. Outline of the presentation Notation Two-step method ◮ Topic embedding ◮ Document regression One-step combined large margin optimization Results on the OHSUMED medical journal database Discussion Large Margin Taxonomy Embedding with an Application to Document Categorization May 13, 2011 5 / 16

  6. Notation Documents � x n ∈ X of dimensionality d x 1 , . . . ,� ◮ Can be, e.g. , bag-of-word indicators or tf-idf scores y 1 , . . . , y n ∈ { 1 , . . . , c } are topic labels in some taxonomy T Indices ◮ i , j ∈ { 1 , . . . , n } for documents ◮ α, β ∈ { 1 , . . . , c } for classes The taxonomy gives rise to a cost matrix C ∈ R c × c , where C αβ ≥ 0 is the cost of misclassifying class α as β and C αα = 0 We wish to represent ◮ each topic α as a prototype � p α ∈ F ◮ each document � x i as a low-dimensional vector � z i ∈ F We assume C is given Large Margin Taxonomy Embedding with an Application to Document Categorization May 13, 2011 6 / 16

  7. Two-step approach: embedding topic prototypes Find prototypes � p 1 , . . . ,� p c ∈ F based on C p c ] ∈ R c × c Define P = [ � p 1 · · · � (note: it is assumed throughout the paper that F = R c ) How to derive P from C ? ◮ Simplest: ignore C and set P = I c × c ; all topics in the corners of a (c-1)-dimensional simplex, denoted P I ◮ Better: solve c � p β � 2 2 − C αβ ) 2 P mds = arg min ( � � p α − � P α,β =1 where mds means metric dimensional scaling , as in ISOMAP √ ◮ The solution is P mds = ΛV , obtained from the decomposition C = VΛV ⊤ where ¯ ¯ C = − 1 2 HCH and H = I − 1 c 11 ⊤ Large Margin Taxonomy Embedding with an Application to Document Categorization May 13, 2011 7 / 16

  8. Two-step approach: document regression Assume we have P Find mapping W : X → F so that � x i with label y i is placed near � p y i Solve the linear ridge regression n � x i � 2 2 + λ � W � 2 W = arg min ( � � p y i − W � F W i =1 The solution has the closed form PJX ⊤ ( XX ⊤ + λ I ) − 1 where x n ] and J ∈ { 0 , 1 } c × n , with J α i = 1 iff y i = α X = [ � x 1 · · · � Large Margin Taxonomy Embedding with an Application to Document Categorization May 13, 2011 8 / 16

  9. Inference and performance measure Inference (for new documents) ◮ Given a new document � x t , first map it into F and then estimate its label via the nearest-neighbor rule x t � 2 ˆ y t = arg min α � � p α − W � 2 Performance measure ◮ For a given set of labeled documents ( � x 1 , y 1 ) , . . . , ( � x n , y n ), the quality of the regression is assessed via the averaged cost-sensitive misclassification loss n E = 1 � C y i ˆ y i n i =1 Large Margin Taxonomy Embedding with an Application to Document Categorization May 13, 2011 9 / 16

  10. One-step method Learning the prototypes independently of the data is not optimal Untangle the mutual dependence between W and P (“chicken-and-egg” problem) ◮ Define A = JX ⊤ ( XX ⊤ + λ I ) − 1 ◮ We have W = PA ; A is independent of P , and can be pre-computed Now find P e α = [0 · · · 1 · · · 0] ⊤ with 1 in the α -th position ◮ Let � x ′ i = A � x i and � ◮ Rewrite � x ′ p α = P � e α and � z i = P � i ◮ We can’t optimize E w.r.t. P directly (the objective is non-continuous and non-differentiable) Large Margin Taxonomy Embedding with an Application to Document Categorization May 13, 2011 10 / 16

  11. One-step method (cont.) Surrogate loss function Minimize � i ,α ξ i α s.t. x ′ i ) � 2 x ′ i ) � 2 � P ( � e y i − � 2 + C y i α ≤ � P ( � e α − � 2 + ξ i α ξ i α ≥ 0 This enforces a large-margin condition, so that prototypes which would incur larger misclassification loss are farther away The surrogate loss is an upper bound on E Large Margin Taxonomy Embedding with an Application to Document Categorization May 13, 2011 11 / 16

  12. One-step method (cont.) Large Margin Taxonomy Embedding with an Application to Document Categorization May 13, 2011 12 / 16

  13. One-step method: upper bound on E Large Margin Taxonomy Embedding with an Application to Document Categorization May 13, 2011 13 / 16

  14. One-step method: convex formulation and regularization The above optimization is not convex, due to quadratic constraints Make the problem invariant to rotations by defining Q = P ⊤ P and writing distances w.r.t. Q x ′ i ) � 2 x ′ i ) ⊤ Q ( � x ′ x ′ i � 2 � P ( � e α − � 2 = ( � e α − � e α − � i ) = � � e α − � Q µ is a regularization parameter Large Margin Taxonomy Embedding with an Application to Document Categorization May 13, 2011 14 / 16

  15. Results Large Margin Taxonomy Embedding with an Application to Document Categorization May 13, 2011 15 / 16

  16. Results Large Margin Taxonomy Embedding with an Application to Document Categorization May 13, 2011 16 / 16

Recommend


More recommend