efficient learning of topic ranking by soft projections
play

Efficient Learning of Topic Ranking by Soft-projections onto - PowerPoint PPT Presentation

Efficient Learning of Topic Ranking by Soft-projections onto Polyhedra Yoram Singer Part of the work was performed at The Hebrew University, Jerusalem, Israel AIM Workshop on the Mathematics of Ranking, Aug. 19, 2010 1 Acknowledgements


  1. Efficient Learning of Topic Ranking by Soft-projections onto Polyhedra Yoram Singer Part of the work was performed at The Hebrew University, Jerusalem, Israel AIM Workshop on the Mathematics of Ranking, Aug. 19, 2010 1

  2. Acknowledgements • Koby Crammer, Technion Initial framework & numerous algorithms for online ranking using dual decomposition • Shai Shalev-Shwartz, Hebrew U. Specialized efficient ranking algorithm, Regret bound analysis using primal-dual method 2

  3. This Workshop (thus far & today) • Overviews of structure of ranking problems • Foundations from economics to social choice • Probabilistic & statistical analysis of orderings • Models for expressing & generating orderings • Page rank, graph-based methods, Hodge theory • Overview of machine learning 4 ranking (MLR) • Learning theoretic analysis of MLR (to follow) 3

  4. This Talk • Specific ranking problem setting • Loss minimization framework • An efficient learning algorithm • Brief experimental discussion • High-level overview of formal analysis 4

  5. Courtesy of K. Crammer Topic Ranking Document Desired Ordering ECONOMICS The higher minimum wage signed CORPORATE / INDUSTRIAL REGULATION / POLICY into law… will be welcome relief MARKETS for millions of workers … . The 90- LABOUR cent-an-hour increase … . GOVERNMENT / SOCIAL LEGAL/JUDICIAL REGULATION/POLICY SHARE LISTINGS PERFORMANCE Relevant topics ACCOUNTS/EARNINGS COMMENT / FORECASTS REGULATION / POLICY MARKETS SHARE CAPITAL BONDS / DEBT ISSUES CORPORATE / INDUSTRIAL LABOUR LOANS / CREDITS STRATEGY / PLANS GOVERNMENT / SOCIAL ECONOMICS INSOLVENCY / LIQUIDITY Multi-label , Bipartite feedback 5

  6. Topic Ranking: Setting • Instances - vectors in (documents, images, speech signals, …) • Predefined set of labels (topics, categories, phonemes) • Target ranking of labels: label r preferred over s ( ) iff • Ranking functions 6

  7. Special Cases • Binary classification: • Multiclass categorization: • Multiclass multilabel: 7

  8. Preference-Based View 3 3 3 2 2 2 4 1 1 1 1 5 1 8

  9. Quality of Predicted Ranking • Loss for a pair of labels s.t. • Measures whether we predicted the order of a pair correctly, and with sufficient confidence 9

  10. Quality of Predicted Ranking (cont.) • Loss of f on a subset E: • Loss of f on an entire example predefined subset weights We will assume that each defines a bipartite graph 10

  11. Linear Ranking Functions • Linear predictors • “Complexity” of a predictor • Complexity of a function set • Can be used with Mercer kernels 11

  12. Loss Minimization & Regularization Empirical Risk of Ranker Complexity of Ranker 12

  13. Loss  Pair-wise Constraints Loss: • Each pair of (comparable) labels corresponds to a margin constraint • Slack variables distinguish between different edge-sets • Focus on a single instance & a single edge-set • Make use of the fact that 13

  14. A Reduced Problem • Current estimate of ranking functions • “New” example • Update ranking function • Framework for online learning • Iterative procedure for batch optimization 14

  15. Solving the Reduced Problem • Direct use of Lagrange multipliers & strong duality leads to variables since each constraint is associated with a variable. 15

  16. Reducing the Reduced… • Introduce new |A|+|B|=k variables A 2 1 B 1 2 3 16

  17. Reduced & Compacted Dual • More compact dual problem: • Feasible  Feasible • Feasible  Feasible 17

  18. Obtaining (Primal) Solution • The new set of ranking vectors: • But, we still need to find 18

  19. “Decoupling” the Dual • Compact Dual • Suppose we were given • We can solve two independent problems 19

  20. Solving the Decoupled Problems • We need to solve • Optimal solution takes the form • Can be found in linear time using an improved & generalized (CS’99, SS’06, DSSC’09) projection technique by Bretsekas 20

  21. Finding C* • The sets define |A|+|B|=k “knots” • Function is piecewise quadratic • Minimum is unique • Can be computed in O(k) Value of Dual as function of C* 21

  22. “Coupled” Again • Locate the global optimum • Check whether C>C* Value of Dual 22

  23. Recapping 1. Focus on a single example 2. Find a compact form of the dual 3. Decouple the reduced & compact dual 4. Compute “knots” for decoupled problems 5. Find the optimal value of 6. Once C* is known we use soft projection to find α and β 7. Once α and β are known we find w from u 23

  24. Back to Multiple Examples • Cycle through the examples: • Online mode: • Visit each example only once • Can obtain worst case loss bound • Batch mode: • Visit each example multiple time an “re-project” • Can obtain asymptotic convergence • Generalization bounds • SOPOPO - SOft Projection Onto POlyhedra 24

  25. Empirical Evaluation (Batch convergence) 25

  26. Empirical Evaluation (Online Error) 0.8 Perc PA 0.7 Sop 0.6 0.5 Letter (poly ker.) 0.8 0.4 Perc 0.7 PA Sop 0 0.5 1 1.5 2 0.6 4 Letter (linear) x 10 0.5 0.4 0.3 0.2 0.1 0 0.5 1 1.5 2 26 4 x 10

  27. Why Does it “Work” ? (proof by picture) Single Iterate Primal Objective If each > then Dual Objective 27

  28. Summary • General framework for online ranking • Specialized efficient algorithm - SOPOPO • Ranking: rich structure, interesting, useful, … • Primal-dual regret bound analysis • Based on: “Efficient Learning of Label Ranking by Soft Projections onto Polyhedra” • Journal Of Machine Learning Research, Vol. 7, 2007 "A Primal-Dual Perspective of Online Learning Algorithms” • Machine Learning Journal, Vol. 69, 2007 • Code: http://www.cs.huji.ac.il/~shais/code/sopopo.tgz • See also: http://www.magicbroom.info 28

Recommend


More recommend