online ranking combination
play

Online Ranking Combination Erzs ebet Frig o Institute for - PowerPoint PPT Presentation

Online Ranking Combination Erzs ebet Frig o Institute for Computer Science and Control (MTA SZTAKI) Joint work with Levente Kocsis Overview Framework: prequential ranking evaluation Goal: optimize convex combination of ranking


  1. Online Ranking Combination Erzs´ ebet Frig´ o Institute for Computer Science and Control (MTA SZTAKI) Joint work with Levente Kocsis

  2. Overview ◮ Framework: prequential ranking evaluation ◮ Goal: optimize convex combination of ranking models ◮ Our proposal: direct optimization of the ranking function

  3. Model combination in prequential framework with ranking evaluation time A 1 A 2 A 3 u i 1 i 1 i 1 i 5 i 2 i 2 i 2 combine i 2 scores ranking list ... ... ... i 7 i m-1 i m-1 i m-1 i 3 i m i m i m

  4. Model combination in prequential framework with ranking evaluation time A 1 A 2 A 3 u i 1 i 1 i 1 i 5 i 2 i 2 i 2 combine i 2 scores ranking list ... ... ... i 7 i m-1 i m-1 i m-1 i 3 i m i m i m Objective: choosing combination weights.

  5. New idea: optimize ranking function directly ◮ Standard method: take a surrogate function and use its gradient ◮ E.g. MSE ◮ Drawback: optimum of the surrogate � = optimum of ranking function ◮ Proposed solution: optimize the ranking function directly ◮ Two approaches: ◮ Global search in the weight space ◮ Gradient approximation (finite differences)

  6. ExpW ◮ Choose a subset Q of the weight space Θ ◮ e.g., lay a grid to the parameter space ◮ Apply exponentially weighted forecaster on Q � t − 1 τ =1 (1 − r τ ( q )) e − η t P (select q ∈ Q in round t ) = � t − 1 τ =1 (1 − r τ ( s )) � s ∈ Q e − η t ◮ Theoretical guarantee: √ E [ R T (best static combination in Θ) − R T ( ExpW )] < O ( T ) ◮ if the cumulative reward function ( R T ) is sufficiently smooth ◮ and Q is sufficiently large ◮ Difficulty: size of Q is exponential in number of base rankers, can’t scale

  7. Simultaneous Perturbation Stochastic Approximation (SPSA) ◮ Approximated gradient (for the weight of base ranker i in round t ): g ti = r t ( θ t + c t ∆ t ) − r t ( θ t − c t ∆ t ) c t ∆ ti ◮ θ t is the current combination weight vector ◮ ∆ t = (∆ t 1 , ... ) is a random vector of +/-1 ◮ c t is perturbation step size ◮ Online update step: one gradient step using the approximated gradient

  8. RSPSA ◮ RSPSA = SPSA + Resilient Backpropagation (RProp)

  9. RSPSA ◮ RSPSA = SPSA + Resilient Backpropagation (RProp) ◮ RProp defines gradient step sizes for each weight ◮ Perturbation step size is tied to gradient step size ◮ Update step sizes using RProp

  10. Resilient Backpropagation (RProp) ◮ Gradient update rule ◮ Predefined step size for each coordinate ◮ ignores the length of the gradient vector ◮ Step size is updated based on the sign of the gradient ◮ decrease step if gradient changed direction ◮ increase otherwise

  11. g ti = r t ( θ t + c t ∆ t ) − r t ( θ t − c t ∆ t ) c t ∆ ti

  12. RFDSA+ ◮ Switch to finite differences (FD) ◮ allows to detect 0 gradient w.r.t. one coordinate ◮ If the gradient is 0 w.r.t. a coordinate, then ◮ increase perturbation size (+) for that coordinate ◮ escape flat section in the right direction ◮ RFDSA+ = RSPSA - simultaneous perturbation + finite differences + zero gradient detection ◮ The modifications might seem to be minor, but are essential to make the algorithm work

  13. Experiments - Datasets, base rankers ◮ Base rankers: ◮ 5 datasets ◮ Models updated incrementally ◮ Amazon SGD Matrix Factorization ◮ CDs and Vinyl ◮ Movies and TV Asymmetric Matrix Factorization ◮ Electronics ◮ MovieLens 10M Item-to-item similarity ◮ Twitter ◮ hashtag prediction 25% 30% 15% 25% 5% Most popular ◮ Size ◮ Traditional models updated periodically ◮ # of events: 2M-10M ◮ # of users: 70k-4M SGD Matrix Factorization ◮ # of items: 10k-100k Implicit Alternating Least Squares MF

  14. Combination algorithms in the experiments Direct optimization: Baselines: ◮ ExpW ◮ ExpA ◮ exponentially weighted forecaster on ◮ exponentially weighted forecaster on a grid the base rankers ◮ global optimization ◮ ExpAW ◮ SPSA ◮ use probabilities of ExpA as weights ◮ gradient method with simultaneous ◮ SGD perturbation ◮ use MSE as a surrogate ◮ RSPSA ◮ target=1 for positive sample ◮ SPSA with RProp ◮ target=0 for generated negative samples ◮ RFDSA+ ◮ our new algorithm ◮ finite differences, flat section detection

  15. Results - 2 base rankers (i2i, OMF) - nDCG 0.07 0.06 0.05 0.04 NDCG item2item 0.03 OMF ExpA ExpAW 0.02 ExpW SGD SPSA 0.01 RSPSA RFDSA+ 0 0 1000 2000 3000 4000 5000 6000 7000 days

  16. Results - 2 base rankers - Combination weights 1.0000000 0.1000000 0.0100000 θ 0.0010000 0.0001000 OptG100+ ExpAW SGD 0.0000100 SPSA RSPSA RFDSA+ 0.0000010 0 1000 2000 3000 4000 5000 6000 7000 days

  17. Cumulative reward as function of combination weight 0.054 R T ( θ ) 0.052 0.05 0.048 NDCG 0.046 0.044 0.042 0.04 0.038 0.0001 0.001 0.01 0.1 1 θ

  18. Results - Scalability 0.06 0.055 0.05 NDCG 0.045 0.04 0.035 ExpA SPSA ExpAW RSPSA SGD RFDSA+ 0.03 1 2 3 4 5 6 7 8 9 10 number of OMF’s

  19. Results - 6 base rankers - DCG

  20. Conclusions ◮ Problem: combine ranking algorithms ◮ Our proposal: optimize the ranking measure directly ◮ Global optimization (ExpW) works well in case of two base algo ◮ Our new algo: RFDSA+ ◮ solves problems (scaling, constant sections w.r.t one coordinate) ◮ strong combination

  21. The End Online Ranking Combination Erzs´ ebet Frig´ o Institute for Computer Science and Control (MTA SZTAKI) Joint work with Levente Kocsis

Recommend


More recommend