Online Ranking Combination Erzs ebet Frig o Institute for - PowerPoint PPT Presentation

Online Ranking Combination Erzs´ ebet Frig´ o Institute for Computer Science and Control (MTA SZTAKI) Joint work with Levente Kocsis

Overview ◮ Framework: prequential ranking evaluation ◮ Goal: optimize convex combination of ranking models ◮ Our proposal: direct optimization of the ranking function

Model combination in prequential framework with ranking evaluation time A 1 A 2 A 3 u i 1 i 1 i 1 i 5 i 2 i 2 i 2 combine i 2 scores ranking list ... ... ... i 7 i m-1 i m-1 i m-1 i 3 i m i m i m

Model combination in prequential framework with ranking evaluation time A 1 A 2 A 3 u i 1 i 1 i 1 i 5 i 2 i 2 i 2 combine i 2 scores ranking list ... ... ... i 7 i m-1 i m-1 i m-1 i 3 i m i m i m Objective: choosing combination weights.

New idea: optimize ranking function directly ◮ Standard method: take a surrogate function and use its gradient ◮ E.g. MSE ◮ Drawback: optimum of the surrogate � = optimum of ranking function ◮ Proposed solution: optimize the ranking function directly ◮ Two approaches: ◮ Global search in the weight space ◮ Gradient approximation (finite differences)

ExpW ◮ Choose a subset Q of the weight space Θ ◮ e.g., lay a grid to the parameter space ◮ Apply exponentially weighted forecaster on Q � t − 1 τ =1 (1 − r τ ( q )) e − η t P (select q ∈ Q in round t ) = � t − 1 τ =1 (1 − r τ ( s )) � s ∈ Q e − η t ◮ Theoretical guarantee: √ E [ R T (best static combination in Θ) − R T ( ExpW )] < O ( T ) ◮ if the cumulative reward function ( R T ) is sufficiently smooth ◮ and Q is sufficiently large ◮ Difficulty: size of Q is exponential in number of base rankers, can’t scale

Simultaneous Perturbation Stochastic Approximation (SPSA) ◮ Approximated gradient (for the weight of base ranker i in round t ): g ti = r t ( θ t + c t ∆ t ) − r t ( θ t − c t ∆ t ) c t ∆ ti ◮ θ t is the current combination weight vector ◮ ∆ t = (∆ t 1 , ... ) is a random vector of +/-1 ◮ c t is perturbation step size ◮ Online update step: one gradient step using the approximated gradient

RSPSA ◮ RSPSA = SPSA + Resilient Backpropagation (RProp)

RSPSA ◮ RSPSA = SPSA + Resilient Backpropagation (RProp) ◮ RProp defines gradient step sizes for each weight ◮ Perturbation step size is tied to gradient step size ◮ Update step sizes using RProp

Resilient Backpropagation (RProp) ◮ Gradient update rule ◮ Predefined step size for each coordinate ◮ ignores the length of the gradient vector ◮ Step size is updated based on the sign of the gradient ◮ decrease step if gradient changed direction ◮ increase otherwise

g ti = r t ( θ t + c t ∆ t ) − r t ( θ t − c t ∆ t ) c t ∆ ti

RFDSA+ ◮ Switch to finite differences (FD) ◮ allows to detect 0 gradient w.r.t. one coordinate ◮ If the gradient is 0 w.r.t. a coordinate, then ◮ increase perturbation size (+) for that coordinate ◮ escape flat section in the right direction ◮ RFDSA+ = RSPSA - simultaneous perturbation + finite differences + zero gradient detection ◮ The modifications might seem to be minor, but are essential to make the algorithm work

Experiments - Datasets, base rankers ◮ Base rankers: ◮ 5 datasets ◮ Models updated incrementally ◮ Amazon SGD Matrix Factorization ◮ CDs and Vinyl ◮ Movies and TV Asymmetric Matrix Factorization ◮ Electronics ◮ MovieLens 10M Item-to-item similarity ◮ Twitter ◮ hashtag prediction 25% 30% 15% 25% 5% Most popular ◮ Size ◮ Traditional models updated periodically ◮ # of events: 2M-10M ◮ # of users: 70k-4M SGD Matrix Factorization ◮ # of items: 10k-100k Implicit Alternating Least Squares MF

Combination algorithms in the experiments Direct optimization: Baselines: ◮ ExpW ◮ ExpA ◮ exponentially weighted forecaster on ◮ exponentially weighted forecaster on a grid the base rankers ◮ global optimization ◮ ExpAW ◮ SPSA ◮ use probabilities of ExpA as weights ◮ gradient method with simultaneous ◮ SGD perturbation ◮ use MSE as a surrogate ◮ RSPSA ◮ target=1 for positive sample ◮ SPSA with RProp ◮ target=0 for generated negative samples ◮ RFDSA+ ◮ our new algorithm ◮ finite differences, flat section detection

Results - 2 base rankers (i2i, OMF) - nDCG 0.07 0.06 0.05 0.04 NDCG item2item 0.03 OMF ExpA ExpAW 0.02 ExpW SGD SPSA 0.01 RSPSA RFDSA+ 0 0 1000 2000 3000 4000 5000 6000 7000 days

Results - 2 base rankers - Combination weights 1.0000000 0.1000000 0.0100000 θ 0.0010000 0.0001000 OptG100+ ExpAW SGD 0.0000100 SPSA RSPSA RFDSA+ 0.0000010 0 1000 2000 3000 4000 5000 6000 7000 days

Cumulative reward as function of combination weight 0.054 R T ( θ ) 0.052 0.05 0.048 NDCG 0.046 0.044 0.042 0.04 0.038 0.0001 0.001 0.01 0.1 1 θ

Results - Scalability 0.06 0.055 0.05 NDCG 0.045 0.04 0.035 ExpA SPSA ExpAW RSPSA SGD RFDSA+ 0.03 1 2 3 4 5 6 7 8 9 10 number of OMF’s

Results - 6 base rankers - DCG

Conclusions ◮ Problem: combine ranking algorithms ◮ Our proposal: optimize the ranking measure directly ◮ Global optimization (ExpW) works well in case of two base algo ◮ Our new algo: RFDSA+ ◮ solves problems (scaling, constant sections w.r.t one coordinate) ◮ strong combination

The End Online Ranking Combination Erzs´ ebet Frig´ o Institute for Computer Science and Control (MTA SZTAKI) Joint work with Levente Kocsis

Online Ranking Combination Erzs ebet Frig o Institute for - PowerPoint PPT Presentation

Online Ranking Combination Erzs ebet Frig o Institute for Computer Science and Control (MTA SZTAKI) Joint work with Levente Kocsis Overview Framework: prequential ranking evaluation Goal: optimize convex combination of ranking

Online Submodular Set Cover, Ranking, and Repeated Active Learning Online Ranking: At each round,

Easy and Hard Outline Constraint Ranking in OT The Constraint Ranking problem Making fast

Tutorial: TF-Ranking for sparse features Tutorial: TF-Ranking for sparse features This tutorial

MT System Combination Silja Hildebrand MT System Combination System Combination in MT

Ranking candidate genes from Ranking candidate genes from perturbation experiments Niko

TVM for Ads Ranking @ Facebook Hao Lu, Ansha Yu, Yinghai Lu, Andrew Tulloch Ads Ranking at

1 Similarity ranking: example Weighted scoring with linear combination A simple weighted

Ranking prediction by online learning Rbert Plovics Informatics Laboratory, Department of

A Ranking Method to Improve A Ranking Method to Improve Detection of Disease Using Selectively

+ Ranking Factor Latest Trends What factors matter in 2016-2017 for ranking your Google

Kernel Principal Component Ranking: Robust Ranking on Noisy Data Evgeni Tsivtsivadze Botond

KNN and re ranking models for English KNN and re-ranking models for English patent mining at

Tutorial Ranking Mechanisms in Games Vanessa Volz and Boris Naujoks CIG 2018, Maastricht

Web Mining and Recommender Systems Advanced Recommender Systems: Bayesian Personalized Ranking

PRanking with Ranking Koby Crammer Technion Israel Institute of Technology Based on joint

Lecture 3: Improving Ranking with Lecture 3: Improving Ranking with Behavior Data Eugene

Chapter 3: Using. Risa Wechsler KIPAC @ Stanford & SLAC large cosmological simulations

IPv4 reverse measurements Mattijs Jonker Introduction Over fjve years ago, we started with an

WEBINAR 30 th APRIL 2020 Maysir UNIQUE FEATURES OF TAKAFUL COMPARED TO INSURANCE Concept of

IP Proposal Scott Tocher MISSION Our Mission To accelerate stem cell treatments to patients

Lecture 19: Flow and Confinement Examples of information flow applications The confinement

1 Assessing Risk in the Physical Build of a Network REANNZ lunchtime presentation 2019 WHAT

Approximate Sliding Window Framework with Error Control lvaro Villalba Former Research

Online Algorithms Algorithm Theory WS 2012/13 Fabian Kuhn Self Adjusting Lists Linked lists

Online Ranking Combination Erzs ebet Frig o Institute for - PowerPoint PPT Presentation

Online Ranking Combination Erzs ebet Frig o Institute for Computer Science and Control (MTA SZTAKI) Joint work with Levente Kocsis Overview Framework: prequential ranking evaluation Goal: optimize convex combination of ranking

Online Submodular Set Cover, Ranking, and Repeated Active Learning Online Ranking: At each round,

Easy and Hard Outline Constraint Ranking in OT The Constraint Ranking problem Making fast

Tutorial: TF-Ranking for sparse features Tutorial: TF-Ranking for sparse features This tutorial

MT System Combination Silja Hildebrand MT System Combination System Combination in MT

Ranking candidate genes from Ranking candidate genes from perturbation experiments Niko

TVM for Ads Ranking @ Facebook Hao Lu, Ansha Yu, Yinghai Lu, Andrew Tulloch Ads Ranking at

1 Similarity ranking: example Weighted scoring with linear combination A simple weighted

Ranking prediction by online learning Rbert Plovics Informatics Laboratory, Department of

A Ranking Method to Improve A Ranking Method to Improve Detection of Disease Using Selectively

+ Ranking Factor Latest Trends What factors matter in 2016-2017 for ranking your Google

Kernel Principal Component Ranking: Robust Ranking on Noisy Data Evgeni Tsivtsivadze Botond

KNN and re ranking models for English KNN and re-ranking models for English patent mining at

Tutorial Ranking Mechanisms in Games Vanessa Volz and Boris Naujoks CIG 2018, Maastricht

Web Mining and Recommender Systems Advanced Recommender Systems: Bayesian Personalized Ranking

PRanking with Ranking Koby Crammer Technion Israel Institute of Technology Based on joint

Lecture 3: Improving Ranking with Lecture 3: Improving Ranking with Behavior Data Eugene

Chapter 3: Using. Risa Wechsler KIPAC @ Stanford &amp; SLAC large cosmological simulations

IPv4 reverse measurements Mattijs Jonker Introduction Over fjve years ago, we started with an

WEBINAR 30 th APRIL 2020 Maysir UNIQUE FEATURES OF TAKAFUL COMPARED TO INSURANCE Concept of

IP Proposal Scott Tocher MISSION Our Mission To accelerate stem cell treatments to patients

Lecture 19: Flow and Confinement Examples of information flow applications The confinement

1 Assessing Risk in the Physical Build of a Network REANNZ lunchtime presentation 2019 WHAT

Approximate Sliding Window Framework with Error Control lvaro Villalba Former Research

Online Algorithms Algorithm Theory WS 2012/13 Fabian Kuhn Self Adjusting Lists Linked lists

Chapter 3: Using. Risa Wechsler KIPAC @ Stanford & SLAC large cosmological simulations