A Unified Contextual Bandit Framework for Long- and Short-Term Recommendations Maryam Tavakol and Ulf Brefeld {tavakol,brefeld}@leuphana.de Skopje - Sep 21, 2017
Recommendation Tavakol & Brefeld, Leuphana University Lüneburg 2/22
Recommendation Tavakol & Brefeld, Leuphana University Lüneburg 3/22
Personalization Tavakol & Brefeld, Leuphana University Lüneburg 4/22
Short-Term Zeitgeist Tavakol & Brefeld, Leuphana University Lüneburg 5/22
Proposed Approach • Goal: Combination of long- and short-term interests of users in one unified model ➡ Long-term part + Short-term component ✤ s.t.: Generality in terms of optimization • Framework: Contextual Multi-Armed Bandit (MAB) • e.g., LinUCB Tavakol & Brefeld, Leuphana University Lüneburg 6/22
Unified Model current user outcome item features context bias term θ > β > E [ r t,a i | u j ] = + + b i i x t j z a i | {z } | {z } Short � term Long � term parameters of short-term model parameters of long-term model Tavakol & Brefeld, Leuphana University Lüneburg 7/22
General Optimization • Objective function with arbitrary loss, V ( · , r t ) T 1 k θ i k 2 + ˆ t z t + b t , r t ) + λ µ X X X V ( θ > t x t + β > k β j k 2 inf 2 2 T θ 1 ,..., θ n t =1 i j β 1 ,..., β m b Regularization Tavakol & Brefeld, Leuphana University Lüneburg 8/22
General Optimization • Using the Fenchel-Legendre conjugate of loss function in the dual space : T C , r t ) � 1 i ) � XX > + 1 V ⇤ ( � α t X X X δ i ⌦ δ > φ i ⌦ φ > 2 α > [( i ) � ZZ > ] α sup µ ( � C α , 1 > α =0 t =1 i i Kernel trick Tavakol & Brefeld, Leuphana University Lüneburg 9/22
Optimization • Gradient-based approaches (in dual or primal) • Calculating the gradient depends on the loss function • Model parameters, , are obtained from ( θ i , β j ) α • Kernel functions applicable Tavakol & Brefeld, Leuphana University Lüneburg 10/22
Algorithm Tavakol & Brefeld, Leuphana University Lüneburg 11/22
Algorithm Tavakol & Brefeld, Leuphana University Lüneburg 11/22
Algorithm Tavakol & Brefeld, Leuphana University Lüneburg 11/22
Algorithm Tavakol & Brefeld, Leuphana University Lüneburg 11/22
Algorithm Tavakol & Brefeld, Leuphana University Lüneburg 11/22
Algorithm Tavakol & Brefeld, Leuphana University Lüneburg 11/22
Algorithm Tavakol & Brefeld, Leuphana University Lüneburg 11/22
Algorithm Tavakol & Brefeld, Leuphana University Lüneburg 11/22
Instantiation: Squared Loss • Conjugate of the loss function: 1 t − 1 V ∗ ( − α t 2 C 2 α 2 C , r t ) = C α t r t • Becomes a standard quadratic optimization with a constraint q • Confidence bound: c x > t ( X > X ) � 1 x t + z > t ( Z > Z ) � 1 z t Tavakol & Brefeld, Leuphana University Lüneburg 12/22
Instantiation: Logistic Loss • Conjugate of the loss function: V ∗ ( − α t , r t ) = (1 − α t ) log(1 − α t ) + α t log( α t ) r t Cr t Cr t Cr t Cr t • Confidence bound: Diagonal matrix of sigmoid model q x > t ( X > V a X ) � 1 x t + z > t ( Z > V u Z ) � 1 z t c Tavakol & Brefeld, Leuphana University Lüneburg 13/22
Model Simplification • Focus on the item model E [ r t,a i ] = θ > Short-Term: • i x t E [ r t,a i ] = θ > i x t + β > z a i Short-Term+Average: • • Focus on the user model E [ r t,a i | u j ] = β > Long-Term: • j z a i E [ r t,a i | u j ] = β > j z a i + θ > z a i Long-Term+Average: • Tavakol & Brefeld, Leuphana University Lüneburg 14/22
Empirical Study • Using squared loss function • Dataset: User transactions from Zalando* • Baseline: Matrix Factorization (MF) • Performance measure: normalized average rank Tavakol & Brefeld, Leuphana University Lüneburg 15/22 *www.zalando.com
No New User/Item • T he combined approach outperforms either short- or long-term models —but not the baseline! Tavakol & Brefeld, Leuphana University Lüneburg 16/22
Cold Start Scenarios • Robustness of combined model in case of new user/item generalizes well for both cases Tavakol & Brefeld, Leuphana University Lüneburg 17/22
Time Complexity • The optimization time in combined model is exponential Tavakol & Brefeld, Leuphana University Lüneburg 18/22
Short-Term Models • The average term compensates for the new items Tavakol & Brefeld, Leuphana University Lüneburg 19/22
Long-Term Models • The average term compensates for the new users Tavakol & Brefeld, Leuphana University Lüneburg 20/22
Conclusion • The short- and long-term interests of users are combined in one model • Free choice of loss function and model complexity • There is not one best model: the choice depends on the application Tavakol & Brefeld, Leuphana University Lüneburg 21/22
Questions? Thanks for your attention A Unified Contextual Bandit Framework for Long- and Short-Term Recommendations Maryam Tavakol & Ulf Brefeld {tavakol,brefeld}@leuphana.de Source code available at https://github.com/marytavakol/Bandits Tavakol & Brefeld, Leuphana University Lüneburg 22/22
Recommend
More recommend