bandits under the influence
play

Bandits Under the Influence Silviu Maniu, Stratis Ioannidis, Bogdan - PowerPoint PPT Presentation

Bandits Under the Influence Silviu Maniu, Stratis Ioannidis, Bogdan Cautis Universit Paris-Saclay & Northeastern University 1/13 Motivation Recommender systems : recommending items to users preferences may be unknown or highly dynamic


  1. Bandits Under the Influence Silviu Maniu, Stratis Ioannidis, Bogdan Cautis Université Paris-Saclay & Northeastern University 1/13

  2. Motivation Recommender systems : recommending items to users • preferences may be unknown or highly dynamic • online recommendations systems – re-learn preferences on the go • users can be influence by other users – social influence Objective : online recommendation systems taking into account social influence • solution framework: sequential learning , multi-armed bandits 2/13

  3. Setting – Recommendation Set of users [ n ] , receiving suggestions at time steps t ∈ N , each having user profiles u i ( t ) ∈ R d Recommended item : d -dimensional vector v ∈ R d , B the catalog of recommendable items Each time step t : user is presented an item i , and presents a rating r i ( t ) : r i ( t ) = � u i ( t ) , v i ( t ) � + ǫ 3/13

  4. Setting – User Preference Evolution Users are in a social network , and interests evolve in time steps: u i ( t ) = α u 0 i + ( 1 − α ) � j ∈ [ n ] P i , j u j ( t − 1 ) , i ∈ [ n ] • social parameter α ∈ [ 0 , 1 ] • influence network between users i and j , P ij 4/13

  5. Our Contributions 1. Establish the link between the online recommendation and linear bandits 2. Apply the non-stationary setting to the classic LinREL and Thompson Sampling algorithms from the bandit literature 3. Study tractable cases for solving the optimizations in each step of the algorithms 5/13

  6. Link with Bandits Want to minimize the aggregate regret : R ( T ) = � T � n i = 1 � u i ( t ) , v ∗ i ( t ) � − � u i ( t ) , v i ( t ) � t = 1 Bandit setting : we notice that the aggregate reward is a linear function of the matrix of user profiles U 0 : • expected reward ¯ r ( t ) = u ⊤ 0 L ( t ) v – function of vectorized forms of the user and item matrices u , v and a matrix capturing the social evolution L ( t ) 6/13

  7. LinREL – Adapting to Recommendations LinREL : • arms are selected from a vector space, and the expected reward observes an linear function of the arm • to select an armwe use Upper Confidence Bound (UCB) principle – a confidence bound on an estimator • the unknown model is estimated via least square fit, either L 1 or L 2 ellipsoids 7/13

  8. LinREL – Adapting to Recommendations In our case : • arms are the items v , modified by L ( t ) – non-stationary setting • the estimator is least-squares t − 1 � X ( V ( τ ) , A ( τ )) u − r ( τ ) � 2 u 0 ( t ) = arg min � ˆ 2 u ∈ R nd τ = 1 • recommendations are selected as solution to the non-convex optimization v ( t ) = arg max u ∈C t u ⊤ L ( t ) v v ∈B ( n ) max • we study the case of C 1 , C 2 – ellipsoids in L 1 and L 2 8/13

  9. LinREL – Regret Theorem Assume that, for any 0 < δ < 1 : � 2 � � 128 nd ln t ln t 2 3 ln t 2 � 8 (1) β t = max δ , , δ then, for C t = C 2 t : � 1 + n �� � � ∀ T , R ( T ) � n 8 nd β T T ln dT � 1 − δ, (2) Pr and, for C t = C 1 t : � 1 + n �� � ∀ T , R ( T ) � n 2 d � 8 β T T ln dT � 1 − δ. (3) Pr 9/13

  10. LinREL – Computational Issues For C 1 the optimization can be solved efficiently for two classes of catalogs: • if B is a convex set – convex optimization problem , need to solve 2 n 2 d convex problems • if B is a finite subset – can check all |B| items for a total of 2 n 2 d evaluations 10/13

  11. Other Algorithms Thompson Sampling • Bayesian interpretation, assumes a prior on u 0 • in each step, samples this vector from the posterior obtained after the feedback has been observed • computationally efficient • Bayesian regret of the same order as for LinREL LinUCB • similar to LinREL, but does not optimize over an ellipsoid • non-convex optimization , inefficient 11/13

  12. Results on Synthetic Datasets RandomBanditFiniteSet RandomBanditL2Ball 14000 LinREL1FiniteSet LinREL1L2Ball 4000 12000 RegressionFiniteSet RegressionL2Ball TompsonSamplingFiniteSet TompsonSamplingL2Ball 10000 3000 Regret Regret 8000 2000 6000 4000 1000 2000 0 0 0 20 40 60 80 100 0 20 40 60 80 100 Step Step (a) Regret, finite set (b) Regret, L 2 ball n = 100 , d = 20 , n = 100 , d = 20 |B| = 1000 Synthetic dataset : randomly generated social network, user profiles, and catalog 12/13

  13. Results on Real Dataset RandomBanditFiniteSet LinREL1FiniteSet 50000 RegressionFiniteSet TompsonSamplingFiniteSet 40000 Regret 30000 20000 10000 0 0 20 40 60 80 100 Step Figure 1: Flixstr regret n = 206 , d = 28 , |B| = 100 Flixstr : filtered dataset • 1 049 492 users in a social network of 7 058 819 links • 74 240 movies and 8 196 077 reviews 13/13

Recommend


More recommend