on the effectiveness of linear models for one class
play

On the Effectiveness of Linear Models for One-Class Collaborative - PowerPoint PPT Presentation

On the Effectiveness of Linear Models for One-Class Collaborative Filtering Suvash Sedhain 1,2 , Aditya Menon 2,1 , Scott Sanner 3,1 , Darius Braziunas 4 Australian National University 1 NICTA 2 Oregon State University 3 Rakuten Kobo Inc 4


  1. On the Effectiveness of Linear Models for One-Class Collaborative Filtering Suvash Sedhain 1,2 , Aditya Menon 2,1 , Scott Sanner 3,1 , Darius Braziunas 4 Australian National University 1 NICTA 2 Oregon State University 3 Rakuten Kobo Inc 4

  2. Recommender Systems • Recommender Systems – Objective: Present personalized items to users • Collaborative filtering – De-facto method for multiuser recommender systems – Find people like you and leverage their preferences – One-class: only observe positive feedback

  3. Sneak Peak: Model Proposal • Personalized user focused linear model • Convex • Embarrassingly parallel – Each user trained individually

  4. State-of-the-art Collaborative Filtering • Neighborhood methods • Matrix Factorization • SLIM (Sparse Linear Method)

  5. Nearest Neighbors: A Matrix View . . . . . . . . . . . . . . . . . 1 ? 1 ? ? ? 1 1 ? 1 . . . . . . × = 1 1 ? ? 1 . . . . . . . . ? 1 ? 1 ? • { Jaccard, Cosine} similarity S I used in practice • Keep only top k similarities • Simple, but learning is limited

  6. Factorization Model (Weighted) Matrix Factorization . . . . . . . . . . . . 1 0 1 0 0 1 1 0 0 1 k × n = Item Projection 1 1 0 0 1 . . . . 0 1 1 0 0 m × k User Projection • Works well in general, but non-convex!

  7. SLIM item item . . . . . . . . . . . . . . . . . 1 0 1 1 0 1 0 0 0 0 0 1 1 0 1 1 0 0 1 1 = user × 1 1 1 1 0 0 1 0 0 1 . item . . . . . . . . . . . 0 1 1 0 0 0 1 1 0 0 • Effectively trying to learn item-to-item similarities • Not user-focused, complicated optimization 10

  8. Recommender Systems Desiderata • Learning based • Convex objective • User focused • Parallelizable

  9. Comparison of recommendation methods for OC-CF

  10. Outline • Problem statement • Background • LRec Model • Experiments • Results • Summary

  11. LRec Recommendation for . . . • Each item is a training 1 0 1 1 1 instance 0 1 0 0 0 • Can be interpreted as 1 1 1 1 1 learning user-user × = . . . affinities 0 0 0 1 0 • Regularizer prevents . . from the trivial solution . . . . 1 1 0 1 0 W u1 Recommendation Any loss function - Squared Learning a model - Logistic per user

  12. Properties of LRec • User focused – Recommendation as learning a model per user • Convex objective – Guarantees optimal solution for the formulation • Embarrassingly parallel – Each model is completely independent of other

  13. Relationship with Existing Models SLIM LRec - Item focused - User focused - Elastic-net penalty + non-negativity - L2 penalty constraints - Optimization - Optimization: – L2 loss - Coordinate descent – Logistic Loss : Liblinear - Levy et.al. relaxed the non-negativity (dual iff #users >> #items) constraints; optimization via SGD Truncated Gradient

  14. Relationship with Existing Models LRec Neighborhood models • Learns weight matrix via • Computes similarities using predefined classification/regression problem similarity metrics(eg: Cosine, Jaccard) – can be interpreted as learning user- user similarities

  15. Relationship with Existing Models LRec Matrix Factorization • Learns weight matrix via classification/regression problem – can be interpreted as learning user- If user similarities Recommendation Where, • Non Convex objective • Convex objective • Low rank • Full rank • Parallelism via distributed • Embarrassingly parallel communication

  16. Other Advantages of LRec • Efficient hyper-parameter tuning for ranking – Validate on small subset of users • Model can be fine-tuned per user

  17. Other Advantages of LRec: Incorporating Side Information . . . Genre Actors . . . 1 1 0 1 1 0 0 1 0 0 1 1 1 1 × = 1 Item features . . . 0 0 0 1 0 . . . . . . 0 1 1 0 0 • Can easily incorporate abundant item-side information

  18. Outline • Problem statement • Background • LRec Model • Experiment & Results • Summary

  19. Dataset Description and Evaluation • Movielens 1M (ML1M) • Kobo • Last FM (LASTFM) • Million Song Dataset (MSD) • Evaluation Metrics 10 random train-test split • • precision@k 80%-20% split • • mean Average Precision@100 For MSD, we evaluate on random 500 users • Error bars => 95% confidence interval

  20. Experiment Setup • Baselines • SLIM – Most Popular • LREC – Neighborhood – Elastic Net Lrec + Non-Negativity • User KNN (U-KNN) (Lrec + Sq + L 1 + NN) • Item KNN (I-KNN) – Squared Loss LRec (Lrec + Sq) – Logistic Loss LRec (LRec) – Matrix Factorization • PureSVD • WRMF • LogisticMF • Bayesian Personalized Ranking (BPR)

  21. Results Did not finish

  22. Results Precision@20 on ML1M and LastFM dataset

  23. Results Did not finish Precision@20 on Kobo and LastFM dataset

  24. Performance Evaluation Users segmented by the number of observation % improvement over WRMF on ML1M dataset

  25. Case Study Recommendation from WRMF vs LRec LRec is more personalized

  26. Summary • LRec – Personalized user focused linear recommender – Convex objective – Embarrassingly parallel • Future work – Further scale LRec • Computational • Memory footprint

  27. Thanks

Recommend


More recommend