matrix factorization and factorization machines for
play

Matrix Factorization and Factorization Machines for Recommender - PowerPoint PPT Presentation

Matrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department of Computer Science National Taiwan University Talk at SDM workshop on Machine Learning Methods on Recommender Systems, May 2, 2015 Chih-Jen Lin


  1. Matrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department of Computer Science National Taiwan University Talk at SDM workshop on Machine Learning Methods on Recommender Systems, May 2, 2015 Chih-Jen Lin (National Taiwan Univ.) 1 / 54

  2. Outline Matrix factorization 1 Factorization machines 2 Conclusions 3 Chih-Jen Lin (National Taiwan Univ.) 2 / 54

  3. In this talk I will briefly discuss two related topics Fast matrix factorization (MF) in shared-memory systems Factorization machines (FM) for recommender systems and classification/regression Note that MF is a special case of FM Chih-Jen Lin (National Taiwan Univ.) 3 / 54

  4. Matrix factorization Outline Matrix factorization 1 Introduction and issues for parallelization Our approach in the package LIBMF Factorization machines 2 Conclusions 3 Chih-Jen Lin (National Taiwan Univ.) 4 / 54

  5. Matrix factorization Introduction and issues for parallelization Outline Matrix factorization 1 Introduction and issues for parallelization Our approach in the package LIBMF Factorization machines 2 Conclusions 3 Chih-Jen Lin (National Taiwan Univ.) 5 / 54

  6. Matrix factorization Introduction and issues for parallelization Matrix Factorization Matrix Factorization is an effective method for recommender systems (e.g., Netflix Prize and KDD Cup 2011) But training is slow. We developed a parallel MF package LIBMF for shared-memory systems http://www.csie.ntu.edu.tw/~cjlin/libmf Best paper award at ACM RecSys 2013 Chih-Jen Lin (National Taiwan Univ.) 6 / 54

  7. Matrix factorization Introduction and issues for parallelization Matrix Factorization (Cont’d) For recommender systems: a group of users give ratings to some items User Item Rating 1 5 100 1 10 80 1 13 30 . . . . . . . . . u v r . . . . . . . . . The information can be represented by a rating matrix R Chih-Jen Lin (National Taiwan Univ.) 7 / 54

  8. Matrix factorization Introduction and issues for parallelization Matrix Factorization (Cont’d) R 1 2 .. v .. n 1 ? 2 , 2 2 : r u , v u : m m × n m , n : numbers of users and items u , v : index for u th user and v th item r u , v : u th user gives a rating r u , v to v th item Chih-Jen Lin (National Taiwan Univ.) 8 / 54

  9. Matrix factorization Introduction and issues for parallelization Matrix Factorization (Cont’d) P T R 1 2 .. v .. n Q p T 1 1 p T ? 2 , 2 2 2 : : ≈ × q 1 q 2 .. q v .. q n q 2 r u , v u p T u : : p T m m m × n m × k k × n k : number of latent dimensions r u , v = p T u q v ? 2 , 2 = p T 2 q 2 Chih-Jen Lin (National Taiwan Univ.) 9 / 54

  10. Matrix factorization Introduction and issues for parallelization Matrix Factorization (Cont’d) A non-convex optimization problem: � � � u q v ) 2 + λ P � p u � 2 F + λ Q � q v � 2 ( r u , v − p T min F P , Q ( u , v ) ∈ R λ P and λ Q are regularization parameters SG (Stochastic Gradient) is now a popular optimization method for MF It loops over ratings in the training set. Chih-Jen Lin (National Taiwan Univ.) 10 / 54

  11. Matrix factorization Introduction and issues for parallelization Matrix Factorization (Cont’d) SG update rule: p u ← p u + γ ( e u , v q v − λ P p u ) , q v ← q v + γ ( e u , v p u − λ Q q v ) where e u , v ≡ r u , v − p T u q v SG is inherently sequential Chih-Jen Lin (National Taiwan Univ.) 11 / 54

  12. Matrix factorization Introduction and issues for parallelization SG for Parallel MF After r 3 , 3 is selected, ratings in gray blocks cannot be updated 1 2 3 4 5 6 r 3 , 1 = p 3 T q 1 1 r 3 , 2 = p 3 T q 2 2 .. r 3 , 1 r 3 , 2 r 3 , 3 r 3 , 4 r 3 , 5 r 3 , 6 3 r 3 , 6 = p 3 T q 6 4 —————— 5 r 3 , 3 = p 3 T q 3 6 r 6 , 6 r 6 , 6 = p 6 T q 6 But r 6 , 6 can be used Chih-Jen Lin (National Taiwan Univ.) 12 / 54

  13. Matrix factorization Introduction and issues for parallelization SG for Parallel MF (Cont’d) We can split the matrix to blocks. Then use threads to update the blocks where ratings in different blocks don’t share p or q 1 2 3 4 5 6 1 2 3 4 5 6 Chih-Jen Lin (National Taiwan Univ.) 13 / 54

  14. Matrix factorization Introduction and issues for parallelization SG for Parallel MF (Cont’d) This concept of splitting data to independent blocks seems to work However, there are many issues to have a right implementation under the given architecture Chih-Jen Lin (National Taiwan Univ.) 14 / 54

  15. Matrix factorization Our approach in the package LIBMF Outline Matrix factorization 1 Introduction and issues for parallelization Our approach in the package LIBMF Factorization machines 2 Conclusions 3 Chih-Jen Lin (National Taiwan Univ.) 15 / 54

  16. Matrix factorization Our approach in the package LIBMF Our approach in the package LIBMF Parallelization (Zhuang et al., 2013; Chin et al., 2015a) Effective block splitting to avoid synchronization time Partial random method for the order of SG updates Adaptive learning rate for SG updates (Chin et al., 2015b) Details omitted due to time constraint Chih-Jen Lin (National Taiwan Univ.) 16 / 54

  17. Matrix factorization Our approach in the package LIBMF Block Splitting and Synchronization A naive way for T nodes is to split the matrix to T × T blocks This is used in DSGD (Gemulla et al., 2011) for distributed systems. The setting is reasonable because communication cost is the main concern In distributed systems, it is difficult to move data or model Chih-Jen Lin (National Taiwan Univ.) 17 / 54

  18. Matrix factorization Our approach in the package LIBMF Block Splitting and Synchronization (Cont’d) • Block 1: 20s • However, for shared memory • Block 2: 10s systems, synchronization is a • Block 3: 20s concern 1 2 3 We have 3 threads 1 hi Thread 0 → 10 10 → 20 1 Busy Busy 2 2 Busy Idle 3 Busy Busy 3 ok 10s wasted!! Chih-Jen Lin (National Taiwan Univ.) 18 / 54

  19. Matrix factorization Our approach in the package LIBMF Lock-Free Scheduling We split the matrix to enough blocks. For example, with two threads, we split the matrix to 4 × 4 blocks 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 is the updated counter recording the number of updated times for each block Chih-Jen Lin (National Taiwan Univ.) 19 / 54

  20. Matrix factorization Our approach in the package LIBMF Lock-Free Scheduling (Cont’d) Firstly, T 1 selects a block randomly For T 2 , it selects a block neither green nor gray 0 0 0 0 T 1 0 0 0 0 0 0 0 0 0 0 0 0 Chih-Jen Lin (National Taiwan Univ.) 20 / 54

  21. Matrix factorization Our approach in the package LIBMF Lock-Free Scheduling (Cont’d) For T 2 , it selects a block neither green nor gray randomly For T 2 , it selects a block neither green nor gray 0 0 0 0 T 1 0 0 0 0 0 0 0 0 0 0 0 0 T 2 Chih-Jen Lin (National Taiwan Univ.) 21 / 54

  22. Matrix factorization Our approach in the package LIBMF Lock-Free Scheduling (Cont’d) After T 1 finishes, the counter for the corresponding block is added by one 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T 2 Chih-Jen Lin (National Taiwan Univ.) 22 / 54

  23. Matrix factorization Our approach in the package LIBMF Lock-Free Scheduling (Cont’d) T 1 can select available blocks to update Rule: select one that is least updated 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T 2 Chih-Jen Lin (National Taiwan Univ.) 23 / 54

  24. Matrix factorization Our approach in the package LIBMF Lock-Free Scheduling (Cont’d) SG: applying Lock-Free Scheduling SG**: applying DSGD-like Scheduling 0.9 SG** 24 SG** SG SG 23.5 0.88 RMSE RMSE 23 0.86 22.5 0.84 22 0 2 4 6 8 10 0 100 200 300 400 500 600 Time(s) Time(s) MovieLens 10M Yahoo!Music MovieLens 10M: 18.71s → 9.72s (RMSE: 0.835) Yahoo!Music: 728.23s → 462.55s (RMSE: 21.985) Chih-Jen Lin (National Taiwan Univ.) 24 / 54

  25. Matrix factorization Our approach in the package LIBMF Memory Discontinuity Discontinuous memory access can dramatically increase the training time. For SG, two possible update orders are Update order Advantages Disadvantages Random Faster and stable Memory discontinuity Sequential Memory continuity Not stable Sequential Random R R Our lock-free scheduling gives randomness, but the resulting code may not be cache friendly Chih-Jen Lin (National Taiwan Univ.) 25 / 54

  26. Matrix factorization Our approach in the package LIBMF Partial Random Method Our solution is that for each block, access both ˆ R and ˆ P continuously ˆ ˆ R : ( one block ) P T 1 ˆ Q 2 4 3 = × 5 6 Partial: sequential in each block Random: random when selecting block Chih-Jen Lin (National Taiwan Univ.) 26 / 54

  27. Matrix factorization Our approach in the package LIBMF Partial Random Method (Cont’d) 45 1.3 Random Random Partial Random Partial Random 40 1.2 RMSE 1.1 RMSE 35 1 30 0.9 25 0.8 20 0 20 40 60 80 100 0 500 1000 1500 2000 2500 3000 Time(s) Time(s) MovieLens 10M Yahoo!Music The performance of Partial Random Method is better than that of Random Method Chih-Jen Lin (National Taiwan Univ.) 27 / 54

Recommend


More recommend