neighborhood enhanced transfer learning for one class
play

Neighborhood-Enhanced Transfer Learning for One-Class Collaborative - PowerPoint PPT Presentation

Neighborhood-Enhanced Transfer Learning for One-Class Collaborative Filtering Wanling Cai 1 , 2 , Jiongbin Zheng 1 , Weike Pan 1 , Jing Lin 1 , Lin Li 1 , Li Chen 2 , Xiaogang Peng 1 and Zhong Ming 1 cswlcai@comp.hkbu.edu.hk,


  1. Neighborhood-Enhanced Transfer Learning for One-Class Collaborative Filtering Wanling Cai 1 , 2 , Jiongbin Zheng 1 , Weike Pan 1 ∗ , Jing Lin 1 , Lin Li 1 , Li Chen 2 , Xiaogang Peng 1 ∗ and Zhong Ming 1 ∗ cswlcai@comp.hkbu.edu.hk, jiongbin92@gmail.com, panweike@szu.edu.cn, linjing4@email.szu.edu.cn, lilin20171@email.szu.edu.cn, lichen@comp.hkbu.edu.hk, pengxg@szu.edu.cn, mingz@szu.edu.cn 1 College of Computer Science and Software Engineering Shenzhen University, Shenzhen, China 2 Department of Computer Science Hong Kong Baptist University, Hong Kong, China Cai et al. (SZU & HKBU) TNF Neurocomputing 2019 1 / 27

  2. Introduction Problem Definition One-Class Collaborative Filtering Input: A set of (user, item) pairs P = { ( u , i ) } , where each ( u , i ) pair means that user u has a positive feedback to item i . Goal: recommend each user u ∈ U a personalized ranked list of items from the set of unobserved items, i.e., I\P u . Cai et al. (SZU & HKBU) TNF Neurocomputing 2019 2 / 27

  3. Introduction Challenges The sparisity of observed feedback. 1 The ambiguity of unobserved feedback. 2 Cai et al. (SZU & HKBU) TNF Neurocomputing 2019 3 / 27

  4. Introduction Overall of Our Solution Figure: Illustration of our transfer learning solution Transfer by Neighborhood-Enhanced Factorization (TNF) We first extract the local knowledge of neighborhood information among users. We then transfer it to a global preference learning task in an enhanced factorization-based framework. Cai et al. (SZU & HKBU) TNF Neurocomputing 2019 4 / 27

  5. Introduction Advantages of Our Solution Our TNF is able to inherit the merits of the localized neighborhood-based methods and the globalized factorization-based methods. Notice that neighborhood-based methods and factorization-based methods are rarely studied in one single framework or solution for OCCF. The factored representation of users and items allows TNF to capture and model transitive relations within a group of close neighbors on datasets of low density. Cai et al. (SZU & HKBU) TNF Neurocomputing 2019 5 / 27

  6. Introduction Notations n number of users m number of items u ∈ U user ID i , i ′ ∈ I item ID R = { ( u , i ) } universe of all possible (user, item) pairs P = { ( u , i ) } the whole set of observed (user, item) pairs A , |A| = ρ |P| a sampled set of negative feedback from R\P item set observed by user u I u d number of latent dimensions b u ∈ R user bias b i ∈ R item bias V i · ∈ R 1 × d item-specific latent feature vector X u ′ · ∈ R 1 × d user-specific latent feature vector a set of nearest neighbors of user u N u r ui predicted preference of user u to item i ˆ α v , α x , β u , β v trade-off parameters on the regularization terms learning rate γ T iteration number in the algorithm Cai et al. (SZU & HKBU) TNF Neurocomputing 2019 6 / 27

  7. Method Neighborhood Construction In order to extract the local knowledge from the records of users’ behaviors, we first calculate the cosine similarity between user u and user w , s uw = |I u ∩I w | √ |I u | √ |I w | , where |I u | , |I w | , |I u ∩ I w | denote the number of items observed by user u , user w , and both user u and user w , respectively. We can then obtain a set of the most similar users of each user u to construct a neighborhood N u . Cai et al. (SZU & HKBU) TNF Neurocomputing 2019 7 / 27

  8. Method Assumption We assume that the knowledge of neighborhood extracted from the local association can be incorporated into a global factorization framework so as to better capture the latent representation. This process is just as human learning, in which people with intense concentration would digest knowledge locally but effectively while others with a big picture in mind are experts in building correlations between different domains or tasks. The learners who are able to exploit a key combination of the local and global cues may make a greater achievement. Cai et al. (SZU & HKBU) TNF Neurocomputing 2019 8 / 27

  9. Method Transfer by Neighborhood-Enhanced Factorization Specifically, a recent work [Guo et al., 2017] inspires us to aggregate the like-minded users’ preferences. Finally, we have the estimated preference of user u to item i as follows, 1 X u ′ · V T r ui = b u + b i + � ˆ i · . (1) � |N u | u ′ ∈N u In this way, the local knowledge of neighborhood can be transferred into the factorization-based method. For this reason, we call it transfer by neighborhood-enhanced factorization (TNF). Notice that a closely related work FISM [Kabbur et al., 2013] focuses on learning the factored item similarity by incorporating the knowledge of items that have been observed by user u (i.e., I u ). Cai et al. (SZU & HKBU) TNF Neurocomputing 2019 9 / 27

  10. Method Pointwise Preference Learning In our TNF, we adopt pointwise preference learning as our preference learning paradigm. The objective function is as follows, f ui + R (Θ) , � min (2) Θ ( u , i ) ∈P∪A where f ui = log ( 1 + exp ( − r ui ˆ r ui )) is a loss function defined on a ( u , i ) pair, and Θ = { X u ′ · , V i · , b u , b i ; i = 1 , . . . , m , u , u ′ = 1 , . . . , n } denotes the set of model parameters to be learned. Notice that we use r ui = 1 and r ui = − 1 to denote positive and negative preference for an observed ( u , i ) ∈ P pair and an unobserved ( u , i ) ∈ A pair, respectively. In addition, we introduce the regularization term u ′ ∈N u || X u ′ · || 2 2 || V i · || 2 2 b 2 2 b 2 F + β u u + β v R (Θ) = α x F + α v � i so that it can 2 contribute to avoid overfitting, where α x , α v , β u and β v are trade-off hyper parameters. Cai et al. (SZU & HKBU) TNF Neurocomputing 2019 10 / 27

  11. Method Gradients In order to solve the optimization problem in Eq.(2), we adopt the commonly used stochastic gradient descent (SGD) algorithm. Specifically, for each ( u , i ) ∈ P ∪ A , we have the gradients, ∂ f ui 1 V i · + α x X u ′ · , u ′ ∈ N u , ∇ X u ′ · = − e ui = (3) ∂ X u ′ · � |N u | ∂ f ui 1 ∇ V i · = − e ui � X u ′ · + α v V i · , = (4) ∂ V i · � |N u | u ′ ∈N u ∂ f ui ∇ b u = − e ui + β u b u , = (5) ∂ b u ∂ f ui ∇ b i = − e ui + β v b i , = (6) ∂ b i r ui where e ui = r ui ) , and ¯ U u · = u ′ ∈N u X u ′ · is a certain virtual 1 √ � 1 + exp ( r ui ˆ |N u | user-specific latent feature vector of user u aggregated from the set of user neighborhood N u . Cai et al. (SZU & HKBU) TNF Neurocomputing 2019 11 / 27

  12. Method Update Rules For each ( u , i ) ∈ P ∪ A , we have the update rules, X u ′ · − γ ∇ X u ′ · , u ′ ∈ N u , X u ′ · = (7) V i · V i · − γ ∇ V i · , = (8) b u b u − γ ∇ b u , = (9) b i b i − γ ∇ b i , = (10) where γ > 0 is the learning rate. Cai et al. (SZU & HKBU) TNF Neurocomputing 2019 12 / 27

  13. Method Algorithm 1: Input: Observations P 2: Output: Recommended items for each user 3: Initialize model parameters Θ 4: Construct a neighborhood N u for each user u 5: for t 1 = 1 , . . . , T do Randomly pick up a set A with |A| = ρ |P| 6: for t 2 = 1 , 2 , . . . , |P ∪ A| do 7: Randomly draw a ( u , i ) pair from P ∪ A 8: U u · = u ′ ∈N u X u ′ · Calculate ¯ √ 1 � 9: |N u | U u · V T r ui = b u + b i + ¯ Calculate ˆ 10: i · r ui Calculate e ui = 11: 1 + exp ( r ui ˆ r ui ) Update b u , b i , V i · and X u ′ · for u ′ ∈ N u 12: end for 13: 14: end for Notes: randomly drawing a ( u , i ) pair from P ∪ A is more efficient than the user-wise sampling strategy in [Pan and Chen, 2013]. Cai et al. (SZU & HKBU) TNF Neurocomputing 2019 13 / 27

  14. Experiments Datasets Table: Statistics of the datasets used in the experiments, including the number of users ( n ), the number of items ( m ), the number of (user, item) pairs in the training data ( |P| ), the number of (user, item) pairs in the test data ( |P te | ), and the density of each data i.e., ( |P| + |P te | ) / n / m . n m |P te | ( |P| + |P te | ) / n / m |P| Dataset ML100K 943 1,682 27,688 27,687 3.49% ML1M 6,040 3,952 287,641 287,640 2.41% UserTag 3,000 2,000 123,218 123,218 4.11% Netflix5K5K 5,000 5,000 77,936 77,936 0.62% XING5K5K 5,000 5,000 39,197 39,197 0.31% Notice that the datasets and code are publicly available 1 1 http://csse.szu.edu.cn/staff/panwk/publications/TNF/ Cai et al. (SZU & HKBU) TNF Neurocomputing 2019 14 / 27

  15. Experiments Baselines UCF: user-oriented collaborative filtering [Aggarwal et al., 1999] MF: matrix factorization with square loss [Koren et al., 2009] BPR: Bayesian personalized ranking [Rendle et al., 2009] FISM: factored item similarity model [Kabbur et al., 2013] NeuMF: neural matrix factorization [He et al., 2017] Cai et al. (SZU & HKBU) TNF Neurocomputing 2019 15 / 27

  16. Experiments Parameter Configurations (1/2) For UCF and TNF, we use cosine similarity and set the size of neighborhood as 20. For BPR, FISM and our TNF, we adopt the commonly used stochastic gradient descent (SGD) method with the same sampling strategy for fair comparison, and we fix the dimension as d = 20 and the learning rate as γ = 0 . 01. For FISM and TNF, we set ρ = 3, i.e., |A| = 3 |P| For the deep model NeuMF, we implement the method using TensorFlow 2 and keep the structure with the best performance as reported in [He et al., 2017]. 2 https://www.tensorflow.org/ Cai et al. (SZU & HKBU) TNF Neurocomputing 2019 16 / 27

Recommend


More recommend