pat preference aware transfer learning for recommendation
play

PAT: Preference-Aware Transfer Learning for Recommendation with - PowerPoint PPT Presentation

PAT: Preference-Aware Transfer Learning for Recommendation with Heterogeneous Feedback Feng Liang a, b, c , Wei Dai a, b, c , Yunfeng Huang a, b, c , Weike Pan a, b, c , Zhong Ming a, b, c { liangfeng2018, daiwei20171, huangyunfeng2017 }


  1. PAT: Preference-Aware Transfer Learning for Recommendation with Heterogeneous Feedback Feng Liang a, b, c , Wei Dai a, b, c , Yunfeng Huang a, b, c , Weike Pan a, b, c ∗ , Zhong Ming a, b, c ∗ { liangfeng2018, daiwei20171, huangyunfeng2017 } @email.szu.edu.cn, { panweike, mingz } @szu.edu.cn a National Engineering Laboratory for Big Data System Computing Technology, Shenzhen University, Shenzhen, China b Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ), Shenzhen University, Shenzhen, China c College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China Liang et al., (SZU) Preference-Aware Transfer (PAT) IJCNN 2020 1 / 30

  2. Introduction Problem Definition Rating Prediction with Users’ Heterogeneous Explicit Feedback Input: A set of grade score records R = { ( u , i , r ui ) } with r ui ∈ G as a grade score such as { 0 . 5 , 1 , . . . , 5 } , and a set of binary rating records ˜ R = { ( u , i , ˜ r ui ) } with ˜ r ui ∈ B = { like , dislike } . Goal: Estimate the grade score of each (user, item) pair in the test data T E . Liang et al., (SZU) Preference-Aware Transfer (PAT) IJCNN 2020 2 / 30

  3. Introduction Motivation TMF exploits the implicit preference context of users from the 1 auxiliary binary data in grade score prediction, but the implicit preference context of users in the midst of the target data is not exploited. SVD++ and MF-MPC only exploit the preference context in the 2 target data in order to model users’ personalized preferences, and do not consider the binary ratings in the auxiliary data as used in TMF . Liang et al., (SZU) Preference-Aware Transfer (PAT) IJCNN 2020 3 / 30

  4. Introduction Our Contributions In order to share knowledge between two different types of data 1 more sufficiently, we address the problem from a transfer learning perspective, i.e., taking the grade scores as the target data and the binary ratings as the auxiliary data. Besides the observed explicit feedback of grade scores and binary 2 ratings, we propose to exploit the implicit preference context beneath the feedback, which is incorporated into the prediction process of users’ grade scores to items. We conduct extensive empirical studies on two large and public 3 datasets and find that our PAT performs significantly better than the state-of-the-art methods. Liang et al., (SZU) Preference-Aware Transfer (PAT) IJCNN 2020 4 / 30

  5. Introduction Notations (1/3) Table: Some notations and their explanations. Notation Explanation n number of users m number of items u , u ′ ∈ { 1 , 2 , . . . , n } user ID i , j ∈ { 1 , 2 , . . . , m } item ID G = { 0 . 5 , 1 , . . . , 5 } grade score range B = { like , dislike } binary rating range r ui ∈ G grade score of user u to item i ˜ r ui ∈ B binary rating of user u to item i R = { ( u , i , r ui ) } grade score records (training data) ˜ R = { ( u , i , ˜ r ui ) } binary rating records (training data) p = |R| number of grade scores (training data) p = | ˜ ˜ R| number of binary ratings (training data) I g u , g ∈ G items rated by user u with score g (training data) P u items liked (w/ positive feedback) by user u (training data) N u items disliked (w/ negative feedback) by user u (training data) T E = { ( u , i , r ui ) } grade score records in test data Liang et al., (SZU) Preference-Aware Transfer (PAT) IJCNN 2020 5 / 30

  6. Introduction Notations (2/3) Table: Some notations and their explanations (cont.). Notation Explanation µ ∈ R global average rating value b u ∈ R user bias b i ∈ R item bias d ∈ R number of latent dimensions U u · , W u · ∈ R 1 × d user-specific latent feature vector U , W ∈ R n × d user-specific latent feature matrix V i · , C p i ′ · , C g j · , C o i ′ · ∈ R 1 × d j · , C n item-specific latent feature vector V , C p , C n , C o , C g ∈ R m × d item-specific latent feature matrix ˆ r ui predicted grade score of user u to item i ˆ ˜ r ui predicted binary rating of user u to item i Liang et al., (SZU) Preference-Aware Transfer (PAT) IJCNN 2020 6 / 30

  7. Introduction Notations (3/3) Table: Some notations and their explanations (cont.). Notation Explanation γ learning rate ρ interaction weight between grade scores and binary ratings α tradeoff parameter on the corresponding regularization terms δ O , δ G , δ p , δ n ∈ { 0 , 1 } indicator variable for positive and negative feedback w p , w n weight on positive and negative feedback T iteration number in the algorithm Liang et al., (SZU) Preference-Aware Transfer (PAT) IJCNN 2020 7 / 30

  8. Related Work Related Work (1/2) Probabilistic matrix factorization (PMF) [Salakhutdinov and Mnih, 2008] is a dominant recommendation model that takes the explicit grade score matrix as input and outputs the learned low-rank feature vectors of users and items. Transfer by collective factorization (TCF) [Pan and Yang, 2013] models users’ personalized preferences from both grade scores and binary ratings collectively by sharing users’ features and items’ features. Notice that when the auxiliary binary ratings are not considered, TCF is reduced to PMF. Interaction-Rich transfer by collective factorization (iTCF) [Pan and Ming, 2014] is proposed based on CMF [Singh and Gordon, 2008] which exploits the rich interactions among the user-specific latent features of the target data and the auxiliary data when calculating the gradients of items in the model training stage. Notice that when the rich interactions mentioned above is not exploited, iTCF reduces to CMF . Transfer by mixed factorization (TMF) [Pan et al., 2016] combines the feature vectors learned from two different types of data in a collective and integrative manner. Notice that when the like/dislike feedback of users to items are not considered in grade score prediction, TMF becomes iTCF . Liang et al., (SZU) Preference-Aware Transfer (PAT) IJCNN 2020 8 / 30

  9. Related Work Related Work (2/2) In SVD++ [Koren, 2008], a user’s estimated score to an item is related to other items that the user rated in the past, which are called preference context of the user. Furthermore, there is no difference among these rated items because whatever scores they are assigned, they are in the same set, or in other words, their effects are classified into the same class, which is a typical example of one-class preference context (OPC). When predicting the unobserved score, the introduction of OPC can provide a global preference context for the users. In MF-MPC [Pan and Ming, 2017], on the other side, rated items except the target one of a given user, i.e., preference context, are classified into several clusters in terms of the grade scores, which are named multi-class preference context (MPC). Intuitively, MPC is an advanced version of OPC which not only offers the global preference information of users, but also distinguishes the information with different values. Liang et al., (SZU) Preference-Aware Transfer (PAT) IJCNN 2020 9 / 30

  10. Method Collective Matrix Factorization (CMF) In order to jointly model two different types of explicit feedback, i.e., r ui and ˜ r ui , a state-of-the-art method is proposed to approximate the grade score and binary rating simultaneously by sharing some latent variable [Singh and Gordon, 2008], � ˆ r ui = U u · V T i · + b u + b i + µ (1) ˆ r ui = W u · V T ˜ i · where the item-specific latent feature vector V i · is shared between two factorization subtasks. However, for the goal of grade score prediction, some implicit preference contexts are missing in the above joint modeling approach shown in CMF [Singh and Gordon, 2008]. Liang et al., (SZU) Preference-Aware Transfer (PAT) IJCNN 2020 10 / 30

  11. Method Implicit Preference Context C O Mathematically, we may represent the one-class preference context as ¯ u · , the C G C p graded preference context as ¯ u · , the positive preference context as ¯ u · , and the negative preference context as ¯ C n u · as follows [Koren, 2008, Pan and Ming, 2017, Pan et al., 2016], 1 C O � C o ¯ (2) u · = δ O i ′ · � |I u \{ i }| i ′ ∈I u \{ i } 1 C G � � ¯ C g u · = δ G (3) i ′ · � |I g u \{ i }| g ∈ G i ′ ∈I g u \{ i } 1 C p ¯ � C p u · = δ p w p (4) j · � |P u | j ∈P u 1 C n ¯ � C n u · = δ n w n (5) j · � |N u | j ∈N u where δ O , δ G , δ p , δ n ∈ { 1 , 0 } are the indicator variables, and w p and w n are the weights on positive feedback and negative feedback, respectively. Liang et al., (SZU) Preference-Aware Transfer (PAT) IJCNN 2020 11 / 30

  12. Method Transfer with Implicit Preference Context With the preference context, we propose to incorporate them into the collective factorization framework, � i · + b u + b i + µ + (¯ u · + ¯ u · + ¯ C p u · + ¯ ˆ r ui = U u · V T C O C G C n u · ) V T i · (6) ˆ r ui = W u · V T ˜ i · which will bring two user-specific latent feature vectors of user u and user u ′ to be close if they have similar implicit preference context in a similar way to that of SVD++ [Koren, 2008]. Notice that we incorporate the preference context into the prediction rule of grade scores instead of that of binary ratings because that matches our final goal of grade score prediction rather than binary rating prediction. Liang et al., (SZU) Preference-Aware Transfer (PAT) IJCNN 2020 12 / 30

Recommend


More recommend