CoFiGAN: Collaborative Filtering by Generative and Discriminative Training for One-Class Recommendation Jixiong Liu a , b , c , Weike Pan a , b , c ∗ and Zhong Ming a , b , c ∗ liujixiong@email.szu.edu.cn, { panweike, mingz } @szu.edu.cn a National Engineering Laboratory for Big Data System Computing Technology Shenzhen University, Shenzhen, China b Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ) Shenzhen University, Shenzhen, China c College of Computer Science and Software Engineering Shenzhen University, Shenzhen, China Liu, Pan and Ming (SZU) CoFiGAN 1 / 40
Introduction Problem Definition One-Class Collaborative Filtering (OCCF) Input: Observations in the form of (user, item) pairs. Training: A min-max game between a generator and a discriminator, in which the generator improves the quality of generated items to fool the discriminator. Goal: For each target user u , we provide a personalized list of items from I\I u via the generator, where I and I u denote the set of all items and the set of user u ’s observed items, respectively. Liu, Pan and Ming (SZU) CoFiGAN 2 / 40
Introduction Challenges Non-differentiable: The discrete sampling step prevents the loss from being back-propagated to the generator. Mode collapse: The generator becomes lazy to explore different items so as to avoid penalty from the discriminator. Liu, Pan and Ming (SZU) CoFiGAN 3 / 40
Introduction Notations (1/3) n number of users m number of items d number of latent dimensions bs batch size of mini-batch gradient descent items preferred by u in the training data I u T 1 , T 2 , T 3 numbers of iterations y ui ∈ { 0 , 1 } indicator whether ( u , i ) pair is observed U = { 1 , 2 , . . . , n } whole set of users I = { 1 , 2 , . . . , m } whole set of items ( U , I ) bs one batch ( u , i ) pairs Liu, Pan and Ming (SZU) CoFiGAN 4 / 40
Introduction Notations (2/3) D discriminator U D u · ∈ R 1 × d latent feature vector of user u in D V D i · ∈ R 1 × d latent feature vector of item i in D b D bias of item i in D i ∈ R γ D learning rate in D I D + positive items generated by D u I D − negative items generated by D u T + b D r D ui = U D u · V D preference of user u to item i in D ˆ i · i α D u , α D v , β D tradeoff parameters in D v Liu, Pan and Ming (SZU) CoFiGAN 5 / 40
Introduction Notations (3/3) G generator U G u · ∈ R 1 × d latent feature vector of user u in G V G i · ∈ R 1 × d latent feature vector of item i in G b G bias of item i in G i ∈ R γ G learning rate in G I G − negative items generated by G u T + b G r G ui = U G u · V G preference of user u to item i in G ˆ i · i α G u , α G v , β G tradeoff parameters in G v Liu, Pan and Ming (SZU) CoFiGAN 6 / 40
Related Work BPR-MF In BPR-MF [Rendle et al., 2009], the users’ preference behaviors are modeled based on a pairwise preference assumption, i.e., a user u prefers an observed item i to an unobserved one j , Pre ( u , i | φ ) > Pre ( u , j | φ ) , (1) where Pre ( u , i | φ ) and Pre ( u , j | φ ) denote user u ’s preference to the observed item i and the unobserved item j , respectively. And φ are the model parameters that we would like to learn. Liu, Pan and Ming (SZU) CoFiGAN 7 / 40
Related Work Logistic-MF Logistic-MF [Johnson, 2014] assigns different labels to the observed items and the unobserved ones for each user, which is a pointwise algorithm, log ( 1 + exp ( − r ui ˆ r ui )) + λ || φ || 2 , � min (2) φ ( u , i ) ∈P∪A where r ui is the label of ( u , i ) . And r ui = 1, if ( u , i ) ∈ P ; r ui = 0, r ui is the predicted preference of user u to item i , and P , A otherwise. ˆ represent the set of users’ observed items and the set of sampled negative ones in the training data, respectively. Liu, Pan and Ming (SZU) CoFiGAN 8 / 40
Related Work IRGAN (1/3) n J G ∗ , D ∗ = min E i ∼ p true log D ( i | u ) + E i ∼ p θ ( i | u ) log ( 1 − D ( i | u )) , (3) � max θ φ u = 1 where θ and φ denote the parameters of the generator and the discriminator respectively, p true denotes the distribution of users’ true preferences, p θ ( i | u ) denotes the item sampling probability given user u , and D ( i | u ) = σ (ˆ r D ui ) denotes the estimated probability of item i belonging to the ground truth data of user u . Liu, Pan and Ming (SZU) CoFiGAN 9 / 40
Related Work IRGAN (2/3) Discriminative training: The discriminator aims to discriminate the ground truth preference items from the items generated by the generator. Generative training: The generator aims to generate/sample high-quality items that look like the ground truth items for higher reward from the discriminator, and the generator update itself by the reward or penalty signal from the discriminator, i.e., policy gradient approximation [Sutton et al., 1999]. Liu, Pan and Ming (SZU) CoFiGAN 10 / 40
Related Work IRGAN (3/3) - Policy Gradient ∇ θ J G = ∇ θ E i ∼ p θ ( i | u ) [ log ( 1 + exp (ˆ r D ui )))] m r D ∇ θ p θ ( i | u ) log ( 1 + exp (ˆ � = ui ))) i = 1 m r D p θ ( i | u ) ∇ θ log p θ ( i | u ) log ( 1 + exp (ˆ � = ui )) (4) i = 1 r D = E i ∼ p θ ( i | u ) [ ∇ θ log p θ ( i | u ) log ( 1 + exp (ˆ ui ))] K ≈ 1 r D ∇ θ log p θ ( i | u ) log ( 1 + exp (ˆ � ui )) , K i = 1 where p θ ( i | u ) denotes the sampling probability of item i via the generator, K is the number of generated samples, and r D log ( 1 + exp (ˆ ui )) is the reward/penalty from the discriminator. Liu, Pan and Ming (SZU) CoFiGAN 11 / 40
Our Method Our Solution: CoFiGAN Our CoFiGAN also plays a min-max game inheriting from GAN, n J G ∗ , D ∗ = min � � E i ∼ p true log D ( i | u ) + E i ∼ p θ ( i | u ) log ( 1 − D ( i | u )) � max , θ φ u = 1 (5) where φ and θ are parameters of the discriminator and generator, respectively, and D ( i | u ) denotes the estimated probability of item i belonging to the ground truth data of user u . Notice that we propose a more direct approximation of the generator’s loss function, which yields the main difference compared with IRGAN. Liu, Pan and Ming (SZU) CoFiGAN 12 / 40
Our Method Discriminative Training (1/2) n φ ∗ = arg max [ E i ∼ p true log D ( i | u ) + E i ∼ p θ ( i | u ) log ( 1 − D ( i | u ))] . � (6) φ u = 1 The discriminator aims to label the ground truth data and the generated samples as ‘1’ and ‘0’ separately, thus it is natural to take the binomial Logistic regression model as a discriminator. f D � min ui , (7) φ ( u , i ) ∈ ( U , I ) bs where ui ) + α D u · � 2 + α D i · � 2 + β D r D f D r D 2 � U D 2 � V D 2 � b D ui = ( 1 − y ui ) · ˆ ui + log ( 1 + exp − ˆ u v v i � 2 where label y ui = 1 if ( u , i ) is a ground truth pair and y ui = 0 otherwise. Liu, Pan and Ming (SZU) CoFiGAN 13 / 40
Our Method Discriminative Training (2/2) r D r D In order to avoid numerical overflow of exp ( − ˆ ui ) when ˆ ui < 0, we follow the same trick used in TensorFlow 1 , and use the same following objective functions in our own implementation using the Java programming language. Sine e is used as the base of the log function in our experiments, we can easily verify that the following function is equivalent to f D ui above. ui )+ λ D r D r D r D ( 1 − y ui )ˆ ui + log ( 1 + e − ˆ 2 � φ � 2 , if ˆ ui ≥ 0 f D ui = , ui )+ λ D r D r D r D ui · y ui + log ( 1 + e ˆ 2 � φ � 2 , if ˆ − ˆ ui < 0 2 � φ � 2 = α D u · � 2 + α D i · � 2 + β D and λ D 2 � U D 2 � V D 2 � b D u v v i � 2 . 1 Implementation of sigmoid cross entropy with logits in TensorFlow Liu, Pan and Ming (SZU) CoFiGAN 14 / 40
Our Method Generative Training (1/3) n G ∗ = arg min E i ∼ p θ ( i | u ) log ( 1 − D ( i | u )) � θ u = 1 (8) n r D � E i ∼ p θ ( i | u ) [ log ( 1 + exp (ˆ = arg max ui ))] , θ u = 1 where p θ ( i | u ) denotes the item sampling probability given user u r D calculated by the generator, and log ( 1 + exp (ˆ ui ) can be interpreted as the reward/penalty scored by the discriminator. And thus the update rule of the generator can be interpreted as encouraging sampling items with high reward in the next sampling step. Liu, Pan and Ming (SZU) CoFiGAN 15 / 40
Our Method Generative Training (2/3) To solve Eq.(8), we propose a more direct approximation by narrowing the distance between the samples generated by the generator and the discriminator, which contrasts with policy gradient approximation, n θ ∗ = arg min E S G ∼ p θ ( S | u ) [ Dist ( S G , S D )] , � (9) θ u = 1 where S G and S D are high-quality samples that can confuse the discriminator from the perspectives of the generator and discriminator, respectively. Liu, Pan and Ming (SZU) CoFiGAN 16 / 40
Our Method Generative Training (3/3) To narrow Dist ( S G , S D ) in Eq.(9), the generator is designed to generate items close to high-quality ones and away from low-quality ones from the perspective of the discriminator, i.e., I D + and I D − , u u f G � � � min uij , (10) θ u ∈U i ∈I D + j ∈I D − u u where θ = { U G u · , V G i · , b G i , u ∈ U , i ∈ I} , σ is the sigmoid function, and uj )) + α G u · � 2 + α G i · � 2 + α G f G r G r G 2 � U G 2 � V G 2 � V G j · � 2 + 1 u v v |I u | ( − log ( σ (ˆ ui − ˆ uij = β G i � 2 + β G 2 � b G 2 � b G v v j � 2 ) , which is the same as BPR-MF. Liu, Pan and Ming (SZU) CoFiGAN 17 / 40
Recommend
More recommend