sequence aware factored mixed similarity model for next
play

Sequence-Aware Factored Mixed Similarity Model for Next-Item - PowerPoint PPT Presentation

Sequence-Aware Factored Mixed Similarity Model for Next-Item Recommendation Liulan Zhong, Jing Lin, Weike Pan and Zhong Ming zhongliulan2017@email.szu.edu.cn, linjing2018@email.szu.edu.cn, panweike@szu.edu.cn, mingz@szu.edu.cn National


  1. Sequence-Aware Factored Mixed Similarity Model for Next-Item Recommendation Liulan Zhong, Jing Lin, Weike Pan ∗ and Zhong Ming ∗ zhongliulan2017@email.szu.edu.cn, linjing2018@email.szu.edu.cn, panweike@szu.edu.cn, mingz@szu.edu.cn National Engineering Laboratory for Big Data System Computing Technology, Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ), College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China Zhong, Lin, Pan and Ming (SZU) S-FMSM IEEE BigComp 2020 1 / 30

  2. Introduction Problem Definition Next-Item Recommendation Input: ( u , S u ) , i.e., a sequence of items for each user u . Goal: Rank the unobserved items at user u ’s next step by estimating the score ˆ r uj , j ∈ I\I u to form the recommendation list. Zhong, Lin, Pan and Ming (SZU) S-FMSM IEEE BigComp 2020 2 / 30

  3. Introduction Notations (1/2) Table: Some notations and explanations. n number of users m number of items u user ID, u ∈ { 1 , 2 , . . . , n } i item ID, i ∈ { 1 , 2 , . . . , m } U the whole set of users I the whole set of items P the whole set of observed ( u , i ) pairs A a sampled set of unobserved ( u , i ) pairs I u a set of items that have been interacted by user u d ∈ R number of latent dimensions V i · , W i · ∈ R 1 × d item-specific latent feature vector w.r.t. item i b i ∈ R item bias learning rate γ tradeoff parameters of regularization terms α w , α v , β η , β v T iteration number Zhong, Lin, Pan and Ming (SZU) S-FMSM IEEE BigComp 2020 3 / 30

  4. Introduction Notations (2/2) Table: Some notations and explanations. u , . . . , i |S u | a sequence of items, S u = { i 1 u , i 2 S u } u i t the t th item in S u u predicted preference of user u to item i t ˆ r ui t u u s ij predefined similarity between item i and item j tradeoff parameter in mixed similarity λ L the order of Markov chains the ℓ th order of Markov chains, ℓ ∈ { 1 , 2 , . . . , L } ℓ i t - ℓ the ( t - ℓ ) th item in S u u η ∈ R 1 × L global weighting vector η u ∈ R 1 × L personalized weighting vector w.r.t. user u Zhong, Lin, Pan and Ming (SZU) S-FMSM IEEE BigComp 2020 4 / 30

  5. Background Motivation Previously proposed methods usually model the general representation and the sequential representation in two divided factorization components. Our proposed model can integrate the items’ general similarity and the items’ learnable sequential representations in a unified component. On the basis of Fossil [He et al., 2016], it considers the short-term sequential information via high-order Markov chains. The rationale behind the specific term η ℓ + η u ℓ is that each of the previous L locations should contribute with different weights to the high-order smoothness, lacking the weight contribution from the latest specific items. Zhong, Lin, Pan and Ming (SZU) S-FMSM IEEE BigComp 2020 5 / 30

  6. Background Fossil: Prediction Rule On the basis of FISM [Kabbur et al., 2013], Fossil combines a similarity-based method and high-order Markov chains, the predicted function is as follows, U − i t u + ¯ u · V T ˆ r ui t u = b i t u (1) u · , i t where L 1 U − i t ¯ � � ( η ℓ + η u u = W i ′ · + ℓ ) W i i − ℓ · , (2) u · � |I u \{ i t u }| u i ′ ∈I u \{ i t ℓ = 1 u } and η u ℓ controls the weight of user u ’s preference and sequential dynamics, while η ℓ is a global parameter shared by all the users. Zhong, Lin, Pan and Ming (SZU) S-FMSM IEEE BigComp 2020 6 / 30

  7. Background Overall of Our Solution Sequence-Aware Factored Mixed Similarity Model (S-FMSM) Our S-FMSM considers the weights of the specific history item i t - ℓ and u its relative position in contributing to the target item i t u for sequence modeling. Zhong, Lin, Pan and Ming (SZU) S-FMSM IEEE BigComp 2020 7 / 30

  8. Method S-FMSM: Prediction Rule The predicted preference of user u to item i t u : U − i t u + ¯ u · V T ˆ r ui t u = b i t u u · , (3) i t where L 1 U − i t ¯ � � ( η ℓ + η u u = W i ′ · + ℓ )(( 1 − λ ) + λ s i t u i t - ℓ u ) W i t - ℓ · . (4) u · � |I u \{ i t u }| u i ′ ∈I u \{ i t ℓ = 1 u } Notes: u and item i t - ℓ is the cosine similarity between item i t s i t u i t - ℓ u . In fact, u what it captures is the weight of the history item i t - ℓ in contributing u to the target item i t u . The tradeoff parameter λ tuned among { 0,0.2,0.4,0.6,0.8,1 } adjusts the influence of s i t in preference prediction. Notice that u i t - ℓ u when λ = 0, it reduces to Fossil [He et al., 2016]. Zhong, Lin, Pan and Ming (SZU) S-FMSM IEEE BigComp 2020 8 / 30

  9. Method S-FMSM: Objective Function The objective function is as follows, � � � min f ui t u j , (5) Θ u ∈U i t j / ∈I u u ∈S u , t � = 1 where Θ = { V i · , W i · , b i , η ℓ , η u ℓ , i = 1 , 2 , . . . , m ; u = 1 , 2 , . . . , n ; ℓ = 1 , 2 , . . . , L } 2 � 2 + � � � � r uj ) + α v + α v � �� �� � u j = − ln σ (ˆ u − ˆ and f ui t r ui t � V i t � V j · � � � � u · 2 2 � � � � 2 is a tentative i ′ ∈I u || W i ′ · || 2 + β v 2 || η ℓ || 2 + β η j + β η α w u + β v 2 b 2 2 b 2 � �� � η u �� � � i t ℓ 2 2 objective function for a randomly sampled triple ( u , i t u , j ) via “first positive ( u , i t u ) then negative j ”. Notes: Because the pairwise preference relaxes the assumption of the pointwise preference, we adopt a personalized pairwise ranking to keep the loss at a minimum. Zhong, Lin, Pan and Ming (SZU) S-FMSM IEEE BigComp 2020 9 / 30

  10. Method Gradients (1/2) ∂ ( f uit uj ) The gradient of each parameter θ ∈ Θ , i.e., ∇ θ = , is computed ∂θ as follows: ∇ b it = β v b it + ( − 1 ) σ (ˆ r uj − ˆ r uit ) , (6) u u u ∇ b j = β v b j + σ (ˆ r uj − ˆ r uit ) , (7) u L 1 � � ( η ℓ + η u u · + ( − 1 ) σ (ˆ r uj − ˆ ∇ V it u · = α v V it r uit )[ W i ′ · + ℓ )(( 1 − λ ) + λ s it ) W it - ℓ · ] , (8) uit - ℓ � u |I u \{ i t u u u }| i ′ ∈I u \{ it ℓ = 1 u } L 1 ( η ℓ + η u � � ∇ V j · = α v V j · + σ (ˆ r uj − ˆ )[ W i ′ · + ℓ )(( 1 − λ ) + λ s jit - ℓ ) W it - ℓ · ] , r uit (9) � u |I u | u u i ′ ∈I u ℓ = 1 Zhong, Lin, Pan and Ming (SZU) S-FMSM IEEE BigComp 2020 10 / 30

  11. Method Gradients (2/2) β η η ℓ + ( − 1 ) σ (ˆ r uj − ˆ ∇ η ℓ = r uit ) W it - ℓ · u u [ V T ) − V T u · (( 1 − λ ) + λ s it j · (( 1 − λ ) + λ s jit - ℓ )] , ℓ = 1 , . . . , L , (10) uit - ℓ it u u ∇ η u β η η u = ℓ + ( − 1 ) σ (ˆ r uj − ˆ r uit ) W it - ℓ ℓ · u u [ V T ) − V T u · (( 1 − λ ) + λ s it j · (( 1 − λ ) + λ s jit - ℓ )] , ℓ = 1 , . . . , L , (11) uit - ℓ it u u 1 ∇ W i ′ · = α w W i ′ · + ( − 1 ) σ (ˆ r uj − ˆ r uit )[ � u |I u \{ i t u }| 1 V j · ] , i ′ ∈ I u \{ i t u , i t − 1 , . . . , i t − L V it u · − } , (12) u u � |I u | 1 u · + ( − 1 ) σ (ˆ r uj − ˆ ∇ W it u · = α w W it r uit )[ − V j · ] , (13) � |I u | u ∇ W it - ℓ = α w W it - ℓ · + ( − 1 ) σ (ˆ r uj − ˆ r uit )[( V it u · (( 1 − λ ) + λ s it ) uit - ℓ · u u u u ))( η ℓ + η u − V j · (( 1 − λ ) + λ s jit - ℓ ℓ )] , ℓ = 1 . . . , L . (14) u Zhong, Lin, Pan and Ming (SZU) S-FMSM IEEE BigComp 2020 11 / 30

  12. Method Update Rules We have the update rule for each parameter, θ = θ − γ ∇ θ, (15) where γ > 0 is the learning rate, θ ∈ Θ , Θ = { V i · , W i · , b i , η ℓ , η u ℓ , i = 1 , 2 , . . . , m ; u = 1 , 2 , . . . , n ; ℓ = 1 , 2 , . . . , L } . Zhong, Lin, Pan and Ming (SZU) S-FMSM IEEE BigComp 2020 12 / 30

  13. Method Algorithm Algorithm 1 The algorithm of S-FMSM. 1: Initialize the model parameters. 2: for t = 1 , · · · , T do for each ( u , i t u ) ∈ P in a random order do 3: Randomly pick up an item j from I\I u 4: Calculate gradients according to Eqs.(6-14) 5: Update the model parameters via Eq.(15) 6: end for 7: 8: end for Zhong, Lin, Pan and Ming (SZU) S-FMSM IEEE BigComp 2020 13 / 30

  14. Experiments Datasets (1/2) We adopt two commonly used datasets in the experiments, including the MovieLens data, i.e., MovieLens 100K (ML100K), MovieLens 1M (ML1M), and the Amazon e-commerce data, i.e., Office Products (Office), Automotive (Auto), Video Games (Video), and Cell Phones & Accessories (Cell). We treat all the observed behaviors as positive feedback and preprocess each dataset as follows: We remove the records of the users who rate fewer than five times; We remove the records of the items that are rated fewer than five times; We sort all the records according to the timestamps and split each user’s sequence into three parts, i.e., the item(s) at the last step for test, the item(s) at the penultimate step for validation, and the remaining items for training. Zhong, Lin, Pan and Ming (SZU) S-FMSM IEEE BigComp 2020 14 / 30

Recommend


More recommend