bert4rec sequential recommendation with
play

BERT4Rec : Sequential Recommendation with Bidirectional Encoder - PowerPoint PPT Presentation

BERT4Rec : Sequential Recommendation with Bidirectional Encoder Representations from Transformer Advisor: Jia-Ling Koh Presenter: You-Xiang Chen Source: CIKM19 Data: 2020/04/20 INTRODUCTION Introduction target item Sequential


  1. BERT4Rec : Sequential Recommendation with Bidirectional Encoder Representations from Transformer Advisor: Jia-Ling Koh Presenter: You-Xiang Chen Source: CIKM’19 Data: 2020/04/20

  2. INTRODUCTION

  3. Introduction target item Sequential Recommendation Recommender system historical subsequence

  4. Motivation & Goal ▪ Unidirectional models often assume a rigidly ordered sequence over data which is not always true for user behaviors in real-world applications. Proposing bidirectional self-attention network - BERT4Rec

  5. Motivation & Goal ▪ Conventional bidirectional models encode each historical subsequence to predict the target item . ▪ This approach is very time and resources consuming since we need to create a new sample for each position historical subsequence target item in the sequence and predict them separately. Introducing the Cloze task to produce more samples to train a more powerful model.

  6. METHOD

  7. Problem Statement Sets of user & item Output Interaction sequence

  8. Framework

  9. Embedding Layer Input representation 𝟏 item embedding matrix 𝒊 𝟐 d-dim. position embedding matrix

  10. Transformer Layer Multi-Head Self-Attention 𝒎 , 𝒊 𝟑 𝒎 , … , 𝒊 𝒖 𝑰 𝒎 = [𝒊 𝟐 𝒎 ] projects 𝑰 𝒎 into 𝒐 subspaces 𝑜 Scaled Dot-Product Attention 𝑜

  11. Transformer

  12. Multi-Head Attention

  13. Transformer Layer Position-wise Feed-Forward Network Gaussian Error Linear Unit (GELU) activation function separately and identically at each position

  14. Gaussian Error Linear Units https://arxiv.org/pdf/1606.08415.pdf

  15. Transformer Layer Stacking Transformer Layer LN(·) : layer normalization function https://arxiv.org/pdf/1607.06450.pdf

  16. Output Layer 𝑿 𝑸 : 𝑴𝒇𝒃𝒐𝒃𝒄𝒎𝒇 𝒒𝒔𝒑𝒌𝒇𝒅𝒖𝒋𝒑𝒐 𝒏𝒃𝒖𝒔𝒋𝒚 𝑭: 𝑭𝒏𝒄𝒇𝒆𝒆𝒋𝒐𝒉 𝑵𝒃𝒖𝒔𝒋𝒚 𝒑𝒈 𝒋𝒖𝒇𝒏𝒕

  17. Model Learning 𝒏𝒃𝒕𝒍𝒇𝒆 𝒘𝒇𝒔𝒕𝒋𝒑𝒐 𝒈𝒑𝒔 𝒗𝒕𝒇𝒔 𝒄𝒇𝒊𝒃𝒘𝒋𝒑𝒔 𝒖𝒊𝒇 𝒏𝒃𝒕𝒍𝒇𝒆 𝒋𝒖𝒇𝒏𝒕

  18. EXPERIMENT

  19. Datasets

  20. Baselines ▪ POP ▪ BPR-MF 𝒐𝒑𝒐 − 𝒕𝒇𝒓𝒗𝒇𝒐𝒖𝒋𝒃𝒎 ▪ NCF ▪ FPMC markov chain ▪ GRU4Rec + 𝒕𝒇𝒓𝒗𝒇𝒐𝒖𝒋𝒃𝒎 ▪ Caser ▪ SASRec

  21. Evaluation metrics Hit Ratio Mean Reciprocal Rank 𝐼𝑆@𝐿 = 𝑂𝑣𝑛𝑐𝑓𝑠 𝑝𝑔 𝐼𝑗𝑢𝑡 @ 𝐿 𝐻𝑈 𝑅 𝑁𝑆𝑆 = 1 1 𝑅 ෍ 𝑠𝑏𝑜𝑙 𝑗 Normalized Discounted cumulative gain 𝑗=1 𝑙 2 𝑠𝑓𝑚 𝑗 − 1 𝐸𝐷𝐻 𝑙 = ෍ log 2 (𝑗 + 1) 𝑗=1 𝑂𝐸𝐷𝐻@𝐿 = 𝐸𝐷𝐻@𝐿 𝐽𝐸𝐷𝐻

  22. Performance 𝑼𝒔𝒃𝒐𝒕𝒈𝒑𝒔𝒏𝒇𝒔 𝒕𝒇𝒓𝒗𝒇𝒐𝒖𝒋𝒃𝒎 𝒐𝒑𝒐 − 𝒕𝒇𝒓𝒗𝒇𝒐𝒖𝒋𝒃𝒎 𝒙𝒑𝒔𝒕𝒖

  23. Analysis on Bidirection and Cloze

  24. CONCLUSION ▪ We introduce a deep bidirectional sequential model called BERT4Rec for sequential recommendation. ▪ For model training, we introduce the Cloze task which predicts the masked items using both left and right context. ▪ Extensive experimental results on four real-world datasets show that our model outperforms state-of-the-art baselines.

Recommend


More recommend