xing zhao qingquan song james caverlee and xia hu
play

Xing Zhao, Qingquan Song, James Caverlee and Xia Hu Department of - PowerPoint PPT Presentation

Xing Zhao, Qingquan Song, James Caverlee and Xia Hu Department of Computer Science and Engineering Texas A&M University, USA 1 Da Dataset Statistics cs # 10 6 2.5 1 Cumsum Taking Up of Positive Samples Items Quantity Proportion


  1. Xing Zhao, Qingquan Song, James Caverlee and Xia Hu Department of Computer Science and Engineering Texas A&M University, USA 1

  2. Da Dataset Statistics cs # 10 6 2.5 1 Cumsum Taking Up of Positive Samples Items Quantity Proportion Number of Remaining Tracks 2 0.8 Playlists 1,000,000 Unique Tracks 2,262,292 100% 1.5 0.6 Unique tracks (freq ≥ 5) 599,341 96.05% 1 0.4 Unique tracks (freq ≥ 100) 70,229 80.67% Unique albums 734,684 0.5 0.2 Unique artists 295,860 0 0 1 5 10 100 1000 10000 40000 Track Appeared Times in Training Data Therefore, in some part of our methods, we only consider these tracks for training. 4

  3. Ou Our Me Metho thod - Tr TrailMix Cold Start: For CC- DNCF Task 1 Title Playlist C-Tree Continuation: For Task 2 to 10 5

  4. CC-Tit CC Title le: Co Context t Cl Cluster ering g us using ng Tit Title le Tracks (2,262,292) Word list 1: Word Tracks Track list 1 i list … 3 7 Word list 3: Words (9,817) Word list 2: 5 21 Word list Tracks Track list 3 Track list 2 3 43 … 6 81 j Word list Tracks Cluster … Pre- 8 32 process 7 Word list Tracks Recommend 13 14 6 5 … Word list Tracks Track i is existed in 6 New title: e.g. Pop Punk 2018 Summer Normalize playlists whose title contain word j 6

  5. CC CC-Tit Title le: Co Cont. t. Steps: 1. Preprocessing: stemming, stop words, emoji, punctuation, etc. Items Quantity 2. Building word-track matrix of size unique titles 92,944 9817 x 2,262,292 unique normalized titles 17,381 3. Normalizing cells using ‘IDF’ unique non-stop 9,817 normalized words 4. Clustering words based on row playlist without title after similarity 22,921 processing 5. Recommend tracks in each cluster for new title 7

  6. CC CC-Tit Title le: Co Cont. t. Highlight: 1. CC-Title could deal with large scale of matrix computation with high efficiency. 2. In some cases (clusters), the performance is very good. 8

  7. DNCF: DNCF: Dec Decorated ed Neu Neural Co Collaborati tive e Filter Fi ering Neural Collaborative Filtering Pros: 1. Simple and Generic 2. Ensemble the advantages of basic matrix factorization model and MLP . Cons: Computationally not efficient to be directly applied on the target problem due to the huge item scope and the matrix sparsity. He et al. , “Neural Collaborative Filtering”. WWW, 2017. 9

  8. DNCF: DNCF: Co Cont. t. Two modifications to address efficiency issue: Training Phase: Constrained Negative Sampling. Testing Phase: Constrained Recommendation with Reordering. 10

  9. DNCF: Co DNCF: Cont. t. Training Phase: Constrained Negative Sampling. # 10 6 2.5 1 1. Constrain the negative sampling Cumsum Taking Up of Positive Samples space to the space of the tracks Number of Remaining Tracks 2 0.8 appearing equal to or more than 100 times in the training data. 1.5 0.6 1 0.4 2. Positive samples remain the whole dataset during training to protect the 0.5 0.2 feasible embedding and prediction of all the testing data. (Task 2-10) 0 0 1 5 10 100 1000 10000 40000 Track Appeared Times in Training Data 11

  10. DNCF: Co DNCF: Cont. t. Testing Phase: Constrained Recommendation with Reordering. 1. Constrain the recommendation space by only recommending the popular tracks (>=100 times) during testing phase towards a more targeted prediction. φ 1 φ 2 φ 3 Word2Vec (1) Word2Vec (2) DNCF 2. Reorder the predicted 500 tracks with an ensemble trick leveraging two types of predictions provided by the Word2Vec embedding. φ 1 \ L 1 ∪ L 2 ∪ L 3 L 2 L 3 L 1 12

  11. DNCF: Re DNCF: Result Highlight : 1. Results steadily increase with maximum performance at seed 25; 2. It performs better for playlists with random seeding tracks (R) than sequential seeding tracks; 13

  12. C-Tree: ee: Construct cted Tree A Playlist is: 1. Natural tree-structure : A playlist consists of different tracks ,and these tracks always belong to a specific album of an artist; 2. Meaningful Cluster: A list of tracks in a specific playlist always have latent similarity, such as genres, style, listening sense, etc. Phylogenetic Tree. (Source: https://www.creative-biostructure.com/custom- phylogenetic-tree-construction-service-399.htm) 14

  13. C-Tree: ee: Co Cont. t. A Real Example (PID: 11548): Pop punk band • Playlist Title: Pop Puck • 48 tracks belongs to 12 albums by 5 artists (2 rock bands and 3 pop punk bands) How do we compare the internal relationship? Rock band How do we compare it with another tree ( external )? 15

  14. C-Tree: ee: Co Cont. t. External comparison Incomplete Tree: A playlist only contains partial of tracks (seed), which is waiting for recommending. Training Data: Complete Tree Testing Data: Incomplete Tree 16

  15. C-Tree: ee: Co Cont. t. Steps: 1. Building Forest: 1 million complete trees; 2. Comparing and normalizing the distance between the incomplete tree T-test and Playlist 1 complete tree T-train; Playlist 2 3. Recommending the tracks Playlist 3 (leaves) from each T-train to the Playlist 4 … incomplete tree T-test, based on the score of each leaf. Playlist n 17

  16. C-Tree: ee: Re Result Highlight : 1. Results steadily increase with maximum performance at seed 25; 2. It performs better for playlists with random seeding tracks (R) than sequential seeding tracks; 18

  17. TrailMix : En Tr Ensemble Mo Model el Num_handou t Method 1 CC-Title A DNCF B DNCF A C-Tree B C-Tree Method 2 Final Recommendation 19

  18. Exp Experiment an and Re Result Experiment Setting: • Training 80%, testing 20%: cross-validation for hyper parameter tuning • Testing data strictly follows the rules designed by RecSys 2018 20

  19. Thank you! 21

  20. Q&A 22

Recommend


More recommend