recommender system 2017 challenge polimi team palo
play

Recommender System 2017 Challenge Polimi Team palo Angelo - PowerPoint PPT Presentation

Recommender System 2017 Challenge Polimi Team palo Angelo Falci Riccardo Cantoni Hybrid Algorithm The core of our algorithm is a hybrid algorithm that join 10 different algorithm that we will see in the next slides. In order to


  1. Recommender System 2017 Challenge Polimi ╿ ╿ Team palo Angelo Falci Riccardo Cantoni

  2. Hybrid Algorithm The core of our algorithm is a hybrid algorithm that join 10 different algorithm that we will see in the next slides. In order to join this 10 models we: 1) Divided each R matrix by their highest number to have all R matrix with values For example: included between 0 and 1. R1*4 + R2*3 + R3*1 + R4*5 + 2) Sum the 10 R matrix and give for each + R5*7 +R6*2 + R7*8 + R8*6+ of them different weights in order to give + R9 *2 + R10*5 more importance to a certain model and less to an other. After this we created an algorithm that For the models that generate R search the optimal values (or it is better matrix with a too high density we to say the stable values) of weight based select only the k highest elements on the MAP results that give us our in order to not generate a memory personal test. error when we sum all R matrix

  3. Content Based using Artists and Albums In the first 2 models we used 2 content based in order to do the predictions, in one we used the similarity between tracks and tracks based on Artists and in the other based on Albums. Best constant found= 14 In our best content based for artists and album we did two main things: 1) We divided every row of C matrix by the total sum of each row plus a constant Best constant found= 10 in order to penalize more the artists and albums that appear few times. 2) Before multiple S matrix with URM we divided every columns of URM matrix by the total sum of each column plus a constant in order to penalize more the tracks that appear few times in the Best constant found= 17 playlists.

  4. Content Based using Tags with few frequency In the third content based we predicted the tracks using the similarity between tracks and tracks based on Tags that appear few times. In our best content based on tags with few frequency we did three main things: 1) We select only the tag that appear more or Best value found= 28 equal 0 times ad less or equal than 100 times. 2) We divided every row of C matrix by the total sum of each row plus a constant in order to penalize more the tags that appear few times. 3) Before multiple S matrix with URM we divided every columns of URM matrix by the total sum of each column plus a constant in Best value found= 75 order to penalize more the tracks that appear few times in the playlists.

  5. Content Based using Tags with high frequency In the fourth content based we predicted the tracks using the similarity between tracks and tracks based on Tags that appear with an high frequency. In our best content based for tags with an high frequency we did three main things: 1) We select only the tag that appear more or equal 101 times ad less or equal than 2000 times. Best value found= 27 2) We divided every row of C matrix by the total sum of each row plus a constant in order to penalize more the tags that appear few times. 3) We divided every columns of URM matrix by the total sum of each column plus a constant in order to penalize more the tracks that appear few times in the playlists. Then we using the shrinkage, calculate the Best value found= 36 similarity using the cosine and consider only the k highest value in S matrix Best model: shrinkage = 9, k = 50.

  6. User Content Based We have applied the same content based algorithm seen before to find similarities among playlist. Than we have recommended track according to these similarities. We have taken into account the following attributes: ● owner ● title In this way we recomend tracks that belong to the same users or that are inside playlists with the same titles.

  7. Item and User Collaborative Filter These models are completely based on the users interactions ( URM matrix) and their focus is on finding the similarity between items and between users. Features implemented: ● IDF normalization for the URM matrix ● shrinkage factor ● possibility to keep only the k highest similarities in S matrix ● different similarity measures : cosine, jaccard, conditioned probabilities and a simple dot-product ● SVD decomposition Best model found: item based : IDF normalization, k -> 20, shrinkage -> 5, similarity -> cosine, no SVD decomposition user based : IDF normalization, k = 50, shrinkage=5, similarity = "cosine", no SVD decomposition

  8. Matrix Factorization We have tried to use different models based on matrix factorization: ● BPRMF ● AsySVD ● IALS But just one of this models gave us a positive result: IALS. The configuration of that model is : number of iteration equal to 5, 750 factor, learning rate=0.01, regularization=0.015.

  9. Graph Based model This model takes into account: users intersections, items' attributes and users' attributes . The representation of all the data is done by a k-partite graph . Starting from each user node we want to compute the probability to reach each item node. Then we just use this probability as a rating. The algorithm used for the computation of the ratings is the random walk with restart. Configuration of the random walk: ● number of iteration: 7 ● probability to restart: 0.5

  10. Algorithm for find the Best Constants In order to find the best constants in each single model we realize an algorithm that search the best constants trying to modify one constant for times.

  11. Algorithm to find the best weight for the hybrid algorithm 1) Start with a vector that represents the weight for each model (for example [1,1..1,1]) 2) Try to change one single weight and calculate the MAP 3) Change the weight that give the best MAP and repeat the algorithm with the new vector weight 4) If we don’t find any better solution try to change the weight in a different way (for example sum 2 unless 1)

  12. Dataset The original dataset has been splitted in Training and Test set: ● Test set: from the playlists with at least 10 tracks we have randomly sampled 5 tracks. The size of our test set is 20% of the whole dataset. ● The remaining iterations have been used as training set

  13. Result The following charts show the performance of all the models: Best weights found for hybrid: [19.0, 21.0, 19.0, 2.0, -7.0, 22.0, 29.0, 31.0, 10, 227.0] (order: [cbf_Al, cbf_Ar, cbf_T, cbf_TL, ucb_title, ucb_owner, basic_itemCF, basic_userCF, ials, gb])

  14. Thank's for your attention

Recommend


More recommend