2nd International Workshop on Information Heterogeneity and Fusion in Recommender Systems (HetRec 2011) Chicago, IL (USA) – Oct 2011, 27 th Hybrid algorithms for recommending new items http://dx.doi.org/10.1145/2039320.2039325 http://dx.doi.org/10.1145/2039320.2039325 R OBERTO T URRIN – Moviri, R&D Paolo Cremonesi – Politecnico di Milano Fabio Airoldi – Moviri, R&D MOVIRI, R&D
..in a nutshell Credits: http://dpaki.com/?p=2591 • Hybrid algorithms • Real domain requirements • scalability • modularity • many unrated items • many unrated items • New-item stressing experiments • Datasets • Private TV dataset • MovieLens
Traditional recommender systems Collaborative (CF) Content-based (CBF) � Pros � Pros High quality Work on new items � � � Cons Cons � Cons Cons New items problem Low quality � � (since they do not have ratings) (since user ratings are ignored) Popularity bias Profile overfitting � � R. TURRIN, P . Cremonesi, F . Airoldi - Hybrid algorithms for recommending new items
..so CF or CBF? ..many variables quality CF CBF CBF time ? new system mature system R. TURRIN, P . Cremonesi, F . Airoldi - Hybrid algorithms for recommending new items
TV domain: new items • The EPG is characterized by many unrated, new TV programs • The percentage of new-item • The percentage of new-item cannot be neglected R. TURRIN, P . Cremonesi, F . Airoldi - Hybrid algorithms for recommending new items
Existing hybrid algorithms � Several hybrid algorithms mix CF and CBF (but also demographics, social) e.g.: P . Melville, R. J. Mooney, and R. Nagarajan. “ Content-boosted collaborative � filtering for improved recommendations ”, 2002 B. Mobasher, X. Jin, and Y. Zhou. “ Semantically Enhanced Collaborative � Filtering on the Web ”, 2003 � Pros � Some approaches show better quality than CF/CBF � Cons Low scalability / no real-time recommendations � Only partial focus on new-item problem � Not working with implicit, binary ratings � R. TURRIN, P . Cremonesi, F . Airoldi - Hybrid algorithms for recommending new items
Our hybrid algorithms � GOALS � New-item � Quality comparable to collaborative � REQUIREMENTS: � Batch/real-time scalability /complexity � Updated recommendations � Modularity : ability to re-use existing CF and CBF algorithms. Modularity : ability to re-use existing CF and CBF algorithms. � Implicit/explicit ratings R. TURRIN, P . Cremonesi, F . Airoldi - Hybrid algorithms for recommending new items
Main contributions � GOALS � New-item � Quality comparable to collaborative � REQUIREMENTS: � Batch/real-time scalability /complexity � Updated recommendations � Modularity : ability to re-use existing CF and CBF algorithms. Modularity : ability to re-use existing CF and CBF algorithms. � Implicit/explicit ratings � Two hybrid algorithms: � extension of SimComb algorithm � introduction of a new hybrid algorithm � New-item stressing evaluation R. TURRIN, P . Cremonesi, F . Airoldi - Hybrid algorithms for recommending new items
STATE-OF-THE-ART RECOMMENDER ALGORITHMS MOVIRI, R&D
Collaborative algorithms Rating given by user u to item i User Rating In implicit dataset is either 1 or 0 Matrix (URM) u i Implemented strategies : � Item-item neighborhood-based ( NNCos ) Item-item neighborhood-based ( NNCos ) Recommendations are based on item-item similarities computed as the � cosine metric � Latent factor models ( PureSVD ) Recommendations are based on hidden factors implicitly discovered by � means of a matrix factorization (SVD) R. TURRIN, P . Cremonesi, F . Airoldi - Hybrid algorithms for recommending new items
Content-based algorithm Weight of feature f in item i . � Computed as TF-IDF Item-content matrix � Example of features: genre, actors, (ICM) f directors,… i LSA (Latent Semantic Analysis) The ICM is factorized by means of SVD in order to discover latent semantic R. TURRIN, P . Cremonesi, F . Airoldi - Hybrid algorithms for recommending new items
Hybrid algorithms Interleaved (INTL) Trivial hybrid implementation where the final recommendation list is � formed by alternating items recommended by the CF algorithm with items recommended by the CBF algorithm Item A Item A Item Z Item B Item Y Item Z Item C Item X Item B CF list CBF list Item Y Item Y Hybrid list SimComb [Mobasher et al. 2004] Two item-item similarity matrices are computed and linearly combined � CF CBF HYBRID (1- α ) + α = item-item item-item item-item similarities similarities similarities R. TURRIN, P . Cremonesi, F . Airoldi - Hybrid algorithms for recommending new items
• FFA (Filtered Feature Augmentation) • SIMinjKnn (Similarity Injection Knn) PROPOSED HYBRID ALGORITHMS MOVIRI, R&D
Collaborative filtering as main brick We trust CF recommendations when the model has been trained with “enough” information (i.e., ratings) CF We add CBF-based data (i.e., rating) for better training the CF when no enough information is available CBF R. TURRIN, P . Cremonesi, F . Airoldi - Hybrid algorithms for recommending new items
Collaborative filtering as main brick We trust CF recommendations when the model has been trained with “enough” information (i.e., ratings) CF We add CBF-based data (i.e., features) for better training the CF when no enough information is available CBF R. TURRIN, P . Cremonesi, F . Airoldi - Hybrid algorithms for recommending new items
Item-item model K NN Item-item similarity matrix i j A number of recommendation (CF and CBF) algorithms allow to compute item-item similarity. R. TURRIN, P . Cremonesi, F . Airoldi - Hybrid algorithms for recommending new items
Item-item model: real-time recommendations + ? ? ? - ? + User ratings KNN Item-item similarity matrix i j R. TURRIN, P . Cremonesi, F . Airoldi - Hybrid algorithms for recommending new items
Item-item model: real-time recommendations + ? ? ? - ? + * User ratings Real-time requirements: • Memory : K * #items • Memory : #items • Time : f(#ratings, K ) * #items • Use of existing algorithms • Updated recommendations • Implicit/explicit ratings MODEL R. TURRIN, P . Cremonesi, F . Airoldi - Hybrid algorithms for recommending new items
Filtered Feature Augmentation (FFA) Idea : add pseudo-ratings to the item profiles Motivation Pseudo-ratings model new items � Less sparse item-profiles � CBF C ONTENT Filter CF Model R ATINGS R. TURRIN, P . Cremonesi, F . Airoldi - Hybrid algorithms for recommending new items
Filtered Feature Augmentation (FFA) Idea : add pseudo-ratings to the item profiles Motivation Pseudo-ratings model new items � Less sparse item-profiles � Entropy-based filtering (e.g., Gini impurity measure) predicted ratings CBF C ONTENT Filter CF R ATINGS Model R. TURRIN, P . Cremonesi, F . Airoldi - Hybrid algorithms for recommending new items
Similarity Injection Knn (SIMinjKnn) Idea : mixing CF and CBF similarities Motivation Discovering relationships between new and � old items CBF CBF CBF C ONTENT C ONTENT Model Combiner Model CF CF R ATINGS Model R. TURRIN, P . Cremonesi, F . Airoldi - Hybrid algorithms for recommending new items
EVALUATION MOVIRI, R&D
Datasets � 1M Movielens � ~6K users, ~3.9K items, 1M ratings ML � An implicit, binary dataset collected from 15’000 IPTV users over a period of six months � ~15K users, ~800 rated items/~4K, ~26K ratings TV � Multilanguage (mainly German, French) content data available at http://home.dei.polimi.it/cremones/memo/downloads/TV2.zip R. TURRIN, P . Cremonesi, F . Airoldi - Hybrid algorithms for recommending new items
Testing methodology (1) Training set (extracted from H 1 ) • H 1 : set of existing items • H 2 : set of new items Test set • (100- β )% existing items: extracted from H 1 • β % new items: extracted from H 2 Discarded ratings R. TURRIN, P . Cremonesi, F . Airoldi - Hybrid algorithms for recommending new items
Testing methodology (1) Training set (extracted from H 1 ) • H 1 : set of existing items • H 2 : set of new items Test set • (100- β )% existing items: extracted from H 1 • β % new items: extracted from H 2 Discarded ratings R. TURRIN, P . Cremonesi, F . Airoldi - Hybrid algorithms for recommending new items
Testing methodology (2) � For each <user, item> < u,i> in H 1+2 : � Generate rating prediction for i � Generate rating prediction for every other items � Sort the items according to predicted rating � There is a “hit” if rank( i ) < N There is a “hit” if rank( i ) < N � i.e., item i appears in the top-N. In our tests, N=20 R. TURRIN, P . Cremonesi, F . Airoldi - Hybrid algorithms for recommending new items
Non-hybrid algorithms ML TV ML TV R. TURRIN, P . Cremonesi, F . Airoldi - Hybrid algorithms for recommending new items
Hybrid algorithms: ML ML R. TURRIN, P . Cremonesi, F . Airoldi - Hybrid algorithms for recommending new items
Hybrid algorithms: ML ML R. TURRIN, P . Cremonesi, F . Airoldi - Hybrid algorithms for recommending new items
Hybrid algorithms: TV TV R. TURRIN, P . Cremonesi, F . Airoldi - Hybrid algorithms for recommending new items
Recommend
More recommend