Distributed Collaborative Filtering and Adaptive User-to-User Correlation Francesco Ricci Faculty of Computer Science Free University of Bozen-Bolzano, Italy Joint work with Shlomo Berkovsky (CIRSO), Tsvi Kuflik (University of Haifa), and Linas Baltrunas (University of Bozen)
Content � Introduction to recommender systems and collaborative filtering � Motivations: � Decentralized collaborative filtering � Improve accuracy by partitioning ratings and re- aggregating information � Domain-based rating partitioning � Importing user modelling information � Computing inter-domain correlations � Evaluation � Extension: adapting the similarity metric to the prediction problem 2
W hat m ovie should I see? The Internet Movie Database (IMDb) provides information about actors, films, television shows, television stars, video games and production crew personnel. Owned by Amazon.com since 1998 September 15, 2008 IMDb featured 1,039,447 titles and 2,723,306 people More than 57M users per month. 3
Recom m ender System s � In everyday life w e rely on recom m endations from other people either by word of mouth, recommendation letters, movie and book reviews printed in newspapers … � In a typical recommender system people provide recom m endations as inputs, w hich the system then aggregates and directs to appropriate recipients � Aggregation of recommendations � Match the recommendations with those searching for recommendations [Resnick and Varian, 1997] 4
Exam ples Am azon.com – looks in the user past buying history, � and recommends product bought by a user with similar buying behavior Tripadvisor.com - Quoting product reviews of a � community of users Myproductadvisor.com – make questions about � searched benefits (product features) to reduce the number of candidate products Yahoo.com – “Today’s Picks” highlight ten destinations � that are highly-relevant to individual users, based on recent online activity and preferences. iTunes Genius – recommend albums similar to those � found in your library Sm arter Kids – self selection of a user profile – � classification of products in user profiles. 5
6 ??? Social Filtering
7 http://movielens.umn.edu Movie Lens
8
9
10
11
12
Items 13 Matrix of ratings Users
Collaborative-Based Filtering A collection of n user u i and a collection of m products p j � A n × m matrix of ratings v ij , with v ij = ? if user i did not rate � product j Prediction for user i and product j is computed as � ∑ = + − * ( ) v v K u v v ij i ≠ ik kj k ? v kj Where, v i is the average rating of user i , K is a normalization � factor such that the sum of u ik is 1, and ∑ − − ( )( ) v v v v ij i kj k Similarity of = j u ∑ ∑ users i and k ik − − 2 2 ( ) ( ) v v v v ij i kj k j j Where the sum (and averages) is over j s.t. v ij and v kj are � not “?”. [Breese et al., 1998] 14
Exam ple p j u 5 v 5 = 4 4 v i = 3.2 u i ? u 8 v 8 = 3.5 3 u 9 5 v 9 = 3 Users’ similarities: u i5 = 0.5, u i8 = 0.5, u i9 = 0.8 ∑ = + − * ( ) v v K u v v ≠ ij i ik kj k ? v kj v* ij = 3.2 + 1/(0.5+0.5+0.8) * [0.5 (4 -4) + 0.5 (3 – 3.5) + 0.8 (5 -3) = 3.2 + 1/1.8 * [0 - 0.25 + 1.6] = 3.2 + 0.75 = 3.95 15
Distributed Scenario q=<user = i> recommend j Target Recommender Sys. reply from a remote system q=<user = i, item = j, target = t> � User identifiers � User models � User identifiers and their similarities � Rating prediction for j Remote Recommender Sys. Remote Recommender Sys. Remote Recommender Sys. 16
Related W orks B. N. Miller, J. A. Konstan, J. Riedl, “ PocketLens: � Toward a Personal Recommender System ”, 2004 R. Burke, Hybrid web recommender systems . In � The Adaptive Web , page 377-408. Springer Berlin / Heidelberg, 2007. K. Yu, X. Xu, M. Ester, H. P. Kriegel, Feature � Weighting and Instance Selection for Collaborative Filtering: An Information-Theoretic Approach , in Knowledge Information Systems, vol. 5(2), 2003. J. Freyne, B. Smyth, Communities, � Collaboration and Cooperation in Personalized Web Search , in proc. of the ITWP Workshop, Edinburgh, UK, 2005. 17
I nform ation processing in CF 1 . Sim ilarity com putation: assessing the similarity of all the users to the active user, i.e., the user for whom a recommendation is searched 2 . Neighborhood form ation: selecting the K most similar users to the active user 3 . Com puting the active user rating prediction: for a target item whose rating is unknown 1. weight the ratings - on the target item - of the K most similar users, found at (2) according to the user-to-user similarity computed at (1) 2. the predicted rating is the weighted average. 18
W hat inform ation can be exchanged � UMs ( rating vectors ) stored by the remote systems � Lists of the neighborhood candidates computed by the remote systems � Degrees of sim ilarity between the active user and the other users, computed over the data stored by the remote systems � Complete predictions generated by the remote systems. 19
Assum ptions � Users can be identified uniquely in all the domains � I tem s can be identified uniquely in all domains � Target dom ain sends a request to remote domains specifying q= < i, j, t> � i is the identifier of the active user � j is the target item identifier (possibly null) � t is the target domain. � Different distributed prediction methods are characterized by "w hat the rem ote dom ains reply". 20
Rating Matrix and Dom ains � Given the assumptions: there is a "centralized" (aggregated) model of the distributed scenario � V is the overall rating matrix � V a , V b , V c , are rating sub matrices for three domains R a , R b , R c V c V b V a v 11 v 12 v 13 … v 1m v 21 v 22 v 23 … v 2m v 31 v 32 v 33 … v 3m V = v n1 v n2 v n3 … v nm 21
Prediction Methods - Rem ote Replies Local Prediction: the remote systems { R d } d ∈ D do not return any � data Centralized Prediction: All the ratings managed by { R d } d ∈ D are � sent back - we assume that all the domains are related and D is the full set of domains Distributed Peer I dentification: The identifiers of users that all � the remote systems { R d } d ∈ D consider as “similar” to the target user i are sent back Distributed Neighborhood Form ation: The identifiers of the � users that all the remote systems { R d } d ∈ D consider as “similar” to the target user i, together w ith their sim ilarities to the target user i - similarities are computed by the remote system using only the ratings in V d Distributed Prediction: The rating predictions for item j � computed by the remote systems { R d } d ∈ D using the ratings contained in V d , d ∈ D are sent back. 22
Distributed Peer I dentification � The identifiers of K users that all the remote systems { R d } d ∈ D consider as “similar” to the target user i are sent back � D is the set of all rem ote dom ains � In our experiments a domain is identified by a tag (a genre) � The target domain m erge the received peers and make a prediction using the local data (ratings only in the target domain). � Remote systems provide knowledge as an inform ed selection of users . 23
Distributed Peer I dentification � V a is the target domain � V b and V c are remote domains (containing some ratings of the target user) V c V b V a target domain v 11 v 12 v 13 … v 1m V b peer v 21 v 22 v 23 … v 2m v 31 v 32 v 33 … v 3m V b peer V c peer V = v n1 v n2 v n3 … v nm target user 24
Distributed Neighborhood Form ation The identifiers of some users that the remote systems � { R d } d ∈ D consider as “similar” to the target user i, together w ith their sim ilarities to the target user i - similarities are computed by the remote system using only the ratings in V d The similarity of a neighbor user l with the target user i is a � weighted average ∑ ( , ) ( , ) cor d t sim i l d ∈ = d D ( , ) sim i l ∑ ( , ) cor d t ∈ d D where D is the set of all the domains (including t), t is the � target domain and � cor(d,t) is the I nter-Dom ain Correlation measure between domains. 25
I nter-Dom ain Correlations � Content-based m ethod � Mining the textual descriptions of the items in the domains from external data sources to obtain tf-idf description (vector v ) of each domain � Computing cosine correlation of the domain representations ⋅ v v = = 1 2 ( , ) ( , ) cor content d d sim v v 1 2 1 2 ∗ || || || || v v 1 2 � Rating-based m ethod � Average correlation of the items in the domains ( J d ) = ≠ ∈ ∈ ( , ) { ( , ) : , , } cor d d AVG sim j k j k j J k J 1 2 ratings d d 1 2 26
Distributed Prediction � The rating predictions for item j computed by the remote systems { R d } d ∈ D using the ratings contained in V d , d ∈ D are sent back to the target domain R t � Upon receiving the set of predictions, R t aggregates all the predictions ( including the local one ) into a single value by averaging the predictions � We do not use here the Inter-Domain Correlation. 27
Recommend
More recommend