Example: User-based CF sim(u,v) 2 4 5 NA 5 4 1 0.87 1 5 2 1 5 4 -1 3.51* 3.81* 2.42* 2.48* 4 2 4 5 1 NA 35 Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
Example: Item-based CF Target item: item for 2 4 5 which the CF prediction 5 4 1 task is 5 2 performed. 1 5 4 4 2 4 5 1 36 Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
Item-based CF The basic steps: Identify set of users who rated the target item i Identify which other items (neighbours) were rated by the users set Compute similarity between each neighbour & target item (similarity function) In case, select k most similar neighbours Predict ratings for the target item (prediction function) Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
Item Based Similarity Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
Item Based Similarity Target item I yu,j → rating of user u for item j, average rating for j. Similarity sim(i,j) between items i and j (Pearson- correlation) Predicted rating 39 Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
Example: Item-based CF 2 4 5 5 4 1 5 2 1 5 4 4 2 4 5 1 40 sim(i,j) -1 Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
Example: Item-based CF 2 4 5 5 4 1 5 2 1 5 4 4 2 4 5 1 41 sim(i,j) -1 -1 Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
Example: Item-based CF 2 4 5 5 4 1 5 2 1 5 4 4 2 4 5 1 42 sim(i,j) 0.86 -1 -1 Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
Example: Item-based CF 2 4 5 5 4 1 5 2 1 5 4 4 2 4 5 1 43 sim(i,j) -1 -1 0.86 1 Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
Example: Item-based CF 2 4 5 sim(6,5) cannot be calculated 5 4 1 5 2 1 5 4 4 2 4 5 1 44 sim(i,j) -1 -1 0.86 1 NA Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
Example: Item-based CF 2 4 5 2.94* 5 4 1 5 2 2.48* 1 5 4 4 2 4 5 1 1.12* 45 sim(i,j) -1 -1 0.86 1 NA Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
Item Similarity Computation Pearson r correlation-based Similarity r correlation-based Similarity Pearson does not account for user rating biases Cosine-based Similarity Similarity Cosine-based does not account for user rating biases Adjusted Cosine Similarity Cosine Similarity Adjusted takes care of user rating biases as each pair in the co-rated set corresponds to a different user. Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
Performance Implications Bottleneck : Similarity computation. Time complexity, highly time consuming with millions of users & items in the database. Two-step process: “off-line component” / “model”: similarity computation, precomputed & stored. “on-line component”: prediction process. Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
Two-step process Online Offline Online Offline Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
Performance Implications User-based similarity is more dynamic. Precomputing user neighbourhood can lead to poor predictions. Item-based similarity is static. We can precompute item neighbourhood. Online computation of the predicted ratings. Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
Memory based CF + Requires minimal knowledge engineering efforts + Users and products are symbols without any internal structure or characteristics + Produces good-enough results in most cases - Requires a large number of explicit and reliable “ratings” - Requires standardized products: users should have bought exactly the same product - Assumes that prior behaviour determines current behaviour without taking into account “contextual” knowledge 50 Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
Personalised vs Non-Personalised CF CF recommendations are personalized: the prediction is based on the ratings expressed by similar users; neighbours are different for each target user A non-personalized collaborative-based recommendation can be generated by averaging the recommendations of ALL users How would the two approaches compare? 51 Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
Personalised vs Non-Personalised CF MAE MAE total Data Set users items density Non ratings Pers Pers Jester 48483 100 3519449 0,725 0,220 0,152 MovieLens 6040 3952 1000209 0,041 0,233 0,179 EachMovie 74424 1649 2811718 0,022 0,223 0,151 Not much difference indeed! Mean Average Error Non Personalized: v ij is the rating of user i for product j MAE NP = ∑ i, j ∣ v ij − v j ∣ and v j is the average rating for num.ratings product j 52 Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
The Sparsity Problem Typically large product sets & few user ratings e.g. Amazon: in a catalogue of 1 million books, the probability that two users who bought 100 books each, have a book in common is 0.01 in a catalogue of 10 million books, the probability that two users who bought 50 books each, have a book in common is 0.0002 CF must have a number of users ~ 10% of the product catalogue size 53 Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
The Sparsity Problem Methods for dimensionality reduction Matrix Factorization SVD Clustering Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
Model-Based Collaborative Filtering Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
Model Based CF Algorithms Models are learned from the underlying data rather than heuristics. Models of user ratings (or purchases): Clustering (classification) Association rules Matrix Factorization Restricted Boltzmann Machines Other models: Bayesian network (probabilistic) Probabilistic Latent Semantic Analysis ... Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
Clustering Cluster customers into categories based Cluster on preferences & past purchases Compute recommendations at the Compute cluster level: all customers within a cluster receive the same recommendations Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
Clustering B, C & D form 1 CLUSTER vs. A & E form another cluster. « Typical » preferences for CLUSTER are: Book 2, very high Book 3, high Books 5 & 6, may be recommended (Books 1 & 4, not recommended) Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
Clustering Customer F is classified as a new member of CLUSTER will receive recommendations based on the CLUSTER's preferences : Book 2 will be highly recommended to Customer F Book 6 will also be recommended to some extent Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
Clustering + It can also be applied for selecting the k most relevant neighbours in a CF algorithm + Faster: recommendations are per cluster - less personalized: recommendations are per cluster vs. in CF they are per user Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
Association rules Past purchases used to find relationships of common purchases Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
Association rules + Fast to implement + Fast to execute + Not much storage space required + Not « individual » specific + Very successful in broad applications for large populations, such as shelf layout in retail stores - Not suitable if preferences change rapidly - Rules can be used only when enough data validates them. False associations can arise Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
Matrix Factorization Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
Loss Functions for MF Squared error loss: Mean Average Error: Binary Hinge loss: Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
Learning: Stochastic Gradient Descent Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
Restricted Boltzmann Machines A (generative stochastic) Neural Network Learns a probability distribution over its inputs Used in dimensionality reduction, CF, topic modeling, feature learning Essential components of Deep Learning methods (DBN's, DBM's) Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
Restricted Boltzmann Machines Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
Restricted Boltzmann Machines Each unit is in a state which can be active or not active. Each input of a unit is associated to a weight The transfer function Σ calculates for each unit a score based on the weighted sum of the inputs This score is passed to the activation function φ which calculated the probability that the unit state is active. Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
Restricted Boltzmann Machines Each unit in the visible layer vi corresponds to one item The number of the hidden units hj is a parameter. Each vi is connected to each hj through a weight wij In the training phase, for each user: if the user purchased the item the corresponding vi is activated. The activation states of all vi are the input of each hj Based on this input the activation state of each hj is calculated The activation state of all hj become now the input of each vi The activation state of each vi is recalculated For each vi the difference between the present activation state and the previous is used to update the weights wij and thresholds θ j Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
Restricted Boltzmann Machines Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
Restricted Boltzmann Machines In the prediction phase, using a trained RBM, when recommending to a user: For the items of the user the corresponding v i is activated. The activation states of all v are the input of each h j Based on this input the activation state of each hj is calculated The activation state of all hj become now the input of each v i The activation state of each v i is recalculated The activation probabilities are used to recommend items Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
Limitations of CF Requires User-Item data Requires User-Item data: It needs to have enough users in the system. New items need to get enough ratings. New users need to provide enough ratings (cold start) Sparsity: it is hard to find users who rated the same items. Popularity Bias: Cannot recommend items to users with unique tastes. Tends to recommend popular items. 72 Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
Cold-start New User Problem New User Problem: the system must first learn the user’s preferences from the ratings. Hybrid RS, which combines content-based and collaborative techniques, can help. New Item Problem: Until the new item is rated by New Item Problem a substantial number of users, the RS is not able to recommend it. Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
Index 1. Introduction: What is a Recommender System? 2. Approaches 1. Collaborative Filtering 2. Content-based Recommendations 3. Context-aware Recommendations 4. Other Approaches 5. Hybrid Recommender Systems 3. Research Directions 4. Conclusions 5. References Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
Content-Based Recommendations Recommendations are based on the information on the content content of items of items rather than on other users’ opinions. Use a machine learning algorithm to model the users' preferences from examples based on a description of the content. 75 Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
What is content of an item? Explicit attributes or characteristics e.g. for a movie: Genre: Action / adventure Feature: Bruce Willis Year: 1995 Textual content e.g. for a book: title, description, table of content Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
In Content-Based Recommendations... The recommended items for a user are based on the profile built up by analysing the content of the items the user has liked in the past
Content-Based Recommendation Suitable for text-based products (web pages, books) Items are “described” by their features (e.g. keywords) Users are described by the keywords in the items they bought Recommendations based on the match between the content (item keywords) and user keywords The user model can also be a classifier (Neural Networks, SVM, Naïve Bayes...) 78 Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
Advantages of CB Approach + No need for data on other users. + No cold-start or sparsity problems. + Can recommend to users with unique tastes. + Can recommend new and unpopular items + Can provide explanations of recommended items by listing content-features that caused an item to be recommended. 79 Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
Disadvantages of CB Approach - Only for content that can be encoded as meaningful features. - Some types of items (e.g. movies, music)are not amenable to easy feature extraction methods - Even for texts, IR techniques cannot consider multimedia information, aesthetic qualities, download time: a positive rating could be not related to the presence of certain keywords - Users’ tastes must be represented as a learnable function of these content features. - Hard to exploit quality judgements of other users. - Difficult to implement serendipity 80 Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
Content-based Methods Content(s):= item profile, i.e. a set of attributes/keywords characterizing item s . weight w ij measures the 'Importance” (or “informativeness”) of word k j in document d j term frequency/inverse document frequency(TF-IDF) is a popular weighting technique in IR Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
Content-based User Profile ContentBasedProfile(c):= profile of user c profiles are obtained by: analysing the content of the previous items using keyword analysis techniques e.g., ContentBasedProfile(c):=(wc 1 , . . . , wc k ) a vector of weights, where wc i denotes the importance of keyword k i to user c Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
Similarity Measurements In content-based systems, the utility function u(c,s) is defined as: where ContentBasedProfile(c) of user c and Content(s) of document s are both represented as TF-IDF vectors of keyword weights. Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
Similarity Measurements Utility function u(c,s) usually represented by some scoring heuristic defined in terms of vectors, such as the cosine similarity measure. Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
Content-based Recommendation. An (unrealistic) example How to compute recommendations of books based only on their title? A customer buys the book: Building data mining applications for CRM 7 Books are possible candidates for a recommendation: Accelerating Customer Relationships: Using CRM and Relationship Technologies Mastering Data Mining: The Art and Science of Customer Relationship Management Data Mining Your Website Introduction to marketing Consumer behaviour Marketing research, a handbook Customer knowledge management Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
Management Accelerating applications Introduction Knowledge technology relationship Handbook Consumer Marketing Mastering customer Research behavior COUNT science Building website mining using CRM data your and the art for to of a Building data mining applications for 1 1 1 1 1 1 CRM Accelerating customer relationships: using 1 1 1 1 2 1 1 CRM and relationship technologies Mastering Data Mining: the art and science of Customer 1 1 1 1 1 1 1 1 1 1 1 Relationship Management Data Mining your 1 1 1 1 website Introduction to 1 1 1 Marketing Consumer behavior 1 1 Marketing Research: 1 1 1 1 a Handbook Customer Knowledge 1 1 1 Management Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
Content-based Recommendation Computes distances between this book & all others Recommends the « closest » books: #1: Data Mining Your Website #2: Accelerating Customer Relationships: Using CRM and Relationship Technologies #3: Mastering Data Mining: The Art and Science of Customer Relationship Management Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
Management Accelerating applications Introduction Knowledge technology relationship Handbook Consumer Marketing Mastering customer Research TFIDF Normed behavior Building science website Vectors mining using CRM data your and the art for of to a Building data mining applications for 0.502 0.502 0.344 0.251 0.502 0.251 CRM Accelerating customer relationships: using 0.432 0.296 0.296 0.216 0.468 0.432 0.432 CRM and relationship technologies Mastering Data Mining: the art and science of Customer 0.256 0.374 0.187 0.187 0.256 0.374 0.187 0.374 0.256 0.374 0.374 Relationship Management Data Mining your 0.316 0.316 0.632 0.632 website Introduction to 0.636 0.436 0.636 Marketing Consumer behavior 0.707 0.707 Marketing Research: 0.537 0.537 0.368 0.537 a Handbook Customer Knowledge 0.381 0.736 0.522 Management Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
Index 1. Introduction: What is a Recommender System? 2. Approaches 1. Collaborative Filtering 2. Content-based Recommendations 3. Context-aware Recommendations 4. Other Approaches 5. Hybrid Recommender Systems 3. Research Directions 4. Conclusions 5. References Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
Context Context is a dynamic set of factors describing the state of the user at the moment of the user's experience Context factors can rapidly change and affect how the user perceives an item Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
Context in Recommendations Temporal: Time of the day, weekday/end Spatial: Location, Home, Work etc. Social: with Friends, Family Recommendations should be tailored to the user & to the current Context of the user Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
Level of Adaptation Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
Context-Aware RS: Pre-filtering Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
Context-Aware RS: Post-filtering Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
Context-Aware RS: Tensor Factorization Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
Context-Aware RS: Pre-filtering + Simple + Works with large amounts of data - Increases sparseness - Does not scale well with many Context variables Post-filtering + Single model + Takes into account context interactions - Computationally expensive - Increases data sparseness - Does not model the Context directly Tensor Factorization + Performance + Linear scalability + Models context directly Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
Index 1. Introduction: What is a Recommender System? 2. Approaches 1. Collaborative Filtering 2. Content-based Recommendations 3. Context-aware Recommendations 4. Other Approaches 5. Hybrid Recommender Systems 3. Research Directions 4. Conclusions 5. References Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
Ranking Most recommendations are presented in a sorted list Recommendation is a ranking problem Popularity is the obvious baseline Users pay attention to few items at the top of the list Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
Ranking: Approaches (I) Re-ranking: based on features e.g. predicted rating, popularity, etc (II) Learning to Rank: Build Ranking CF models Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
Re-ranking Alexandros Karatzoglou – September 06, 2013 – Recommender Systems
Recommend
More recommend