Announcements: • Submit your project group TODAY (Ed Pinned Post) • Project Proposal due this Thursday (no late periods) • Upload homework on time (23:59pm)! CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu
It is always possible to decompose a real matrix A into A = U V T , where U, , V : unique* U, V : column orthonormal ▪ U T U = I ; V T V = I ( I : identity matrix) ▪ (Columns are orthogonal unit vectors) : diagonal ▪ Entries ( singular values ) are positive, and sorted in decreasing order ( σ 1 σ 2 ... 0 ) * Up to permutations for redundant singular values and orientation of singular vectors (details) 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 2
High dim. Graph Infinite Machine Apps data data data learning Locality Sampling PageRank, Recommen- sensitive data SVM SimRank der systems hashing streams Filtering Community Decision Association Clustering data Detection Trees Rules streams Dimension- Duplicate Spam Queries on Perceptron, ality document Detection streams kNN reduction detection 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 3
Customer Y Customer X ▪ Does search on Metallica ▪ Buys Metallica CD ▪ Recommender system ▪ Buys Megadeth CD suggests Megadeth from data collected about customer X 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 4
Examples: Search Recommendations Products, web sites, Items blogs, news items, … 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 5
Shelf space is a scarce commodity for traditional retailers ▪ Also: TV networks, movie theaters,… Web enables near-zero-cost dissemination of information about products ▪ From scarcity to abundance More choice necessitates better filters: ▪ Recommendation engines ▪ Association rules: How Into Thin Air made Touching the Void a bestseller: http://www.wired.com/wired/archive/12.10/tail.html 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 6
Source: Chris Anderson (2004) 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 7
Read http://www.wired.com/wired/archive/12.10/tail.html to learn more! 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 8
Editorial and hand curated ▪ List of favorites ▪ Lists of “essential” items Simple aggregates ▪ Top 10, Most Popular, Recent Uploads Tailored to individual users Today’s class ▪ Amazon, Netflix, … 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 9
X = set of Customers S = set of Items Utility function u : X × S → R ▪ R = set of ratings ▪ R is a totally ordered set ▪ e.g., 1-5 stars, real number in [0,1] 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 10
Avatar LOTR Matrix Pirates 1 0.2 Alice Bob 0.5 0.3 0.2 1 Carol 0.4 David 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 11
(1) Gathering “known” ratings for matrix ▪ How to collect the data in the utility matrix (2) Extrapolating unknown ratings from the known ones ▪ Mainly interested in high unknown ratings ▪ We are not interested in knowing what you don’t like but what you like (3) Evaluating extrapolation methods ▪ How to measure success/performance of recommendation methods 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 12
Explicit ▪ Ask people to rate items ▪ Doesn’t work well in practice – people don’t like being bothered ▪ Crowdsourcing: Pay people to label items Implicit ▪ Learn ratings from user actions ▪ E.g., purchase implies high rating ▪ E.g., add to playlist, play in full, skip song… ▪ What about low ratings? 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 13
Key problem: Utility matrix U is sparse ▪ Most people have not rated most items ▪ Cold Start Problem: ▪ New items have no ratings ▪ New users have no history Three approaches to recommender systems: ▪ 1) Content-based Today! ▪ 2) Collaborative ▪ 3) Latent factor based 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 14
Main idea: Recommend items to customer x similar to previous items rated highly by x Example: Movie recommendations ▪ Recommend movies with same actor(s), director, genre, … Websites, blogs, news ▪ Recommend other sites with “similar” content 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 16
Item profiles likes build recommend match Red Circles Triangles User profile 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 17
For each item, create an item profile Profile is a set (vector) of features ▪ Movies: author, title, actor, director,… ▪ Text: Set of “important” words in document How to pick important features? ▪ Usual heuristic from text mining is TF-IDF (Term frequency * Inverse Doc Frequency) ▪ Term … Feature ▪ Document … Item 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 18
Added pink notes f ij = frequency of term (feature) i in doc (item) j Note: we normalize Large when term i TF to discount for appears often in doc j “longer” documents n i = number of docs that mention term i N = total number of docs Large when term i appears in very few documents TF-IDF score: w ij = TF ij × IDF i Doc profile = set of words with highest TF-IDF scores, together with their scores 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 19
User profile possibilities: ▪ Weighted average of rated item profiles ▪ Variation: weight by difference from average rating for item Prediction heuristic: Cosine similarity of user and item profiles) ▪ Given user profile x and item profile i , estimate 𝒚·𝒋 𝑣 𝒚, 𝒋 = cos 𝒚, 𝒋 = 𝒚 ⋅ 𝒋 How do you quickly find items closest to 𝒚 ? ▪ Job for LSH! 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 20
+: No need for data on other users ▪ No cold-start or sparsity problems +: Able to recommend to users with unique tastes +: Able to recommend new & unpopular items ▪ No first-rater problem +: Able to provide explanations ▪ Can provide explanations of recommended items by listing content-features that caused an item to be recommended 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 21
– : Finding the appropriate features is hard ▪ E.g., images, movies, music – : Recommendations for new users ▪ How to build a user profile? – : Overspecialization ▪ Never recommends items outside user’s content profile ▪ People might have multiple interests ▪ ! Unable to exploit quality judgments of other users! 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 22
Harnessing quality judgments of other users
Consider user x Find set N of other x users whose ratings are “ similar ” to x ’s ratings N Estimate x ’s ratings based on ratings of users in N 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 24
r x = [*, _, _, *, ***] r y = [*, _, **, **, _] Let r x be the vector of user x ’s ratings Jaccard similarity metric r x , r y as sets: r x = {1, 4, 5} ▪ Problem: Ignores the value of the rating r y = {1, 3, 4} Cosine similarity metric r x , r y as points: 𝑠 𝑦 ⋅𝑠 𝑧 ▪ sim( x , y ) = cos( r x , r y ) = r x = {1, 0, 0, 1, 3} ||𝑠 𝑦 ||⋅||𝑠 𝑧 || r y = {1, 0, 2, 2, 0} ▪ Problem: Treats some missing ratings as “negative” Better: Pearson correlation coefficient ▪ S xy = items rated by both users x and y r x , r y … avg. rating of x , y 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 25
Cosine sim: σ 𝒋 𝒔 𝒚𝒋 ⋅ 𝒔 𝒛𝒋 𝒕𝒋𝒏(𝒚, 𝒛) = 𝟑 ⋅ 𝟑 σ 𝒋 𝒔 𝒚𝒋 σ 𝒋 𝒔 𝒛𝒋 Intuitively we want: sim( A , B ) > sim( A , C ) Jaccard similarity: 1/5 < 2/4 Cosine similarity: 0.380 > 0.322 ▪ Considers missing ratings as “negative” ▪ Solution: subtract the (row) mean sim A,B vs. A,C: 0.092 > -0.559 Notice cosine sim. is correlation when data is centered at 0 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 26
Recommend
More recommend