http cs246 stanford edu
play

http://cs246.stanford.edu It is always possible to decompose a real - PowerPoint PPT Presentation

Announcements: Submit your project group TODAY (Ed Pinned Post) Project Proposal due this Thursday (no late periods) Upload homework on time (23:59pm)! CS246: Mining Massive Datasets Jure Leskovec, Stanford University


  1. Announcements: • Submit your project group TODAY (Ed Pinned Post) • Project Proposal due this Thursday (no late periods) • Upload homework on time (23:59pm)! CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu

  2. It is always possible to decompose a real matrix A into A = U  V T , where  U,  , V : unique*  U, V : column orthonormal ▪ U T U = I ; V T V = I ( I : identity matrix) ▪ (Columns are orthogonal unit vectors)   : diagonal ▪ Entries ( singular values ) are positive, and sorted in decreasing order ( σ 1  σ 2  ...  0 ) * Up to permutations for redundant singular values and orientation of singular vectors (details) 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 2

  3. High dim. Graph Infinite Machine Apps data data data learning Locality Sampling PageRank, Recommen- sensitive data SVM SimRank der systems hashing streams Filtering Community Decision Association Clustering data Detection Trees Rules streams Dimension- Duplicate Spam Queries on Perceptron, ality document Detection streams kNN reduction detection 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 3

  4.  Customer Y  Customer X ▪ Does search on Metallica ▪ Buys Metallica CD ▪ Recommender system ▪ Buys Megadeth CD suggests Megadeth from data collected about customer X 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 4

  5. Examples: Search Recommendations Products, web sites, Items blogs, news items, … 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 5

  6.  Shelf space is a scarce commodity for traditional retailers ▪ Also: TV networks, movie theaters,…  Web enables near-zero-cost dissemination of information about products ▪ From scarcity to abundance  More choice necessitates better filters: ▪ Recommendation engines ▪ Association rules: How Into Thin Air made Touching the Void a bestseller: http://www.wired.com/wired/archive/12.10/tail.html 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 6

  7. Source: Chris Anderson (2004) 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 7

  8. Read http://www.wired.com/wired/archive/12.10/tail.html to learn more! 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 8

  9.  Editorial and hand curated ▪ List of favorites ▪ Lists of “essential” items  Simple aggregates ▪ Top 10, Most Popular, Recent Uploads  Tailored to individual users Today’s class ▪ Amazon, Netflix, … 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 9

  10.  X = set of Customers  S = set of Items  Utility function u : X × S → R ▪ R = set of ratings ▪ R is a totally ordered set ▪ e.g., 1-5 stars, real number in [0,1] 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 10

  11. Avatar LOTR Matrix Pirates 1 0.2 Alice Bob 0.5 0.3 0.2 1 Carol 0.4 David 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 11

  12.  (1) Gathering “known” ratings for matrix ▪ How to collect the data in the utility matrix  (2) Extrapolating unknown ratings from the known ones ▪ Mainly interested in high unknown ratings ▪ We are not interested in knowing what you don’t like but what you like  (3) Evaluating extrapolation methods ▪ How to measure success/performance of recommendation methods 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 12

  13.  Explicit ▪ Ask people to rate items ▪ Doesn’t work well in practice – people don’t like being bothered ▪ Crowdsourcing: Pay people to label items  Implicit ▪ Learn ratings from user actions ▪ E.g., purchase implies high rating ▪ E.g., add to playlist, play in full, skip song… ▪ What about low ratings? 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 13

  14.  Key problem: Utility matrix U is sparse ▪ Most people have not rated most items ▪ Cold Start Problem: ▪ New items have no ratings ▪ New users have no history  Three approaches to recommender systems: ▪ 1) Content-based Today! ▪ 2) Collaborative ▪ 3) Latent factor based 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 14

  15.  Main idea: Recommend items to customer x similar to previous items rated highly by x Example:  Movie recommendations ▪ Recommend movies with same actor(s), director, genre, …  Websites, blogs, news ▪ Recommend other sites with “similar” content 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 16

  16. Item profiles likes build recommend match Red Circles Triangles User profile 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 17

  17.  For each item, create an item profile  Profile is a set (vector) of features ▪ Movies: author, title, actor, director,… ▪ Text: Set of “important” words in document  How to pick important features? ▪ Usual heuristic from text mining is TF-IDF (Term frequency * Inverse Doc Frequency) ▪ Term … Feature ▪ Document … Item 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 18

  18. Added pink notes f ij = frequency of term (feature) i in doc (item) j Note: we normalize Large when term i TF to discount for appears often in doc j “longer” documents n i = number of docs that mention term i N = total number of docs Large when term i appears in very few documents TF-IDF score: w ij = TF ij × IDF i Doc profile = set of words with highest TF-IDF scores, together with their scores 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 19

  19.  User profile possibilities: ▪ Weighted average of rated item profiles ▪ Variation: weight by difference from average rating for item  Prediction heuristic: Cosine similarity of user and item profiles) ▪ Given user profile x and item profile i , estimate 𝒚·𝒋 𝑣 𝒚, 𝒋 = cos 𝒚, 𝒋 = 𝒚 ⋅ 𝒋  How do you quickly find items closest to 𝒚 ? ▪ Job for LSH! 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 20

  20.  +: No need for data on other users ▪ No cold-start or sparsity problems  +: Able to recommend to users with unique tastes  +: Able to recommend new & unpopular items ▪ No first-rater problem  +: Able to provide explanations ▪ Can provide explanations of recommended items by listing content-features that caused an item to be recommended 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 21

  21.  – : Finding the appropriate features is hard ▪ E.g., images, movies, music  – : Recommendations for new users ▪ How to build a user profile?  – : Overspecialization ▪ Never recommends items outside user’s content profile ▪ People might have multiple interests ▪ ! Unable to exploit quality judgments of other users! 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 22

  22. Harnessing quality judgments of other users

  23.  Consider user x  Find set N of other x users whose ratings are “ similar ” to x ’s ratings N  Estimate x ’s ratings based on ratings of users in N 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 24

  24. r x = [*, _, _, *, ***] r y = [*, _, **, **, _]  Let r x be the vector of user x ’s ratings  Jaccard similarity metric r x , r y as sets: r x = {1, 4, 5} ▪ Problem: Ignores the value of the rating r y = {1, 3, 4}  Cosine similarity metric r x , r y as points: 𝑠 𝑦 ⋅𝑠 𝑧 ▪ sim( x , y ) = cos( r x , r y ) = r x = {1, 0, 0, 1, 3} ||𝑠 𝑦 ||⋅||𝑠 𝑧 || r y = {1, 0, 2, 2, 0} ▪ Problem: Treats some missing ratings as “negative”  Better: Pearson correlation coefficient ▪ S xy = items rated by both users x and y r x , r y … avg. rating of x , y 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 25

  25. Cosine sim: σ 𝒋 𝒔 𝒚𝒋 ⋅ 𝒔 𝒛𝒋 𝒕𝒋𝒏(𝒚, 𝒛) = 𝟑 ⋅ 𝟑 σ 𝒋 𝒔 𝒚𝒋 σ 𝒋 𝒔 𝒛𝒋  Intuitively we want: sim( A , B ) > sim( A , C )  Jaccard similarity: 1/5 < 2/4  Cosine similarity: 0.380 > 0.322 ▪ Considers missing ratings as “negative” ▪ Solution: subtract the (row) mean sim A,B vs. A,C: 0.092 > -0.559 Notice cosine sim. is correlation when data is centered at 0 4/20/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 26

Recommend


More recommend