cs535 big data 4 13 2020 week 12 a sangmi lee pallickara
play

CS535 Big Data 4/13/2020 Week 12-A Sangmi Lee Pallickara CS535 Big - PDF document

CS535 Big Data 4/13/2020 Week 12-A Sangmi Lee Pallickara CS535 Big Data | Computer Science | Colorado State University CS535 Big Data | Computer Science | Colorado State University CS535 BIG DATA FAQs Wednesday (4/15) is the GEAR Session


  1. CS535 Big Data 4/13/2020 Week 12-A Sangmi Lee Pallickara CS535 Big Data | Computer Science | Colorado State University CS535 Big Data | Computer Science | Colorado State University CS535 BIG DATA FAQs • Wednesday (4/15) is the GEAR Session IV presentation • Discussion will be available on 4/15, 16, and 17 • Watch video clips on Canvas à Assignments à Echo360 PART B. GEAR SESSIONS SESSION 4: LARGE SCALE RECOMMENDATION SYSTEMS AND SOCIAL MEDIA Sangmi Lee Pallickara Computer Science, Colorado State University http://www.cs.colostate.edu/~cs535 CS535 Big Data | Computer Science | Colorado State University CS535 Big Data | Computer Science | Colorado State University Topics of Todays Class • Part 1: Collaborative Filtering with the case study of Item-to-Item CF • Part 2: Collaborative Filtering with the case study of Latent Factor CF • Part 3: Evaluating Recommendation Systems GEAR Session 4. Large Scale Recommendation Systems and Social Media Lecture 2. Large Scale Recommendation Systems Amazon.com : Item-to-item collaborative filtering CS535 Big Data | Computer Science | Colorado State University CS535 Big Data | Computer Science | Colorado State University Recommendation System • Amazon.com uses recommendations as a targeted marketing tool • Find a set of customers whose purchased and rated items overlap the user’s purchased and rated items • Email campaigns • Most of their web pages • Eliminates items the user has already purchased (or rated) • Recommends the remaining items to the users http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 1

  2. CS535 Big Data 4/13/2020 Week 12-A Sangmi Lee Pallickara CS535 Big Data | Computer Science | Colorado State University CS535 Big Data | Computer Science | Colorado State University What if they use a Traditional CF [1/4] What if they use a Traditional CF [2/4] • Build a utility matrix • Find out similar users • N-dimensional vector of items per user regarding their ratings • Cosine similarity between the vectors • Where N is the number of distinct catalog items • E.g. user A and B !"# • Positive for purchased or positively rated items • Cosine_Similarity(A,B) =cos(A,B)= ∥!∥∗∥#∥ • Negative for negatively rated items • Select items within the group of items purchased by the similar users • To compensate for the best-selling items • E.g. Rank each item according to how many similar customers purchased it • Multiplies the vector components by the inverse frequency • Making less well-known items more relevant • Highly ranked item(s) will be recommended CS535 Big Data | Computer Science | Colorado State University CS535 Big Data | Computer Science | Colorado State University What if they use a Traditional CF [3/4] What if they use a Traditional CF [4/4] • For N items (in the catalog) and M users • Dimensionality reduction • Worst case • Reducing M by randomly sampled customers or discarding customers with few purchases • O(MN) • Average customer vector is extremely sparse • Reducing N by discarding very popular or unpopular items • O(M+N) • Most of scanning will be approximately O(M) • What will be the problem of above approaches? • There are a few customers who have purchased or rated a significant percentage of the catalog • Therefore, the final performance of the algorithm is approximately O(M+N) CS535 Big Data | Computer Science | Colorado State University CS535 Big Data | Computer Science | Colorado State University What if they use a Traditional CF [4/4] Item-to-item collaborative filtering • Dimensionality reduction • It does NOT match the user to similar customers • Reducing M by randomly sampled customers or discarding customers with few • Item-to-item collaborative filtering purchases • Matches each of the user’s purchased and rated items to similar items • Reducing N by discarding very popular or unpopular items • Combines those similar items into a recommendation list • Disadvantages • Hard to capture the similarity between the users • Item-space partitioning restricts recommendations to a specific product or subject area • If the algorithm discards the most popular or unpopular items • They will never appear as recommendataion http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 2

  3. CS535 Big Data 4/13/2020 Week 12-A Sangmi Lee Pallickara CS535 Big Data | Computer Science | Colorado State University CS535 Big Data | Computer Science | Colorado State University Determining the most-similar match Determining the most-similar match • The algorithm builds a similar-items table • Calculating the similarity between a single product and all related products • By finding items that customers tend to purchase together • It is not the same “similarity” between items • Based on the co-occurred items in the a client’s purchase history • How about building a product-to-product matrix by iterating through all item pairs and • E.g. if a client A has bought a headset X and a lawn mower Y, X and Y can be considered as “similar” item in this context computing a similarity metric for each pair? • How to build a similar-items matrix • Many product pairs have no common customer For each item in product catalog , I1 • If you already bought a TV today, will you buy another TV again today? For each customer C who purchased I1 For each item I2 purchased by customer C Record that a customer purchased I1 and I2 For each item I2 Compute the similarity between I1 and I2 CS535 Big Data | Computer Science | Colorado State University CS535 Big Data | Computer Science | Colorado State University Part 1: tracking co-occurrence items [1/3] Part 1: tracking co-occurrence items [2/3] Purchase record for the user U A ={ I 1 , I 3. ,I 4 } Purchase record for the user U A ={ I 1 , I 3. ,I 4 } I 0 I 1 I 2 I 3 I 4 I 5 I 6 I 0 I 1 I 2 I 3 I 4 I 5 I 6 I 0 Purchase record for the user U B ={ I 2 , I 3. ,I 4 } Purchase record for the user U B ={ I 2 , I 3. ,I 4 } I 0 I 1 Purchase record for the user U C ={ I 2 } I 1 1 1 Purchase record for the user U C ={ I 2 } I 2 Purchase record for the user U D ={ I 0 , I 5. ,I 6 } Purchase record for the user U D ={ I 0 , I 5. ,I 6 } I 2 I 3 Purchase record for the user U E ={ I 1 , I 3. } I 3 1 1 Purchase record for the user U E ={ I 1 , I 3. } I 4 Purchase record for the user U F ={ I 0 , I 3. ,I 5 } I 4 1 1 Purchase record for the user U F ={ I 0 , I 3. ,I 5 } I 5 Purchase record for the user U G ={ I 5 , I 6. } I 5 Purchase record for the user U G ={ I 5 , I 6. } I 6 I 6 For each item in product catalog , I1 For each item in product catalog , I1 For each customer C who purchased I1 For each customer C who purchased I1 For each item I2 purchased by customer C For each item I2 purchased by customer C Record that a customer purchased I1 and I2 Record that a customer purchased I1 and I2 For each item I2 For each item I2 Compute the similarity between I1 and I2 Compute the similarity between I1 and I2 CS535 Big Data | Computer Science | Colorado State University CS535 Big Data | Computer Science | Colorado State University Part 1: tracking co-occurrence items [3/3] Part 2: Computing similarity between items Co-occurrence matrix Purchase record for the user U A ={ I 1 , I 3. ,I 4 } • Using cosine measure Purchase record for the user U B ={ I 2 , I 3. ,I 4 } I 0 I 1 I 2 I 3 I 4 I 5 I 6 Purchase record for the user U C ={ I 2 } • Each vector corresponds to an item I 0 0 0 0 1 0 2 1 Purchase record for the user U D ={ I 0 , I 5. ,I 6 } • Item A and B (rather than customers) I 1 0 0 0 2 1 0 0 !"# Purchase record for the user U E ={ I 1 , I 3. } I 2 0 0 0 1 1 0 0 • Cosine_Similarity(A,B) =cos(A,B)= Purchase record for the user U F ={ I 0 , I 3. ,I 5 } ∥!∥∗∥#∥ I 3 1 2 1 0 2 1 0 Purchase record for the user U G ={ I 5 , I 6. } I 4 0 1 1 2 0 0 0 I 5 2 0 0 1 0 0 2 I 6 1 0 0 0 0 2 0 http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 3

Recommend


More recommend