faqs
play

FAQs Wednesday (4/15) is the GEAR Session IV presentation - PDF document

CS535 Big Data 4/13/2020 Week 12-A Sangmi Lee Pallickara CS535 Big Data | Computer Science | Colorado State University CS535 BIG DATA PART B. GEAR SESSIONS SESSION 4: LARGE SCALE RECOMMENDATION SYSTEMS AND SOCIAL MEDIA Sangmi Lee Pallickara


  1. CS535 Big Data 4/13/2020 Week 12-A Sangmi Lee Pallickara CS535 Big Data | Computer Science | Colorado State University CS535 BIG DATA PART B. GEAR SESSIONS SESSION 4: LARGE SCALE RECOMMENDATION SYSTEMS AND SOCIAL MEDIA Sangmi Lee Pallickara Computer Science, Colorado State University http://www.cs.colostate.edu/~cs535 CS535 Big Data | Computer Science | Colorado State University FAQs • Wednesday (4/15) is the GEAR Session IV presentation • Discussion will be available on 4/15, 16, and 17 • Watch video clips on Canvas à Assignments à Echo360 http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 1

  2. CS535 Big Data 4/13/2020 Week 12-A Sangmi Lee Pallickara CS535 Big Data | Computer Science | Colorado State University Topics of Todays Class • Part 1: Collaborative Filtering with the case study of Item-to-Item CF • Part 2: Collaborative Filtering with the case study of Latent Factor CF • Part 3: Evaluating Recommendation Systems CS535 Big Data | Computer Science | Colorado State University GEAR Session 4. Large Scale Recommendation Systems and Social Media Lecture 2. Large Scale Recommendation Systems Amazon.com : Item-to-item collaborative filtering http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 2

  3. CS535 Big Data 4/13/2020 Week 12-A Sangmi Lee Pallickara CS535 Big Data | Computer Science | Colorado State University • Amazon.com uses recommendations as a targeted marketing tool • Email campaigns • Most of their web pages CS535 Big Data | Computer Science | Colorado State University Recommendation System • Find a set of customers whose purchased and rated items overlap the user’s purchased and rated items • Eliminates items the user has already purchased (or rated) • Recommends the remaining items to the users http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 3

  4. CS535 Big Data 4/13/2020 Week 12-A Sangmi Lee Pallickara CS535 Big Data | Computer Science | Colorado State University What if they use a Traditional CF [1/4] • Build a utility matrix • N-dimensional vector of items per user regarding their ratings • Where N is the number of distinct catalog items • Positive for purchased or positively rated items • Negative for negatively rated items • To compensate for the best-selling items • Multiplies the vector components by the inverse frequency • Making less well-known items more relevant CS535 Big Data | Computer Science | Colorado State University What if they use a Traditional CF [2/4] • Find out similar users • Cosine similarity between the vectors • E.g. user A and B !"# • Cosine_Similarity(A,B) =cos(A,B)= ∥!∥∗∥#∥ • Select items within the group of items purchased by the similar users • E.g. Rank each item according to how many similar customers purchased it • Highly ranked item(s) will be recommended http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 4

  5. CS535 Big Data 4/13/2020 Week 12-A Sangmi Lee Pallickara CS535 Big Data | Computer Science | Colorado State University What if they use a Traditional CF [3/4] • For N items (in the catalog) and M users • Worst case • O(MN) • Average customer vector is extremely sparse • O(M+N) • Most of scanning will be approximately O(M) • There are a few customers who have purchased or rated a significant percentage of the catalog • Therefore, the final performance of the algorithm is approximately O(M+N) CS535 Big Data | Computer Science | Colorado State University What if they use a Traditional CF [4/4] • Dimensionality reduction • Reducing M by randomly sampled customers or discarding customers with few purchases • Reducing N by discarding very popular or unpopular items • What will be the problem of above approaches? http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 5

  6. CS535 Big Data 4/13/2020 Week 12-A Sangmi Lee Pallickara CS535 Big Data | Computer Science | Colorado State University What if they use a Traditional CF [4/4] • Dimensionality reduction • Reducing M by randomly sampled customers or discarding customers with few purchases • Reducing N by discarding very popular or unpopular items • Disadvantages • Hard to capture the similarity between the users • Item-space partitioning restricts recommendations to a specific product or subject area • If the algorithm discards the most popular or unpopular items • They will never appear as recommendataion CS535 Big Data | Computer Science | Colorado State University Item-to-item collaborative filtering • It does NOT match the user to similar customers • Item-to-item collaborative filtering • Matches each of the user’s purchased and rated items to similar items • Combines those similar items into a recommendation list http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 6

  7. CS535 Big Data 4/13/2020 Week 12-A Sangmi Lee Pallickara CS535 Big Data | Computer Science | Colorado State University Determining the most-similar match • The algorithm builds a similar-items table • By finding items that customers tend to purchase together • How about building a product-to-product matrix by iterating through all item pairs and computing a similarity metric for each pair? • Many product pairs have no common customer • If you already bought a TV today, will you buy another TV again today? CS535 Big Data | Computer Science | Colorado State University Determining the most-similar match • Calculating the similarity between a single product and all related products • It is not the same “similarity” between items • Based on the co-occurred items in the a client’s purchase history • E.g. if a client A has bought a headset X and a lawn mower Y, X and Y can be considered as “similar” item in this context • How to build a similar-items matrix For each item in product catalog , I1 For each customer C who purchased I1 For each item I2 purchased by customer C Record that a customer purchased I1 and I2 For each item I2 Compute the similarity between I1 and I2 http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 7

  8. CS535 Big Data 4/13/2020 Week 12-A Sangmi Lee Pallickara CS535 Big Data | Computer Science | Colorado State University Part 1: tracking co-occurrence items [1/3] Purchase record for the user U A ={ I 1 , I 3. ,I 4 } I 0 I 1 I 2 I 3 I 4 I 5 I 6 I 0 Purchase record for the user U B ={ I 2 , I 3. ,I 4 } Purchase record for the user U C ={ I 2 } I 1 I 2 Purchase record for the user U D ={ I 0 , I 5. ,I 6 } Purchase record for the user U E ={ I 1 , I 3. } I 3 Purchase record for the user U F ={ I 0 , I 3. ,I 5 } I 4 Purchase record for the user U G ={ I 5 , I 6. } I 5 I 6 For each item in product catalog , I1 For each customer C who purchased I1 For each item I2 purchased by customer C Record that a customer purchased I1 and I2 For each item I2 Compute the similarity between I1 and I2 CS535 Big Data | Computer Science | Colorado State University Part 1: tracking co-occurrence items [2/3] Purchase record for the user U A ={ I 1 , I 3. ,I 4 } I 0 I 1 I 2 I 3 I 4 I 5 I 6 I 0 Purchase record for the user U B ={ I 2 , I 3. ,I 4 } Purchase record for the user U C ={ I 2 } I 1 1 1 I 2 Purchase record for the user U D ={ I 0 , I 5. ,I 6 } Purchase record for the user U E ={ I 1 , I 3. } I 3 1 1 Purchase record for the user U F ={ I 0 , I 3. ,I 5 } I 4 1 1 Purchase record for the user U G ={ I 5 , I 6. } I 5 I 6 For each item in product catalog , I1 For each customer C who purchased I1 For each item I2 purchased by customer C Record that a customer purchased I1 and I2 For each item I2 Compute the similarity between I1 and I2 http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 8

  9. CS535 Big Data 4/13/2020 Week 12-A Sangmi Lee Pallickara CS535 Big Data | Computer Science | Colorado State University Part 1: tracking co-occurrence items [3/3] Co-occurrence matrix Purchase record for the user U A ={ I 1 , I 3. ,I 4 } Purchase record for the user U B ={ I 2 , I 3. ,I 4 } I 0 I 1 I 2 I 3 I 4 I 5 I 6 Purchase record for the user U C ={ I 2 } I 0 0 0 0 1 0 2 1 Purchase record for the user U D ={ I 0 , I 5. ,I 6 } I 1 0 0 0 2 1 0 0 Purchase record for the user U E ={ I 1 , I 3. } I 2 0 0 0 1 1 0 0 Purchase record for the user U F ={ I 0 , I 3. ,I 5 } I 3 1 2 1 0 2 1 0 Purchase record for the user U G ={ I 5 , I 6. } I 4 0 1 1 2 0 0 0 I 5 2 0 0 1 0 0 2 I 6 1 0 0 0 0 2 0 CS535 Big Data | Computer Science | Colorado State University Part 2: Computing similarity between items • Using cosine measure • Each vector corresponds to an item • Item A and B (rather than customers) !"# • Cosine_Similarity(A,B) =cos(A,B)= ∥!∥∗∥#∥ http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 9

Recommend


More recommend