FAQs Wednesday (4/15) is the GEAR Session IV presentation - PDF document

CS535 Big Data 4/13/2020 Week 12-A Sangmi Lee Pallickara CS535 Big Data | Computer Science | Colorado State University CS535 BIG DATA PART B. GEAR SESSIONS SESSION 4: LARGE SCALE RECOMMENDATION SYSTEMS AND SOCIAL MEDIA Sangmi Lee Pallickara Computer Science, Colorado State University http://www.cs.colostate.edu/~cs535 CS535 Big Data | Computer Science | Colorado State University FAQs • Wednesday (4/15) is the GEAR Session IV presentation • Discussion will be available on 4/15, 16, and 17 • Watch video clips on Canvas à Assignments à Echo360 http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 1

CS535 Big Data 4/13/2020 Week 12-A Sangmi Lee Pallickara CS535 Big Data | Computer Science | Colorado State University Topics of Todays Class • Part 1: Collaborative Filtering with the case study of Item-to-Item CF • Part 2: Collaborative Filtering with the case study of Latent Factor CF • Part 3: Evaluating Recommendation Systems CS535 Big Data | Computer Science | Colorado State University GEAR Session 4. Large Scale Recommendation Systems and Social Media Lecture 2. Large Scale Recommendation Systems Amazon.com : Item-to-item collaborative filtering http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 2

CS535 Big Data 4/13/2020 Week 12-A Sangmi Lee Pallickara CS535 Big Data | Computer Science | Colorado State University • Amazon.com uses recommendations as a targeted marketing tool • Email campaigns • Most of their web pages CS535 Big Data | Computer Science | Colorado State University Recommendation System • Find a set of customers whose purchased and rated items overlap the user’s purchased and rated items • Eliminates items the user has already purchased (or rated) • Recommends the remaining items to the users http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 3

CS535 Big Data 4/13/2020 Week 12-A Sangmi Lee Pallickara CS535 Big Data | Computer Science | Colorado State University What if they use a Traditional CF [1/4] • Build a utility matrix • N-dimensional vector of items per user regarding their ratings • Where N is the number of distinct catalog items • Positive for purchased or positively rated items • Negative for negatively rated items • To compensate for the best-selling items • Multiplies the vector components by the inverse frequency • Making less well-known items more relevant CS535 Big Data | Computer Science | Colorado State University What if they use a Traditional CF [2/4] • Find out similar users • Cosine similarity between the vectors • E.g. user A and B !"# • Cosine_Similarity(A,B) =cos(A,B)= ∥!∥∗∥#∥ • Select items within the group of items purchased by the similar users • E.g. Rank each item according to how many similar customers purchased it • Highly ranked item(s) will be recommended http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 4

CS535 Big Data 4/13/2020 Week 12-A Sangmi Lee Pallickara CS535 Big Data | Computer Science | Colorado State University What if they use a Traditional CF [3/4] • For N items (in the catalog) and M users • Worst case • O(MN) • Average customer vector is extremely sparse • O(M+N) • Most of scanning will be approximately O(M) • There are a few customers who have purchased or rated a significant percentage of the catalog • Therefore, the final performance of the algorithm is approximately O(M+N) CS535 Big Data | Computer Science | Colorado State University What if they use a Traditional CF [4/4] • Dimensionality reduction • Reducing M by randomly sampled customers or discarding customers with few purchases • Reducing N by discarding very popular or unpopular items • What will be the problem of above approaches? http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 5

CS535 Big Data 4/13/2020 Week 12-A Sangmi Lee Pallickara CS535 Big Data | Computer Science | Colorado State University What if they use a Traditional CF [4/4] • Dimensionality reduction • Reducing M by randomly sampled customers or discarding customers with few purchases • Reducing N by discarding very popular or unpopular items • Disadvantages • Hard to capture the similarity between the users • Item-space partitioning restricts recommendations to a specific product or subject area • If the algorithm discards the most popular or unpopular items • They will never appear as recommendataion CS535 Big Data | Computer Science | Colorado State University Item-to-item collaborative filtering • It does NOT match the user to similar customers • Item-to-item collaborative filtering • Matches each of the user’s purchased and rated items to similar items • Combines those similar items into a recommendation list http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 6

CS535 Big Data 4/13/2020 Week 12-A Sangmi Lee Pallickara CS535 Big Data | Computer Science | Colorado State University Determining the most-similar match • The algorithm builds a similar-items table • By finding items that customers tend to purchase together • How about building a product-to-product matrix by iterating through all item pairs and computing a similarity metric for each pair? • Many product pairs have no common customer • If you already bought a TV today, will you buy another TV again today? CS535 Big Data | Computer Science | Colorado State University Determining the most-similar match • Calculating the similarity between a single product and all related products • It is not the same “similarity” between items • Based on the co-occurred items in the a client’s purchase history • E.g. if a client A has bought a headset X and a lawn mower Y, X and Y can be considered as “similar” item in this context • How to build a similar-items matrix For each item in product catalog , I1 For each customer C who purchased I1 For each item I2 purchased by customer C Record that a customer purchased I1 and I2 For each item I2 Compute the similarity between I1 and I2 http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 7

CS535 Big Data 4/13/2020 Week 12-A Sangmi Lee Pallickara CS535 Big Data | Computer Science | Colorado State University Part 1: tracking co-occurrence items [1/3] Purchase record for the user U A ={ I 1 , I 3. ,I 4 } I 0 I 1 I 2 I 3 I 4 I 5 I 6 I 0 Purchase record for the user U B ={ I 2 , I 3. ,I 4 } Purchase record for the user U C ={ I 2 } I 1 I 2 Purchase record for the user U D ={ I 0 , I 5. ,I 6 } Purchase record for the user U E ={ I 1 , I 3. } I 3 Purchase record for the user U F ={ I 0 , I 3. ,I 5 } I 4 Purchase record for the user U G ={ I 5 , I 6. } I 5 I 6 For each item in product catalog , I1 For each customer C who purchased I1 For each item I2 purchased by customer C Record that a customer purchased I1 and I2 For each item I2 Compute the similarity between I1 and I2 CS535 Big Data | Computer Science | Colorado State University Part 1: tracking co-occurrence items [2/3] Purchase record for the user U A ={ I 1 , I 3. ,I 4 } I 0 I 1 I 2 I 3 I 4 I 5 I 6 I 0 Purchase record for the user U B ={ I 2 , I 3. ,I 4 } Purchase record for the user U C ={ I 2 } I 1 1 1 I 2 Purchase record for the user U D ={ I 0 , I 5. ,I 6 } Purchase record for the user U E ={ I 1 , I 3. } I 3 1 1 Purchase record for the user U F ={ I 0 , I 3. ,I 5 } I 4 1 1 Purchase record for the user U G ={ I 5 , I 6. } I 5 I 6 For each item in product catalog , I1 For each customer C who purchased I1 For each item I2 purchased by customer C Record that a customer purchased I1 and I2 For each item I2 Compute the similarity between I1 and I2 http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 8

CS535 Big Data 4/13/2020 Week 12-A Sangmi Lee Pallickara CS535 Big Data | Computer Science | Colorado State University Part 1: tracking co-occurrence items [3/3] Co-occurrence matrix Purchase record for the user U A ={ I 1 , I 3. ,I 4 } Purchase record for the user U B ={ I 2 , I 3. ,I 4 } I 0 I 1 I 2 I 3 I 4 I 5 I 6 Purchase record for the user U C ={ I 2 } I 0 0 0 0 1 0 2 1 Purchase record for the user U D ={ I 0 , I 5. ,I 6 } I 1 0 0 0 2 1 0 0 Purchase record for the user U E ={ I 1 , I 3. } I 2 0 0 0 1 1 0 0 Purchase record for the user U F ={ I 0 , I 3. ,I 5 } I 3 1 2 1 0 2 1 0 Purchase record for the user U G ={ I 5 , I 6. } I 4 0 1 1 2 0 0 0 I 5 2 0 0 1 0 0 2 I 6 1 0 0 0 0 2 0 CS535 Big Data | Computer Science | Colorado State University Part 2: Computing similarity between items • Using cosine measure • Each vector corresponds to an item • Item A and B (rather than customers) !"# • Cosine_Similarity(A,B) =cos(A,B)= ∥!∥∗∥#∥ http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 9

FAQs Wednesday (4/15) is the GEAR Session IV presentation - PDF document

CS535 Big Data 4/13/2020 Week 12-A Sangmi Lee Pallickara CS535 Big Data | Computer Science | Colorado State University CS535 BIG DATA PART B. GEAR SESSIONS SESSION 4: LARGE SCALE RECOMMENDATION SYSTEMS AND SOCIAL MEDIA Sangmi Lee Pallickara

FAQs Safety Protective devices for machines FAQs What is functional safety and why is machine

Glossary Glossary FAQS FAQS Tools and Resources Tools and Resources Welcome to Your HR Leader

FAQs on Accreditation Criteria for FAQs on Accreditation Criteria for Government and Private

Announcements Check course web page under assignments for FAQs Read FAQs before sending

Under Labor Law 537 The FAQs can be accessed here -

FAQs Pat Tabor spearheaded a project when he was on the Board to have a source of information on

Promotion Open Session Introduction This document outlines the full transcript of the FAQS from

Budget Update FAQs and Clarifications Board of Education February 5, 2020 Kathleen Askelson,

DRN OC Updates October 5, 2015 Agenda Discussion of revised CDM Implementation FAQs: Shelley

PREVENTING MUSCULOSKELETAL DISORDERS AND TRAINING : FAQS DIANA ROBLA Social partners

Final Paper Format Guide and Presentation FAQs This document provides a basic overview of

Water and Sewer Department (WTWSD) Water Quality- July 12, 2016 FAQs Q: Is my public water

Crack Pipe FAQs: What service providers need to know Presenter: Andrew Ivsins Presentation

Welcome! The Webinar will Begin Shortly Technical Assistance FAQs 1. Why cant I hear anything?

UC SPONSORED RETIREE HEALTH PLANS FREQUENTLY ASKED QUESTIONS ( FAQs ) v.07102020 FAQ #1 When I

Travel Welcome to Acorn Adventure Ardche Adventure FAQs Any questions?

Recommender Systems Instructor: Ekpe Okorafor 1. Accenture Big Data Academy 2. Computer

YOUR FRIENDS LIKE IT, DO YOU? THE EFFECTS OF SOCIAL RECOMMENDATION SYSTEMS ON CONSUMER

BE SURE TO PROFILE THAT THIS WAS DONE IN THE MSCPT RESEARCH CURRICULUM AS WELL. No conflicts of

Graduation Readiness Practice professionalism, fairness, and reasonable judgment when

A Quick Look at the Reinforcement Learning course A. LAZARIC ( SequeL Team @INRIA-Lille )

FULLY STAFFED Finding and Keeping Great Employees ERIC CHESTER THE PERFECT STORM The cupboards

Personalized PageRank based Community Detection Code bit.ly/dgleich-codes Joint work with

Request for Proposal Avoiding the pitfAlls of the trAditionAl rfp process HighEdWeb | October