Problem Domain Collaborative filtering (CF)-based recommender - PowerPoint PPT Presentation

ClustKNN : A Highly Scalable Hybrid Model-& Memory-Based CF Algorithm Al Mamunur Rashid, Shyong K. Lam, George Karypis, and John Riedl University of Minnesota

Problem Domain • Collaborative filtering (CF)-based recommender systems (RS). • Issue: − Scalability 2 Al Mamunur Rashid, WebKDD 2006

Background: Why Recommender Systems? Information overload: More than 1.3 million articles! About 50 million blogs! About 130 million photos! 3 Al Mamunur Rashid, WebKDD 2006

Background: Why Recommender Systems? • One solution: − Recommender systems � Tools that suggest items of interest based on • Users’ expressed preferences • Observed behaviors • Information about the items � Collaborative Filtering • Recommendations based on like-minded users 4 Al Mamunur Rashid, WebKDD 2006

Many CF Algorithms So Far… • Most of the early ones: kNN − GroupLens (1994) , Ringo (1995) • View it as a special regression problem. − Nearly all statistical and ML approaches can be applied! • Classification by Breese et al. (1998) : Memory-based Model-based CF CF � � Simplicity � � Training cost � � Online prediction cost � � Adding new information 5 Al Mamunur Rashid, WebKDD 2006

Many CF Algorithms So Far… • Accuracy: − So far the main focus � However, how much difference in accuracy users perceive? • Does it scale though? 6 Al Mamunur Rashid, WebKDD 2006

User-based k NN CF Algorithm • Classic memory-based CF • Assumption: − Linear relationship between two users’ preferences � User-similarities measured by Pearson correlation coeff. • Works very well − Very good accuracy & Explainable to general users. • Problem: Doesn’t scale! − O(mn) online cost 7 Al Mamunur Rashid, WebKDD 2006

ClustKNN : Proposed Approach • Retain good properties of User-based kNN • Make it to scale n users Bisecting k-means clustering k clusters Take k-centroids k surrogate users • Online cost: O(km) ≅ O(m) − (k«m, k«n) 8 Al Mamunur Rashid, WebKDD 2006

ClustKNN : Proposed Approach • Bisecting k-means clustering − Better k-means � Cluster sizes are more uniform � Better results found in document clustering (Steinbach 2000) • Similarity function: − Same in both cluster-building and CF − Nicely complements each other 9 Al Mamunur Rashid, WebKDD 2006

Other Algorithms Considered 10 Al Mamunur Rashid, WebKDD 2006

Time-complexities 11 Al Mamunur Rashid, WebKDD 2006

Experiments: Datasets •Movie recommendation data from 12 Al Mamunur Rashid, WebKDD 2006

Experiments: Evaluation Metrics • Prediction eval metrics − NMAE � Divide MAE with Expected MAE � Limitation: • Same value of error: same treatment � No difference between two (pred, actual) pairs (5, 2) and (2, 5) − Expected Utility (EU) • Recommendation list eval metrics − Precision-recall-F1 13 Al Mamunur Rashid, WebKDD 2006

Evaluation Metric: EU • Two tables: − A contingency table � Rows: predictions; columns: actual ratings − A utility table � Filled with a linear utility function: � Penalizes false positives more than false negatives 14 Al Mamunur Rashid, WebKDD 2006

Results 7 6.8 Expect ed Ut ilit y 6.6 6.4 ClustKNN 6.2 User-based KNN 6 20 30 40 50 60 70 80 100 120 140 200 500 # of clusters in the model 0.47 0.465 ClustKNN User-based KNN 0.46 0.455 NMAE 0.45 0.445 0.44 0.435 0.43 0.425 20 30 40 50 60 70 80 100 120 140 200 500 # of clusters in the model 15 Al Mamunur Rashid, WebKDD 2006

Results: Prediction Accuracy 16 Al Mamunur Rashid, WebKDD 2006

Results: Recommendation List 17 Al Mamunur Rashid, WebKDD 2006

ClustKNN : Discussion • Scalable! • Simple and explainable • Hybrid of model- and memory-based approaches • Great for occasionally-connected, low-storage devices! − Memory requirement: only O(km+m) ! 18 Al Mamunur Rashid, WebKDD 2006

Thanks for listening! Questions? 19 Al Mamunur Rashid, WebKDD 2006

Problem Domain Collaborative filtering (CF)-based recommender - PowerPoint PPT Presentation

ClustKNN : A Highly Scalable Hybrid Model-& Memory-Based CF Algorithm Al Mamunur Rashid, Shyong K. Lam, George Karypis, and John Riedl University of Minnesota Problem Domain Collaborative filtering (CF)-based recommender systems (RS).

Web Hosting and Domain Names Introduction to Web Design Web Hosting and Domain Names

Focusing the Core Domain Model A Domain-Driven Design Case Study, Eric Evans, Domain Language

Image Processing A case study for a domain decomposed MPI code Domain Decomposition 1

Problem Definition Problem Definition Problem Definition Problem Definition Problem Definition

Texture Synthesis Presented by James Hays Problem Statement 1 Problem Statement Problem

Kicking Down the Cross Domain Door Techniques for Cross Domain Exploitation Billy K Rios (BK) and

Chapter 24 Chapter 24 Chapter 24 The Domain Name System The Domain Name System The Domain Name

Strong Baselines for Neural Semi-supervised Learning under Domain Shift Sebastian Ruder Barbara

Web Development Web Hosting and Domain Names CSCI-GA 1122 Web Development Web Hosting and

Protein Sequence Analysis Protein Sequence Analysis Domain review Domain review What is a

Information Visualization domain situation details of an application domain Characterize

Domain-independent planning and Domain-dependent planning Le Meilleur est lennemi

s to Z-Domain Transfer Function 1. s to Z-Domain Transfer Function 1. Discrete ZOH Signals s

Criminal Use of Domain Names Greg Aaron, Illumintel Colin Strutt, Interisle Consulting Group 1

Example: Domain Model Using CRC Cards Steven Zeil February 13, 2013 Example: Domain

Domain-Driven Design Brett D. Roads Domain-Driven Design: Tackling Complexity in the Heart of

H.A.R.P. Clara Mae Barnhart & Soyuz Shrestha Binghamton University Academic Recovery through

Bias-variance trade-off. Crossvalidation. Regularization. Petr Po s k P. Po s k

Update Agenda Single Security Initiative Overview Key Considerations Investors

Targets for S 3 : design, fabrication and control under irradiation Ch. Stodel, CNRS/IN2P3

draft-bonica-l3vpn-auth-01.txt SP can accidentally provision Customer_A interface into

TOS Arno Puder 1 Objectives Motivate the need for Inter-Process Communication Introduce

Blockchains and its applications Quentin Bramas Assistant Professor ICUBE Laboratory

CS330 &)

Problem Domain Collaborative filtering (CF)-based recommender - PowerPoint PPT Presentation

ClustKNN : A Highly Scalable Hybrid Model-& Memory-Based CF Algorithm Al Mamunur Rashid, Shyong K. Lam, George Karypis, and John Riedl University of Minnesota Problem Domain Collaborative filtering (CF)-based recommender systems (RS).

Web Hosting and Domain Names Introduction to Web Design Web Hosting and Domain Names

Focusing the Core Domain Model A Domain-Driven Design Case Study, Eric Evans, Domain Language

Image Processing A case study for a domain decomposed MPI code Domain Decomposition 1

Problem Definition Problem Definition Problem Definition Problem Definition Problem Definition

Texture Synthesis Presented by James Hays Problem Statement 1 Problem Statement Problem

Kicking Down the Cross Domain Door Techniques for Cross Domain Exploitation Billy K Rios (BK) and

Chapter 24 Chapter 24 Chapter 24 The Domain Name System The Domain Name System The Domain Name

Strong Baselines for Neural Semi-supervised Learning under Domain Shift Sebastian Ruder Barbara

Web Development Web Hosting and Domain Names CSCI-GA 1122 Web Development Web Hosting and

Protein Sequence Analysis Protein Sequence Analysis Domain review Domain review What is a

Information Visualization domain situation details of an application domain Characterize

Domain-independent planning and Domain-dependent planning Le Meilleur est lennemi

s to Z-Domain Transfer Function 1. s to Z-Domain Transfer Function 1. Discrete ZOH Signals s

Criminal Use of Domain Names Greg Aaron, Illumintel Colin Strutt, Interisle Consulting Group 1

Example: Domain Model Using CRC Cards Steven Zeil February 13, 2013 Example: Domain

Domain-Driven Design Brett D. Roads Domain-Driven Design: Tackling Complexity in the Heart of

H.A.R.P. Clara Mae Barnhart &amp; Soyuz Shrestha Binghamton University Academic Recovery through

Bias-variance trade-off. Crossvalidation. Regularization. Petr Po s k P. Po s k

Update Agenda Single Security Initiative Overview Key Considerations Investors

Targets for S 3 : design, fabrication and control under irradiation Ch. Stodel, CNRS/IN2P3

draft-bonica-l3vpn-auth-01.txt SP can accidentally provision Customer_A interface into

TOS Arno Puder 1 Objectives Motivate the need for Inter-Process Communication Introduce

Blockchains and its applications Quentin Bramas Assistant Professor ICUBE Laboratory

CS330 &amp;)

H.A.R.P. Clara Mae Barnhart & Soyuz Shrestha Binghamton University Academic Recovery through

CS330 &)