Sturgeon and the Cool Kids Problems with Top- N Recommender - PowerPoint PPT Presentation

Sturgeon and the Cool Kids Problems with Top- N Recommender Evaluation Michael D. Ekstrand People and Information Research Team Boise State University Vaibhav Mahant Texas State University https://goo.gl/bfVg1T

What can editorials in mid-20th- century sci-fi mags tell us about evaluating recommender systems?

Evaluating Recommenders Recommenders find items for users . Evaluated: • Online , by measuring actual user response • Offline , by using existing data sets • Prediction accuracy with rating data (RMSE) • Top- N accuracy with ratings, purchases, clicks, etc. (IR metrics – MAP, MRR, P/R, AUC, nDCG)

Offline Evaluation Purchase / Train Data Rating Data Recommender Test Data Compare Recommendations & Measure

The Candidate Set Candidate set 𝐷 𝑣 Decoy set 𝐸 𝑣 Test set 𝑈 𝑣 Often: 𝐷 𝑣 = 𝐽 ∖ 𝑆 𝑣 (all items not rated in training) Recommender is a classifier separating relevant items ( 𝑈 𝑣 ) Recommmend from decoy items ( 𝐸 𝑣 )

Missing Data ☐ Zootopia IR metrics assume a fully coded corpus ☑ The Iron Giant • Real data has unknowns ☑ Frozen • Unknown = irrelevant ☒ Seven ☐ Tangled For recommender systems, this assumption is 👗🔦 RR = 0.5 AP = 0.417

Misclassified Decoys ☐ Zootopia 3 possibilities for Zootopia : ☑ The Iron Giant • I don’t like it ☑ Frozen • I do but data doesn’t know ☒ Seven • I do but I don’t know yet ☐ Tangled RR = 0.5 AP = 0.417

Misclassified Decoys If I would like Zootopia But have not yet seen it Then it is likely a very good recommendation But the recommender is penalized How can we fix this?

IR Solutions Rank Effectiveness • Only rank test items, don’t pick from big set • Requires ratings or negative samples Pooling • Requires judges – doesn’t work for recsys Relevance Inference • Reduces to the recommendation problem • Can we really use a recommender to evaluate a recommender?

Sturgeon’s Law Ninety percent of everything is crud. — T. Sturgeon (1958) Only 1% is ‘really good’ — P. S. Miller (1960)

Sturgeon’s Decoys Most items are not relevant. Corollary: a randomly-selected item is probably not relevant.

Random Decoys • Generalization of One-Plus-Random protocol (Cremonesi et al. 2008) • Candidate set contains • Test items • Randomly selected decoy items One Plus Random tries to recommend each test item separately

How Many Decoys? Koren (2008): right # is open problem, used 1000 Our origin story: find a good number or fraction

Modeling Goodness Starting point: Pr[𝑗 ∈ 𝐻 𝑣 ] , probability 𝑗 is good for 𝑣 goodness rate 𝑕 Want: Pr[𝐸 𝑣 ∩ 𝐻 𝑣 = ∅] ≥ 1 − 𝛽 high likelihood of no misclassified decoys Simplifying assumption: goodness is independent Pr[𝑗 ∉ 𝐻 𝑣 ] = 1 − 𝑕 𝑂 Pr 𝐸 𝑣 ∩ 𝐻 𝑣 = ∅ = ෑ 𝑗∈𝐸 𝑣

What’s the damage? For 𝛽 = 0.05 (95% certainty), 𝑂 = 1000 1 1 − 𝑕 = 0.95 𝑂 𝑕 = 0.0001 Only 1 in 10,000 can be relevant! MovieLens users like 10s to 100s of 25K films

Why so serious? If there is even one good item in the decoy set … … then it is the recommender’s job to find that item If no unknown items are good, why recommend?

Popularity Bias Evaluation naively favors popular recommendations Why? Popular items are more likely to be rated And therefore more likely to be ‘right’ Problem: how much of this is ‘real’?

Sturgeon and Popularity Random items are … … less likely to be relevant (we hoped) … less likely to be popular Result: popularity is even more likely to separate test items from decoys oops

Empirical Results

Empirical Findings • Didn’t see theoretically -expected impact • Absolute difference depends on decoy set size • Statistical significance depends on set size! • No clear inflection points for choosing a size • Algorithm ordering unaffected

Takeaways Random decoys seem useful, but … … have unquantified benefit … may not achieve benefit … have complex problems … hurt reproducibility

Future Work • Compare under Bellogin’s techniques • What happens w/ decoy sizes when neutralizing popularity bias? • Try with more domains • Try one-class classifier techniques • Extend theoretical analysis to ‘Personalized Sturgeon’s Law’

Thank you • Thanks to Sole Pera and the PIReTs • Texas State for supporting initial work Questions? https://goo.gl/bfVg1T

Sturgeon and the Cool Kids Problems with Top- N Recommender - PowerPoint PPT Presentation

Sturgeon and the Cool Kids Problems with Top- N Recommender Evaluation Michael D. Ekstrand People and Information Research Team Boise State University Vaibhav Mahant Texas State University https://goo.gl/bfVg1T What can editorials in

Cool Chips Cool Chips Markets Markets Cool Cargo Applications Cool Cargo Applications

Cool Chips Cool Chips Markets Markets Electronics Cooling Electronics Cooling Cool

Cool Chips Cool Chips Markets Markets Aerospace Applications Aerospace Applications

Cool Chips Cool Chips Markets Markets Domestic Refrigeration Domestic Refrigeration

Cool Chips Cool Chips Markets Markets Semiconductor Fabrication Semiconductor

100 Cool Cities: Overcoming barriers of Cool roofs and cool pavements Hashem Akbari Heat Island

Cool theorems proved by undergraduates Ken Ono Emory University Cool theorems proved by

Raising Resilient Kids Raising Resilient Kids Raising Resilient Kids Raising Resilient Kids

Kids T Kids Teaching K eaching Kids ids Building Resilience Through Environmental Education

Kids in Parks Designing Self-guided Trails that Get Kids in Parks Introducing TRACK Trails Kids

Update on the work of the SRWA Presented by: AnnLisa Jensen, Chair Sturgeon River Watershed

Trends from 21 Years of Lake Sturgeon Assessments in the St. Clair System Andrew S. Briggs,

Sturgeon County January 2020 Overview Todays presentation will cover Intro to Ec Dev

SCHOOL DISTRICT OF 2018-2019 UPDATE STURGEON BAY One personal slide to start out: Who is our

The Wizards of OZ Vanessa Sturgeon Coni Rathbone President, TMT Development Partner, Dunn

BLACK STURGEON WATER QUALITY MONITORING - 2009 to 2015 Summary of 2015 Results Lower Black

Data Science for Public Policy Case of Aspirational Districts Program S ( Subu ) V Subramanian,

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

BREAK EVEN OR BUST 5 Selling Systems DigitalMarketer Needed for Growth John Grimshaw

COMMUNITY TRANSLATION IN AFRICA DENIS GIKUNDA, LOCALIZATION PRG MANAGER w3c: The Multilingual

Regulation 261/2004: The Follow-Up of Sturgeon Italian experience and Recent Development L2B

web web standards < > PURPOSE PRINCIPLES PURPOSE PATTERNS PRINCIPLES

Lecture 12 Electricity II Electrical resistance hea-ng Edison, carbon

Theory in Practice: Modeling in Neuroimaging How to model big MRI datasets Outline of talk

Sturgeon and the Cool Kids Problems with Top- N Recommender - PowerPoint PPT Presentation

Sturgeon and the Cool Kids Problems with Top- N Recommender Evaluation Michael D. Ekstrand People and Information Research Team Boise State University Vaibhav Mahant Texas State University https://goo.gl/bfVg1T What can editorials in

Cool Chips Cool Chips Markets Markets Cool Cargo Applications Cool Cargo Applications

Cool Chips Cool Chips Markets Markets Electronics Cooling Electronics Cooling Cool

Cool Chips Cool Chips Markets Markets Aerospace Applications Aerospace Applications

Cool Chips Cool Chips Markets Markets Domestic Refrigeration Domestic Refrigeration

Cool Chips Cool Chips Markets Markets Semiconductor Fabrication Semiconductor

100 Cool Cities: Overcoming barriers of Cool roofs and cool pavements Hashem Akbari Heat Island

Cool theorems proved by undergraduates Ken Ono Emory University Cool theorems proved by

Raising Resilient Kids Raising Resilient Kids Raising Resilient Kids Raising Resilient Kids

Kids T Kids Teaching K eaching Kids ids Building Resilience Through Environmental Education

Kids in Parks Designing Self-guided Trails that Get Kids in Parks Introducing TRACK Trails Kids

Update on the work of the SRWA Presented by: AnnLisa Jensen, Chair Sturgeon River Watershed

Trends from 21 Years of Lake Sturgeon Assessments in the St. Clair System Andrew S. Briggs,

Sturgeon County January 2020 Overview Todays presentation will cover Intro to Ec Dev

SCHOOL DISTRICT OF 2018-2019 UPDATE STURGEON BAY One personal slide to start out: Who is our

The Wizards of OZ Vanessa Sturgeon Coni Rathbone President, TMT Development Partner, Dunn

BLACK STURGEON WATER QUALITY MONITORING - 2009 to 2015 Summary of 2015 Results Lower Black

Data Science for Public Policy Case of Aspirational Districts Program S ( Subu ) V Subramanian,

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

BREAK EVEN OR BUST 5 Selling Systems DigitalMarketer Needed for Growth John Grimshaw

COMMUNITY TRANSLATION IN AFRICA DENIS GIKUNDA, LOCALIZATION PRG MANAGER w3c: The Multilingual

Regulation 261/2004: The Follow-Up of Sturgeon Italian experience and Recent Development L2B

web web standards &lt; &gt; PURPOSE PRINCIPLES PURPOSE PATTERNS PRINCIPLES

Lecture 12 Electricity II Electrical resistance hea-ng Edison, carbon

Theory in Practice: Modeling in Neuroimaging How to model big MRI datasets Outline of talk

web web standards < > PURPOSE PRINCIPLES PURPOSE PATTERNS PRINCIPLES