CSE 255 Lecture 6 Data Mining and Predictive Analytics Combining - PowerPoint PPT Presentation

CSE 255 – Lecture 6 Data Mining and Predictive Analytics Combining models of ratings and reviews

Ratings – Latent Factor Models Two models we’ve seen so far: 1: Latent Factor Models (Lecture 5) learn my preferences, and the product’s properties my (user’s) HP’s (item) “preferences” “properties” e.g. Koren & Bell (2011)

T ext – Latent Dirichlet Allocation Two models we’ve seen so far: 2: Topic models (Today!) Document topics LDA (review of “The Chronicles of Riddick”) Sci-fi Action: space, future, planet,… action, loud, fast, explosion,… Blei & McAuliffe (2007)

Low-dimensional representations • Both of these models try to summarize complex data into low-dimensional representations • If both of these models are based on the same principle (project high-dimensional data into low-dimensional spaces), can we combine them? • In other words, can we come up with low- dimensional representations that capture the common structure present in both types of data simultaneously?

Why combine ratings and text? Reason 1 (modeling): it takes lots of ratings to estimate high-dimensional models of users and items – we might get away with fewer reviews Reason 2 (understanding): standard rating models have no interpretations – text might help us explain opinion dimensions ACM RecSys 2013 (w/ Leskovec)

Combining ratings and reviews The parameters of a “standard” recommender system user/item offset user/item bias latent factors are fit so as to minimize the mean-squared error where is a training corpus of ratings

Combining ratings and reviews transform Review “topics” Item “factors” Our approach: find topics in reviews that inform us about opinions

Combining ratings and reviews We replace this objective with one that uses the review text as a regularizer: rating parameters LDA parameters

Model fitting Repeat steps (1) and (2) until convergence: Step 1: fit a rating model regularized by solved via gradient ascent using L-BFGS the topics (see e.g. Koren & Bell, 2011) Step 2: identify topics (solved via gradient ascent using L-BFGS) that “explain” solved via Gibbs sampling the ratings (see e.g. Blei & McAuliffe, 2007)

Outcomes – rating prediction Rating prediction: • Amazon (35M reviews): 6% better than state-of-the-art • Yelp (230K reviews): 4% better than state-of-the-art New users: • Improvements are largest for users with few reviews: !

Outcomes – interpretation Interpretability: Topics are highly interpretable across all datasets Beers Musical Instruments pale ales pale ales lambics lambics dark beers dark beers spices spices wheat beers wheat beers drums drums strings strings wind wind mics mics software software cartridge guitar reeds mic software ipa ipa funk funk chocolate chocolate pumpkin pumpkin wheat wheat cartridge guitar reeds mic software pine pine brett brett coffee coffee nutmeg nutmeg yellow yellow sticks sticks violin violin harmonica harmonica microphone microphone interface interface grapefruit grapefruit saison saison black black corn corn straw straw strings strings strap strap cream cream stand stand midi midi citrus citrus vinegar vinegar dark dark cinnamon cinnamon pilsner pilsner snare snare neck neck reed reed mics mics windows windows ipas ipas raspberry raspberry roasted roasted pie pie summer summer stylus stylus capo capo harp harp wireless wireless drivers drivers piney piney lambic lambic stout stout cheap cheap pale pale cymbals cymbals tune tune fog fog microphones microphones inputs inputs citrusy citrusy barnyard barnyard bourbon bourbon bud bud lager lager mute mute guitars guitars mouthpiece mouthpiece condenser condenser usb usb floral floral funky funky tan tan water water banana banana heads heads picks picks bruce bruce battery battery computer computer hoppy hoppy tart tart porter porter macro macro coriander coriander these these bridge bridge harmonicas harmonicas filter filter mp3 mp3 dipa dipa raspberries raspberries vanilla vanilla adjunct adjunct pils pils daddario daddario tuner tuner harps harps stands stands program program

Outcomes – usefulness prediction What makes a review useful? “Useful” reviews discuss topics in proportion to their importance Do the topics in my review match those that the community find important?

Questions?

CSE 255 Lecture 6 Data Mining and Predictive Analytics Combining - PowerPoint PPT Presentation

CSE 255 Lecture 6 Data Mining and Predictive Analytics Combining models of ratings and reviews Ratings Latent Factor Models Two models weve seen so far: 1: Latent Factor Models (Lecture 5) learn my preferences, and the products

IIT Mumbai First and Last Leg Optimization 127 203 179 212 255 255 175 215 149 195

Testing 221 238 197 223 171 213 Manoj Nambiar, Tata Blue 50% Tata Blue 25% Purple 50 %

CSE 255 Data Mining and Predictive Analytics Introduction What is CSE 255? In this course we

HSI and RGB Transformation and Applications with Tim Welch (R)ed (G)reen (B)lue Model Color

ACCIDENT REPORTING 237 217 200 80 252 237 217 200 119 174 237 217 200 27 .59 255 0

Parts of a Circle MP2: Reason abstractly & quantitatively. MP3: Construct viable arguments

MAKING THE DECISION 237 217 200 80 252 237 217 200 119 174 237 217 200 27 .59 255

Parts of a Circle Euclid defined figures in this way: Definition 13: A boundary is that which is

Color Blending Sander Tiganik Colors (R,G,B,A?) 3 or 4 channels A channel contains

CSE 3401 Functional and Logic Programming York University CSE 3401 Vida Movahedi 1 York University

CSE 182-L2:Blast & variants I Dynamic Programming www.cse cse. .ucsd ucsd. .edu

CSE 312 Final Review: Section AA CSE 312 TAs December 8, 2011 CSE 312 Final Review: Section AA

Welcome to CSE 506 Introduc/on & Review Don Porter 1 2 CSE 506: Opera.ng Systems CSE 506:

4 5 6 CSE 142 vs CSE 143 CSE 142 / AP CS A CSE 143 You learned how to write Return of

CS 457 Lecture 12 Routing Fall 2011 IP Address and 24-bit Subnet Mask Address 12 34

CSE Fall 2014 311 Lecture 1 Lecture 1 Lecture 1: Propositional Logic Lecture 1 Foundations

04-1: Market defini1on U.C. Berkeley, Boalt Hall School of

Writing maintainable and extensible CSS Mato gajner, 2014 Complex projects and puny

A Learned Index for Log-Structured Merge Trees Yifan Dai, Yien Xu, Aishwarya Ganesan, Ramnatthan

Caches and Memory Anne Bracy CS 3410 Computer Science Cornell University Slides by Anne Bracy

Metrics are Pivotal A NATIONAL FARM TO INSTITUTION METRICS COLLABORATIVE WEBINAR Local

July 22, 2020 Session 1 (Recorded) The Unemployment Self Check Welcome! Rediscovering WHO you

ENERGY STAR Connected Thermostats Stakeholder Working Meeting March 23, 2018 1 Attendees

Saint Lukes eHealth/Virtual Care Evolving models of Virtual Care Stephen Kropp, MS, System

CSE 255 Lecture 6 Data Mining and Predictive Analytics Combining - PowerPoint PPT Presentation

CSE 255 Lecture 6 Data Mining and Predictive Analytics Combining models of ratings and reviews Ratings Latent Factor Models Two models weve seen so far: 1: Latent Factor Models (Lecture 5) learn my preferences, and the products

IIT Mumbai First and Last Leg Optimization 127 203 179 212 255 255 175 215 149 195

Testing 221 238 197 223 171 213 Manoj Nambiar, Tata Blue 50% Tata Blue 25% Purple 50 %

CSE 255 Data Mining and Predictive Analytics Introduction What is CSE 255? In this course we

HSI and RGB Transformation and Applications with Tim Welch (R)ed (G)reen (B)lue Model Color

ACCIDENT REPORTING 237 217 200 80 252 237 217 200 119 174 237 217 200 27 .59 255 0

Parts of a Circle MP2: Reason abstractly &amp; quantitatively. MP3: Construct viable arguments

MAKING THE DECISION 237 217 200 80 252 237 217 200 119 174 237 217 200 27 .59 255

Parts of a Circle Euclid defined figures in this way: Definition 13: A boundary is that which is

Color Blending Sander Tiganik Colors (R,G,B,A?) 3 or 4 channels A channel contains

CSE 3401 Functional and Logic Programming York University CSE 3401 Vida Movahedi 1 York University

CSE 182-L2:Blast &amp; variants I Dynamic Programming www.cse cse. .ucsd ucsd. .edu

CSE 312 Final Review: Section AA CSE 312 TAs December 8, 2011 CSE 312 Final Review: Section AA

Welcome to CSE 506 Introduc/on &amp; Review Don Porter 1 2 CSE 506: Opera.ng Systems CSE 506:

4 5 6 CSE 142 vs CSE 143 CSE 142 / AP CS A CSE 143 You learned how to write Return of

CS 457 Lecture 12 Routing Fall 2011 IP Address and 24-bit Subnet Mask Address 12 34

CSE Fall 2014 311 Lecture 1 Lecture 1 Lecture 1: Propositional Logic Lecture 1 Foundations

04-1: Market defini1on U.C. Berkeley, Boalt Hall School of

Writing maintainable and extensible CSS Mato gajner, 2014 Complex projects and puny

A Learned Index for Log-Structured Merge Trees Yifan Dai, Yien Xu, Aishwarya Ganesan, Ramnatthan

Caches and Memory Anne Bracy CS 3410 Computer Science Cornell University Slides by Anne Bracy

Metrics are Pivotal A NATIONAL FARM TO INSTITUTION METRICS COLLABORATIVE WEBINAR Local

July 22, 2020 Session 1 (Recorded) The Unemployment Self Check Welcome! Rediscovering WHO you

ENERGY STAR Connected Thermostats Stakeholder Working Meeting March 23, 2018 1 Attendees

Saint Lukes eHealth/Virtual Care Evolving models of Virtual Care Stephen Kropp, MS, System

Parts of a Circle MP2: Reason abstractly & quantitatively. MP3: Construct viable arguments

CSE 182-L2:Blast & variants I Dynamic Programming www.cse cse. .ucsd ucsd. .edu

Welcome to CSE 506 Introduc/on & Review Don Porter 1 2 CSE 506: Opera.ng Systems CSE 506: