Lecture 20 Jan-Willem van de Meent Schedule Schedule Adjustments - PowerPoint PPT Presentation

Unsupervised Machine Learning   and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 20 Jan-Willem van de Meent

Schedule

Schedule Adjustments • Wed 28 Nov: Review Lecture • Mon 3 Dec: Project Presentations • Fri 7 Dec: Project Reports Due • Wed 12 Dec: Final Exam • Fri 14 Dec: Peer Reviews Due

Project

Project Reports • ~10 pages (rough guideline) • Guidelines for contents • Introduction / Motivation • Exploratory analysis (if applicable) • Data mining analysis • Discussion of results

Project Review • 2 per person (randomly assigned) • Reviews should discuss 4 aspects   of the report • Clarity   ( is the writing clear? ) • Technical merit   ( are methods valid? ) • Reproducibility   ( is it clear how results were obtained? ) • Discussion   ( are results interpretable? )

Recommender Systems

The Long Tail (from: https://www.wired.com/2004/10/tail/)

Problem Setting

Problem Setting • Task : Predict user preferences for unseen items

Content-based Filtering serious Braveheart The Color Amadeus Purple Lethal Weapon Sense and Sensibility Ocean’s ¡ 11 Geared Geared towards towards males females Dave The Lion King Dumb and Dumber The Princess Independence Diaries Day Gus escapist Two Approaches:   1. Predict rating using item features on a per-user basis 2. Predict rating using user features on a per-item basis

Collaborative Filtering #3 #2 #1 Joe #4 Idea : Predict rating based on similarity to other users

Problem Setting • Task : Predict user preferences for unseen items • Content-based filtering : Model user/item features • Collaborative filtering : Implicit similarity of users or items

Applications of Recommender Systems • Movie recommendation (Netflix) • Related product recommendation (Amazon) • Web page ranking (Google) • Social recommendation (Facebook) • Priority inbox & spam filtering (Google) • Online dating (OK Cupid) • Computational Advertising (Everyone)

Challenges • Scalability • Millions of objects • 100s of millions of users • Cold start • Changing user base • Changing inventory • Imbalanced dataset • User activity / item reviews   power law distributed • Ratings are not missing at random

Running Example: Netflix Data Training data Test data user movie date score user movie date score 1 21 5/7/02 1 1 62 1/6/05 ? 1 213 8/2/04 5 1 96 9/13/04 ? 2 345 3/6/01 4 2 7 8/18/05 ? 2 123 5/1/05 4 2 3 11/22/05 ? 2 768 7/15/02 3 3 47 6/13/02 ? 3 76 1/22/01 5 3 15 8/12/01 ? 4 45 8/3/00 4 4 41 9/1/00 ? 5 568 9/10/05 1 4 28 8/27/05 ? 5 342 3/5/03 2 5 93 4/4/05 ? 5 234 12/28/00 2 5 74 7/16/03 ? 6 76 8/11/02 5 6 69 2/14/04 ? 6 56 6/15/03 4 6 83 10/3/03 ? • Released as part of $1M competition by Netflix in 2006 • Prize awarded to BellKor in 2009

Running Yardstick: RMSE s X | S | − 1 r ui − r ui ) 2 rmse( S ) = (ˆ ( i,u ) ∈ S

Running Yardstick: RMSE s X | S | − 1 r ui − r ui ) 2 rmse( S ) = (ˆ ( i,u ) ∈ S (doesn’t tell you how to actually do recommendation)

Content-based Filtering

Item-based Features

Per-user Regression Learn a set of regression coefficients for each user | r u − X w | 2 w u = argmin w

User Bias and   Item Popularity

Bias Moonrise Kingdom 4 5 4 4 0.3 0.2

Bias Moonrise Kingdom 4 5 4 4 0.3 0.2 Problem : Some movies are universally loved / hated

Bias 3 3 Moonrise Kingdom 4 5 3 4 4 0.3 0.2 Problem : Some movies are universally loved / hated   some users are more picky than others

Bias 3 3 Moonrise Kingdom 4 5 3 4 4 0.3 0.2 Problem : Some movies are universally loved / hated   some users are more picky than others Solution: Introduce a per-movie and per-user bias

Collaborative   Filtering

Neighborhood Based Methods #3 #2 #1 Joe #4 Users and items form a bipartite graph (edges are ratings)

Neighborhood Based Methods (user, user) similarity • predict rating based on average   from k-nearest users • good if item base is small • good if item base changes rapidly (item,item) similarity • predict rating based on average   from k-nearest items • good if the user base is small • good if user base changes rapidly

Parzen-Window Style CF #3 #2 #1 Joe #4 • Define a similarity s ij between items • Find set ε k ( i , u ) of k -nearest neighbors   to i that were rated by user u • Predict rating using weighted average over set • How should we define s ij ?

• – Pearson Correlation Coefficient • – each item rated by a distinct set of users User ratings for item i: ? ? 1 ? ? 5 5 3 ? ? 4 2 ? ? ? 4 ? 5 4 1 ? User ratings for item j: ? ? 4 2 5 ? ? 1 2 5 ? ? 2 ? ? 3 ? ? ? 5 4 • Cov[ r ui , r uj ] s ij = Std[ r ui ]Std[ r uj ]

(item,item) similarity Empirical estimate of Pearson correlation coefficient P u ∈ U ( i,j ) ( r ui − b ui )( r uj − b uj ) ρ ij = ˆ qP u ∈ U ( i,j ) ( r ui − b ui ) 2 P u ∈ U ( i,j ) ( r uj − b uj ) 2 U(i, j): set of users who have rated both i and j Regularize towards 0 for small support | U ( i, j ) | − 1 s ij = | U ( i, j ) | − 1 + λ ˆ ρ ij Regularize towards baseline for small neighborhood

Similarity for binary labels Pearson correlation not meaningful for binary labels   (e.g. Views, Purchases, Clicks) Jaccard similarity Observed / Expected ratio m ij s ij = observed m ij s ij = expected ≈ α + m i m j /m α + m i + m j − m ij m i users acting on i m ij users acting on both i and j m total number of users

Matrix Factorization Methods

Matrix Factorization Moonrise Kingdom 4 5 4 4 0.3 0.2

Matrix Factorization Moonrise Kingdom 4 5 4 4 0.3 0.2 Idea: pose as (biased) matrix factorization problem

Matrix Factorization users 1 3 5 5 4 5 4 4 2 1 3 items ~ 2 4 1 2 3 4 3 5 2 4 5 4 2 4 3 4 2 2 5 1 3 3 2 4 users 1.1 -.2 .3 .5 -2 -.5 .8 -.4 .3 1.4 2.4 -.9 .1 -.4 .2 items -.8 .7 .5 1.4 .3 -1 1.4 2.9 -.7 1.2 -.1 1.3 -.5 .6 .5 ~ 2.1 -.4 .6 1.7 2.4 .9 -.3 .4 .8 .7 -.6 .1 -.2 .3 .5 1.1 2.1 .3 -.7 2.1 -2 -1 .7 .3 A rank-3 SVD approximation

Prediction users 1 3 5 5 4 5 4 4 2 1 3 ? items ~ 2 4 1 2 3 4 3 5 2 4 5 4 2 4 3 4 2 2 5 1 3 3 2 4 users 1.1 -.2 .3 .5 -2 -.5 .8 -.4 .3 1.4 2.4 -.9 .1 -.4 .2 items -.8 .7 .5 1.4 .3 -1 1.4 2.9 -.7 1.2 -.1 1.3 -.5 .6 .5 ~ 2.1 -.4 .6 1.7 2.4 .9 -.3 .4 .8 .7 -.6 .1 -.2 .3 .5 1.1 2.1 .3 -.7 2.1 -2 -1 .7 .3 A rank-3 SVD approximation

Prediction users 1 3 5 5 4 5 4 4 2 1 3 2.4 items ~ 2 4 1 2 3 4 3 5 2 4 5 4 2 4 3 4 2 2 5 1 3 3 2 4 users 1.1 -.2 .3 .5 -2 -.5 .8 -.4 .3 1.4 2.4 -.9 .1 -.4 .2 items -.8 .7 .5 1.4 .3 -1 1.4 2.9 -.7 1.2 -.1 1.3 -.5 .6 .5 ~ 2.1 -.4 .6 1.7 2.4 .9 -.3 .4 .8 .7 -.6 .1 -.2 .3 .5 1.1 2.1 .3 -.7 2.1 -2 -1 .7 .3 A rank-3 SVD approximation

SVD with missing values .1 -.4 .2 1 3 5 5 4 -.5 .6 .5 5 4 4 2 1 3 1.1 -.2 .3 .5 -2 -.5 .8 -.4 .3 1.4 2.4 -.9 -.2 .3 .5 ~ 2 4 1 2 3 4 3 5 -.8 .7 .5 1.4 .3 -1 1.4 2.9 -.7 1.2 -.1 1.3 1.1 2.1 .3 2 4 5 4 2 2.1 -.4 .6 1.7 2.4 .9 -.3 .4 .8 .7 -.6 .1 -.7 2.1 -2 4 3 4 2 2 5 -1 .7 .3 1 3 3 2 4 Pose as regression problem • SVD ¡isn’t ¡defined ¡when ¡entries ¡are ¡unknown ¡ � • � Regularize using Frobenius norm • – –

Alternating Least Squares .1 -.4 .2 1 3 5 5 4 -.5 .6 .5 5 4 4 2 1 3 1.1 -.2 .3 .5 -2 -.5 .8 -.4 .3 1.4 2.4 -.9 -.2 .3 .5 ~ 2 4 1 2 3 4 3 5 -.8 .7 .5 1.4 .3 -1 1.4 2.9 -.7 1.2 -.1 1.3 1.1 2.1 .3 2 4 5 4 2 2.1 -.4 .6 1.7 2.4 .9 -.3 .4 .8 .7 -.6 .1 -.7 2.1 -2 4 3 4 2 2 5 -1 .7 .3 1 3 3 2 4 • SVD ¡isn’t ¡defined ¡when ¡entries ¡are ¡unknown ¡ � (regress w u given X ) • � • – –

Alternating Least Squares .1 -.4 .2 1 3 5 5 4 -.5 .6 .5 5 4 4 2 1 3 1.1 -.2 .3 .5 -2 -.5 .8 -.4 .3 1.4 2.4 -.9 -.2 .3 .5 ~ 2 4 1 2 3 4 3 5 -.8 .7 .5 1.4 .3 -1 1.4 2.9 -.7 1.2 -.1 1.3 1.1 2.1 .3 2 4 5 4 2 2.1 -.4 .6 1.7 2.4 .9 -.3 .4 .8 .7 -.6 .1 -.7 2.1 -2 4 3 4 2 2 5 -1 .7 .3 1 3 3 2 4 • SVD ¡isn’t ¡defined ¡when ¡entries ¡are ¡unknown ¡ � (regress w u given X ) • � • L 2: closed form solution Remember – ridge regression? w = ( X T X + λ I ) � 1 X T y –

Lecture 20 Jan-Willem van de Meent Schedule Schedule Adjustments - PowerPoint PPT Presentation

Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 20 Jan-Willem van de Meent Schedule Schedule Adjustments Wed 28 Nov: Review Lecture Mon 3 Dec: Project Presentations Fri 7 Dec: Project Reports

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

CEE 680 Lecture #2 1/22/2020 1 CEE 680 Lecture #2 1/22/2020 2 CEE 680 Lecture #2

Pocket Lecture Pocket Lecture Pocket Lecture Pocket Lecture Listen Audio Notes Progress

Multiphase Modelling in Cancer Helen Byrne Wolfson Centre for Mathematical Biology Mathematical

Previous Lecture Todays Lecture Slides for Lecture 5 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 30 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 28 Completion of divide-by-3 counter

Previous Lecture Todays Lecture Slides for Lecture 12 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 3 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 2 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 35 ENEL 353: Digital Circuits Fall

Lecture Capture Introduction to Lecture Capture Learning Outcomes What will lecture capture

Previous Lecture Todays Lecture Slides for Lecture 32 Completion of a timing analysis

Repetition Automatic Control, Basic Course, Lecture 11 Fredrik Bagge Carlson December 17, 2016

Previous Lecture Todays Lecture Slides for Lecture 26 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 33 ENEL 353: Digital Circuits Fall

Jennifer Goforth Gregory Carol Tice The Content Marketing Writer Make a Living Writing blog

No one likes to be sold to! People hire and refer lawyers they know, like, and trust! One of the

Technology for Occupation and Participation CAOT Practice Network Exchange Forum on COVID-19

MedDRA and Information Technology Blue Ribbon Panel Discussion 10-Sep-2020 Blue Ribbon Panel

and Data Science Lecture 1, September 9, 2015 Maria-Florina (Nina) Balcan Course Staff

Workshop for Web-based Signage Electronics and Telecommunications Research Institute Sunghan Kim

SG01 at the NTCIR-13 STC-2 task Haizhou Zhao , Yi Du, Hangyu Li, Qiao Qian, Hao Zhou, Minlie

Public Hearing: 2017 CDBG-DR Harris County Draft Infrastructure Project Application No. 8