Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 20 Jan-Willem van de Meent
Schedule
Schedule Adjustments • Wed 28 Nov: Review Lecture • Mon 3 Dec: Project Presentations • Fri 7 Dec: Project Reports Due • Wed 12 Dec: Final Exam • Fri 14 Dec: Peer Reviews Due
Project
Project Reports • ~10 pages (rough guideline) • Guidelines for contents • Introduction / Motivation • Exploratory analysis (if applicable) • Data mining analysis • Discussion of results
Project Review • 2 per person (randomly assigned) • Reviews should discuss 4 aspects of the report • Clarity ( is the writing clear? ) • Technical merit ( are methods valid? ) • Reproducibility ( is it clear how results were obtained? ) • Discussion ( are results interpretable? )
Recommender Systems
The Long Tail (from: https://www.wired.com/2004/10/tail/)
The Long Tail (from: https://www.wired.com/2004/10/tail/)
The Long Tail (from: https://www.wired.com/2004/10/tail/)
Problem Setting
Problem Setting
Problem Setting
Problem Setting • Task : Predict user preferences for unseen items
Content-based Filtering serious Braveheart The Color Amadeus Purple Lethal Weapon Sense and Sensibility Ocean’s ¡ 11 Geared Geared towards towards males females Dave The Lion King Dumb and Dumber The Princess Independence Diaries Day Gus escapist Two Approaches: 1. Predict rating using item features on a per-user basis 2. Predict rating using user features on a per-item basis
Collaborative Filtering #3 #2 #1 Joe #4 Idea : Predict rating based on similarity to other users
Problem Setting • Task : Predict user preferences for unseen items • Content-based filtering : Model user/item features • Collaborative filtering : Implicit similarity of users or items
Applications of Recommender Systems • Movie recommendation (Netflix) • Related product recommendation (Amazon) • Web page ranking (Google) • Social recommendation (Facebook) • Priority inbox & spam filtering (Google) • Online dating (OK Cupid) • Computational Advertising (Everyone)
Challenges • Scalability • Millions of objects • 100s of millions of users • Cold start • Changing user base • Changing inventory • Imbalanced dataset • User activity / item reviews power law distributed • Ratings are not missing at random
Running Example: Netflix Data Training data Test data user movie date score user movie date score 1 21 5/7/02 1 1 62 1/6/05 ? 1 213 8/2/04 5 1 96 9/13/04 ? 2 345 3/6/01 4 2 7 8/18/05 ? 2 123 5/1/05 4 2 3 11/22/05 ? 2 768 7/15/02 3 3 47 6/13/02 ? 3 76 1/22/01 5 3 15 8/12/01 ? 4 45 8/3/00 4 4 41 9/1/00 ? 5 568 9/10/05 1 4 28 8/27/05 ? 5 342 3/5/03 2 5 93 4/4/05 ? 5 234 12/28/00 2 5 74 7/16/03 ? 6 76 8/11/02 5 6 69 2/14/04 ? 6 56 6/15/03 4 6 83 10/3/03 ? • Released as part of $1M competition by Netflix in 2006 • Prize awarded to BellKor in 2009
Running Yardstick: RMSE s X | S | − 1 r ui − r ui ) 2 rmse( S ) = (ˆ ( i,u ) ∈ S
Running Yardstick: RMSE s X | S | − 1 r ui − r ui ) 2 rmse( S ) = (ˆ ( i,u ) ∈ S (doesn’t tell you how to actually do recommendation)
Content-based Filtering
Item-based Features
Item-based Features
Item-based Features
Per-user Regression Learn a set of regression coefficients for each user | r u − X w | 2 w u = argmin w
User Bias and Item Popularity
Bias
Bias Moonrise Kingdom 4 5 4 4 0.3 0.2
Bias Moonrise Kingdom 4 5 4 4 0.3 0.2 Problem : Some movies are universally loved / hated
Bias 3 3 Moonrise Kingdom 4 5 3 4 4 0.3 0.2 Problem : Some movies are universally loved / hated some users are more picky than others
Bias 3 3 Moonrise Kingdom 4 5 3 4 4 0.3 0.2 Problem : Some movies are universally loved / hated some users are more picky than others Solution: Introduce a per-movie and per-user bias
Collaborative Filtering
Neighborhood Based Methods #3 #2 #1 Joe #4 Users and items form a bipartite graph (edges are ratings)
Neighborhood Based Methods (user, user) similarity • predict rating based on average from k-nearest users • good if item base is small • good if item base changes rapidly (item,item) similarity • predict rating based on average from k-nearest items • good if the user base is small • good if user base changes rapidly
Parzen-Window Style CF #3 #2 #1 Joe #4 • Define a similarity s ij between items • Find set ε k ( i , u ) of k -nearest neighbors to i that were rated by user u • Predict rating using weighted average over set • How should we define s ij ?
• – Pearson Correlation Coefficient • – each item rated by a distinct set of users User ratings for item i: ? ? 1 ? ? 5 5 3 ? ? 4 2 ? ? ? 4 ? 5 4 1 ? User ratings for item j: ? ? 4 2 5 ? ? 1 2 5 ? ? 2 ? ? 3 ? ? ? 5 4 • Cov[ r ui , r uj ] s ij = Std[ r ui ]Std[ r uj ]
(item,item) similarity Empirical estimate of Pearson correlation coefficient P u ∈ U ( i,j ) ( r ui − b ui )( r uj − b uj ) ρ ij = ˆ qP u ∈ U ( i,j ) ( r ui − b ui ) 2 P u ∈ U ( i,j ) ( r uj − b uj ) 2 U(i, j): set of users who have rated both i and j Regularize towards 0 for small support | U ( i, j ) | − 1 s ij = | U ( i, j ) | − 1 + λ ˆ ρ ij Regularize towards baseline for small neighborhood
Similarity for binary labels Pearson correlation not meaningful for binary labels (e.g. Views, Purchases, Clicks) Jaccard similarity Observed / Expected ratio m ij s ij = observed m ij s ij = expected ≈ α + m i m j /m α + m i + m j − m ij m i users acting on i m ij users acting on both i and j m total number of users
Matrix Factorization Methods
Matrix Factorization Moonrise Kingdom 4 5 4 4 0.3 0.2
Matrix Factorization Moonrise Kingdom 4 5 4 4 0.3 0.2 Idea: pose as (biased) matrix factorization problem
Matrix Factorization users 1 3 5 5 4 5 4 4 2 1 3 items ~ 2 4 1 2 3 4 3 5 2 4 5 4 2 4 3 4 2 2 5 1 3 3 2 4 users 1.1 -.2 .3 .5 -2 -.5 .8 -.4 .3 1.4 2.4 -.9 .1 -.4 .2 items -.8 .7 .5 1.4 .3 -1 1.4 2.9 -.7 1.2 -.1 1.3 -.5 .6 .5 ~ 2.1 -.4 .6 1.7 2.4 .9 -.3 .4 .8 .7 -.6 .1 -.2 .3 .5 1.1 2.1 .3 -.7 2.1 -2 -1 .7 .3 A rank-3 SVD approximation
Prediction users 1 3 5 5 4 5 4 4 2 1 3 ? items ~ 2 4 1 2 3 4 3 5 2 4 5 4 2 4 3 4 2 2 5 1 3 3 2 4 users 1.1 -.2 .3 .5 -2 -.5 .8 -.4 .3 1.4 2.4 -.9 .1 -.4 .2 items -.8 .7 .5 1.4 .3 -1 1.4 2.9 -.7 1.2 -.1 1.3 -.5 .6 .5 ~ 2.1 -.4 .6 1.7 2.4 .9 -.3 .4 .8 .7 -.6 .1 -.2 .3 .5 1.1 2.1 .3 -.7 2.1 -2 -1 .7 .3 A rank-3 SVD approximation
Prediction users 1 3 5 5 4 5 4 4 2 1 3 2.4 items ~ 2 4 1 2 3 4 3 5 2 4 5 4 2 4 3 4 2 2 5 1 3 3 2 4 users 1.1 -.2 .3 .5 -2 -.5 .8 -.4 .3 1.4 2.4 -.9 .1 -.4 .2 items -.8 .7 .5 1.4 .3 -1 1.4 2.9 -.7 1.2 -.1 1.3 -.5 .6 .5 ~ 2.1 -.4 .6 1.7 2.4 .9 -.3 .4 .8 .7 -.6 .1 -.2 .3 .5 1.1 2.1 .3 -.7 2.1 -2 -1 .7 .3 A rank-3 SVD approximation
SVD with missing values .1 -.4 .2 1 3 5 5 4 -.5 .6 .5 5 4 4 2 1 3 1.1 -.2 .3 .5 -2 -.5 .8 -.4 .3 1.4 2.4 -.9 -.2 .3 .5 ~ 2 4 1 2 3 4 3 5 -.8 .7 .5 1.4 .3 -1 1.4 2.9 -.7 1.2 -.1 1.3 1.1 2.1 .3 2 4 5 4 2 2.1 -.4 .6 1.7 2.4 .9 -.3 .4 .8 .7 -.6 .1 -.7 2.1 -2 4 3 4 2 2 5 -1 .7 .3 1 3 3 2 4 Pose as regression problem • SVD ¡isn’t ¡defined ¡when ¡entries ¡are ¡unknown ¡ � • � Regularize using Frobenius norm • – –
Alternating Least Squares .1 -.4 .2 1 3 5 5 4 -.5 .6 .5 5 4 4 2 1 3 1.1 -.2 .3 .5 -2 -.5 .8 -.4 .3 1.4 2.4 -.9 -.2 .3 .5 ~ 2 4 1 2 3 4 3 5 -.8 .7 .5 1.4 .3 -1 1.4 2.9 -.7 1.2 -.1 1.3 1.1 2.1 .3 2 4 5 4 2 2.1 -.4 .6 1.7 2.4 .9 -.3 .4 .8 .7 -.6 .1 -.7 2.1 -2 4 3 4 2 2 5 -1 .7 .3 1 3 3 2 4 • SVD ¡isn’t ¡defined ¡when ¡entries ¡are ¡unknown ¡ � (regress w u given X ) • � • – –
Alternating Least Squares .1 -.4 .2 1 3 5 5 4 -.5 .6 .5 5 4 4 2 1 3 1.1 -.2 .3 .5 -2 -.5 .8 -.4 .3 1.4 2.4 -.9 -.2 .3 .5 ~ 2 4 1 2 3 4 3 5 -.8 .7 .5 1.4 .3 -1 1.4 2.9 -.7 1.2 -.1 1.3 1.1 2.1 .3 2 4 5 4 2 2.1 -.4 .6 1.7 2.4 .9 -.3 .4 .8 .7 -.6 .1 -.7 2.1 -2 4 3 4 2 2 5 -1 .7 .3 1 3 3 2 4 • SVD ¡isn’t ¡defined ¡when ¡entries ¡are ¡unknown ¡ � (regress w u given X ) • � • L 2: closed form solution Remember – ridge regression? w = ( X T X + λ I ) � 1 X T y –
Recommend
More recommend