Unsupervised Machine Learning and Data Mining
DS 5230 / DS 4420 - Fall 2018
Lecture 20
Jan-Willem van de Meent
Lecture 20 Jan-Willem van de Meent Schedule Schedule Adjustments - - PowerPoint PPT Presentation
Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 20 Jan-Willem van de Meent Schedule Schedule Adjustments Wed 28 Nov: Review Lecture Mon 3 Dec: Project Presentations Fri 7 Dec: Project Reports
DS 5230 / DS 4420 - Fall 2018
Jan-Willem van de Meent
(is the writing clear?)
(are methods valid?)
(is it clear how results were obtained?)
(are results interpretable?)
(from: https://www.wired.com/2004/10/tail/)
(from: https://www.wired.com/2004/10/tail/)
(from: https://www.wired.com/2004/10/tail/)
Geared towards females Geared towards males serious escapist The Princess Diaries The Lion King Braveheart Lethal Weapon Independence Day Amadeus The Color Purple Dumb and Dumber Ocean’s ¡11 Sense and Sensibility
Gus DaveTwo Approaches:
Joe
#2 #3 #1 #4
Idea: Predict rating based on similarity to other users
score
date movie user
1 5/7/02 21 1 5 8/2/04 213 1 4 3/6/01 345 2 4 5/1/05 123 2 3 7/15/02 768 2 5 1/22/01 76 3 4 8/3/00 45 4 1 9/10/05 568 5 2 3/5/03 342 5 2 12/28/00 234 5 5 8/11/02 76 6 4 6/15/03 56 6
score date movie user
? 1/6/05 62 1 ? 9/13/04 96 1 ? 8/18/05 7 2 ? 11/22/05 3 2 ? 6/13/02 47 3 ? 8/12/01 15 3 ? 9/1/00 41 4 ? 8/27/05 28 4 ? 4/4/05 93 5 ? 7/16/03 74 5 ? 2/14/04 69 6 ? 10/3/03 83 6
Training data Test data
(i,u)∈S
(i,u)∈S
(doesn’t tell you how to actually do recommendation)
wu = argmin
w
|ru − X w|2
Learn a set of regression coefficients for each user
Moonrise Kingdom 4 5 4 4 0.3 0.2
Moonrise Kingdom 4 5 4 4 0.3 0.2
Problem: Some movies are universally loved / hated
Moonrise Kingdom 4 5 4 4 0.3 0.2
Problem: Some movies are universally loved / hated some users are more picky than others
3 3 3
Solution: Introduce a per-movie and per-user bias Problem: Some movies are universally loved / hated some users are more picky than others
Moonrise Kingdom 4 5 4 4 0.3 0.2 3 3 3
Joe
#2 #3 #1 #4
Users and items form a bipartite graph (edges are ratings)
(user, user) similarity
from k-nearest users
(item,item) similarity
from k-nearest items
to i that were rated by user u
#2 #3 #1 #4
each item rated by a distinct set of users
1 ? ? 5 5 3 ? ? ? 4 2 ? ? ? ? 4 ? 5 4 1 ? ? ? 4 2 5 ? ? 1 2 5 ? ? 2 ? ? 3 ? ? ? 5 4 User ratings for item i: User ratings for item j:
ˆ ρij = P
u∈U(i,j)(rui − bui)(ruj − buj)
qP
u∈U(i,j)(rui − bui)2 P u∈U(i,j)(ruj − buj)2
Empirical estimate of Pearson correlation coefficient sij = |U(i, j)| − 1 |U(i, j)| − 1 + λ ˆ ρij Regularize towards 0 for small support Regularize towards baseline for small neighborhood
U(i, j): set of users who have rated both i and j
mi users acting on i mij users acting on both i and j m total number of users sij = mij α + mi + mj − mij sij = observed expected ≈ mij α + mimj/m Jaccard similarity Observed / Expected ratio Pearson correlation not meaningful for binary labels (e.g. Views, Purchases, Clicks)
Moonrise Kingdom 4 5 4 4 0.3 0.2
Moonrise Kingdom 4 5 4 4 0.3 0.2
Idea: pose as (biased) matrix factorization problem
4 5 5 3 1 3 1 2 4 4 5 5 3 4 3 2 1 4 2 2 4 5 4 2 5 2 2 4 3 4 4 2 3 3 1
items
.2
.1 .5 .6
.5 .3
.3 2.1 1.1
2.1
.3 .7
2.4 1.4 .3
.8
.5 .3
1.1 1.3
1.2
2.9 1.4
.3 1.4 .5 .7
.1
.7 .8 .4
.9 2.4 1.7 .6
2.1
items users users A rank-3 SVD approximation
4 5 5 3 1 3 1 2 4 4 5 5 3 4 3 2 1 4 2 2 4 5 4 2 5 2 2 4 3 4 4 2 3 3 1
items
.2
.1 .5 .6
.5 .3
.3 2.1 1.1
2.1
.3 .7
2.4 1.4 .3
.8
.5 .3
1.1 1.3
1.2
2.9 1.4
.3 1.4 .5 .7
.1
.7 .8 .4
.9 2.4 1.7 .6
2.1
items users A rank-3 SVD approximation users
?
4 5 5 3 1 3 1 2 4 4 5 5 3 4 3 2 1 4 2 2 4 5 4 2 5 2 2 4 3 4 4 2 3 3 1
items
.2
.1 .5 .6
.5 .3
.3 2.1 1.1
2.1
.3 .7
2.4 1.4 .3
.8
.5 .3
1.1 1.3
1.2
2.9 1.4
.3 1.4 .5 .7
.1
.7 .8 .4
.9 2.4 1.7 .6
2.1
items users
2.4
A rank-3 SVD approximation users
4 5 5 3 1 3 1 2 4 4 5 5 3 4 3 2 1 4 2 2 4 5 4 2 5 2 2 4 3 4 4 2 3 3 1 .2
.1 .5 .6
.5 .3
.3 2.1 1.1
2.1
.3 .7
2.4 1.4 .3
.8
.5 .3
1.1 1.3
1.2
2.9 1.4
.3 1.4 .5 .7
.1
.7 .8 .4
.9 2.4 1.7 .6
2.1
–
Pose as regression problem Regularize using Frobenius norm
4 5 5 3 1 3 1 2 4 4 5 5 3 4 3 2 1 4 2 2 4 5 4 2 5 2 2 4 3 4 4 2 3 3 1 .2
.1 .5 .6
.5 .3
.3 2.1 1.1
2.1
.3 .7
2.4 1.4 .3
.8
.5 .3
1.1 1.3
1.2
2.9 1.4
.3 1.4 .5 .7
.1
.7 .8 .4
.9 2.4 1.7 .6
2.1
–
(regress wu given X)
4 5 5 3 1 3 1 2 4 4 5 5 3 4 3 2 1 4 2 2 4 5 4 2 5 2 2 4 3 4 4 2 3 3 1 .2
.1 .5 .6
.5 .3
.3 2.1 1.1
2.1
.3 .7
2.4 1.4 .3
.8
.5 .3
1.1 1.3
1.2
2.9 1.4
.3 1.4 .5 .7
.1
.7 .8 .4
.9 2.4 1.7 .6
2.1
–
(regress wu given X)
L2: closed form solution w = (XTX + λI)1XTy
Remember ridge regression?
4 5 5 3 1 3 1 2 4 4 5 5 3 4 3 2 1 4 2 2 4 5 4 2 5 2 2 4 3 4 4 2 3 3 1 .2
.1 .5 .6
.5 .3
.3 2.1 1.1
2.1
.3 .7
2.4 1.4 .3
.8
.5 .3
1.1 1.3
1.2
2.9 1.4
.3 1.4 .5 .7
.1
.7 .8 .4
.9 2.4 1.7 .6
2.1
–
(regress xi given W) (regress wu given X)
4 5 5 3 1 3 1 2 4 4 5 5 3 4 3 2 1 4 2 2 4 5 4 2 5 2 2 4 3 4 4 2 3 3 1 .2
.1 .5 .6
.5 .3
.3 2.1 1.1
2.1
.3 .7
2.4 1.4 .3
.8
.5 .3
1.1 1.3
1.2
2.9 1.4
.3 1.4 .5 .7
.1
.7 .8 .4
.9 2.4 1.7 .6
2.1
–
(Recht, Re, Wright, 2012 - Hogwild)
at ¡Random ¡Assumption” ¡ Yahoo! survey answers Yahoo! music ratings Netflix ratings
4 5 5 3 1 3 1 2 4 4 5 5 3 4 3 2 1 4 2 2 4 5 4 2 5 2 2 4 3 4 4 2 3 3 1
users movies
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
users movies
matrix factorization regression data
…
2004
Netflix changed rating labels
Are movies getting better with time?
Do movies get better with time?
Solution: Model temporal effects in bias not weights
Are movies getting better with time?
40 60 90 128 180 50 100 200 50 100 200 50 100 200 500 100 200 500 50 100 200 500 1000 1500
0.875 0.88 0.885 0.89 0.895 0.9 0.905 0.91
10 100 1000 10000 100000
RMSE
Millions of Parameters
Factor models: Error vs. #parameters
NMF BiasSVD SVD++ SVD v.2 SVD v.3 SVD v.4
Add biases
Do SGD, but also learn biases μ, bu and bi
Account for fact that ratings are not missing at random.
40 60 90 128 180 50 100 200 50 100 200 50 100 200 500 100 200 500 50 100 200 500 1000 1500
0.875 0.88 0.885 0.89 0.895 0.9 0.905 0.91
10 100 1000 10000 100000
RMSE
Millions of Parameters
Factor models: Error vs. #parameters
NMF BiasSVD SVD++ SVD v.2 SVD v.3 SVD v.4
“who ¡rated ¡ what”
40 60 90 128 180 50 100 200 50 100 200 50 100 200 500 100 200 500 50 100 200 500 1000 1500
0.875 0.88 0.885 0.89 0.895 0.9 0.905 0.91
10 100 1000 10000 100000
RMSE
Millions of Parameters
Factor models: Error vs. #parameters
NMF BiasSVD SVD++ SVD v.2 SVD v.3 SVD v.4
temporal effects
40 60 90 128 180 50 100 200 50 100 200 50 100 200 500 100 200 500 50 100 200 500 1000 1500
0.875 0.88 0.885 0.89 0.895 0.9 0.905 0.91
10 100 1000 10000 100000
RMSE
Millions of Parameters
Factor models: Error vs. #parameters
NMF BiasSVD SVD++ SVD v.2 SVD v.3 SVD v.4
temporal effects
Still pretty far from 0.8563 grand prize
June 26th submission triggers 30-day “last call”
Netflix in 2009 Netflix in 2017