Recommender Systems Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824
Administrative • HW 4 due April 10
Unsupervised Learning • Clustering, K-Mean • Expectation maximization • Dimensionality reduction • Anomaly detection • Recommendation system
Motivating example: Monitoring machines in a data center 𝑦 2 (Memory use) 𝑦 1 (CPU load) 𝑦 2 (Memory use) 𝑦 1 (CPU load)
Multivariate Gaussian (normal) distribution • 𝑦 ∈ 𝑆 𝑜 . Don’t model 𝑞 𝑦 1 , 𝑞 𝑦 2 , ⋯ separately • Model 𝑞 𝑦 all in one go. • Parameters : 𝜈 ∈ 𝑆 𝑜 , Σ ∈ 𝑆 𝑜×𝑜 (covariance matrix) 1 2𝜌 𝑜/2 Σ 1/2 exp − 𝑦 − 𝜈 ⊤ Σ −1 (𝑦 − 𝜈) • 𝑞 𝑦; 𝜈, Σ =
Multivariate Gaussian (normal) examples Σ = 1 0 Σ = 0.6 0 Σ = 2 0 0 1 0 0.6 0 2 𝑦 2 𝑦 2 𝑦 2 𝑦 1 𝑦 1 𝑦 1
Multivariate Gaussian (normal) examples Σ = 1 0 Σ = 0.6 0 Σ = 2 0 0 1 0 1 0 1 𝑦 2 𝑦 2 𝑦 2 𝑦 1 𝑦 1 𝑦 1
Multivariate Gaussian (normal) examples Σ = 1 0 1 0.5 1 0.8 Σ = Σ = 0 1 0.5 1 0.8 1 𝑦 2 𝑦 2 𝑦 2 𝑦 1 𝑦 1 𝑦 1
Anomaly detection using the multivariate Gaussian distribution 1. Fit model 𝑞 𝑦 by setting 𝑛 𝜈 = 1 𝑦 (𝑗) 𝑛 𝑗=1 𝑛 Σ = 1 (𝑦 (𝑗) −𝜈)(𝑦 (𝑗) − 𝜈) ⊤ 𝑛 𝑗=1 2 Give a new example 𝑦 , compute 1 2𝜌 𝑜/2 Σ 1/2 exp − 𝑦 − 𝜈 ⊤ Σ −1 (𝑦 − 𝜈) 𝑞 𝑦; 𝜈, Σ = Flag an anomaly if 𝑞 𝑦 < 𝜗
Original model Original model 2 𝑞 𝑦 2 ; 𝜈 2 , 𝜏 2 2 ⋯ 𝑞 𝑦 𝑜 ; 𝜈 𝑜 , 𝜏 𝑜 2 𝑞 𝑦; 𝜈, Σ 𝑞 𝑦 1 ; 𝜈 1 , 𝜏 1 1 2𝜌 𝑜/2 Σ 1/2 exp(− 𝑦 − 𝜈 ⊤ Σ −1 (𝑦 = Manually create features to capture anomalies where 𝑦 1 , 𝑦 2 take unusual combinations of values Computationally cheaper (alternatively, scales better) OK even if training set size is small
Recommender Systems • Motivation • Problem formulation • Content-based recommendations • Collaborative filtering • Mean normalization
Recommender Systems • Motivation • Problem formulation • Content-based recommendations • Collaborative filtering • Mean normalization
You may also like..?
Recommender Systems • Motivation • Problem formulation • Content-based recommendations • Collaborative filtering • Mean normalization
Example: Predicting movie ratings • User rates movies using zero to five stars • 𝑜 𝑣 = no. users Movie Alice (1) Bob (2) Carol (3) Dave (4) Love at last 5 5 0 0 • 𝑜 𝑛 = no. movies Romance forever 5 ? ? 0 • 𝑠 𝑗, 𝑘 = 1 if user 𝑘 has Cute puppies of ? 4 0 ? rated movie 𝑗 love Nonstop car chases 0 0 5 4 𝑧 (𝑗,𝑘) = rating given by • Swords vs. karate user 𝑘 to movie 𝑗 0 0 5 ?
Recommender Systems • Motivation • Problem formulation • Content-based recommendations • Collaborative filtering • Mean normalization
Content-based recommender systems Movie Alice (1) Bob (2) Carol (3) Dave (4) 𝑦 1 𝑦 2 (romance) (action) Love at last 5 5 0 0 0.9 0 Romance 5 ? ? 0 1.0 0.01 forever Cute puppies ? 4 0 ? 0.99 0 of love Nonstop car 0 0 5 4 0.1 1.0 chases Swords vs. 0 0 5 ? 0 0.9 karate For each user 𝑘 , learn a parameter 𝜄 (𝑘) ∈ 𝑆 3 . Predict user 𝑘 as rating movie 𝑗 with (𝜄 𝑘 ) ⊤ 𝑦 (𝑗) stars.
Content-based recommender systems Movie Alice (1) Bob (2) Carol (3) Dave (4) 𝑦 1 𝑦 2 1 (romance) (action) 𝑦 (3) = 0.99 Love at last 5 5 0 0 0.9 0 0 Romance 5 ? ? 0 1.0 0.01 forever 0 Cute puppies ? 4 0 ? 0.99 0 𝜄 1 = 5 of love 0 Nonstop car 0 0 5 4 0.1 1.0 chases (𝜄 1 ) ⊤ 𝑦 (3) = 5 ∗ 0.99 = 4.95 Swords vs. 0 0 5 ? 0 0.9 karate For each user 𝑘 , learn a parameter 𝜄 (𝑘) ∈ 𝑆 3 . Predict user 𝑘 as rating movie 𝑗 with (𝜄 𝑘 ) ⊤ 𝑦 (𝑗) stars.
Problem formulation • 𝑠 𝑗, 𝑘 = 1 if user 𝑘 has rated movie 𝑗 𝑧 (𝑗,𝑘) = rating given by user 𝑘 to movie 𝑗 • 𝜄 (𝑘) = parameter vector for user 𝑘 • 𝑦 (𝑗) = feature vector for user 𝑗 • For each user 𝑘, predicted rating : (𝜄 𝑘 ) ⊤ 𝑦 (𝑗) • 𝑛 (𝑘) = no. of movies rated by user j • Goal: learn 𝜄 (𝑘) : 𝑜 1 𝜇 2 + 2 (𝜄 𝑘 ) ⊤ 𝑦 𝑗 𝑘 − 𝑧 𝑗,𝑘 min 2𝑛 (𝑘) 𝜄 𝑙 2𝑛 (𝑘) 𝜄 (𝑘) 𝑗:𝑠 𝑗,𝑘 =1 𝑙=1
Optimization objective • Learn 𝜄 𝑘 (parameter for user 𝑘 ): 𝑜 1 2 + 𝜇 2 (𝜄 𝑘 ) ⊤ 𝑦 𝑗 𝑘 − 𝑧 𝑗,𝑘 min 2 𝜄 𝑙 2 𝜄 (𝑘) 𝑗:𝑠 𝑗,𝑘 =1 𝑙=1 Learn 𝜄 1 , 𝜄 2 , ⋯ , 𝜄 𝑜 𝑣 : 𝑜 𝑣 𝑜 𝑣 𝑜 1 2 + 𝜇 2 (𝜄 𝑘 ) ⊤ 𝑦 𝑗 𝑘 − 𝑧 𝑗,𝑘 min 2 2 𝜄 𝑙 𝜄 1 ,𝜄 2 ,⋯,𝜄 𝑜𝑣 𝑘=1 𝑗:𝑠 𝑗,𝑘 =1 𝑘=1 𝑙=1
Optimization algorithm 𝑜 𝑣 𝑜 𝑣 𝑜 1 2 + 𝜇 2 (𝜄 𝑘 ) ⊤ 𝑦 𝑗 𝑘 − 𝑧 𝑗,𝑘 min 2 2 𝜄 𝑙 𝜄 (𝑘) 𝑘=1 𝑗:𝑠 𝑗,𝑘 =1 𝑘=1 𝑙=1 Gradient descent update: ⊤ 𝑦 𝑗 𝑘 ≔ 𝜄 𝑙 𝑘 − 𝛽 σ 𝑗:𝑠 𝑗,𝑘 =1 𝑗 𝜄 𝑘 − 𝑧 𝑗,𝑘 𝜄 𝑙 𝑦 𝑙 (for 𝑙 = 0 ) ⊤ 𝑦 𝑗 𝑘 ≔ 𝜄 𝑙 𝑘 − 𝛽 σ 𝑗:𝑠 𝑗,𝑘 =1 ( 𝜄 𝑘 𝑗 + 𝜇 𝜄 𝑙 − 𝑧 𝑗,𝑘 ) 𝑦 𝑙 (𝑘) 𝜄 𝑙 (for 𝑙 ≠ 0 )
Recommender Systems • Motivation • Problem formulation • Content-based recommendations • Collaborative filtering • Mean normalization
Problem motivation 𝑦 1 𝑦 2 Movie Alice (1) Bob (2) Carol (3) Dave (4) (romance) (action) Love at last 5 5 0 0 0.9 0 Romance 5 ? ? 0 1.0 0.01 forever Cute puppies ? 4 0 ? 0.99 0 of love Nonstop car 0 0 5 4 0.1 1.0 chases Swords vs. 0 0 5 ? 0 0.9 karate
Problem motivation 𝑦 1 𝑦 2 Movie Alice (1) Bob (2) Carol (3) Dave (4) (romance) (action) Love at last 5 5 0 0 ? ? Romance 5 ? ? 0 ? ? forever Cute puppies ? 4 0 ? ? ? of love Nonstop car 0 0 5 4 ? ? chases Swords vs. 0 0 5 ? ? ? karate 0 0 0 0 ? 𝑦 1 = 𝜄 1 = 𝜄 2 = 𝜄 3 = 𝜄 4 = ? 5 5 0 0 0 0 5 5 ?
Optimization algorithm • Given 𝜄 1 , 𝜄 2 , ⋯ , 𝜄 𝑜 𝑣 , to learn 𝑦 (𝑗) : 𝑜 1 2 + 𝜇 (𝑗) 2 (𝜄 𝑘 ) ⊤ 𝑦 𝑗 − 𝑧 𝑗,𝑘 min 2 𝑦 𝑙 2 𝑦 (𝑗) 𝑘:𝑠 𝑗,𝑘 =1 𝑙=1 • Given 𝜄 1 , 𝜄 2 , ⋯ , 𝜄 𝑜 𝑣 , to learn 𝑦 (1) , 𝑦 (2) , ⋯ , 𝑦 (𝑜 𝑛 ) : 𝑜 𝑛 𝑜 𝑛 𝑜 1 2 + 𝜇 (𝑗) 2 (𝜄 𝑘 ) ⊤ 𝑦 𝑗 − 𝑧 𝑗,𝑘 min 2 2 𝑦 𝑙 𝑦 (1) ,𝑦 (2) ,⋯,𝑦 (𝑜𝑛) 𝑗=1 𝑘:𝑠 𝑗,𝑘 =1 𝑗=1 𝑙=1
Collaborative filtering • Given 𝑦 1 , 𝑦 2 , ⋯ , 𝑦 𝑜 𝑛 (and movie ratings), Can estimate 𝜄 1 , 𝜄 2 , ⋯ , 𝜄 𝑜 𝑣 • Given 𝜄 1 , 𝜄 2 , ⋯ , 𝜄 𝑜 𝑣 Can estimate 𝑦 1 , 𝑦 2 , ⋯ , 𝑦 𝑜 𝑛
Collaborative filtering optimization objective • Given 𝑦 1 , 𝑦 2 , ⋯ , 𝑦 𝑜 𝑛 , estimate 𝜄 1 , 𝜄 2 , ⋯ , 𝜄 𝑜 𝑣 𝑜 𝑣 𝑜 𝑣 𝑜 1 2 + 𝜇 2 (𝜄 𝑘 ) ⊤ 𝑦 𝑗 𝑘 − 𝑧 𝑗,𝑘 min 2 2 𝜄 𝑙 𝜄 1 ,𝜄 2 ,⋯,𝜄 𝑜𝑣 𝑘=1 𝑗:𝑠 𝑗,𝑘 =1 𝑘=1 𝑙=1 • Given 𝜄 1 , 𝜄 2 , ⋯ , 𝜄 𝑜 𝑣 , estimate 𝑦 1 , 𝑦 2 , ⋯ , 𝑦 𝑜 𝑛 𝑜 𝑛 𝑜 𝑛 𝑜 1 2 + 𝜇 (𝑗) 2 (𝜄 𝑘 ) ⊤ 𝑦 𝑗 − 𝑧 𝑗,𝑘 min 2 2 𝑦 𝑙 𝑦 (1) ,𝑦 (2) ,⋯,𝑦 (𝑜𝑛) 𝑗=1 𝑘:𝑠 𝑗,𝑘 =1 𝑗=1 𝑙=1
Collaborative filtering optimization objective • Given 𝑦 1 , 𝑦 2 , ⋯ , 𝑦 𝑜 𝑛 , estimate 𝜄 1 , 𝜄 2 , ⋯ , 𝜄 𝑜 𝑣 𝑜 𝑣 𝑜 𝑣 𝑜 1 2 + 𝜇 2 (𝜄 𝑘 ) ⊤ 𝑦 𝑗 𝑘 − 𝑧 𝑗,𝑘 min 2 2 𝜄 𝑙 𝜄 1 ,𝜄 2 ,⋯,𝜄 𝑜𝑣 𝑘=1 𝑗:𝑠 𝑗,𝑘 =1 𝑘=1 𝑙=1 • Given 𝜄 1 , 𝜄 2 , ⋯ , 𝜄 𝑜 𝑣 , estimate 𝑦 1 , 𝑦 2 , ⋯ , 𝑦 𝑜 𝑛 𝑜 𝑛 𝑜 𝑛 𝑜 1 2 + 𝜇 (𝑗) 2 (𝜄 𝑘 ) ⊤ 𝑦 𝑗 − 𝑧 𝑗,𝑘 min 2 2 𝑦 𝑙 𝑦 (1) ,𝑦 (2) ,⋯,𝑦 (𝑜𝑛) 𝑗=1 𝑘:𝑠 𝑗,𝑘 =1 𝑗=1 𝑙=1 • Minimize 𝑦 1 , 𝑦 2 , ⋯ , 𝑦 𝑜 𝑛 and 𝜄 1 , 𝜄 2 , ⋯ , 𝜄 𝑜 𝑣 simultaneously 𝑜 𝑣 𝑜 𝑜 𝑛 𝑜 𝐾 = 1 2 + 𝜇 + 𝜇 2 (𝑗) 2 (𝜄 𝑘 ) ⊤ 𝑦 𝑗 𝑘 − 𝑧 𝑗,𝑘 2 𝜄 𝑙 2 𝑦 𝑙 2 𝑘:𝑠 𝑗,𝑘 =1 𝑘=1 𝑙=1 𝑗=1 𝑙=1
Collaborative filtering optimization objective 𝐾(𝑦 1 , 𝑦 2 , ⋯ , 𝑦 𝑜 𝑛 , 𝜄 1 , 𝜄 2 , ⋯ , 𝜄 𝑜 𝑣 ) = 𝑜 𝑣 𝑜 𝑜 𝑛 𝑜 1 2 + 𝜇 + 𝜇 2 (𝑗) 2 (𝜄 𝑘 ) ⊤ 𝑦 𝑗 𝑘 − 𝑧 𝑗,𝑘 2 𝜄 𝑙 2 𝑦 𝑙 2 𝑠 𝑗,𝑘 =1 𝑘=1 𝑙=1 𝑗=1 𝑙=1
Recommend
More recommend