Semi-Supervised Learning Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824
Administrative • HW 4 due April 10
Recommender Systems • Motivation • Problem formulation • Content-based recommendations • Collaborative filtering • Mean normalization
Problem motivation 𝑦 1 𝑦 2 Movie Alice (1) Bob (2) Carol (3) Dave (4) (romance) (action) Love at last 5 5 0 0 0.9 0 Romance 5 ? ? 0 1.0 0.01 forever Cute puppies ? 4 0 ? 0.99 0 of love Nonstop car 0 0 5 4 0.1 1.0 chases Swords vs. 0 0 5 ? 0 0.9 karate
Problem motivation 𝑦 1 𝑦 2 Movie Alice (1) Bob (2) Carol (3) Dave (4) (romance) (action) Love at last 5 5 0 0 ? ? Romance 5 ? ? 0 ? ? forever Cute puppies ? 4 0 ? ? ? of love Nonstop car 0 0 5 4 ? ? chases Swords vs. 0 0 5 ? ? ? karate 0 0 0 0 ? 𝜄 1 = 𝜄 2 = 𝜄 3 = 𝜄 4 = 𝑦 1 = 5 5 0 0 ? 0 0 5 5 ?
Optimization algorithm • Given 𝜄 1 , 𝜄 2 , ⋯ , 𝜄 𝑜 𝑣 , to learn 𝑦 (𝑗) : 𝑜 1 2 + 𝜇 (𝑗) 2 (𝜄 𝑘 ) ⊤ 𝑦 𝑗 − 𝑧 𝑗,𝑘 min 2 𝑦 𝑙 2 𝑦 (𝑗) 𝑘:𝑠 𝑗,𝑘 =1 𝑙=1 • Given 𝜄 1 , 𝜄 2 , ⋯ , 𝜄 𝑜 𝑣 , to learn 𝑦 (1) , 𝑦 (2) , ⋯ , 𝑦 (𝑜 𝑛 ) : 𝑜 𝑛 𝑜 𝑛 𝑜 1 2 + 𝜇 (𝑗) 2 (𝜄 𝑘 ) ⊤ 𝑦 𝑗 − 𝑧 𝑗,𝑘 min 2 2 𝑦 𝑙 𝑦 (1) ,𝑦 (2) ,⋯,𝑦 (𝑜𝑛) 𝑗=1 𝑘:𝑠 𝑗,𝑘 =1 𝑗=1 𝑙=1
Collaborative filtering • Given 𝑦 1 , 𝑦 2 , ⋯ , 𝑦 𝑜 𝑛 (and movie ratings), Can estimate 𝜄 1 , 𝜄 2 , ⋯ , 𝜄 𝑜 𝑣 • Given 𝜄 1 , 𝜄 2 , ⋯ , 𝜄 𝑜 𝑣 Can estimate 𝑦 1 , 𝑦 2 , ⋯ , 𝑦 𝑜 𝑛
Collaborative filtering optimization objective • Given 𝑦 1 , 𝑦 2 , ⋯ , 𝑦 𝑜 𝑛 , estimate 𝜄 1 , 𝜄 2 , ⋯ , 𝜄 𝑜 𝑣 𝑜 𝑣 𝑜 𝑣 𝑜 1 2 + 𝜇 2 (𝜄 𝑘 ) ⊤ 𝑦 𝑗 𝑘 − 𝑧 𝑗,𝑘 min 2 2 𝜄 𝑙 𝜄 1 ,𝜄 2 ,⋯,𝜄 𝑜𝑣 𝑘=1 𝑘=1 𝑙=1 𝑗:𝑠 𝑗,𝑘 =1 • Given 𝜄 1 , 𝜄 2 , ⋯ , 𝜄 𝑜 𝑣 , estimate 𝑦 1 , 𝑦 2 , ⋯ , 𝑦 𝑜 𝑛 𝑜 𝑛 𝑜 𝑛 𝑜 1 2 + 𝜇 (𝑗) 2 (𝜄 𝑘 ) ⊤ 𝑦 𝑗 − 𝑧 𝑗,𝑘 min 2 2 𝑦 𝑙 𝑦 (1) ,𝑦 (2) ,⋯,𝑦 (𝑜𝑛) 𝑗=1 𝑘:𝑠 𝑗,𝑘 =1 𝑗=1 𝑙=1
Collaborative filtering optimization objective • Given 𝑦 1 , 𝑦 2 , ⋯ , 𝑦 𝑜 𝑛 , estimate 𝜄 1 , 𝜄 2 , ⋯ , 𝜄 𝑜 𝑣 𝑜 𝑣 𝑜 𝑣 𝑜 1 2 + 𝜇 2 (𝜄 𝑘 ) ⊤ 𝑦 𝑗 𝑘 − 𝑧 𝑗,𝑘 min 2 2 𝜄 𝑙 𝜄 1 ,𝜄 2 ,⋯,𝜄 𝑜𝑣 𝑘=1 𝑗:𝑠 𝑗,𝑘 =1 𝑘=1 𝑙=1 • Given 𝜄 1 , 𝜄 2 , ⋯ , 𝜄 𝑜 𝑣 , estimate 𝑦 1 , 𝑦 2 , ⋯ , 𝑦 𝑜 𝑛 𝑜 𝑛 𝑜 𝑛 𝑜 1 2 + 𝜇 (𝑗) 2 (𝜄 𝑘 ) ⊤ 𝑦 𝑗 − 𝑧 𝑗,𝑘 min 2 2 𝑦 𝑙 𝑦 (1) ,𝑦 (2) ,⋯,𝑦 (𝑜𝑛) 𝑗=1 𝑘:𝑠 𝑗,𝑘 =1 𝑗=1 𝑙=1 • Minimize 𝑦 1 , 𝑦 2 , ⋯ , 𝑦 𝑜 𝑛 and 𝜄 1 , 𝜄 2 , ⋯ , 𝜄 𝑜 𝑣 simultaneously 𝑜 𝑣 𝑜 𝑜 𝑛 𝑜 𝐾 = 1 2 + 𝜇 + 𝜇 2 (𝑗) 2 (𝜄 𝑘 ) ⊤ 𝑦 𝑗 𝑘 − 𝑧 𝑗,𝑘 2 𝜄 𝑙 2 𝑦 𝑙 2 𝑠 𝑗,𝑘 =1 𝑘=1 𝑙=1 𝑗=1 𝑙=1
Collaborative filtering optimization objective 𝐾(𝑦 1 , 𝑦 2 , ⋯ , 𝑦 𝑜 𝑛 , 𝜄 1 , 𝜄 2 , ⋯ , 𝜄 𝑜 𝑣 ) = 𝑜 𝑣 𝑜 𝑜 𝑛 𝑜 1 2 + 𝜇 + 𝜇 2 (𝑗) 2 (𝜄 𝑘 ) ⊤ 𝑦 𝑗 𝑘 − 𝑧 𝑗,𝑘 𝜄 𝑙 𝑦 𝑙 2 2 2 𝑠 𝑗,𝑘 =1 𝑘=1 𝑙=1 𝑗=1 𝑙=1
Collaborative filtering algorithm • Initialize 𝑦 1 , 𝑦 2 , ⋯ , 𝑦 𝑜 𝑛 , 𝜄 1 , 𝜄 2 , ⋯ , 𝜄 𝑜 𝑣 to small random values • Minimize 𝐾(𝑦 1 , 𝑦 2 , ⋯ , 𝑦 𝑜 𝑛 , 𝜄 1 , 𝜄 2 , ⋯ , 𝜄 𝑜 𝑣 ) using gradient descent (or an advanced optimization algorithm). For every 𝑘 = 1 ⋯ 𝑜 𝑣 , 𝑗 = 1, ⋯ , 𝑜 𝑛 : ⊤ 𝑦 𝑗 𝑘 ≔ 𝑦 𝑙 𝑘 − 𝛽 𝑗 + 𝜇 𝑦 𝑙 − 𝑧 𝑗,𝑘 ) 𝜄 𝑙 (𝑗) ( 𝜄 𝑘 𝑦 𝑙 𝑘:𝑠 𝑗,𝑘 =1 ⊤ 𝑦 𝑗 𝑘 ≔ 𝜄 𝑙 𝑘 − 𝛽 𝑗 + 𝜇 𝜄 𝑙 − 𝑧 𝑗,𝑘 ) 𝑦 𝑙 (𝑘) ( 𝜄 𝑘 𝜄 𝑙 𝑗:𝑠 𝑗,𝑘 =1 • For a user with parameter 𝜄 and movie with (learned) feature 𝑦 , predict a star rating of 𝜄 ⊤ 𝑦
Collaborative filtering Movie Alice (1) Bob (2) Carol (3) Dave (4) Love at last 5 5 0 0 Romance forever 5 ? ? 0 Cute puppies of ? 4 0 ? love Nonstop car chases 0 0 5 4 Swords vs. karate 0 0 5 ?
Collaborative filtering • Predicted ratings: ⊤ − ⊤ − Y = XΘ ⊤ 𝑦 1 𝜄 1 − − ⊤ − ⊤ − 𝑦 2 𝜄 2 − − 𝑌 = Θ = ⋮ ⋮ ⊤ − ⊤ − Low-rank matrix factorization 𝑦 𝑜 𝑛 𝜄 𝑜 𝑣 − −
Finding related movies/products • For each product 𝑗 , we learn a feature vector 𝑦 (𝑗) ∈ 𝑆 𝑜 𝑦 1 : romance, 𝑦 2 : action, 𝑦 3 : comedy, … • How to find movie 𝑘 relate to movie 𝑗 ? Small 𝑦 (𝑗) − 𝑦 (𝑘) movie j and I are “similar”
Recommender Systems • Motivation • Problem formulation • Content-based recommendations • Collaborative filtering • Mean normalization
Users who have not rated any movies Movie Alice (1) Bob (2) Carol (3) Dave (4) Eve (5) Love at last 5 5 0 0 ? Romance 5 ? ? 0 ? forever Cute puppies ? 4 0 ? ? of love Nonstop car 0 0 5 4 ? chases Swords vs. 0 0 5 ? ? karate 𝑜 𝑣 𝑜 𝑜 𝑛 𝑜 1 2 + 𝜇 + 𝜇 2 (𝑗) 2 (𝜄 𝑘 ) ⊤ 𝑦 𝑗 𝑘 − 𝑧 𝑗,𝑘 2 𝜄 𝑙 2 𝑦 𝑙 2 𝑠 𝑗,𝑘 =1 𝑘=1 𝑙=1 𝑗=1 𝑙=1 𝜄 (5) = 0 0
Users who have not rated any movies Movie Alice (1) Bob (2) Carol (3) Dave (4) Eve (5) Love at last 5 5 0 0 0 Romance 5 ? ? 0 0 forever Cute puppies ? 4 0 ? 0 of love Nonstop car 0 0 5 4 0 chases Swords vs. 0 0 5 ? 0 karate 𝑜 𝑣 𝑜 𝑜 𝑛 𝑜 1 2 + 𝜇 + 𝜇 2 (𝑗) 2 (𝜄 𝑘 ) ⊤ 𝑦 𝑗 𝑘 − 𝑧 𝑗,𝑘 2 𝜄 𝑙 2 𝑦 𝑙 2 𝑠 𝑗,𝑘 =1 𝑘=1 𝑙=1 𝑗=1 𝑙=1 𝜄 (5) = 0 0
Mean normalization Learn 𝜄 (𝑘) , 𝑦 (𝑗) For user 𝑘 , on movie 𝑗 predict: ⊤ 𝑦 (𝑗) + 𝜈 𝑗 𝜄 𝑘 User 5 (Eve): ⊤ 𝑦 (𝑗) + 𝜈 𝑗 𝜄 5 = 0 𝜄 5 0
Recommender Systems • Motivation • Problem formulation • Content-based recommendations • Collaborative filtering • Mean normalization
Review: Supervised Learning • K nearest neighbor • Linear Regression • Naïve Bayes • Logistic Regression • Support Vector Machines • Neural Networks
Review: Unsupervised Learning • Clustering, K-Mean • Expectation maximization • Dimensionality reduction • Anomaly detection • Recommendation system
Advanced Topics • Semi-supervised learning • Probabilistic graphical models • Generative models • Sequence prediction models • Deep reinforcement learning
Semi-supervised Learning • Motivation • Problem formulation • Consistency regularization • Entropy-based method • Pseudo-labeling
Semi-supervised Learning • Motivation • Problem formulation • Consistency regularization • Entropy-based method • Pseudo-labeling
Classic Paradigm Insufficient Nowadays • Modern applications: massive amounts of raw data. • Only a tiny fraction can be annotated by human experts Protein sequences Billions of webpages Images
Semi-supervised Learning
Active Learning
Semi-supervised Learning • Motivation • Problem formulation • Consistency regularization • Entropy-based method • Pseudo-labeling
Semi-supervised Learning Problem Formulation • Labeled data 𝑦 1 , 𝑧 1 , 𝑦 2 , 𝑧 2 , ⋯ , 𝑦 𝑛 𝑚 , 𝑧 𝑛 𝑚 𝑇 𝑚 = • Unlabeled data 𝑦 1 , 𝑧 1 , 𝑦 2 , 𝑧 2 , ⋯ , 𝑦 𝑛 𝑣 , 𝑧 𝑛 𝑣 𝑇 𝑣 = • Goal: Learn a hypothesis ℎ 𝜄 (e.g., a classifier) that has small error
Combining labeled and unlabeled data - Classical methods • Transductive SVM [Joachims ’99] • Co- training [Blum and Mitchell ’98] • Graph-based methods [Blum and Chawla ‘01] [ Zhu, Ghahramani, Lafferty ‘03]
Transductive SVM • The separator goes through low density regions of the space / large margin
Transductive SVM SVM Inputs: Inputs: (𝑗) , 𝑧 l (𝑗) (𝑗) , 𝑧 𝑣 (𝑗) 𝑦 l , 𝑦 u (𝑗) , 𝑧 l (𝑗) 𝑦 l 1 1 𝑜 2 𝑜 2 2 σ 𝑘=1 min 𝜄 2 σ 𝑘=1 min 𝜄 𝑘 𝑘 𝜄 𝜄 𝑗 ≥ 1 (𝑗) 𝜄 ⊤ 𝑦 𝑚 s. t. 𝑧 l 𝑗 ≥ 1 (𝑗) 𝜄 ⊤ 𝑦 𝑚 s. t. 𝑧 l (𝑗) 𝜄 ⊤ 𝑦 𝑗 ≥ 1 𝑧 u 𝑗 ∈ {−1, 1} 𝑧 u
Transductive SVMs • First maximize margin over the labeled points • Use this to give initial labels to unlabeled points based on this separator. • Try flipping labels of unlabeled points to see if doing so can increase margin
Deep Semi-supervised Learning
Recommend
More recommend