Unsupervised Learning Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Machine Learning Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 1 / 49
Outline Unsupervised Learning 1 Predictive Learning 2 Autoencoders & Manifold Learning 3 Generative Adversarial Networks 4 Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 2 / 49
Outline Unsupervised Learning 1 Predictive Learning 2 Autoencoders & Manifold Learning 3 Generative Adversarial Networks 4 Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 3 / 49
Unsupervised Learning Dataset: X = { x ( i ) } i No supervision Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 4 / 49
Unsupervised Learning Dataset: X = { x ( i ) } i No supervision What can we learn? Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 4 / 49
Clustering I Goal: to group similar x ( i ) ’s Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 5 / 49
Clustering II K -means algorithm: Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 6 / 49
Clustering II K -means algorithm: Hierarchical clustering: Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 6 / 49
Factorization and Recommendation Goal: to uncover the factors behind data (rating matrix) Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 7 / 49
Factorization and Recommendation Goal: to uncover the factors behind data (rating matrix) Commonly used in the recommender systems Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 7 / 49
Factorization and Recommendation Goal: to uncover the factors behind data (rating matrix) Commonly used in the recommender systems Non-negative matrix factorization (NMF) [9, 10] Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 7 / 49
Dimension Reduction Goal: to reduce the dimension of each x ( i ) E.g., PCA Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 8 / 49
Dimension Reduction Goal: to reduce the dimension of each x ( i ) E.g., PCA Predictive learning Learn to “fill in the blanks” Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 8 / 49
Dimension Reduction Goal: to reduce the dimension of each x ( i ) E.g., PCA Predictive learning Learn to “fill in the blanks” Manifold learning Learn tangent vectors of a given point Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 8 / 49
Data Generation I Goal: to generate new data points/samples Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 9 / 49
Data Generation I Goal: to generate new data points/samples Generative adversarial networks (GANs) Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 9 / 49
Data Generation II Text to image based on conditional GANs: “ This bird is completely red with black wings and pointy beak. ” Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 10 / 49
Outline Unsupervised Learning 1 Predictive Learning 2 Autoencoders & Manifold Learning 3 Generative Adversarial Networks 4 Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 11 / 49
Predictive Learning I.e., blank filling Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 12 / 49
Predictive Learning I.e., blank filling E.g., word2vec [13, 12]: “ ... the cat sat on ... ” Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 12 / 49
Doc2Vec How to encode a document? Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 13 / 49
Doc2Vec How to encode a document? Bag of words (TF-IDF), average word2vec, etc. Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 13 / 49
Doc2Vec How to encode a document? Bag of words (TF-IDF), average word2vec, etc. Do not capture the semantic meaning of a doc “ I like final project ” 6 = “ Final project likes me ” Predictive learning for docs? Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 13 / 49
Doc2Vec How to encode a document? Bag of words (TF-IDF), average word2vec, etc. Do not capture the semantic meaning of a doc “ I like final project ” 6 = “ Final project likes me ” Predictive learning for docs? Doc2vec [7]: to capture the context not explained by words Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 13 / 49
Filling Images How? Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 14 / 49
Filling Images How? PixelRNN [19] Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 14 / 49
More Predicting the future by watching unlabeled videos [6, 21]: Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 15 / 49
Outline Unsupervised Learning 1 Predictive Learning 2 Autoencoders & Manifold Learning 3 Generative Adversarial Networks 4 Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 16 / 49
Autoencoders I Encoder: to learn a low dimensional representation c (called code ) of input x Decoder: to reconstruct x from c Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 17 / 49
Autoencoders I Encoder: to learn a low dimensional representation c (called code ) of input x Decoder: to reconstruct x from c Cost function: argmin Θ � logP ( X | Θ ) = argmin Θ � ∑ n logP ( x ( n ) | Θ ) Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 17 / 49
Autoencoders I Encoder: to learn a low dimensional representation c (called code ) of input x Decoder: to reconstruct x from c Cost function: argmin Θ � logP ( X | Θ ) = argmin Θ � ∑ n logP ( x ( n ) | Θ ) Sigmoid output units a ( L ) = ˆ ρ j for x j ⇠ Bernoulli ( ρ j ) j ) x ( n ) ) ( 1 � x ( n ) P ( x ( n ) | Θ ) = ( a ( L ) j ( 1 � a ( L ) ) j j j j Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 17 / 49
Autoencoders I Encoder: to learn a low dimensional representation c (called code ) of input x Decoder: to reconstruct x from c Cost function: argmin Θ � logP ( X | Θ ) = argmin Θ � ∑ n logP ( x ( n ) | Θ ) Sigmoid output units a ( L ) = ˆ ρ j for x j ⇠ Bernoulli ( ρ j ) j ) x ( n ) ) ( 1 � x ( n ) P ( x ( n ) | Θ ) = ( a ( L ) j ( 1 � a ( L ) ) j j j j Linear output units a ( L ) = z ( L ) = ˆ µ for x ⇠ N ( µ , Σ ) � logP ( x ( n ) | Θ ) = k x ( n ) � z ( L ) k 2 Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 17 / 49
Autoencoders II A 32 -bit code can roughly represents a 32 ⇥ 32 MNIST image Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 18 / 49
Convolutional Autoencoders Convolution + deconvolution: Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 19 / 49
Convolutional Autoencoders Convolution + deconvolution: How to train deconvolution layer? Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 19 / 49
Convolutional Autoencoders Convolution + deconvolution: How to train deconvolution layer? Treat it as convolution layer Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 19 / 49
Manifolds I In many applications, data concentrate around one or more low-dimensional manifolds Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 20 / 49
Manifolds I In many applications, data concentrate around one or more low-dimensional manifolds A manifold is a topological space that are linear locally Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 20 / 49
Manifolds II For each point x on a manifold, we have its tangent space spanned by tangent vectors Local directions specify how one can change x infinitesimally while staying on the manifold Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 21 / 49
Learning Manifolds I How to learn manifolds with autoencoders? Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 22 / 49
Learning Manifolds I How to learn manifolds with autoencoders? Contractive autoencoder [16]: regularize the code c such that it is invariant to local changes of x : 2 � � ∂ c ( n ) � � Ω ( c ) = ∑ � � ∂ x ( n ) � � n � � F ∂ c ( n ) / ∂ x ( n ) is a Jacobian matrix Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 22 / 49
Learning Manifolds I How to learn manifolds with autoencoders? Contractive autoencoder [16]: regularize the code c such that it is invariant to local changes of x : 2 � � ∂ c ( n ) � � Ω ( c ) = ∑ � � ∂ x ( n ) � � n � � F ∂ c ( n ) / ∂ x ( n ) is a Jacobian matrix Encoder preserves local structures in code space Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 22 / 49
Learning Manifolds II In practice, it is easier to train a denoising autoencoder [20]: Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 23 / 49
Learning Manifolds II In practice, it is easier to train a denoising autoencoder [20]: Encoder: to encode x with random noises Decoder: to reconstruct x without noises Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 23 / 49
Recommend
More recommend