Classification: SVHN Performance Semi-supervised test error ( % ) benchmarks on SVHN for 1000 randomly and evenly distributed labeled data. With n labeled examples Model 1000 55 . 33 ( ± 0 . 11 ) M1+TSVM, Kingma et al, NIPS’14 M1+M2, Kingma et al, NIPS’14 36 . 02 ( ± 0 . 10 ) SWWAE, Zhao et al, ICLR’16 23 . 56 ADGM, Maale et al, ICML’16 22 . 86 16 . 61 ( ± 0 . 24 ) SDGM, Maale et al, ICML’16 Improved GAN, Salimans et al, NIPS’16 8 . 11 ( ± 1 . 3 ) Proposed 21 . 74 ( ± 0 . 41 ) Smaller values for test error indicate better performance. All the results of the related works are reported from the original papers. 37
Clustering: Visualization MNIST (a) Epoch 1 (b) Epoch 5 (c) Epoch 20 (d) Epoch 50 (e) Epoch 80 (f) Epoch 100 38
Image Generation Use feature vector obtained from g φ ( x ) and vary the category c (one-hot). 39
Unsupervised Clustering
What if we have no labeled data? No Labeled data (CIFAR-10) Large amount of Unlabeled data (ImageNet) Can we learn good representations and cluster data in a unsupervised way? 40
Unsupervised Clustering: Related Works
Intuition 41
Use of pretrained features 42
Fine-tuning • Deep Embedding Clustering (DEC), Xie et al. ICML’16 • Deep Clustering Network (DCN), Yang et al. ICML’17 43 • Improved Deep Embedding Clustering (IDEC), Guo et al. IJCAI’17
End-To-End • Joint Unsupervised Learning (JULE), Yang et al. CVPR’16 • Deep Embedded Regularized Clustering (DEPICT), Dizaji et al. ICCV’17 44
Complex Structure: Generative Models • Variational Deep Embedding, Jian et al. IJCAI’17 • Gaussian Mixture Variational Autoencoders, Dilokthanakul et al. arXiv’17. 45
Unsupervised Clustering: Proposed Method
Intuition 46
Our Probabilistic Model Inference Model Generative Model 47
Stacked M1+M2 generative model Inference Model Generative Model M1 model 48 Semi-supervised Learning with Deep Generative Models, Kingma et al, NIPS’14
Stacked M1+M2 generative model Inference Model Generative Model M1 model Inference Model Generative Model M2 model 48 Semi-supervised Learning with Deep Generative Models, Kingma et al, NIPS’14
Stacked M1+M2 generative model Inference Model Generative Model M1 model Inference Model Generative Model M2 model Inference Model Generative Model Probabilistic graphical models of M1+M2 48 Semi-supervised Learning with Deep Generative Models, Kingma et al, NIPS’14
Stacked M1+M2: Training Train M1 model and use its feature representations to train M2 model separately. 49
Problem with hierarchical stochastic latent variables Inactive Stochastic Units: Ladder Variational Autoencoders , Sonderby et al, NIPS’16 50
Problem with hierarchical stochastic latent variables Inactive Stochastic Units: Ladder Variational Autoencoders , Sonderby et al, NIPS’16 Solutions require complex models: Inference Model Generative Model 51 Auxiliary Deep Generative Models, Maaloe et al, ICML’16
Avoiding the problem of hierarchical stochastic variables Replace the stochastic layer that produces x with a deterministic one ˆ x = g ( x ) . 52
Other Differences with M1+M2 model • Use of Gumbel-Softmax instead of marginalization over stochastic discrete variables. 53
Other Differences with M1+M2 model • Use of Gumbel-Softmax instead of marginalization over stochastic discrete variables. • Training end-to-end instead of pre-training. 53
Other Differences with M1+M2 model • Use of Gumbel-Softmax instead of marginalization over stochastic discrete variables. • Training end-to-end instead of pre-training. • Unsupervised model: Labels are not required. 53
Variational Lower Bound Inference Model Generative Model Loss Function: L total = L R + L C + L G 54
Proposed Model 55
Reconstruction Loss L total = L R + L C + L G L BCE = − ( x log(˜ x ) + (1 − x ) log(1 − ˜ x )) x || 2 L MSE = || x − ˜ 56
Gaussian and Categorical Regularizers L total = L R + L C + L G L G = KL ( N ( µ ( x ) , σ ( x )) || N (0 , 1)) L C = KL ( C at ( π ) || U (0 , 1)) K K = − 1 � 1 + log σ 2 k − µ 2 k − σ 2 � = π k log ( Kπ k ) k 2 57 k =0 k =1
Experiments
Datasets USPS (9298, 10, 16x16) MNIST (70000, 10, 28x28) REUTERS-10K (10000, 4) 58
Analysis of Clustering Performance 30.0 ACC NMI 25.0 rformance (%) 20.0 15.0 Pe 10.0 5.0 1 30 50 80 100 Ite ration numbe r Clustering performance at each epoch, considering all loss weights equal to 1 59
Analysis of loss functions weights L total = L R + L C + w G L G 80.0 80.0 L R L R L C L C L G L G 60.0 60.0 ACC (%) NMI (%) 40.0 40.0 20.0 20.0 0.0 0.0 1 3 5 7 9 1 3 5 7 9 Loss function we ight (w ∗ ) Loss function we ight (w ∗ ) 60
Quantitative Results: Clustering Performance Clustering performance, ACC (%) and NMI (%), on all datasets. MNIST USPS REUTERS-10K Method ACC NMI ACC NMI ACC NMI k -means 53 . 24 - 66 . 82 - 51 . 62 - GMM 53 . 73 - - - 54 . 72 - AE+k-means 81 . 82 74 . 73 69 . 31 66 . 20 70 . 52 39 . 79 AE+GMM 82 . 18 - - - 70 . 13 - GMVAE 82 . 31 ( ± 4 ) - - - - - DCN 83 . 00 81 . 00 - - - - DEC 86 . 55 83 . 72 74 . 08 75 . 29 73 . 68 49 . 76 IDEC 88 . 06 86 . 72 76 . 05 78 . 46 75 . 64 49 . 81 VaDE 94 . 46 - - - 79 . 83 - Proposed 85 . 75 ( ± 8 ) 82 . 13 ( ± 5 ) 72 . 58 ( ± 3 ) 67 . 01 ( ± 2 ) 80 . 41 ( ± 5 ) 52 . 13 ( ± 5 ) Larger values for ACC and NMI indicate better performance. Colored rows denote methods that require pre-training. All the results of the related works are reported from the original papers. 61
Quantitative Results: Classification - MNIST Performance MNIST test error-rate ( % ) for kNN. k Method 3 5 10 VAE 18 . 43 15 . 69 14 . 19 DLGMM 9 . 14 8 . 38 8 . 42 VaDE 2 . 20 2 . 14 2 . 22 Proposed 3 . 46 3 . 30 3 . 44 Smaller values for test error indicate better performance. 62
Qualitative Results: Image Generation 10 clusters Fix the category c (one-hot) and vary the latent variable z . 63
Qualitative Results: Image Generation 7 clusters 14 clusters 64
Qualitative Results: Style Generation Input a test image x (first column) through q φ ( z | ˆ x ) . 65
Qualitative Results: Style Generation Use vector obtained from q φ ( z | ˆ x ) and vary the category c (one-hot). 65
Qualitative Results: Clustering Visualization (a) Epoch 1 (b) Epoch 5 (c) Epoch 20 (d) Epoch 50 (e) Epoch 150 (f) Epoch 300 Visualization of the feature representations on MNIST data set at different epochs. 66
Conclusions and Future Work
Contributions For semi-supervised clustering our contributions were: • a semi-supervised auxiliary task which aims to define clustering assignments. 67
Contributions For semi-supervised clustering our contributions were: • a semi-supervised auxiliary task which aims to define clustering assignments. • a regularization on the feature representations of the data. 67
Contributions For semi-supervised clustering our contributions were: • a semi-supervised auxiliary task which aims to define clustering assignments. • a regularization on the feature representations of the data. • a loss function that combines a variational loss with our auxiliary task to guide the learning process. 67
Contributions For unsupervised clustering our contributions were: • a combination of deterministic and stochastic layers to solve the problem of hierarchical stochastic variables, allowing an end-to-end learning. 68
Contributions For unsupervised clustering our contributions were: • a combination of deterministic and stochastic layers to solve the problem of hierarchical stochastic variables, allowing an end-to-end learning. • a simple deep generative model represented by the combination of a simple Gaussian and categorical distribution. 68
Future Works • Use of clustering algorithms (e.g., K -means, DBSCAN, agglomerative clustering, etc.) over the feature representations to improve the learning process. 69
Future Works • Use of clustering algorithms (e.g., K -means, DBSCAN, agglomerative clustering, etc.) over the feature representations to improve the learning process. • Improvements of our probabilistic generative model can be performed by using generative adversarial models (GANs). 69
Recommend
More recommend