Unsuperv rvised Learning Jointly Wit ith Im Image Clu lustering Jianwei Yang Devi Parikh Dhruv Batra Vir irgin inia ia Tech 1 https://filebox.ece.vt.edu/~jw2yang/
2
Huge amount of images!!! 3
Huge amount of images!!! Learning without annotation efforts 4
Huge amount of images!!! Learning without annotation efforts What we need to learn? 5
Huge amount of images!!! Learning without annotation efforts What we need to learn? An open problem 6
Huge amount of images!!! Learning without annotation efforts What we need to learn? An open problem A hot problem 7
Huge amount of images!!! Learning without annotation efforts What we need to learn? An open problem A hot problem Various methodologies 8
Learning distribution (structure) Clustering 9 Jain, Anil K., M. Narasimha Murty, and Patrick J. Flynn. "Data clustering: a review." ACM computing surveys (CSUR) 31.3 (1999): 264-323.
Learning distribution (structure) Clustering K-means (Image Credit: Jesse Johnson) 10 Jain, Anil K., M. Narasimha Murty, and Patrick J. Flynn. "Data clustering: a review." ACM computing surveys (CSUR) 31.3 (1999): 264-323.
Learning distribution (structure) Clustering K-means (Image Credit: Jesse Johnson) Hierarchical Clustering 11 Jain, Anil K., M. Narasimha Murty, and Patrick J. Flynn. "Data clustering: a review." ACM computing surveys (CSUR) 31.3 (1999): 264-323.
Learning distribution (structure) Clustering K-means (Image Credit: Jesse Johnson) Spectral Clustering Hierarchical Clustering Manor et al, NIPS’04 12 Jain, Anil K., M. Narasimha Murty, and Patrick J. Flynn. "Data clustering: a review." ACM computing surveys (CSUR) 31.3 (1999): 264-323.
Learning distribution (structure) Clustering K-means (Image Credit: Jesse Johnson) Spectral Clustering Hierarchical Clustering Graph Cut Manor et al, NIPS’04 Shi et al, TPAMI’00 13 Jain, Anil K., M. Narasimha Murty, and Patrick J. Flynn. "Data clustering: a review." ACM computing surveys (CSUR) 31.3 (1999): 264-323.
Learning distribution (structure) Clustering K-means (Image Credit: Jesse Johnson) Spectral Clustering Hierarchical Clustering Graph Cut Manor et al, NIPS’04 Shi et al, TPAMI’00 DBSCAN, Ester et al, KDD’96 (Image Credit: Jesse Johnson) 14 Jain, Anil K., M. Narasimha Murty, and Patrick J. Flynn. "Data clustering: a review." ACM computing surveys (CSUR) 31.3 (1999): 264-323.
Learning distribution (structure) Clustering K-means (Image Credit: Jesse Johnson) Spectral Clustering Hierarchical Clustering Graph Cut Manor et al, NIPS’04 Shi et al, TPAMI’00 DBSCAN, Ester et al, KDD’96 (Image Credit: Jesse Johnson) EM Algorithm, Dempster et al, JRSS’77 15 Jain, Anil K., M. Narasimha Murty, and Patrick J. Flynn. "Data clustering: a review." ACM computing surveys (CSUR) 31.3 (1999): 264-323.
Learning distribution (structure) Clustering K-means (Image Credit: Jesse Johnson) Spectral Clustering Hierarchical Clustering Graph Cut Manor et al, NIPS’04 Shi et al, TPAMI’00 DBSCAN, Ester et al, KDD’96 (Image Credit: Jesse Johnson) EM Algorithm, Dempster et al, JRSS’77 NMF, Xu et al, SIGIR‘03 (Image Credit: Conrad Lee) 16 Jain, Anil K., M. Narasimha Murty, and Patrick J. Flynn. "Data clustering: a review." ACM computing surveys (CSUR) 31.3 (1999): 264-323.
Learning distribution (structure) Sub-space Analysis PCA (Image Credit: Jesse Johnson) ICA (Image Credit: Shylaja et al) tSNE, Maaten et al, JMLR’08 Subspace Clustering, Vidal et al. 17 Sparse coding, Olshausen et al. Vision Research’97
Learning representation (feature) Autoencoder , Hinton et al, Science’06 DBN, Hinton et al, Science’06 DBM, Salakhutdinov et al, AISTATS’09 (Image Credit: Jesse Johnson) Bengio et al, TPAMI’13 Yoshua Bengio, Aaron Courville, and Pierre Vincent. "Representation learning: A review and new perspectives." IEEE Transactions on Pattern Analysis and Machine Intelligence. 35.8 (2013): 1798-1828. 18
Learning representation (feature) VAE, Kingma et al, arXiv’13 (Image Credit: Fast Forward Labs) GAN, Goodfellow et al, NIPS’14 DCGAN, Radford et al, arXiv’15 (Image Credit: Mike Swarbrick Jones) 19
Most Recent CV Works Ego-motion, Jayaraman et al, ICCV’15 Spatial context, Doersch et al, ICCV’15 Temporal context, Wang et al, ICCV’15 Solving Jigsaw, Noroozi et al, ECCV’16 20 Context Encoder, Deepak et al, CVPR’16
Most Recent CV Works Visual concept clustering, Huang et al, CVPR’16 TAGnet , Wang et al, SDM’16 Graph constraint, Li et al, ECCV’16 Deep Embedding, Xie et al, ICML’16 21
Our Work Joint Unsupervised Learning (JULE) of Deep Representations and Image Clusters 22
Outline • Intuition • Approach • Experiments • Extensions 23
Intuition Meaningful clusters can provide supervisory signals to learn image representations 24
Intuition Meaningful clusters can provide supervisory signals to learn image representations Good representations help to get meaningful clusters 25
Intuition Cluster images first, and then learn representations 26
Intuition Cluster images first, and then learn representations Learn representations first, and then cluster images 27
Intuition Cluster images first, and then learn representations Learn representations first, and then cluster images Cluster images and learn representations progressively 28
Intuition Good cluster Good representations Good clusters Poor clusters Good representations Poor representations 29
Intuition Good cluster Good representations Good clusters Poor clusters Good representations Poor representations 30
Intuition Good cluster Good representations Good clusters Poor clusters Good representations Poor representations 31
Intuition Good cluster Good representations Good clusters Poor clusters Good representations Poor representations 32
Approach • Framework • Objective • Algorithm & Implementation 33
Approach: Framework arg min ( | , ) L y I Convolutional Neural Network argmin ( , L y | ) I Representation Agglomerative y , Learning Clustering Agglomerative Clustering argmin ( | , ) L y I y 34
Approach: Framework Convolutional Neural Network Agglomerative Clustering argmin ( | , ) L y I arg min ( | , ) L y I y 35
Approach: Recurrent Framework 36
Approach: Recurrent Framework 37
Approach: Recurrent Framework 38
Approach: Recurrent Framework 39
Approach: Recurrent Framework 40
Approach: Recurrent Framework 41
Approach: Recurrent Framework Backward at each time-step is time-consuming and prone to over-fitting! 42
Approach: Recurrent Framework Backward at each time-step is time-consuming and prone to over-fitting! How about updating once for multiple time-steps? 43
Approach: Recurrent Framework Partially Unrolling: divide all T time-steps into P periods In each period, we merge clusters for multiple times and update CNN parameters at the end of period 44
Approach: Recurrent Framework Partially Unrolling: divide all T time-steps into P periods In each period, we merge clusters for multiple times and update CNN parameters at the end of period 45
Approach: Recurrent Framework Partially Unrolling: divide all T time-steps into P periods In each period, we merge clusters for multiple times and update CNN parameters at the end of period P is determined by a hyper-parameter will be introduced later 46
Approach: Objective Function argmin ( | , ) L y I arg min ( | , ) L y I argmin ( , L y | ) I y y , Overall loss: 47
Approach: Objective Function Loss at time-step t: Conventional Agg. Proposed Agg. Clustering Strategy Clustering Strategy 48
Approach: Objective Function Loss at time-step t: Affinity measure Conventional Agg. Proposed Agg. Clustering Strategy Clustering Strategy 49
Approach: Objective Function Loss at time-step t: i-th cluster Conventional Agg. Proposed Agg. Clustering Strategy Clustering Strategy 50
Approach: Objective Function Loss at time-step t: K_c nearest neighbor clusters of i-th cluster Conventional Agg. Proposed Agg. Clustering Strategy Clustering Strategy 51
Approach: Objective Function Loss at time-step t: Affinity between i-th cluster and its NN Conventional Agg. Proposed Agg. Clustering Strategy Clustering Strategy 52
Approach: Objective Function Loss at time-step t: Affinity between i-th cluster and its NN Conventional Agg. Proposed Agg. Clustering Strategy Clustering Strategy Differences between two cluster affinities 53
Approach: Objective Function Merge these two clusters Loss at time-step t: Affinity between i-th cluster and its NN Conventional Agg. Proposed Agg. Clustering Strategy Clustering Strategy Differences between two cluster affinities 54
Recommend
More recommend