Deep Learning of Binary Hash Codes for Fast Image Retrieval Kevin Lin, Huei-Fang Yang, Jen-Hao Hsiao, Chu-song chen Yahoo! Taiwan CVPR 2015 2016. 11. 6. 박중언 1
Index • Review • Background & Motivation • Method • Experiment & Result • Q & A • Quiz 2
Review 3
Review - Video Object Segmentation http://sglab.kaist.ac.kr/~sungeui/IR/Presentation/first_2016/%EC%A3%BC%EC%84%B8%ED%98%84.pdf 4
Review - Video Object Segmentation http://sglab.kaist.ac.kr/~sungeui/IR/Presentation/first_2016/%EC%A3%BC%EC%84%B8%ED%98%84.pdf 5
Background & Motivation 6
Background - Inverted Index • Reduce search space effectively with agreeable loss of accuracy • Use those ANN techniques for efficiently finding near clusters 7 http://sglab.kaist.ac.kr/~sungeui/IR/Slides2016/Lec4b-bow.pdf
Motivation • Need fast retrieval within huge amount of image data sets. • Need to generate the binary compact codes directly from the deep CNN. 8
Motivation • Consider characteristic of CNN layer depth • Feature from deep layer • similar appearance • Feature from shallow layer • similar High-level semantics 9
Method 10
Method • The method includes three main component consist of 3 steps. • Pre-Training • Fine-Tuning • Hierarchical Search 11
Method – pre-training • Supervised pre-training on the large-scale ImageNet dataset • > 1M images, > 1000 categories • Trained with Alexnet 12
Method – fine-tuning • Fine-tuning the network with the latent layer to simultaneously learn domain specific feature representation and a set of hash-like function 13
Method – fine-tuning • The weights of the latent layer H and the final classification layer F8 are randomly initialized. • The initial random weights of latent layer H acts like LSH [6] which uses random projections for constructing the hashing bits • A. Gionis, P . Indyk, R. Motwani, et al. Similarity search in high dimensions via hashing. In VLDB, volume 99, pages 518 – 529, 1999. 1, 2, 4, 6 14
Method – fine-tuning • latent layer H are activated by sigmoid functions so the activations are approximated to {0,1}. • Sigmoid function • To achieve domain adaptation, fine -tune the proposed network on the target-domain dataset via back propagation. 15
Method – image retrieval • Retrieves images similar to the query one via the hierarchical deep search • Hierarchical search has two steps. • First, it finds nearest n coarse feature with Coarse-level search • Second, fine Fine-level search for candidates belong to the coarse feature. 16
Method – image retrieval • Similarity level in coarse-level search is as the Hamming distance 17
Method – image retrieval 18
Experiment & Result 19
Experiment • Supervised pre-learning on ImageNet • Fine-tuning on target domain • MNIST, CIFAR-10 • Image Retrieval via Hierarchy deep search 20
Experiment • Experiment has done in MNIST, CIFAR-10, and Yahoo-1M dataset • Define precision@k to measure performance • (number of ground truth images in top k) / k 21
Experiment - MNIST • F8 to 10 way, 10 object categories, and h is also set as 48 and 128. • 50,000 training iterations 22
Experiment - MNIST • Classification performed for 1000 images on training set(left) , test set(right) 23
Experiment – CIFAR-10 • CIFAR-10, F8 to10 way, 10 object categories, and h is also set as 48 and 128. 24
Experiment – CIFAR-10 • Classification performed for 1000 images on training set(left) , test set(right) 25
Experiment – Yahoo! 1M dataset • 116 object categories, and h in the latent layer to 128 • randomly select 1000 images 26
Experiment – Yahoo! 1M dataset • (1) AlexNet: F7 feature from the pre-trained CNN [14]; • (2) Ours-ES: F7 features from our network; • (3) Ours-BCS: Latent binary codes from our network; • (4) Ours-HDS: F7 features and latent binary codes from our network. 27
Result – Yahoo! 1M dataset • Classification performed for 1000 images on Training set(left) , test set(right) 28
Result - Speed • 971.3x faster than traditional exhaustive search with 4096- dimensional features. 29
Conclusion • Introducing a simple, yet effective supervised learning framework for rapid image retrieval . • Suggested CNN techniques that learns domain specific image representations and a set of hashing-like functions for rapid image retrieval. • The proposed method outperforms all of the state of-the-art works on the public dataset • Our approach learns binary hashing codes in a pointwised manner and is easily scalable to the data size in comparison of conventional pair-wised approaches 30
Q & A 31
Thank you! 32
Recommend
More recommend