deep learning of binary hash codes for fast image
play

Deep Learning of Binary Hash Codes for Fast Image Retrieval Kevin - PowerPoint PPT Presentation

Deep Learning of Binary Hash Codes for Fast Image Retrieval Kevin Lin, Huei-Fang Yang, Jen-Hao Hsiao, Chu-song chen Yahoo! Taiwan CVPR 2015 2016. 11. 6. 1 Index Review Background & Motivation Method Experiment


  1. Deep Learning of Binary Hash Codes for Fast Image Retrieval Kevin Lin, Huei-Fang Yang, Jen-Hao Hsiao, Chu-song chen Yahoo! Taiwan CVPR 2015 2016. 11. 6. 박중언 1

  2. Index • Review • Background & Motivation • Method • Experiment & Result • Q & A • Quiz 2

  3. Review 3

  4. Review - Video Object Segmentation http://sglab.kaist.ac.kr/~sungeui/IR/Presentation/first_2016/%EC%A3%BC%EC%84%B8%ED%98%84.pdf 4

  5. Review - Video Object Segmentation http://sglab.kaist.ac.kr/~sungeui/IR/Presentation/first_2016/%EC%A3%BC%EC%84%B8%ED%98%84.pdf 5

  6. Background & Motivation 6

  7. Background - Inverted Index • Reduce search space effectively with agreeable loss of accuracy • Use those ANN techniques for efficiently finding near clusters 7 http://sglab.kaist.ac.kr/~sungeui/IR/Slides2016/Lec4b-bow.pdf

  8. Motivation • Need fast retrieval within huge amount of image data sets. • Need to generate the binary compact codes directly from the deep CNN. 8

  9. Motivation • Consider characteristic of CNN layer depth • Feature from deep layer • similar appearance • Feature from shallow layer • similar High-level semantics 9

  10. Method 10

  11. Method • The method includes three main component consist of 3 steps. • Pre-Training • Fine-Tuning • Hierarchical Search 11

  12. Method – pre-training • Supervised pre-training on the large-scale ImageNet dataset • > 1M images, > 1000 categories • Trained with Alexnet 12

  13. Method – fine-tuning • Fine-tuning the network with the latent layer to simultaneously learn domain specific feature representation and a set of hash-like function 13

  14. Method – fine-tuning • The weights of the latent layer H and the final classification layer F8 are randomly initialized. • The initial random weights of latent layer H acts like LSH [6] which uses random projections for constructing the hashing bits • A. Gionis, P . Indyk, R. Motwani, et al. Similarity search in high dimensions via hashing. In VLDB, volume 99, pages 518 – 529, 1999. 1, 2, 4, 6 14

  15. Method – fine-tuning • latent layer H are activated by sigmoid functions so the activations are approximated to {0,1}. • Sigmoid function • To achieve domain adaptation, fine -tune the proposed network on the target-domain dataset via back propagation. 15

  16. Method – image retrieval • Retrieves images similar to the query one via the hierarchical deep search • Hierarchical search has two steps. • First, it finds nearest n coarse feature with Coarse-level search • Second, fine Fine-level search for candidates belong to the coarse feature. 16

  17. Method – image retrieval • Similarity level in coarse-level search is as the Hamming distance 17

  18. Method – image retrieval 18

  19. Experiment & Result 19

  20. Experiment • Supervised pre-learning on ImageNet • Fine-tuning on target domain • MNIST, CIFAR-10 • Image Retrieval via Hierarchy deep search 20

  21. Experiment • Experiment has done in MNIST, CIFAR-10, and Yahoo-1M dataset • Define precision@k to measure performance • (number of ground truth images in top k) / k 21

  22. Experiment - MNIST • F8 to 10 way, 10 object categories, and h is also set as 48 and 128. • 50,000 training iterations 22

  23. Experiment - MNIST • Classification performed for 1000 images on training set(left) , test set(right) 23

  24. Experiment – CIFAR-10 • CIFAR-10, F8 to10 way, 10 object categories, and h is also set as 48 and 128. 24

  25. Experiment – CIFAR-10 • Classification performed for 1000 images on training set(left) , test set(right) 25

  26. Experiment – Yahoo! 1M dataset • 116 object categories, and h in the latent layer to 128 • randomly select 1000 images 26

  27. Experiment – Yahoo! 1M dataset • (1) AlexNet: F7 feature from the pre-trained CNN [14]; • (2) Ours-ES: F7 features from our network; • (3) Ours-BCS: Latent binary codes from our network; • (4) Ours-HDS: F7 features and latent binary codes from our network. 27

  28. Result – Yahoo! 1M dataset • Classification performed for 1000 images on Training set(left) , test set(right) 28

  29. Result - Speed • 971.3x faster than traditional exhaustive search with 4096- dimensional features. 29

  30. Conclusion • Introducing a simple, yet effective supervised learning framework for rapid image retrieval . • Suggested CNN techniques that learns domain specific image representations and a set of hashing-like functions for rapid image retrieval. • The proposed method outperforms all of the state of-the-art works on the public dataset • Our approach learns binary hashing codes in a pointwised manner and is easily scalable to the data size in comparison of conventional pair-wised approaches 30

  31. Q & A 31

  32. Thank you! 32

Recommend


More recommend