and hash coding
play

and Hash Coding with Deep Neural Networks Presenter: MinKu Kang 1 - PowerPoint PPT Presentation

Simultaneous Feature Learning and Hash Coding with Deep Neural Networks Presenter: MinKu Kang 1 Why did I choose this paper? Efficient image retrieval via binary encoding of images - efficient bitwise operations. - space-efficient


  1. Simultaneous Feature Learning and Hash Coding with Deep Neural Networks Presenter: MinKu Kang 1

  2. Why did I choose this paper? • Efficient image retrieval via binary encoding of images - efficient bitwise operations. - space-efficient storage. • There are many useful techniques which can also readily be used in other research fields. • An advanced Neural Net (shared network) structure is used from which I can learn a lot. 2

  3. Background - Similarity-Preserving Hashing 3

  4. Related Work Learn Learn binary hash codes Binary hashing functions Two-Stage Framework 4

  5. Decomposing Similarity Matrix 5

  6. Learning Hash Functions Used as ground truth 6

  7. Two Stage Framework Analytical 7

  8. Optimization Cost Function 8

  9. Augmenting Output Layer with Binary Class Labels 9

  10. Dataset for Experiments 10

  11. Results on CIFAR-10 For 48 bits hash codes 11

  12. Results on NUS-WIDE as much as two orders of magnitude For 48 bits hash codes 12

  13. Related Work – Metric Learning Based Hashing Haomiao et al ., Deep Supervised Hashing for Fast Image Retrieval, CVPR 2016 13

  14. Similarity-Preserving Loss Function Loss for similar pairs Loss for dissimilar pairs … 14

  15. Relaxation of the Loss Function But they didn’t use sigmoid layer because it slows down the convergence Typically, this layer is replaced by A regularizer encouraging the output values a sigmoid activation layer for in the vicinity of range (-1 ~ 1) binary-like outputs Weaker constraints compared to [0, 1]-sigmoid layer, but shows better performance. 15

  16. Effect of Regularizer Results with sigmoid output layer # values in the Output layer Output value More peaked, More binary-like 16

  17. Effect of Regularizer Retrieval performance ( mAP ) of models under different settings of parameters 17

  18. Results on CIFAR-10 18

  19. Background - Metric Learning Siamese Network Triplet Network Similarity of x1 and x2 (pairwise) x is more similar to x+ than to x- (high-order relationship) 19

  20. Triplet Loss based Network less-similar Hanjiang et al., Simultaneous Feature Learning and Hash Coding with Deep Neural Networks , CVPR 2015 20

  21. Pairwise versus Triplet Ranking Pairwise Similarity Triplet Ranking More-similar Similar query Less-similar 𝐽 − 𝐽 𝐽 + 𝑗𝑛𝑏𝑕𝑓 𝐽 𝑗𝑡 𝒏𝒑𝒔𝒇 𝒕𝒋𝒏𝒋𝒎𝒃𝒔 𝑢𝑝 𝑗𝑛𝑏𝑕𝑓 𝐽 + 𝑢ℎ𝑏𝑜 𝑢𝑝 𝑗𝑛𝑏𝑕𝑓𝐽 − Dissimilar 21

  22. Training Architecture . CNN 𝐺(𝐽) . . 𝐽 𝐺(𝐽 + ) . CNN . . 𝐽 + Weights are shared. Triplet Ranking Loss CNN . 𝐺(𝐽 − ) . . 𝐽 − Weights are shared. Sigmoid activation layer restricts the output values in the range [0, 1]. 22

  23. Triplet Ranking Loss loss The term -1 23

  24. Training Architecture 𝑋 𝑗𝑘 . CNN 𝐺(𝐽) . . 𝐽 𝑋 𝑗𝑘 𝐺(𝐽 + ) . CNN . . 𝐽 + Weights are shared. Triplet Ranking Loss 𝑋 𝑗𝑘 CNN . 𝐺(𝐽 − ) . . 𝐽 − Weights are shared. 𝜖 𝑗𝑘 + 𝛽 𝑋 𝑗𝑘 = 𝑋 𝜖𝑋 𝑗𝑘 24

  25. Weight Update Analytically differentiated. 𝜖𝑚 𝐺 𝐽 , 𝐺 𝐽 + , 𝐺 𝐽 − = 𝜖𝑋 𝑗𝑘 𝜖𝑚 𝐺 𝐽 , 𝐺 𝐽 + , 𝐺 𝐽 − ∗ 𝜖𝐺 𝐽 𝜖 … ∗ 𝜖𝐺 𝐽 𝜖 … 𝜖𝑋 𝑗𝑘 + 𝜖𝑚 𝐺 𝐽 , 𝐺 𝐽 + , 𝐺 𝐽 − ∗ 𝜖𝐺 𝐽 + 𝜖 … ∗ 𝜖𝐺 𝐽 + 𝜖 … 𝜖𝑋 𝑗𝑘 + 𝜖𝑚 𝐺 𝐽 , 𝐺 𝐽 + , 𝐺 𝐽 − ∗ 𝜖𝐺 𝐽 − 𝜖 … ∗ 𝜖𝐺 𝐽 − 𝜖 … 𝜖𝑋 𝑗𝑘 Updating 𝑋 𝑗𝑘 requires values from the three networks ! 𝜖𝑚 𝐺 𝐽 , 𝐺 𝐽 + , 𝐺 𝐽 − = 𝒈(𝑶𝑶, 𝑶𝑶 + , 𝑶𝑶 − ) 𝜖𝑋 𝑗𝑘 25

  26. Weight Update Updating 𝑋 𝑗𝑘 requires values from the three networks ! 𝜖𝑚 𝐺 𝐽 , 𝐺 𝐽 + , 𝐺 𝐽 − = 𝒈(𝑶𝑶, 𝑶𝑶 + , 𝑶𝑶 − ) 𝜖𝑋 𝑗𝑘 • We need three forward-propagations for each training triplet. • We need to maintain the three weight-shared copies of the network in the memory • Weight update is computationally expensive compared to a typical single network structure. • Possible combinations of triplets given a training set are many. • There is a dedicated paper (*) – training time improvement about two-orders of magnitude. * Bohan et al , Fast Training of Triplet-based Deep Binary Embedding Networks, CVPR 2016 . 26

  27. Divide-and-Encode Module Enforces Independency property • Each hash bit is generated from a separated slice of features • the output hash codes may be less redundant to each other. • No Mathematical Proof. hash codes hash codes 27

  28. Overall Structure Input Image CNN Divide-and-Encode Quantization In test time, a trained single network is used 28

  29. Results on SVNH 29

  30. Results on CIFAR10 30

  31. Results on NUS-WIDE 31

  32. Divide-and-Encode versus Fully-Connected-Encode 32

  33. DSH(pairwise) verse DNNH(triplet) Triplet Pairwise In DSH paper, they said they implemented DNNH themselves. In DSH paper, they said divide and encode structure largely degraded the retrieval mAP on CIFAR-10 Training inefficiencies in Training Triplet Network may have resulted in inferior performance 33

  34. Conclusion • While Triplet Network can learn higher-order relationship between training samples, there are training inefficiencies • Practically, pairwise metric learning based method shows better performance • Efficient sampling strategies for triplets are needed. • Solving training inefficiencies in Triplet Network could be a key for better results • End-to-End architecture is preferred. 34

  35. References & Acknowledgement • RongkaiXiae et al., Supervised Hashing for Image Retrieval via Image Representation Learning • Haomiao et al ., Deep Supervised Hashing for Fast Image Retrieval • Hanjiang et al., Simultaneous Feature Learning and Hash Coding with Deep Neural Networks 35

  36. Quiz • 1. What is the advantage of Triplet Network over Pairwise Network? a) fast training speed b) low complexity of the architecture c) capturing high-order relationships between training samples • 2. Why did the authors design the Divide and Encode Module? a) to enhance the training speed b) to enforce the independency property between hash functions c) to lower down the complexity of the problem 36

Recommend


More recommend