Simultaneous Feature Learning and Hash Coding with Deep Neural Networks Presenter: MinKu Kang 1
Why did I choose this paper? • Efficient image retrieval via binary encoding of images - efficient bitwise operations. - space-efficient storage. • There are many useful techniques which can also readily be used in other research fields. • An advanced Neural Net (shared network) structure is used from which I can learn a lot. 2
Background - Similarity-Preserving Hashing 3
Related Work Learn Learn binary hash codes Binary hashing functions Two-Stage Framework 4
Decomposing Similarity Matrix 5
Learning Hash Functions Used as ground truth 6
Two Stage Framework Analytical 7
Optimization Cost Function 8
Augmenting Output Layer with Binary Class Labels 9
Dataset for Experiments 10
Results on CIFAR-10 For 48 bits hash codes 11
Results on NUS-WIDE as much as two orders of magnitude For 48 bits hash codes 12
Related Work – Metric Learning Based Hashing Haomiao et al ., Deep Supervised Hashing for Fast Image Retrieval, CVPR 2016 13
Similarity-Preserving Loss Function Loss for similar pairs Loss for dissimilar pairs … 14
Relaxation of the Loss Function But they didn’t use sigmoid layer because it slows down the convergence Typically, this layer is replaced by A regularizer encouraging the output values a sigmoid activation layer for in the vicinity of range (-1 ~ 1) binary-like outputs Weaker constraints compared to [0, 1]-sigmoid layer, but shows better performance. 15
Effect of Regularizer Results with sigmoid output layer # values in the Output layer Output value More peaked, More binary-like 16
Effect of Regularizer Retrieval performance ( mAP ) of models under different settings of parameters 17
Results on CIFAR-10 18
Background - Metric Learning Siamese Network Triplet Network Similarity of x1 and x2 (pairwise) x is more similar to x+ than to x- (high-order relationship) 19
Triplet Loss based Network less-similar Hanjiang et al., Simultaneous Feature Learning and Hash Coding with Deep Neural Networks , CVPR 2015 20
Pairwise versus Triplet Ranking Pairwise Similarity Triplet Ranking More-similar Similar query Less-similar 𝐽 − 𝐽 𝐽 + 𝑗𝑛𝑏𝑓 𝐽 𝑗𝑡 𝒏𝒑𝒔𝒇 𝒕𝒋𝒏𝒋𝒎𝒃𝒔 𝑢𝑝 𝑗𝑛𝑏𝑓 𝐽 + 𝑢ℎ𝑏𝑜 𝑢𝑝 𝑗𝑛𝑏𝑓𝐽 − Dissimilar 21
Training Architecture . CNN 𝐺(𝐽) . . 𝐽 𝐺(𝐽 + ) . CNN . . 𝐽 + Weights are shared. Triplet Ranking Loss CNN . 𝐺(𝐽 − ) . . 𝐽 − Weights are shared. Sigmoid activation layer restricts the output values in the range [0, 1]. 22
Triplet Ranking Loss loss The term -1 23
Training Architecture 𝑋 𝑗𝑘 . CNN 𝐺(𝐽) . . 𝐽 𝑋 𝑗𝑘 𝐺(𝐽 + ) . CNN . . 𝐽 + Weights are shared. Triplet Ranking Loss 𝑋 𝑗𝑘 CNN . 𝐺(𝐽 − ) . . 𝐽 − Weights are shared. 𝜖 𝑗𝑘 + 𝛽 𝑋 𝑗𝑘 = 𝑋 𝜖𝑋 𝑗𝑘 24
Weight Update Analytically differentiated. 𝜖𝑚 𝐺 𝐽 , 𝐺 𝐽 + , 𝐺 𝐽 − = 𝜖𝑋 𝑗𝑘 𝜖𝑚 𝐺 𝐽 , 𝐺 𝐽 + , 𝐺 𝐽 − ∗ 𝜖𝐺 𝐽 𝜖 … ∗ 𝜖𝐺 𝐽 𝜖 … 𝜖𝑋 𝑗𝑘 + 𝜖𝑚 𝐺 𝐽 , 𝐺 𝐽 + , 𝐺 𝐽 − ∗ 𝜖𝐺 𝐽 + 𝜖 … ∗ 𝜖𝐺 𝐽 + 𝜖 … 𝜖𝑋 𝑗𝑘 + 𝜖𝑚 𝐺 𝐽 , 𝐺 𝐽 + , 𝐺 𝐽 − ∗ 𝜖𝐺 𝐽 − 𝜖 … ∗ 𝜖𝐺 𝐽 − 𝜖 … 𝜖𝑋 𝑗𝑘 Updating 𝑋 𝑗𝑘 requires values from the three networks ! 𝜖𝑚 𝐺 𝐽 , 𝐺 𝐽 + , 𝐺 𝐽 − = 𝒈(𝑶𝑶, 𝑶𝑶 + , 𝑶𝑶 − ) 𝜖𝑋 𝑗𝑘 25
Weight Update Updating 𝑋 𝑗𝑘 requires values from the three networks ! 𝜖𝑚 𝐺 𝐽 , 𝐺 𝐽 + , 𝐺 𝐽 − = 𝒈(𝑶𝑶, 𝑶𝑶 + , 𝑶𝑶 − ) 𝜖𝑋 𝑗𝑘 • We need three forward-propagations for each training triplet. • We need to maintain the three weight-shared copies of the network in the memory • Weight update is computationally expensive compared to a typical single network structure. • Possible combinations of triplets given a training set are many. • There is a dedicated paper (*) – training time improvement about two-orders of magnitude. * Bohan et al , Fast Training of Triplet-based Deep Binary Embedding Networks, CVPR 2016 . 26
Divide-and-Encode Module Enforces Independency property • Each hash bit is generated from a separated slice of features • the output hash codes may be less redundant to each other. • No Mathematical Proof. hash codes hash codes 27
Overall Structure Input Image CNN Divide-and-Encode Quantization In test time, a trained single network is used 28
Results on SVNH 29
Results on CIFAR10 30
Results on NUS-WIDE 31
Divide-and-Encode versus Fully-Connected-Encode 32
DSH(pairwise) verse DNNH(triplet) Triplet Pairwise In DSH paper, they said they implemented DNNH themselves. In DSH paper, they said divide and encode structure largely degraded the retrieval mAP on CIFAR-10 Training inefficiencies in Training Triplet Network may have resulted in inferior performance 33
Conclusion • While Triplet Network can learn higher-order relationship between training samples, there are training inefficiencies • Practically, pairwise metric learning based method shows better performance • Efficient sampling strategies for triplets are needed. • Solving training inefficiencies in Triplet Network could be a key for better results • End-to-End architecture is preferred. 34
References & Acknowledgement • RongkaiXiae et al., Supervised Hashing for Image Retrieval via Image Representation Learning • Haomiao et al ., Deep Supervised Hashing for Fast Image Retrieval • Hanjiang et al., Simultaneous Feature Learning and Hash Coding with Deep Neural Networks 35
Quiz • 1. What is the advantage of Triplet Network over Pairwise Network? a) fast training speed b) low complexity of the architecture c) capturing high-order relationships between training samples • 2. Why did the authors design the Divide and Encode Module? a) to enhance the training speed b) to enforce the independency property between hash functions c) to lower down the complexity of the problem 36
Recommend
More recommend