Siamese Neural l Netw Networks a and Simila larity Learning
Wh What at can an ML ML do do for or us? • Classification problem Neural CAT Network Prof. Leal-Taixé and Prof. Niessner 2
Wh What at can an ML ML do do for or us? • Classification problem on ImageNet with thousands of categories Prof. Leal-Taixé and Prof. Niessner 3
Wh What at can an ML ML do do for or us? • Performance on ImageNet – Size of the blobs indicates the number of parameters A. Canziani et al. „An Analysis of Deep Neural Network Models for Practical Prof. Leal-Taixé and Prof. Niessner 4 Applications“. arXiv:1605.07678 2016
Wh What at can an ML ML do do for or us? • Regression problem: pose regression Pretrained FC network p ∈ R 3 q ∈ R 4 y ∈ R 2048 FC Feature extraction Linear regression Prof. Leal-Taixé and Prof. Niessner 5
Wh What at can an ML ML do do for or us? • Regression problem: bounding box regression D. Held et al. „Learning to Track at 100 FPS with Deep Regression Networks“. ECCV 2016 Prof. Leal-Taixé and Prof. Niessner 6
Wh What at can an ML ML do do for or us? • Third type of problems Classification: person, face, female A B Classification: person, face, male Prof. Leal-Taixé and Prof. Niessner 7
Wh What at can an ML ML do do for or us? • Third type of problems A Is it the same person? B Prof. Leal-Taixé and Prof. Niessner 8
Wh What at can an ML ML do do for or us? • Third type of problems: Similarity Learning A - Comparison - Ranking B Prof. Leal-Taixé and Prof. Niessner 9
Si Simila larity ty Le Learni ning ng: whe when n and nd why why? • Application: unlocking your iPhone with your face Training Prof. Leal-Taixé and Prof. Niessner 10
Si Simila larity ty Le Learni ning ng: whe when n and nd why why? • Application: unlocking your iPhone with your face YES A Testing Can be solved as a B NO classification problem Prof. Leal-Taixé and Prof. Niessner 11
Si Simila larity ty Le Learni ning ng: whe when n and nd why why? • Application: face recognition system so students can enter the exam room without the need for ID check Person 1 Training Person 2 Person 3 Prof. Leal-Taixé and Prof. Niessner 12
Si Simila larity ty Le Learni ning ng: whe when n and nd why why? • Application: face recognition system so students can enter the exam room without the need for ID check What is the problem with this approach? Scalability – we need to retrain our model every time a new student is registered to the course Prof. Leal-Taixé and Prof. Niessner 13
Si Simila larity ty Le Learni ning ng: whe when n and nd why why? • Application: face recognition system so students can enter the exam room without the need for ID check Can we train one model and use it every year? Prof. Leal-Taixé and Prof. Niessner 14
Si Simila larity ty Le Learni ning ng: whe when n and nd why why? • Learn a similarity function A A High similarity Low similarity score score B B Prof. Leal-Taixé and Prof. Niessner 15
Si Simila larity ty Le Learni ning ng: whe when n and nd why why? • Learn a similarity function: testing A Not the same d ( A, B ) > τ person B Prof. Leal-Taixé and Prof. Niessner 16
Si Simila larity ty Le Learni ning ng: whe when n and nd why why? • Learn a similarity function A d ( A, B ) < τ Same person B Prof. Leal-Taixé and Prof. Niessner 17
Si Simila larity ty le learni ning ng • How do we train a network to learn similarity? Prof. Leal-Taixé and Prof. Niessner 18
Siamese Neural l Netw Networks
Simila Si larity ty le learni ning ng • How do we train a network to learn similarity? Representation A of my face in 128 values CNN FC Taigman et al. „DeepFace: closing the gap to human level performance“. CVPR 2014 Prof. Leal-Taixé and Prof. Niessner 20
Si Simila larity ty le learni ning ng • How do we train a network to learn similarity? A f ( A ) B f ( B ) Taigman et al. „DeepFace: closing the gap to human level performance“. CVPR 2014 Prof. Leal-Taixé and Prof. Niessner 21
Si Simila larity ty le learni ning ng • Siamese network = shared weights A f ( A ) B f ( B ) Taigman et al. „DeepFace: closing the gap to human level performance“. CVPR 2014 Prof. Leal-Taixé and Prof. Niessner 22
Si Simila larity ty le learni ning ng • Siamese network = shared weights • We use the same network to obtain an encoding of the image f ( A ) • To be done: compare the encodings Taigman et al. „DeepFace: closing the gap to human level performance“. CVPR 2014 Prof. Leal-Taixé and Prof. Niessner 23
Si Simila larity ty le learni ning ng d ( A, B ) = || f ( A ) − f ( B ) || 2 • Distance function • Training: learn the parameter such that – If and depict the same person, is small d ( A, A, B ) = d ( A, B ) = d ( A, A, B ) = d ( A, B ) = – If and depict a different person, is large Taigman et al. „DeepFace: closing the gap to human level performance“. CVPR 2014 Prof. Leal-Taixé and Prof. Niessner 24
Si Simila larity ty le learni ning ng • Loss function for a positive pair: – If and depict the same person, is small d ( A, A, B ) = d ( A, B ) = L ( A, B ) = || f ( A ) − f ( B ) || 2 Prof. Leal-Taixé and Prof. Niessner 25
Si Simila larity ty le learni ning ng • Loss function for a negative pair: d ( A, A, B ) = d ( A, B ) = – If and depict a different person, is large – Better use a Hinge loss: L ( A, B ) = max(0 , m 2 − || f ( A ) − f ( B ) || 2 ) If two elements are already far away, do not spend energy in pulling them even further away Prof. Leal-Taixé and Prof. Niessner 26
Si Simila larity ty le learni ning ng • Contrastive loss: L ( A, B ) = y ∗ || f ( A ) − f ( B ) || 2 + (1 − y ∗ ) max (0 , m 2 − || f ( A ) − f ( B ) || 2 ) Negative pair, Positive pair, brings the elements reduce the distance further apart up to a between the margin elements Prof. Leal-Taixé and Prof. Niessner 27
Si Simila larity ty le learni ning ng • Training the siamese networks – You can update the weights for each channel independently and then average them • This loss function allows us to learn to bring positive pairs together and negative pairs apart Prof. Leal-Taixé and Prof. Niessner 28
Triple let Loss
Tr Triple let t lo loss • Triplet loss allows us to learn a ranking Anchor (A) Positive (P) Negative (N) || f ( A ) − f ( P ) || 2 < || f ( A ) − f ( N ) || 2 We want: Schroff et al „FaceNet: a unified embedding for face recognition and clustering“. CVPR 2015 Prof. Leal-Taixé and Prof. Niessner 30
Tr Triple let t lo loss • Triplet loss allows us to learn a ranking || f ( A ) − f ( P ) || 2 < || f ( A ) − f ( N ) || 2 || f ( A ) − f ( P ) || 2 − || f ( A ) − f ( N ) || 2 < 0 ) = || f ( A ) − f ( P ) || 2 − || f ( A ) − f ( N ) || 2 + m < 0 margin Schroff et al „FaceNet: a unified embedding for face recognition and clustering“. CVPR 2015 Prof. Leal-Taixé and Prof. Niessner 31
Tr Triple let t lo loss • Triplet loss allows us to learn a ranking || f ( A ) − f ( P ) || 2 < || f ( A ) − f ( N ) || 2 || f ( A ) − f ( P ) || 2 − || f ( A ) − f ( N ) || 2 < 0 ) = || f ( A ) − f ( P ) || 2 − || f ( A ) − f ( N ) || 2 + m < 0 L ( A, P, N ) = max (0 , || f ( A ) − f ( P ) || 2 − || f ( A ) − f ( N ) || 2 + m ) Schroff et al „FaceNet: a unified embedding for face recognition and clustering“. CVPR 2015 Prof. Leal-Taixé and Prof. Niessner 32
Tr Triple let t lo loss • Hard negative mining: training with hard cases L ( A, P, N ) = max (0 , || f ( A ) − f ( P ) || 2 − || f ( A ) − f ( N ) || 2 + m ) • Train for a few epochs • Choose the hard cases where d ( A, P ) ≈ d ( A, N ) • Train with those to refine the distance learned Prof. Leal-Taixé and Prof. Niessner 33
Tr Triple let t lo loss Negative Negative Training Anchor Anchor Positive Positive Prof. Leal-Taixé and Prof. Niessner 34
Tr Triple let t lo loss: te test t ti time • Just do nearest neighbor search! ����� ��������� Prof. Leal-Taixé and Prof. Niessner 35
Tr Triple let t Lo Loss Cha halle lleng nges • Random sampling does not work - the number of possible triplets is O(n^3) so the network would need to be trained for a very long time. • Even with hard negative mining, there is the risk of being stuck in local minima. Prof. Leal-Taixé and Prof. Niessner 36
Several l approaches to improve simila larity le learning
Im Improving simil imilar arit ity lear earnin ing • Loss: – Contrastive vs. triplet loss • Sampling: – Choosing the best triplets to train with, sample the space wisely = diversity of classes + hard cases • Ensembles: – Why not using several networks, each of them trained with a subset of triplets? • Can we use a classification loss for similarity learning? Prof. Leal-Taixé and Prof. Niessner 38
Recommend
More recommend