Learning Covariant Feature Detectors Karel Lenc and Andrea Vedaldi University of Oxford GMDL Workshop, ECCV 2016
Local Feature Detection for Image Matching 2 (x) (x’) x’ x ... ...
The challenges of learning feature detectors 3 selection covariance [A. Zisserman] Detector goals → select a small number of features learn what to detect → consistently with a viewpoint change learn covariance
Learning detectors 4 Prior work Anchor Points defined a-priory: J. Sochman and J. Matas: Learning fast emulators of binary decision processes. IJCV 2009. (Haar Cascade, Hessian Detector). E. Rosten, R. Porter, and T. Drummond: Faster and better: a machine learning approach to corner detection. TPAMI 2010 (Decision tree, Harris Detector). Detect points stable over time: Y. Verdie et al: TILDE: A Temporally Invariant Learned DEtector, CVPR 2015.
Learning descriptors 5 Extensive prior work S. Winder et al: Picking the best DAISY. CVPR 2009 K. Simonyan et al: Learning local feature descriptors using convex optimization. TPAMI 2014 S. Zagoruyko et al: Learning to Compare Image Patches via Convolutional Neural Networks. CVPR 2015 M. Paulin et al: Local Convolutional Features with Unsupervised Training for Image Detector ≠ descriptor Retrieval. ICCV 2015 E. Simo-Serra et al: Discriminative learning of deep convolutional feature point descriptors. ICCV 2015 V. Balntas et al: Learning local feature descriptors with triplets and shallow convolutional neural networks. BMVC 2016 F. Radenovic et al. CNN Image Retrieval Learns from BoW: Unsupervised Fine- Tuning with Hard Examples. ECCV 2016 See Also the Local Features: State of the art, open problems and performance evaluation Workshop
Detection by regression 6 Bypassing the feature selection problem Detection by regression : Design a function Ψ that translates the centre of an image window x on top of the nearest feature. Ψ T -1 x x T CNN Advantages: feature selection is performed implicitly by the function Ψ the function Ψ can be implemented by a CNN this simplifies learning formulation
From regression to detection 7 A convolutional approach input image estimated translations voted features Apply the function Ψ at all locations, convolutionally Accumulate votes for the feature locations Perform non-maxima suppression in the vote map
Covariance constraint 8 The CNN normalizes the patch by putting the feature in the center Ψ T -1 x x T CNN
Covariance constraint 9 The CNN normalises the patch by putting the feature in the centre T -1 x T CNN Ψ x
Covariance constraint 10 Two translated patches become the same after normalization T -1 x Tʹ -1 x identity map I Tʹ T T -1 = -T Tʹ CNN CNN Ψ Ψ x ʹ x T’ ○ I ○ -T
Covariance constraint 11 Two translated patches become the same after normalization T -1 x Tʹ -1 x identity map Tʹ T CNN CNN Ψ Ψ x ʹ x Tʹ - T
Covariance constraint 12 We do not know a-priori what the normalized patch looks like ? ? identity map Tʹ T CNN CNN Ψ Ψ x ʹ x Tʹ - T
Covariance constraint 13 But we may know the transformation T gt between the patches ? ? identity map Tʹ T x ʹ x T gt T gt = Tʹ - T = Ψ( x ʹ) - Ψ( x )
Covariance constraint 14 Satisfied if the normalised patches are identical Different Tʹ T x ʹ x T gt ⨉ T gt = Tʹ - T = Ψ( x ʹ) - Ψ( x )
Covariance constraint 15 Satisfied if the normalised patches are identical Same Tʹ T x ʹ x T gt ✔ T gt = Tʹ - T = Ψ( x ʹ) - Ψ( x )
Learning formulation 20 ‖ T gt - Ψ( x ʹ) + Ψ ( x ) ‖ 2 ≅ 0 T gt = Ψ( x ʹ) - Ψ( x ) Using synthetic translations of real patches Loss terms T gt,1 ‖ T gt,1 - Ψ(T gt,1 x 1 ) + Ψ ( x 1 ) ‖ 2 x 1 = T gt,1 x 1 x 1 T gt,2 ‖ T gt,2 - Ψ(T gt,2 x 2 ) + Ψ ( x 2 ) ‖ 2 x 1 = T gt,1 x 1 x 2 T gt,3 ‖ T gt,3 - Ψ(T gt,3 x 3 ) + Ψ ( x 3 ) ‖ 2 x 3 = T gt,3 x 3 x 3 … … …
Generalisation: affine covariant detectors 21 Features are oriented ellipses and transformations are affinities . 1 1 1 identity map Ψ( x ) = (A,T) Ψ( x ʹ) = (Aʹ,Tʹ) G gt x Ψ( x ʹ) ○ Ψ( x ) -1 G gt = Ψ( x ʹ) ○ Ψ( x ) -1
A very general framework 22 By choosing different transformation groups, we obtain different detector types: Covariance group Residual group Feature Example - Translation Point FAST DoG + Orientation - Similarity Pointed disk detection Harris Affine + - Affine Pointed ellipse Orientation detection - Rotation Line bundle Orientation detection Euclidean Rotation Point Harris Translation 1D translation Line Edge detector Simlarity Rotation Disk SIFT Affine Rotation Ellipse Harris Affine
Residual transformations 23 DoG is covariant with scale, rotation, translation, but fixes only scale and translation. Solution : replace identity with a residual transformation Q supplanting the missing information. Q residual Ψ( x ʹ) = (sʹ,Tʹ) Ψ( x ) = (s,T) T gt = (s gt, R gt, T gt ) Ψ( x ʹ) ○ Q ○ Ψ( x ) -1 See the paper for a full theoretical characterisation.
A very general framework 24 By choosing different transformation groups, we obtain different detector types: Covariance group Residual group Feature Example - Translation Point FAST DoG + Orientation - Similarity Pointed disk detection Harris Affine + - Affine Pointed ellipse Orientation detection - Rotation Line bundle Orientation detection Euclidean Rotation Point Harris Translation 1D translation Line Edge detector Simlarity Rotation Disk DoG Affine Rotation Ellipse Harris Affine
An example CNN architecture 25 DeteNet: A simple Siamese CNN architecture with six layers to estimate T Training uses synthetic patch pairs generated from ILSVRC12 train Uses Laplace operator to sample salient image patches of size 28 x 28 Loss ‖ T gt - Ψ( x ʹ) + Ψ( x )‖ 2 Ψ( x ) Ψ( x' ) T gt Conv 1 ⨉ 1 ⨉ 2 Conv 1 ⨉ 1 ⨉ 2 Conv 1 ⨉ 1 ⨉ 500 Conv 1 ⨉ 1 ⨉ 500 Shared Conv 1 ⨉ 1 ⨉ 500 Conv 1 ⨉ 1 ⨉ 500 Weights Conv 4 ⨉ 4 ⨉ 300 Conv 4 ⨉ 4 ⨉ 300 Conv 5 ⨉ 5 ⨉ 100 Conv 5 ⨉ 5 ⨉ 100 Pool ↓2 Pool ↓2 Conv 5 ⨉ 5 ⨉ 40 Conv 5 ⨉ 5 ⨉ 40 x’ x Pool ↓2 Pool ↓2
DetNet Easy Test 26
DetNet Hard Test 27
Detection
Detection
Detection
Detection
DoG
Harris
Detection
DetNet results 37 Repeatability on DTU
DetNet results 38 Repeatability on DTU
DetNet results 39 Repeatability on DTU
RotNet Architecture 40 L2 Norm places the regressed location on an unit circle Loss ‖ R gt - Ψ( x ʹ) + Ψ( x )‖ 2 Ψ( x ) Ψ( x' ) (cos( 𝜄 gt ), sin( 𝜄 gt )) L2 Norm L2 Norm Conv 1 ⨉ 1 ⨉ 2 Conv 1 ⨉ 1 ⨉ 2 Conv 1 ⨉ 1 ⨉ 500 Conv 1 ⨉ 1 ⨉ 500 Shared Conv 1 ⨉ 1 ⨉ 500 Conv 1 ⨉ 1 ⨉ 500 Weights Conv 4 ⨉ 4 ⨉ 300 Conv 4 ⨉ 4 ⨉ 300 Conv 5 ⨉ 5 ⨉ 100 Conv 5 ⨉ 5 ⨉ 100 Pool ↓2 Pool ↓2 Conv 5 ⨉ 5 ⨉ 40 Conv 5 ⨉ 5 ⨉ 40 x’ x Pool ↓2 Pool ↓2
RotNet Easy Test 41
RotNet Hard Test 42
Summary 44 Detection by regression bypasses the feature selection problem formulated as a learnable convolutional regressor Covariance constraint covariance is all you need to learn features naturally integrates with detection by regression a very general formulation encompassing all detector types bonus: a nice theory of features Results very good detectors can be learned however, not yet dramatically better than existing ones This is just the beginning a whole new approach many details still need ironing out DepDet Models available at: https://github.com/lenck/ddet
DetNet results 45 Repeatability on VGG-Affine
RotNet example patch pairs 46 Angle error and matching score VGG-Affine
Recommend
More recommend