Spatial Coordinate Coding To Reduce Histogram Representations, Dominant Angle And Colour Pyramid Match P. Koniusz, K. Mikolajczyk CVSSP, University of Surrey, UK { P.Koniusz, K.Mikolajczyk } @surrey.ac.uk September 11, 2011 P. Koniusz, K. Mikolajczyk ( CVSSP ) Spatial Cooridnate Coding September 11, 2011 1 / 13
Introduction Recognition approach (Bag of Words) 1. Feature extraction 2. Visual vocabulary 3. Mid-level features 4. Classification L 0 L 1 … freq. pool|L X0 ,L Y0 L 2 pool|L X1 ,L Y1 … pool|L X2 ,L Y2 codewords Detect Compute Cluster Build Average or S 2 -spatial Kernel + key-points descriptors descriptors histograms max pooling pyramid match SVM or KDA 1. Feature extraction 2. Visual vocabulary 3. Mid-level features 4. Classification Spatial Pyramid Match [S. Lazebnik, 2006] at a heart of modern … object category recognition to exploit spatial bias in images … Mid-level feature representations result from mapping low level features (e.g. descriptors) to a given vocabulary space Increasing number of quantisation levels results in extreme histogram vectors of 200K or more elements … … P. Koniusz, K. Mikolajczyk ( CVSSP ) Spatial Cooridnate Coding September 11, 2011 2 / 13
Introduction Aim To propose a new joint appearance and spatial representation To reduce resulting vector sizes and therefore both computational and memory requirements To investigate which of pooling modalities (spatial, dominant angle, scale, colour bias) benefit from multiple levels of quantisation Bias in images (Spatial Pyramid Match) sky ,tree, ship, grass sky, tree tree, ship, grass sky sky, tree, ship, grass grass Coordinate set X s of an object s introduces spatial bias p ( s | � x ) ≥ p ( s ) for � x ∈ X s P. Koniusz, K. Mikolajczyk ( CVSSP ) Spatial Cooridnate Coding September 11, 2011 3 / 13
Introduction Bias in images (Dominant Edge Orientation) fencefence fence trunk trunk Trunks t remain largely vertical order Θ t : p ( t | θ ) ≥ p ( t ) if θ ∈ Θ t Bias in images (Dominant Colours) sky trees Foliage f is of a limited colour set C f , thus p ( f | � c ) ≥ p ( f ) if � c ∈ C f P. Koniusz, K. Mikolajczyk ( CVSSP ) Spatial Cooridnate Coding September 11, 2011 4 / 13
Spatial Coordinate Coding for Soft Assignment Descriptor to mid-level features mapping: � h n = f ( � x n ) , n = 1 , ..., N � x n ∈ X - image descriptors � h n - mid-level features Mid-level features are Component Membership Probabilities of GMM: g ( � x n ; � m k , σ ) h nk = p ( � m k | � x n ) = � K k ′ =1 g ( � x n ; � m k ′ , σ ) m k ∈ M - visual words � σ - model paremeter Average (or maximum) pooling operation performed on columns of matrix H N × K We assume independence of visual appearance and spatial bias and code both modalities as a joint distribution ( key idea ): ′ ′ ] ′ ′ ′ ) α ( n , k ) = g [(1 − α ) � x n ; (1 − α ) � · g ( α� n ; α� g m k , σ x m k , σ � �� � � �� � visual term spatial term P. Koniusz, K. Mikolajczyk ( CVSSP ) Spatial Cooridnate Coding September 11, 2011 5 / 13
Spatial Coordinate Coding for Sparse Coding Mid-level features by optimising: � � 2 x n − M � + β | � � � arg min h n | � � h n � � h n M D × K - visual vocabulary with K atoms of length D ′ terms added to the problem ′ Spatial descriptor � n and dictionary M x ( key idea ): � � � � 2 2 x n − M � ′ ′ � + β | � � � � � arg min (1 − α ) � � + α � � n − M h n | (1) h n x h n � � � h n � �� � � �� � visual term spatial term Soft Assignment and Sparse Coding can be spatially enhanced by just ′ concatenating image descriptors with the spatial information � x n , i.e.: √ , √ α ( � ] T ( key outcome ) x aug x T n ) T ′ � = [ 1 − α� x n n � �� � � �� � visual term spatial term P. Koniusz, K. Mikolajczyk ( CVSSP ) Spatial Cooridnate Coding September 11, 2011 6 / 13
Experiments on Spatial Information (VOC 2010) Spatial Coordinate Coding Pascal 2010 [M. Everingham, 2010] Action Classification set 9 classes, 301 training, 307 validation, and 613 testing bounding boxes Soft Assignment (SA) and Spatial Coordinate Coding (SCC) with RBF χ 2 kernels used Results reported as Mean Average Precision SA + SPM (3 levels ) SA + SCC SA + SCC validation, 1 kernel validation, 1 kernel test, multiple kernels 49.8 51.6 62.15 Spatial Coordinate Coding outperforms Spatial Pyramid Match P. Koniusz, K. Mikolajczyk ( CVSSP ) Spatial Cooridnate Coding September 11, 2011 7 / 13
Experiments on Spatial Information (Flower 17) Spatial Coordinate Coding Flower 17 [M. E. Nilsback, 2008], 17 classes, 3 splits of data, each consisting of 680 training, 340 validation, and 340 testing images Soft Assignment SCC SPM (3 levels) χ 2 kernel 91.16 89.3 Sparse Coding SCC SPM (4 levels) linear kernel 88.43 88.86 Spatial Coordinate Coding is a weaker performer if Sparse Coding and linear classifier are used Pyramid Match elevates histogram data to a higher dimensional representation (vital for linear classifier) P. Koniusz, K. Mikolajczyk ( CVSSP ) Spatial Cooridnate Coding September 11, 2011 8 / 13
Experiments on Dominant Angle Pyramid Match Dominant Angle Pooling Pascal 2007 consists of 20 object categories with high variability in intra-class appearance, rotation, and spatial position Dominant Angle (DA) on descriptor level (variant, invariant, and descriptor augmentation cases) DA invariant DA variant DA coordinate appended 46.00 50.23 50.24 Dominant Angle is important in classification Dominant Angle (DA) with multiple qunatisation levels (DAPM) and Spatial Pyramid Match (SPM) SPM (3 levels) DAPM (5 levels ) DAPM + SPM 54.3 53.40 SPM 56.3 Best results achieved when using both Spatial (3 levels) and Dominant Angle Pyramid Match (5 levels) P. Koniusz, K. Mikolajczyk ( CVSSP ) Spatial Cooridnate Coding September 11, 2011 9 / 13
Experiments on Colour Pyramid Match Colour Component Pooling Flower 17 set used for further evaluation as it greatly benefits from colour information Soft Assignment (SA) and Spatial Coordinate Coding (SCC) with RBF χ 2 kernels used Results Reported as Average Accuracy SCC 86.4% SCC+Colour Pyramid Match 87.4 % SCC+Colour Pyramid Match+Opponent SIFT 91.4 % MKL based approach [F. Yan, 2010] 86.7% P. Koniusz, K. Mikolajczyk ( CVSSP ) Spatial Cooridnate Coding September 11, 2011 10 / 13
Conclusions Spatial Coordinate Coding outperforms SPM (3 levels) (e.g. by 1.8 % on Flower 17) It reduces histogram sizes from e.g. 56 K to 4 K bypassing Spatial Pyramid Match Spatial bias does not benefit much form multi-level quantisation Dominant Angle benefits from multi-level quantisation (DAPM) DAPM+SPM results in 2.0 % improvement on VOC 2007 Colour Pyramid Match improves further Spatial Coordinate Coding by 1.0 % on Flower 17 Letting classifier decide the right level of quantisation for multiple modalities leads to performance improvement P. Koniusz, K. Mikolajczyk ( CVSSP ) Spatial Cooridnate Coding September 11, 2011 11 / 13
Thank You P. Koniusz, K. Mikolajczyk ( CVSSP ) Spatial Cooridnate Coding September 11, 2011 12 / 13
References S. Lazebnik et al. (2006) Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. CVPR . J. C. Van Gemert et al. (2010) Visual Word Ambiguity. PAMI . J. Yang et al. (2009) Linear spatial pyramid matching using sparse coding for image classification. CVPR . M. E. Nilsback et al. (2008) Automated Flower Classification over a Large Number of Classes. ICCV . M. Everingham et al. (2010) The PASCAL Visual Object Classes Challenge 2010 (VOC2010) Results. ICCV . F. Yan et al. (2010) Lp Norm Multiple Kernel Fisher Discriminant Analysis for Object and Image Categorisation. CVPR . P. Koniusz, K. Mikolajczyk ( CVSSP ) Spatial Cooridnate Coding September 11, 2011 13 / 13
Recommend
More recommend