extended bag of words formalism for image classification
play

Extended Bag-of-Words Formalism for Image Classification Sandra - PowerPoint PPT Presentation

Extended Bag-of-Words Formalism for Image Classification Sandra Avila 1 , 2 (Cotutelle PhD Candidate), ujo 1 (Advisor), Matthieu Cord 2 (Advisor), Arnaldo de A. Ara Nicolas Thome 2 (Co-Advisor), Eduardo Valle 3 (Collaborator) 1 Federal


  1. Extended Bag-of-Words Formalism for Image Classification Sandra Avila 1 , 2 (Cotutelle PhD Candidate), ujo 1 (Advisor), Matthieu Cord 2 (Advisor), Arnaldo de A. Ara´ Nicolas Thome 2 (Co-Advisor), Eduardo Valle 3 (Collaborator) 1 Federal University of Minas Gerais, NPDI Lab – UFMG, Belo Horizonte, Brazil 2 Pierre and Marie Curie University, UPMC-Sorbonne Universities, LIP6, Paris, France 3 State University of Campinas, RECOD Lab, FEEC – UNICAMP, Campinas, Brazil Sandra Avila (UFMG/UPMC) sandra@dcc.ufmg.br June 2013 1 / 56

  2. Image Classification: Why do we care? Sandra Avila (UFMG/UPMC) sandra@dcc.ufmg.br June 2013 2 / 56

  3. Sandra Avila (UFMG/UPMC) sandra@dcc.ufmg.br June 2013 3 / 56

  4. Huge amount of image is available Sandra Avila (UFMG/UPMC) sandra@dcc.ufmg.br June 2013 4 / 56

  5. Why image classification is a hard problem? Sandra Avila (UFMG/UPMC) sandra@dcc.ufmg.br June 2013 5 / 56

  6. Many classes and concepts Sandra Avila (UFMG/UPMC) sandra@dcc.ufmg.br June 2013 6 / 56

  7. Viewpoint changes Illumination variations Occlusion Background clutter Inter-class similarity Intra-class diversity Much diversity in the data Sandra Avila (UFMG/UPMC) sandra@dcc.ufmg.br June 2013 6 / 56

  8. How do we classify images? Sandra Avila (UFMG/UPMC) sandra@dcc.ufmg.br June 2013 7 / 56

  9. Sandra Avila (UFMG/UPMC) sandra@dcc.ufmg.br June 2013 8 / 56

  10. Problem Statement Given an image dataset, how to represent their visual content information for a classification task? Sandra Avila (UFMG/UPMC) sandra@dcc.ufmg.br June 2013 9 / 56

  11. Sandra Avila (UFMG/UPMC) sandra@dcc.ufmg.br June 2013 10 / 56

  12. night scenes sunset scenes young people old people Sandra Avila (UFMG/UPMC) sandra@dcc.ufmg.br June 2013 11 / 56

  13. Bag-of-Visual-Words ( BoW ) [Sivic and Zisserman, 2003; Csurka et al., 2004] Slide credit: Ken Chatfield Sandra Avila (UFMG/UPMC) sandra@dcc.ufmg.br June 2013 12 / 56

  14. Low-level Visual Feature Extraction patch 1  l 1 , 1 . . . l 1 ,N  l 2 , 1 . . . l 2 ,N    . .  . .   . .   l M, 1 . . . l M,N patch M Local feature extraction Patch detection : interest points, dense sampling, . . . Feature extraction : SIFT [Lowe, 2004], SURF [Bay et al., 2008], . . . Sandra Avila (UFMG/UPMC) sandra@dcc.ufmg.br June 2013 13 / 56

  15. Visual Codebook Coding step Visual codebook learning : random, unsupervised (e.g., k -means, GMM), supervised [Perronnin et al., 2006; Goh et al., 2012], . . . Coding : hard-assignment, soft-assignment [van Gemert et al., 2008, 2010], sparse coding [Yang et al., 2009; Boureau et al., 2010], . . . Feature coding based on the vector difference : VLAD [J´ egou et al., 2010], SVC [Zhou et al., 2010], VLAT [Picard et al., 2011], . . . Sandra Avila (UFMG/UPMC) sandra@dcc.ufmg.br June 2013 14 / 56

  16. Pooling step Pooling : sum/average-pooling, max-pooling [Yang et al., 2009], . . . Spatial pooling : spatial pyramid matching [Lazebnik et al., 2006], [Jia et al., 2012], . . . Spatial Pyramid Matching Sandra Avila (UFMG/UPMC) sandra@dcc.ufmg.br June 2013 15 / 56

  17. Other Approaches Biologically-inspired Models Deep Learning Models [Fukushima and Miyake, 1982; LeCun et al., [Hinton and Salakhutdinov, 2006; 1990; Riesenhuber and Poggio, 1999; Serre Ranzato et al., 2007; Bengio, 2009] et al., 2007; Th´ eriault et al., 2012] Sandra Avila (UFMG/UPMC) sandra@dcc.ufmg.br June 2013 16 / 56

  18. BossaNova Representation Sandra Avila (UFMG/UPMC) sandra@dcc.ufmg.br June 2013 17 / 56

  19. Coding & Pooling Matrix Representation ... ... x 1 x j x N   α 1 , 1 . . . α 1 ,j . . . α 1 ,N c 1 . . . . . . . . . . . .     H = α m, 1 . . . α m,j . . . α m,N   c m   . . . . . . . .   . . . .   α M, 1 . . . α M,j . . . α M,N c M Notations : X = { x j } , j ∈ { 1 , . . . , N } : set of local descriptors (e.g., SIFT) C = { c m } , m ∈ { 1 , . . . , M } : visual codebook Sandra Avila (UFMG/UPMC) sandra@dcc.ufmg.br June 2013 18 / 56

  20. Coding & Pooling Matrix Representation ... ... x 1 x j x N   α 1 , 1 . . . α 1 ,j . . . α 1 ,N c 1 . . . . . . . . . . . .     H = α m, 1 . . . α m,j . . . α m,N   c m   . . . . . . . .   . . . .   α M, 1 . . . α M,j . . . α M,N c M ⇓ f : Coding � x j − c k � 2 Coding : x j → f ( x j ) = { α m,j } , α m,j = 1 iff m = arg min 2 k ∈{ 1 ,...,M } Sandra Avila (UFMG/UPMC) sandra@dcc.ufmg.br June 2013 18 / 56

  21. Coding & Pooling Matrix Representation ... ... x 1 x j x N   α 1 , 1 . . . α 1 ,j . . . α 1 ,N c 1 . . . . . . . . . . . .     H = α m, 1 . . . α m,j . . . α m,N ⇒ g : Pooling   c m   . . . . . . . .   . . . .   α M, 1 . . . α M,j . . . α M,N c M � x j − c k � 2 Coding : x j → f ( x j ) = { α m,j } , α m,j = 1 iff m = arg min 2 k ∈{ 1 ,...,M } N � Pooling : g ( { α j } ) = z : ∀ m, z m = α m,j j =1 Sandra Avila (UFMG/UPMC) sandra@dcc.ufmg.br June 2013 18 / 56

  22. Coding & Pooling Matrix Representation ... ... x 1 x j x N     z 1 α 1 , 1 . . . α 1 ,j . . . α 1 ,N c 1 . . . . . . . . . . . . . . .         z = H = α m, 1 . . . α m,j . . . α m,N z m     c m     . . . . . . . . . .     . . . . .     z M α M, 1 . . . α M,j . . . α M,N c M � x j − c k � 2 Coding : x j → f ( x j ) = { α m,j } , α m,j = 1 iff m = arg min 2 k ∈{ 1 ,...,M } N � Pooling : g ( { α j } ) = z : ∀ m, z m = α m,j j =1 BoW representation : z = [ z 1 , z 2 , · · · , z M ] T Sandra Avila (UFMG/UPMC) sandra@dcc.ufmg.br June 2013 18 / 56

  23. Early Ideas We pointed out the weakness in the standard pooling operation used in the BoW signature generation. Instead of averaging all the values from one row in the H matrix, we proposed to describe their distribution. BOSSA representation ( B ag O f S tatistical S ampling A nalysis) introduces our density function-based pooling strategy . Sandra Avila (UFMG/UPMC) sandra@dcc.ufmg.br June 2013 19 / 56

  24. Early Ideas We pointed out the weakness in the standard pooling operation used in the BoW signature generation. Instead of averaging all the values from one row in the H matrix, we proposed to describe their distribution. BOSSA representation ( B ag O f S tatistical S ampling A nalysis) introduces our density function-based pooling strategy . Sandra Avila (UFMG/UPMC) sandra@dcc.ufmg.br June 2013 19 / 56

  25. Early Ideas We pointed out the weakness in the standard pooling operation used in the BoW signature generation. Instead of averaging all the values from one row in the H matrix, we proposed to describe their distribution. BOSSA representation ( B ag O f S tatistical S ampling A nalysis) introduces our density function-based pooling strategy . Sandra Avila (UFMG/UPMC) sandra@dcc.ufmg.br June 2013 19 / 56

  26. Our Pooling Illustration Our Pooling BoW Pooling Sandra Avila (UFMG/UPMC) sandra@dcc.ufmg.br June 2013 20 / 56

  27. Our Pooling Formalism g : ❘ N ❘ B − → α m − → g ( α m ) = z m � b B ; b + 1 � �� z m,b = card x j | α m,j ∈ B b and b + 1 B ≥ α min ≤ α max m m B B denotes the number of bins of each histogram z m , and [ α min m ; α max ] limits the range of distances m Sandra Avila (UFMG/UPMC) sandra@dcc.ufmg.br June 2013 21 / 56

  28. BossaNova Representation Sandra Avila (UFMG/UPMC) sandra@dcc.ufmg.br June 2013 22 / 56

  29. BossaNova Representation ... ... x 1 x j x N   α 1 , 1 . . . α 1 ,j . . . α 1 ,N c 1 . . . . . . . . . . . .   exp − β m d 2 ( x j , c m )   α m,j = α m, 1 . . . α m,j . . . α m,N   c m � K   m ′ =1 exp − β m d 2 ( x j , c m ′ ) . . . . . . . .   . . . .   α M, 1 . . . α M,j . . . α M,N c M Sandra Avila (UFMG/UPMC) sandra@dcc.ufmg.br June 2013 22 / 56

  30. BossaNova Representation Sandra Avila (UFMG/UPMC) sandra@dcc.ufmg.br June 2013 22 / 56

  31. BossaNova Representation Sandra Avila (UFMG/UPMC) sandra@dcc.ufmg.br June 2013 22 / 56

  32. BossaNova Representation   z 1 , st 1 . . .     z m , st m     . .   .   z M , st M Sandra Avila (UFMG/UPMC) sandra@dcc.ufmg.br June 2013 22 / 56

  33. BossaNova Scheme Sandra Avila (UFMG/UPMC) sandra@dcc.ufmg.br June 2013 23 / 56

  34. BossaNova Scheme • SIFT descriptors on a dense spatial grid at multiple scales • Dimensionality reduction by applying PCA (128 → 64) Sandra Avila (UFMG/UPMC) sandra@dcc.ufmg.br June 2013 23 / 56

  35. BossaNova Scheme • k -means algorithm Sandra Avila (UFMG/UPMC) sandra@dcc.ufmg.br June 2013 23 / 56

  36. BossaNova Scheme Sandra Avila (UFMG/UPMC) sandra@dcc.ufmg.br June 2013 23 / 56

  37. BossaNova Scheme • SVM classifiers are applied by using a nonlinear Gauss– ℓ 2 kernel Sandra Avila (UFMG/UPMC) sandra@dcc.ufmg.br June 2013 23 / 56

Recommend


More recommend