axes trecvid med 2013
play

AXES @ TRECVid MED 2013 Matthijs Douze 1 , Zaid Harchaoui 1 , Dan - PowerPoint PPT Presentation

AXES @ TRECVid MED 2013 Matthijs Douze 1 , Zaid Harchaoui 1 , Dan Oneat a 1 , Danila Potapov 1 , J ome Revaud 1 , Cordelia Schmid 1 , er Jochen Schwenninger 2 , Jakob Verbeek 1 , Heng Wang 1 1 INRIALEAR, Grenoble, France 2 Fraunhofer


  1. AXES @ TRECVid MED 2013 Matthijs Douze 1 , Zaid Harchaoui 1 , Dan Oneat a 1 , ¸˘ Danila Potapov 1 , J´ ome Revaud 1 , Cordelia Schmid 1 , erˆ Jochen Schwenninger 2 , Jakob Verbeek 1 , Heng Wang 1 1 INRIA–LEAR, Grenoble, France 2 Fraunhofer Sankt Augustin, Germany 1 / 16

  2. Outline Low-level features High-level features Improved SIFT Color MFCC OCR ASR trajectories Encoding Spatial Spatial Fisher Fisher Bag-of- Bag-of- Fisher Fisher vector vector words words vector vector Classifier Classifier Classifier Classifier Classifier Classifier + Classification 2 / 16

  3. Outline Low-level features High-level features Improved SIFT Color MFCC OCR ASR trajectories Encoding Spatial Spatial Fisher Fisher Bag-of- Bag-of- Fisher Fisher vector vector words words vector vector Classifier Classifier Classifier Classifier Classifier Classifier + Classification 2 / 16

  4. Outline Low-level features High-level features Improved SIFT Color MFCC OCR ASR trajectories Encoding Spatial Spatial Fisher Fisher Bag-of- Bag-of- Fisher Fisher vector vector words words vector vector Classifier Classifier Classifier Classifier Classifier Classifier + Classification 2 / 16

  5. Outline Low-level features High-level features Improved SIFT Color MFCC OCR ASR trajectories Encoding Spatial Spatial Fisher Fisher Bag-of- Bag-of- Fisher Fisher vector vector words words vector vector Classifier Classifier Classifier Classifier Classifier Classifier + Classification 2 / 16

  6. Outline Low-level features High-level features Improved SIFT Color MFCC OCR ASR trajectories Encoding Spatial Spatial Fisher Fisher Bag-of- Bag-of- Fisher Fisher vector vector words words vector vector Classifier Classifier Classifier Classifier Classifier Classifier + Classification 2 / 16

  7. Outline Low-level features High-level features Improved SIFT Color MFCC OCR ASR trajectories Encoding Spatial Spatial Fisher Fisher Bag-of- Bag-of- Fisher Fisher vector vector words words vector vector Classifier Classifier Classifier Classifier Classifier Classifier + Classification 2 / 16

  8. Table of Contents Low-level features: static, motion, audio 1 Feature encoding: Fisher vector 2 High-level features 3 Experiments and results 4 3 / 16

  9. Static and audio features Scale-invariant feature transform (SIFT, Lowe 2004 ) Mel-frequency cepstral coefficients (MFCC, Rabiner and Schafer 2007 ) 4 / 16

  10. Static and audio features Scale-invariant feature transform (SIFT, Lowe 2004 ) Mel-frequency cepstral coefficients (MFCC, Rabiner and Schafer 2007 ) Color descriptors (Clinchant et al., 2007) . µ, σ µ, σ µ, σ µ, σ Mean and variance. . . 2 . . . of RGB values. . . 3 . . . in 4 × 4 cells 16 Descriptor dimensionality 96 4 / 16

  11. Improved motion features (Wang and Schmid, ICCV, 2013) Builds upon dense trajectory features ( ? , CVPR, ? ) Tracking in each spatial scale separately Trajectory description Dense sampling in each spatial scale HOG HOF MBH 5 / 16

  12. Improved motion features (Wang and Schmid, ICCV, 2013) Builds upon dense trajectory features ( ? , CVPR, ? ) Dense trajectories can be affected by camera motion. Tracking in each spatial scale separately Trajectory description Dense sampling in each spatial scale HOG HOF MBH 5 / 16

  13. Improved motion features (Wang and Schmid, ICCV, 2013) Idea: stabilize camera motion before computing optical flow. 6 / 16

  14. Improved motion features (Wang and Schmid, ICCV, 2013) Idea: stabilize camera motion before computing optical flow. Method: extract feature points (SURF descriptors and dense optical flow) 1 match feature points and estimate homography with RANSAC 2 warp the optical flow. 3 6 / 16

  15. Improved motion features (Wang and Schmid, ICCV, 2013) Idea: stabilize camera motion before computing optical flow. Two succesive frames 7 / 16

  16. Improved motion features (Wang and Schmid, ICCV, 2013) Idea: stabilize camera motion before computing optical flow. Two succesive frames Optical flow 7 / 16

  17. Improved motion features (Wang and Schmid, ICCV, 2013) Idea: stabilize camera motion before computing optical flow. improves flow estimation Two succesive frames Optical flow Warped optical flow 7 / 16

  18. Improved motion features (Wang and Schmid, ICCV, 2013) Idea: stabilize camera motion before computing optical flow. improves flow estimation removes background tracks. Two succesive frames Optical flow Warped optical flow Removed trajectories 7 / 16

  19. Removed trajectories under various camera motions 8 / 16

  20. Table of Contents Low-level features: static, motion, audio 1 Feature encoding: Fisher vector 2 High-level features 3 Experiments and results 4 9 / 16

  21. Fisher vector for appearance Generalization of the bag-of-words. Strong performance across multiple tasks: action recognition, action detection, event recognition (Oneat ¸˘ a et al., ICCV, 2013) 10 / 16

  22. Fisher vector for appearance Generalization of the bag-of-words. Strong performance across multiple tasks: action recognition, action detection, event recognition (Oneat ¸˘ a et al., ICCV, 2013) image classification (Chatfield et al., BMVC, 2011) image retrieval (J´ egou et al., PAMI, 2012) fine-grained image classification (Gavves et al., ICCV, 2013) face verification (Simonyan et al., BMVC, 2013) word spotting (Almaz´ an et al., ICCV, 2013) . 10 / 16

  23. Fisher vector for location Spatial Fisher vector (SFV) (Krapac et al., ICCV, 2011) encodes first and second moments of visual word locations adds 6 entries for each visual word: µ and σ for ( x, y, t ) coordinates. Schematic illustration of the spatial Fisher vector for three types of visual words ( , , ) in an image. 11 / 16

  24. Fisher vector for location Spatial Fisher vector (SFV) (Krapac et al., ICCV, 2011) encodes first and second moments of visual word locations adds 6 entries for each visual word: µ and σ for ( x, y, t ) coordinates. Schematic illustration of the spatial Fisher vector for three types of visual words ( , , ) in an image. 11 / 16

  25. Fisher vector for location Spatial Fisher vector (SFV) (Krapac et al., ICCV, 2011) encodes first and second moments of visual word locations adds 6 entries for each visual word: µ and σ for ( x, y, t ) coordinates. Compared to spatial pyramids: (Oneat ¸˘ a et al., ICCV, 2013) Schematic illustration of the similar performance gain spatial Fisher vector for three types of visual words ( , , ) in an image. 11 / 16

  26. Fisher vector for location Spatial Fisher vector (SFV) (Krapac et al., ICCV, 2011) encodes first and second moments of visual word locations adds 6 entries for each visual word: µ and σ for ( x, y, t ) coordinates. Compared to spatial pyramids: (Oneat ¸˘ a et al., ICCV, 2013) Schematic illustration of the similar performance gain spatial Fisher vector for three SFV are more compact types of visual words ( , , ) in an image. 11 / 16

  27. Fisher vector for location Spatial Fisher vector (SFV) (Krapac et al., ICCV, 2011) encodes first and second moments of visual word locations adds 6 entries for each visual word: µ and σ for ( x, y, t ) coordinates. Compared to spatial pyramids: (Oneat ¸˘ a et al., ICCV, 2013) Schematic illustration of the similar performance gain spatial Fisher vector for three SFV are more compact types of visual words ( , , ) in complementary. an image. 11 / 16

  28. Table of Contents Low-level features: static, motion, audio 1 Feature encoding: Fisher vector 2 High-level features 3 Experiments and results 4 12 / 16

  29. High-level features: OCR and ASR Optical character recognition (OCR) Automatic speech recognition (ASR) (from Fraunhofer IAIS) trained on 100 hours of English broadcasts language model trained on news articles and patents For both systems: bag-of-words encoding with 110 , 000 words. tf-idf weighting ℓ 2 normalization. 13 / 16

  30. Low-level features High-level features Improved SIFT Color MFCC OCR ASR trajectories Encoding Spatial Spatial Fisher Fisher Bag-of- Bag-of- Fisher Fisher vector vector words words vector vector Classifier Classifier Classifier Classifier Classifier Classifier + Classification 14 / 16

  31. Table of Contents Low-level features: static, motion, audio 1 Feature encoding: Fisher vector 2 High-level features 3 Experiments and results 4 15 / 16

  32. Initial experiments on TRECVid ’11 subset 16 / 16

  33. Initial experiments on TRECVid ’11 subset Spatial Fisher vectors improve for color and SIFT. 16 / 16

  34. Initial experiments on TRECVid ’11 subset Spatial Fisher vectors improve for color and SIFT. Comparison of the motion features (HOG, HOF, MBH): 16 / 16

  35. Initial experiments on TRECVid ’11 subset Spatial Fisher vectors improve for color and SIFT. Comparison of the motion features (HOG, HOF, MBH): MBH > HOG > HOF 16 / 16

  36. Initial experiments on TRECVid ’11 subset Spatial Fisher vectors improve for color and SIFT. Comparison of the motion features (HOG, HOF, MBH): MBH > HOG > HOF HOG+MBH > HOF+MBH 16 / 16

Recommend


More recommend