on the use of semantic features for the semantic indexing
play

On the use of semantic features for the semantic indexing task - PowerPoint PPT Presentation

On the use of semantic features for the semantic indexing task Bahjat Safadi, Nadia Derbas, Abdelkader Hamadi, Mateusz Budnik, Philippe Mulhem and Georges Qunot UJF-LIG and many other people from the IRIM group of GDR 720 ISIS 10 November


  1. On the use of semantic features for the semantic indexing task Bahjat Safadi, Nadia Derbas, Abdelkader Hamadi, Mateusz Budnik, Philippe Mulhem and Georges Quénot UJF-LIG and many other people from the IRIM group of GDR 720 ISIS 10 November 2014 1

  2. Outline • System overview • Semantic features • Contrast experiments • Engineered versus learned features • Conclusion 2

  3. Mean InfAP. 0,05 0,15 0,25 0,35 0,1 0,2 0,3 0 D_MediaMill.14_1 D_MediaMill.14_2 D_MediaMill.14_3 A_MediaMill.14_4 Main runs scores 2014 (from NIST) D_PicSOM.14_1 D_PicSOM.14_3 D_TokyoTech-Waseda.14_1 D_TokyoTech-Waseda.14_2 D_PicSOM.14_2 D_LIG.14_3 D_LIG.14_4 A_TokyoTech-Waseda.14_3 A_TokyoTech-Waseda.14_4 D_LIG.14_2 D_IRIM.14_2 D_IRIM.14_1 D_LIG.14_1 D_IRIM.14_4 A_CMU.14_1 D_IRIM.14_3 * non-LIG submitted runs in 2013 against 2014 testing data (progress runs) * LIG submitted runs in 2014 against 2014 testing data (main runs) * LIG (Quaero) submitted runs in 2013 against 2014 testing data (progress runs) A_LIG.13_3 A_LIG.13_1 A_CMU.14_3 A_IRIM.13_1 A_VideoSense.13_4 D_OrangeBJ.14_4 A_CMU.14_2 A_CMU.14_4 D_VIREO.14_2 D_EURECOM.14_1 A_axes.inria.lear.13_8 A_axes.inria.lear.13_5 A_axes.inria.lear.13_2 A_ITI_CERTH.14_1 A_IRIM.13_2 D_OrangeBJ.14_2 A_ITI_CERTH.14_2 D_EURECOM.14_2 D_VIREO.14_1 A_PicSOM.14_4 A_ITI_CERTH.14_3 A_OrangeBJ.14_1 D_UEC.14_1 D_CRCV_UCF.14_3 A_NII.13_1 A_EURECOM.14_3 A_NII.13_2 D_CRCV_UCF.14_2 D_UEC.14_2 D_CRCV_UCF.14_1 D_OrangeBJ.14_3 A_EURECOM.14_4 A_ITI_CERTH.13_6 A_ITI_CERTH.13_5 D_CRCV_UCF.14_4 A_UEC.14_3 A_NHKSTRL.13_3 A_insightdcu.13_1 A_ITI_CERTH.14_4 E_insightdcu.14_1 A_UEC.13_2 E_insightdcu.14_2 A_insightdcu.14_1 A_HFUT.13_2 A_EURECOM.13_1 A_EURECOM.13_2 E_CMU.14_1 E_CMU.14_2 Median = 0.206 A_PKUSZ_ELMT.14_2 A_PKUSZ_ELMT.14_1 A_FIU_UM.14_4 3 A_FIU_UM.14_3

  4. Basic classification pipeline Text Audio Image Descriptor extraction Classification Late fusion Classification score 4

  5. Springer 2014] + hierarchical fusion [Strat et al., ECCV/IFCVCR workshop 2012, LIG/Quaero/IRIM classification pipeline Text Audio Image Descriptor extraction Classification Descriptors and classifier variants fusion Higher level hierarchical fusion Classification score 5

  6. LIG/Quaero/IRIM classification pipeline Descriptors and classifier Text Audio Image Re-ranking (re-scoring) Descriptor extraction Classification score hierarchical fusion variants fusion Classification Higher level + Temporal re-ranking [Safadi et al., CIKM 2011; Wang et al, TV 2009]: update shot scores considering other shots’ scores for a same concept 6

  7. transformations of PCA-based dimensionality reduction and pre- and post- power + Descriptor optimization [Safadi et al., MTAP 2014]: combination LIG/Quaero/IRIM classification pipeline Text Audio Image Descriptor extraction Descriptor transformation Classification Descriptors and classifier variants fusion Higher level hierarchical fusion Re-ranking (re-scoring) Classification score 7

  8. + conceptual feedback [Hamadi et al., MTAP, 2014] LIG/Quaero/IRIM classification pipeline Text Audio Image Descriptor extraction Descriptor transformation Conceptual feedback Classification Descriptors and classifier variants fusion Higher level hierarchical fusion Re-ranking (re-scoring) Classification score 8

  9. scores considering other concepts’ scores for a same shot + conceptual re-ranking [Hamadi et al., MTAP, 2014] update concept LIG/Quaero/IRIM classification pipeline Text Audio Image Descriptor extraction Descriptor transformation Conceptual feedback Classification Descriptors and classifier variants fusion Higher level hierarchical fusion Re-ranking (re-scoring) Classification score 9

  10. + semantic descriptors [TRECVid 2013 and 2014] LIG/Quaero/IRIM classification pipeline Text Audio Image Descriptor extraction Descriptor transformation Conceptual feedback Classification Descriptors and classifier variants fusion Higher level hierarchical fusion Re-ranking (re-scoring) Classification score 10

  11. Text Audio Image Descriptor extraction Conceptual feedback: unfolded graph Descriptor transformation Classification Descriptors and classifier variants fusion Higher level hierarchical fusion Re-ranking (re-scoring) Descriptor transformation Classification Descriptors and classifier variants fusion Higher level hierarchical fusion Re-ranking (re-scoring) Score iteration 0 Score iteration 1 (feedback) (original) 11

  12. components Conceptual feedback: semantic descriptor (computed only once) shared Image Image Audio Audio Text Text Descriptor extraction Descriptor extraction Descriptor transformation Descriptor transformation Classification Classification Descriptors and classifier Descriptors and classifier variants fusion variants fusion Higher level Higher level hierarchical fusion hierarchical fusion Re-ranking (re-scoring) Re-ranking (re-scoring) iteration 1 Classification score iteration 0 Classification score semantic descriptor standard descriptor extraction processing 12

  13. Semantic descriptor: general case Any classification system using Image any approach trained on Semantic descriptor Audio any annotated data for semantic descriptor any target concept set Text extraction standard descriptor processing Descriptors and classifier Descriptor transformation Re-ranking (re-scoring) Image Descriptor extraction hierarchical fusion variants fusion Classification Higher level Classification score Audio Text Model vectors [Smith et al. ICME 2003] 13

  14. Semantic descriptors trained on ImageNet • Fisher Vector based descriptor [Perronnin, IJCV 2013] : - XEROX/ilsvrc2010: vectors of 1000 scores trained on ILSVRC10 and applied to key frames, kindly produced by Florent Perronnin from Xerox (XRCE) - XEROX/imagenet10174: same with10274 concepts scores trained ImageNet • Deep learning based descriptors, computed by Eurecom and LIG using Berkeley caffe tool [Jia et al, 2013]: - EUR/caffe1000: vectors of 1000 scores trained on ILSVRC12 and applied to key frames, fusing outputs for 10 variants of each input image - LIG/caffe1000b: same with a different version of the tool and using only one variant of each input image 14

  15. “Quasi-semantic” descriptors from deep learning and ImageNet [Krizhevsky et al., 2012] • 7 hidden layers, 650K units, 630 M connections, 60M parameters • GPU implementation (50× speed-up over CPU) • Trained on two GPUs for a week b1000 fc7 fc6 fc5 15

  16. “Quasi-semantic” descriptors from deep learning and ImageNet • Deep learning based descriptors, computed by LIG using Berkeley caffe tool [Jia et al, 2013]: - LIG/caffe_fc7b_4096: 4096 values of the last hidden layer (non convolutional) - LIG/caffe_fc6b_4096: 4096 values of the last but one hidden layers (non convolutional) - LIG/caffe_fc5b_43264: 43264 values of the last but two hidden layers (convolutional, 13×13×256) • Not strictly semantic as not classification scores, close to the semantic level however • Expected to perform better than the last layer: - No (or les) information loss due to the targeting of different and/or unrelated target concepts 16

  17. Local semantic descriptors trained on TRECVid 2003 • Scores for 15 TRECVid 2003 concepts (sky, building, water, greenery ...) on image patches trained using local annotations [Ayache et al., IVP 2007] - LIG/percepts*: computed at various resolutions in a pyramidal way, aggregated by concatenation - Computed using local color and texture descriptors • No longer state of the art 17

  18. Experiments • Use of SIN 2013 development data only (no tuning on SIN 2013 test data) and various components using ImageNet annotated data → D type submissions • Evaluation on SIN 2013 and 2014 test data • Use of a combination of kNN and MSVM for classification [Safadi, RIAO 2010] • Use of uploader information: multiplicative factor at the video level, weighted at 10%, provided by Eurecom [Niaz, TV 2012] 18

  19. Performance of “low-level” descriptors LIRIS OC-LBP MAP 2013 LIG opponent SIFT MAP 2014 CEALIST pyramidal bag of SIFT ETIS color (lab BoW) and texture (wavelets) LISTIC SIFT with retina masking ETIS VLAT (vector of locally aggregated tensors) 13 Low-level "engineered" descriptor 0 0,1 0,2 0,3 19

  20. Performance of semantic descriptors 13 "low-level" "engineered" descriptors Xerox semantic features ILSVRC 1000 Xerox semantic features ImageNet 10174 Xerox semantic features (fused) Caffe semantic features LIG MAP 2013 Caffe semantic feature Eurecom MAP 2014 Caffe semantic features output layer (fused) Caffe quasi-semantic hidden layer 5 (43264) Caffe quasi-semantic hidden layer 6 (4096) Caffe quasi-semantic hidden layer 7 (4096) Caffe semantic features last two hidden layers… LIG/concepts first iteration (includes Xerox) LIG/concepts second iteration (includes Xerox) 0 0,1 0,2 0,3 20

Recommend


More recommend