Combining text/image in WikipediaMM task 2009 Christophe Moulin, C´ ecile Barat, C´ edric Lemaˆ ıtre, Mathias G´ ery, Christophe Ducottet, Christine Largeron Laboratoire Hubert Curien, Saint-´ Etienne, France October 1st 2009 Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 1 / 16
Outline 1 Model overview Textual vector space model Visual vocabulary Combining text and image modalities 2 Experiments 3 Conclusion and future work Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 2 / 16
Model overview Model overview A textual/visual model based on the bag of words approach bag of words +( 1 − α ) α approach ✞ ☎ ✞ ☎ ✞ ☎ documents indexing combining ✝ ✆ ✝ ✆ ✝ ✆ Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 3 / 16
Model overview Textual vector space model Textual vocabulary creation Main steps of the textual bag of words creation ✄ � ✄ � ✄ � stop words filtering Porter stemming bag of words creation ✂ ✁ ✂ ✁ ✂ ✁ Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 4 / 16
Model overview Textual vector space model Textual vector weighting Salton’s based tf.idf weighting [ 1 ] bag of words vector of tf.idf weights ☛ ✟ [2] w i , j = tf i , j idf j ✡ ✠ tf i , j : representativeness idf j : discrimination power [1]: Salton et al. A vector space model for automatic indexing , 1975 [2]: Robertson et al. Okapi et trec-3 , 1994 Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 5 / 16
Model overview Textual vector space model Exploiting of the text around an image Two sources of text : metadata + extracted text of the original Wikipedia articles metadata of Wikipedia image used in ImageCLEFwiki original Wikipedia article ( n char around the image) Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 6 / 16
Model overview Visual vocabulary Visual representation Similar to the text representation using a visual codebook [ 3 ] Visual vocabulary creation descriptors visual bag of visual descriptors projection vocabulary words Image representation vector of descriptors bag of visual tfidf weights words [3]: Jurie et al. Creating efficient codebooks for visual recognition , 2005 Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 7 / 16
Model overview Visual vocabulary Visual features computation Two different descriptors are used regular partitioning: 16 × 16 cells meanstd (6 dimensions: 9350 visual words) sift 2 (128 dimensions: 9630 visual words) interest regions based on MSER detector sift 1 (128 dimensions: 9303 visual words) Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 8 / 16
Model overview Combining text and image modalities Score matching Distance computed between query and document vectors query documents query document tf tf.idf score 1 score 2 tf.idf tf.idf Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 9 / 16
Model overview Combining text and image modalities Model overview Linear combination of textual and visual scores bag of words +( 1 − α ) α approach α is fixed globally on ImageCLEFwiki 2008 Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 10 / 16
Experiments Global results rank participant/score text image map num ret num rel ret 1 deuceng TXT - 0.2397 43052 1351 5 lahc/score 2 100 char meanstd ( α =0.025) 0.2178 44993 1213 6 lahc/score 2 50 char meanstd ( α =0.025) 0.2148 44993 1218 14 lahc/score 2 metadata sift 2 ( α =0.084) 0.1903 44993 1212 15 lahc/score 2 100 char - 0.1890 38004 1205 16 lahc/score 2 50 char - 0.1880 37041 1198 20 lahc/score 2 metadata meanstd ( α =0.025) 0.1845 44993 1208 21 lahc/score 2 metadata sift 1 ( α =0.012) 0.1807 44995 1200 24 lahc/score 2 metadata meanstd ( α =0.015) 0.1792 44993 1213 33 lahc/score 2 metadata - 0.1667 35611 1192 44 lahc/score 1 metadata - 0.1432 35611 1164 52 lahc/score 2 metadata sift 2 0.0365 619 142 53 lahc/score 2 metadata meanstd 0.0338 574 76 54 lahc/score 2 metadata sift 1 0.0321 637 120 57 sztaki - IMG 0.0068 44993 80 Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 11 / 16
Experiments Textual results 0.7 score 1 (map: 0.1432) score 2 (map: 0.1667) score 2 50 char (map: 0.1880) score 2 100 char (map: 0.1890) 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.2 0.4 0.6 0.8 1 Improvements provided by additional text (15%) Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 12 / 16
Experiments Textual+visual results 0.7 score 2 (map: 0.1667) score 2 sift 1 : α =0.012 (map: 0.1807) score 2 meanstd: α =0.025 (map: 0.1845) score 2 sift 2 : α =0.084 (map: 0.1903) 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.2 0.4 0.6 0.8 1 sift 2 > meanstd > sift 1 Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 13 / 16
Experiments Best results 0.8 score 2 50 char (map: 0.1880) score 2 100 char (map: 0.1890) score 2 50 char + meanstd (map: 0.2148) score 2 100 char + meanstd (map: 0.2178) 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.2 0.4 0.6 0.8 1 Improvements provided by visual information (15%) Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 14 / 16
Conclusion and future work Conclusion Improvement of our last year model It works: Text around the image in original wikipedia articles. (+15%) Addition of visual features (MSER+sift). (color/texture complementarity) Text-Image combination. (+15%) Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 15 / 16
Conclusion and future work Future work Combination with more than one visual descriptor. Other fusion method. Learn α for each query. Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 16 / 16
Recommend
More recommend