spatial and temporal representations for multi modal
play

Spatial and Temporal representations for Multi-Modal Visual - PowerPoint PPT Presentation

Spatial and Temporal representations for Multi-Modal Visual Retrieval 17th December 2018 Noa Garcia Docampo PhD Candidate, Aston University Introduction Million of images created every day... Million of images created every day... Problem :


  1. Spatial and Temporal representations for Multi-Modal Visual Retrieval 17th December 2018 Noa Garcia Docampo PhD Candidate, Aston University

  2. Introduction Million of images created every day... Million of images created every day... Problem : How to find images in large Problem : How to find images in large collections? collections?

  3. Introduction Million of images created every day... Million of images created every day... Problem : How to find images in large Problem : How to find images in large collections? collections? Solution : Visual Retrieval! Image Retrieval exists from the 90s ● Many types of visual retrieval ●

  4. Introduction Million of images created every day... Million of images created every day... Problem : How to find images in large Problem : How to find images in large collections? collections? Solution : Visual Retrieval! Image Retrieval exists from the 90s ● Many types of visual retrieval ●

  5. Introduction We classify visual retrieval into 3 main types, depending on the query object and the dataset content:

  6. Introduction We classify visual retrieval into 3 main types, depending on the query object and the dataset content:

  7. Structure Introduction and Background S y m m e t r i c V R i s e u t r a i l e v a l Asymmetric Visual Retrieval Cross-Modal Retrieval Conclusions and Final Remarks

  8. Structure Introduction and Background S y m m e t r i c V R i s e u t r a i l e v a l Asymmetric Visual Retrieval Cross-Modal Retrieval Conclusions and Final Remarks

  9. Contributions CNNs for non-metric visual similarity ● Symmetric Pushing performance on standard CBIR datasets Visual Retrieval ● MoviesDB: image-to-video retrieval dataset ● Binary descriptors for local aggregation of video features ● Asymmetric Spatio-temporal encoders for global aggregation of video features Visual Retrieval ● Item video retrieval application ● SemArt: semantic art understanding dataset ● Cross-Modal Cross-modal retrieval for semantic art understanding Retrieval ●

  10. Introduction and Background S y m m e t r i c V R i s e u t r a i l e v a l

  11. Symmetric Visual Retrieval Standard CBIR system

  12. Symmetric Visual Retrieval Drawbacks of metric distances Do not consider data distribution ● Standard CBIR system

  13. Symmetric Visual Retrieval Drawbacks of metric distances Do not consider data distribution ● Metric distance constraints: ● ○ ○ ○ ○ Standard CBIR system

  14. Symmetric Visual Retrieval Drawbacks of metric distances Do not consider data distribution ● Metric distance constraints: ● Standard CBIR system

  15. Symmetric Visual Retrieval Drawbacks of metric distances Do not consider data distribution ● Metric distance constraints: ● Standard CBIR system

  16. Symmetric Visual Retrieval Standard CBIR system Proposed CBIR system Garcia & Vogiatzis (2018). Learning Non-Metric Visual Similarity for Image Retrieval. Under review at IMAVIS journal

  17. Similarity Networks

  18. Symmetric Visual Retrieval Off-the-shelf methods Garcia & Vogiatzis (2018). Learning Non-Metric Visual Similarity for Image Retrieval. Under review at IMAVIS journal

  19. Symmetric Visual Retrieval Off-the-shelf methods Garcia & Vogiatzis (2018). Learning Non-Metric Visual Similarity for Image Retrieval. Under review at IMAVIS journal

  20. Symmetric Visual Retrieval Off-the-shelf methods Fine-tuned methods Garcia & Vogiatzis (2018). Learning Non-Metric Visual Similarity for Image Retrieval. Under review at IMAVIS journal

  21. Symmetric Visual Retrieval Off-the-shelf methods Fine-tuned methods Garcia & Vogiatzis (2018). Learning Non-Metric Visual Similarity for Image Retrieval. Under review at IMAVIS journal

  22. Contributions CNNs for non-metric visual similarity ● Symmetric Pushing performance on standard CBIR datasets ● Visual Retrieval

  23. Introduction and Background S y m m e t r i c V R i s e u t r a i l e v a l Asymmetric Visual Retrieval

  24. Asymmetric Visual Retrieval Garcia & Vogiatzis (2018). Temporal Aggregation of Visual Features for Large-Scale Image-to-Video Retrieval. In: ICMR 2018

  25. Asymmetric Visual Retrieval Garcia & Vogiatzis (2018). Temporal Aggregation of Visual Features for Large-Scale Image-to-Video Retrieval. In: ICMR 2018

  26. Asymmetric Visual Retrieval Chapter 5 Chapter 6 No temporal aggregation Garcia & Vogiatzis (2018). Temporal Aggregation of Visual Features for Large-Scale Image-to-Video Retrieval. In: ICMR 2018

  27. Asymmetric Visual Retrieval Chapter 5 Chapter 6 No temporal aggregation Garcia & Vogiatzis (2018). Temporal Aggregation of Visual Features for Large-Scale Image-to-Video Retrieval. In: ICMR 2018

  28. Asymmetric Visual Retrieval Temporal Local Aggregation Feature Indexing Garcia & Vogiatzis (2018). Dress like a Star: Retrieving Fashion Products from Videos. In: CVF workshop ICCV 2017

  29. Asymmetric Visual Retrieval Temporal Local Aggregation Search and Retrieval Garcia & Vogiatzis (2018). Dress like a Star: Retrieving Fashion Products from Videos. In: CVF workshop ICCV 2017

  30. Asymmetric Visual Retrieval Chapter 5 Chapter 6 No temporal aggregation

  31. Asymmetric Visual Retrieval Spatio-Temporal Global Aggregation Garcia & Vogiatzis (2018). Asymmetric Spatio-Temporal Embeddings for Large-Scale Image-to-Video Retrieval. In: BMVC 2018

  32. Asymmetric Visual Retrieval Spatio-Temporal Global Aggregation Chapter 6 Temporal Local Aggregation Chapter 5 ● High accuracy ● Global aggregation state-of-the-art accuracy ● High compression rates ● High compression rates ● Multiple searches per query ● Single search per query

  33. Contributions CNNs for non-metric visual similarity ● Symmetric Pushing performance on standard CBIR datasets ● Visual Retrieval MoviesDB: image-to-video retrieval dataset ● Binary descriptors for local aggregation of video features ● Asymmetric Spatio-temporal encoders for global aggregation of video features Visual Retrieval ● Item video retrieval application ●

  34. Introduction and Background S y m m e t r i c V R i s e u t r a i l e v a l Asymmetric Visual Retrieval Cross-Modal Retrieval

  35. Cross-Modal Retrieval Retrieve paintings from artistic comments Artistic Comments: ● Not only descriptions of the content but also ○ about the author, context, techniques, etc. Fine-art paintings: ● ○ Figurative representations Garcia & Vogiatzis (2018). How to Read Paintings: Semantic Art Understanding with Multi-Modal Retrieval. In: VISART workshop ECCV 2018

  36. Cross-Modal Retrieval Visual Encoding (images): VGG16, ResNet , RMAC ● Text Encoding (comments and titles): BOW , MLP, RNN ● Cross-Modal Transformation: CCA, Cosine Margin Loss , Augmented with Metadata ● Garcia & Vogiatzis (2018). How to Read Paintings: Semantic Art Understanding with Multi-Modal Retrieval. In: VISART workshop ECCV 2018

  37. Cross-Modal Retrieval Same type images Random images Human Comparison: Easy Set Human Comparison: Difficult Set Garcia & Vogiatzis (2018). How to Read Paintings: Semantic Art Understanding with Multi-Modal Retrieval. In: VISART workshop ECCV 2018

  38. Contributions CNNs for non-metric visual similarity ● Symmetric Pushing performance on standard CBIR datasets ● Visual Retrieval MoviesDB: image-to-video retrieval dataset ● Binary descriptors for local aggregation of video features ● Asymmetric Spatio-temporal encoders for global aggregation of video features Visual Retrieval ● Item video retrieval application ● SemArt: semantic art understanding dataset ● Cross-Modal Cross-modal retrieval for semantic art understanding Retrieval ●

  39. Introduction and Background S y m m e t r i c V R i s e u t r a i l e v a l Asymmetric Visual Retrieval Cross-Modal Retrieval Conclusions and Final Remarks

  40. Future Work Symmetric Similarity networks for other retrieval tasks ● Visual Retrieval Temporal aggregation at the scene level ● Asymmetric Visual Retrieval Asymmetric techniques for video-to-image retrieval ● Style and content detector for cross-modal retrieval in art ● Cross-Modal Retrieval SemArt dataset for alternative tasks ●

  41. Q&A

  42. Introduction and Background S y m m e t r i c V R i s e u t r a i l e v a l

  43. Content-Based Image Retrieval

  44. Similarity Networks Input : Concatenation of feature vectors ● Architecture : Fully connected layers with ReLU ● Output : Similarity score ● Loss Function Network Output

  45. Similarity Networks Input : Concatenation of feature vectors ● Architecture : Fully connected layers with ReLU ● Output : Similarity score ● Loss Function Pair Label

  46. Similarity Networks Input : Concatenation of feature vectors ● Architecture : Fully connected layers with ReLU ● Output : Similarity score ● Loss Function Margin

  47. Similarity Networks Input : Concatenation of feature vectors ● Architecture : Fully connected layers with ReLU ● Output : Similarity score ● Loss Function Standard Similarity

Recommend


More recommend