Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models Jiuxiang Gu Jianfei Cai Shafiq Joty Li Niu Gang Wang
Goal Text-to-Image Retrieval Image-to-Text Retrieval A young man doing a skateboard trick while others watch Bright room with a couch and A man doing a skate trick during a various different competition event with a audience dressers Guys on a course made for skate boarding A group of people doing skateboarding tricks on a car … A boy riding on his skateboard at a skate park while other guys watch …
Classical Pipeline 𝑗 𝑑 Image Feature Text Feature Bright room 𝑤 " 𝑢 " with a couch Similarity … and various … different dressers Image Encoder Text Encoder
Motivation: Look è Imagine è Match Text-to-Image Retrieval Image-to-Text Retrieval Global Global Similarity Similarity 𝑗 𝑤 𝑢 𝑑 𝑗 𝑤 𝑢 𝑑 Local Local Similarity Similarity Imagine 𝚥̂ Imagine 𝑑̂
Look è Imagine
Match
Look è Imagine
Match
Proposed Approach
Cross-Modal Retrieval with Generative Learning
Cross-Modal Retrieval with Generative Learning
Results
Results (Classical Pipeline)
Results (Ours)
• Additional details At the Poster: • Quantitative results • Discussion
Recommend
More recommend