phrase based image captioning
play

Phrase-based Image Captioning Rmi Lebret , Pedro O. Pinheiro, Ronan - PowerPoint PPT Presentation

Phrase-based Image Captioning Rmi Lebret , Pedro O. Pinheiro, Ronan Collobert Idiap Research Institute / EPFL ICML, 9 July 2015 Image Captioning Objective: Generate descriptive sentences given a sample image. A man is grinding a ramp on


  1. Phrase-based Image Captioning Rémi Lebret , Pedro O. Pinheiro, Ronan Collobert Idiap Research Institute / EPFL ICML, 9 July 2015

  2. Image Captioning ◮ Objective: Generate descriptive sentences given a sample image. A man is grinding a ramp on Model a skateboard. Rémi Lebret (Idiap Research Institute / EPFL) Phrase-based Image Captioning ICML 2015 2 / 18

  3. Related Works ◮ Recent models based on Deep CNN + RNN [Vinyals et al. , Karpathy & Fei-Fei, Mao et al. , Donahue et al. ]. A man is grinding a ramp on a skateboard. Rémi Lebret (Idiap Research Institute / EPFL) Phrase-based Image Captioning ICML 2015 3 / 18

  4. Related Works ◮ Recent models based on Deep CNN + RNN [Vinyals et al. , Karpathy & Fei-Fei, Mao et al. , Donahue et al. ]. A man is grinding a ramp on a skateboard. Visual features with Deep CNN Rémi Lebret (Idiap Research Institute / EPFL) Phrase-based Image Captioning ICML 2015 3 / 18

  5. Related Works ◮ Recent models based on Deep CNN + RNN [Vinyals et al. , Karpathy & Fei-Fei, Mao et al. , Donahue et al. ]. A man is grinding a ramp on a skateboard. Sentence generation with RNN ( e.g. LSTM) Rémi Lebret (Idiap Research Institute / EPFL) Phrase-based Image Captioning ICML 2015 3 / 18

  6. Related Works ◮ Recent models based on Deep CNN + RNN [Vinyals et al. , Karpathy & Fei-Fei, Mao et al. , Donahue et al. ]. A man is grinding a ramp on a skateboard. Can similar performance be achieved with a simpler model? Rémi Lebret (Idiap Research Institute / EPFL) Phrase-based Image Captioning ICML 2015 3 / 18

  7. Syntax Analysis of Image Descriptions A given image i ∈ I Ground-truth descriptions s ∈ S : a man riding a skateboard up the side of a wooden ramp a man is grinding a ramp on a skateboard man riding on edge of an oval ramp with a skate board a man in a helmet skateboarding before an audience a man on a skateboard is doing a trick Rémi Lebret (Idiap Research Institute / EPFL) Phrase-based Image Captioning ICML 2015 4 / 18

  8. Syntax Analysis of Image Descriptions A given image i ∈ I Ground-truth descriptions s ∈ S : a man riding a skateboard up the side of a wooden ramp � �� � � �� � � �� � ���� � �� � ���� � �� � NP VP NP PP NP PP NP a man is grinding a ramp on a skateboard man riding on edge of an oval ramp with a skate board a man in a helmet skateboarding before an audience a man on a skateboard is doing a trick → Chunking approach to identify the sentence constituents. Rémi Lebret (Idiap Research Institute / EPFL) Phrase-based Image Captioning ICML 2015 4 / 18

  9. Syntax Analysis of Image Descriptions A given image i ∈ I Ground-truth descriptions s ∈ S : a man riding a skateboard up the side of a wooden ramp � �� � � �� � � �� � ���� � �� � ���� � �� � NP VP NP PP NP PP NP a man is grinding a ramp on a skateboard � �� � � �� � � �� � ���� � �� � NP VP NP PP NP man riding on edge of an oval ramp with a skate board a man in a helmet skateboarding before an audience a man on a skateboard is doing a trick → Chunking approach to identify the sentence constituents. Rémi Lebret (Idiap Research Institute / EPFL) Phrase-based Image Captioning ICML 2015 4 / 18

  10. Syntax Analysis of Image Descriptions A given image i ∈ I Ground-truth descriptions s ∈ S : a man riding a skateboard up the side of a wooden ramp � �� � � �� � � �� � ���� � �� � ���� � �� � NP VP NP PP NP PP NP a man is grinding a ramp on a skateboard � �� � � �� � � �� � ���� � �� � NP VP NP PP NP man riding on edge of an oval ramp with a skate board ���� � �� � ���� ���� � �� � ���� � �� � NP VP NP PP NP PP NP a man in a helmet skateboarding before an audience a man on a skateboard is doing a trick → Chunking approach to identify the sentence constituents. Rémi Lebret (Idiap Research Institute / EPFL) Phrase-based Image Captioning ICML 2015 4 / 18

  11. Syntax Analysis of Image Descriptions A given image i ∈ I Ground-truth descriptions s ∈ S : a man riding a skateboard up the side of a wooden ramp � �� � � �� � � �� � ���� � �� � ���� � �� � NP VP NP PP NP PP NP a man is grinding a ramp on a skateboard � �� � � �� � � �� � ���� � �� � NP VP NP PP NP man riding on edge of an oval ramp with a skate board ���� � �� � ���� ���� � �� � ���� � �� � NP VP NP PP NP PP NP a man in a helmet skateboarding before an audience � �� � ���� � �� � � �� � � �� � NP PP NP PP NP a man on a skateboard is doing a trick → Chunking approach to identify the sentence constituents. Rémi Lebret (Idiap Research Institute / EPFL) Phrase-based Image Captioning ICML 2015 4 / 18

  12. Syntax Analysis of Image Descriptions A given image i ∈ I Ground-truth descriptions s ∈ S : a man riding a skateboard up the side of a wooden ramp � �� � � �� � � �� � ���� � �� � ���� � �� � NP VP NP PP NP PP NP a man is grinding a ramp on a skateboard � �� � � �� � � �� � ���� � �� � NP VP NP PP NP man riding on edge of an oval ramp with a skate board ���� � �� � ���� ���� � �� � ���� � �� � NP VP NP PP NP PP NP a man in a helmet skateboarding before an audience � �� � ���� � �� � � �� � � �� � NP PP NP PP NP a man on a skateboard is doing a trick � �� � ���� � �� � � �� � � �� � NP PP NP VP NP → Chunking approach to identify the sentence constituents. Rémi Lebret (Idiap Research Institute / EPFL) Phrase-based Image Captioning ICML 2015 4 / 18

  13. Syntax Analysis of Image Descriptions A given image i ∈ I Ground-truth descriptions s ∈ S : a man riding a skateboard up the side of a wooden ramp � �� � � �� � � �� � ���� � �� � ���� � �� � NP VP NP PP NP PP NP a man is grinding a ramp on a skateboard � �� � � �� � � �� � ���� � �� � NP VP NP PP NP man riding on edge of an oval ramp with a skate board ���� � �� � ���� ���� � �� � ���� � �� � NP VP NP PP NP PP NP a man in a helmet skateboarding before an audience � �� � ���� � �� � � �� � � �� � NP PP NP PP NP a man on a skateboard is doing a trick � �� � ���� � �� � � �� � � �� � NP PP NP VP NP → Key elements in images. ◮ Noun phrases (NP) � ◮ Verbal phrases (VP) Interactions between elements. Prepositional phrases (PP) Rémi Lebret (Idiap Research Institute / EPFL) Phrase-based Image Captioning ICML 2015 4 / 18

  14. Large-scale Syntax Analysis ◮ Two datasets: Flickr30k + COCO ( ≈ 560k training sentences). 0.7 ● 15 0.6 Cumulative Distribution Function Appareance frequencies (%) ● 0.5 10 0.4 ● ● ● 5 0.3 ● ● ● ● 0.2 ● ● ● ● ● ● ● ● ● ● ● 0 NP VP NP PP NP O NP PP NP VP NP O NP VP NP O NP VP NP PP NP PP NP O NP PP NP PP NP O NP PP NP VP NP PP NP O NP VP NP VP NP O NP PP NP PP NP PP NP O NP VP NP VP NP PP NP O NP PP NP O NP VP NP PP NP VP NP O NP NP VP NP O NP VP NP PP NP PP NP PP NP O NP PP NP PP NP VP NP O NP PP NP O NP O NP PP NP VP NP PP NP PP NP O NP NP VP NP PP NP O NP VP NP SBAR VP NP O NP O NP VP NP O NP VP NP O VP NP O ◮ Describing images: 1. Predicting NP, VP and PP. 2. Finding how they all interact. Rémi Lebret (Idiap Research Institute / EPFL) Phrase-based Image Captioning ICML 2015 5 / 18

  15. Phrase-based Model for Image Descriptions Our approach: 1. A bilinear model that learns a metric between an image and phrases used to describe it. 2. Sentences generated using a simple language model based on caption syntax statistics. Rémi Lebret (Idiap Research Institute / EPFL) Phrase-based Image Captioning ICML 2015 6 / 18

  16. A Bilinear Model U T V � U = ( u c 1 , . . . , u c |C| ) ∈ R m ×|C| I = set of training images trainable parameters θ V ∈ R m × n C = set of all phrases used to describe I a man a skate board NP a wooden ramp V U riding VP is grinding on PP with A man in a helment skateboarding before an audience. Man riding on edge of an oval ramp with a skate board. A man riding a skateboard up the side of a wooden ramp. A man on a skateboard is doing a trick. A man is grinding a ramp on a skateboard. Rémi Lebret (Idiap Research Institute / EPFL) Phrase-based Image Captioning ICML 2015 7 / 18

Recommend


More recommend