SPICE: Semantic Propositional Image Caption Evaluation Presented to - PowerPoint PPT Presentation

SPICE: Semantic Propositional Image Caption Evaluation Presented to the COCO Consortium, Sept 2016 Peter Anderson 1 , Basura Fernando 1 , Mark Johnson 2 and Stephen Gould 1 1 Australian National University 2 Macquarie University ARC Centre of Excellence for Robotic Vision 1 ARC Centre of Excellence for Robotic Vision

Image captioning Source: MS COCO Captions dataset Source: http://aipoly.com/ www.roboticvisi ARC Centre of Excellence for Robotic Vision 2 on.org

Automatic caption evaluation • Benchmark datasets require fast to compute, accurate and inexpensive evaluation metrics • Good metrics can be used to help construct better models The Evaluation Task: Given a candidate caption c i and a set of m reference captions R i = {r i1 ,…,r im } , compute a score S i that represents similarity between c i and R i . www.roboticvisi ARC Centre of Excellence for Robotic Vision 3 on.org

Existing state of the art • Nearest neighbour captions often ranked higher than human captions Source: Lin Cui, Large-scale Scene UNderstanding Workshop, CVPR 2015 www.roboticvisi ARC Centre of Excellence for Robotic Vision 4 on.org

Existing metrics • BLEU: Precision • METEOR: Align with brevity penalty, fragments, take geometric mean harmonic mean of over n-grams precision & recall • ROUGE-L: F -score • CIDEr: Cosine based on Longest similarity with TF- Common Substring IDF weighting www.roboticvisi ARC Centre of Excellence for Robotic Vision 5 on.org

Motivation ‘False positive’ ‘False negative’ (High n-gram similarity) (Low n-gram similarity) A young girl A shiny metal standing on top pot filled with of a tennis some diced court. veggies. A giraffe The pan on the standing on top stove has of a green field. chopped vegetables in it. …n-gram overlap is not necessary or suffjcient for two sentences to mean the same …SPICE primarily addresses false positives Source: MS COCO Captions dataset www.roboticvisi ARC Centre of Excellence for Robotic Vision 6 on.org

Is this a good caption? “A young girl standing on top of a basketball court” www.roboticvisi ARC Centre of Excellence for Robotic Vision 7 on.org

Is this a good caption? “A young girl standing on top of a basketball court” Semantic propositions: 1.There is girl 2.The girl is young 3.The girl is standing 4.There is court 5.The court is used for basketball 6.The girl is on the court www.roboticvisi ARC Centre of Excellence for Robotic Vision 8 on.org

Key Idea – scene graphs 1 2. Parse 2 1. Input 3. Scene Graph 3 4. T uples (girl) (court) (girl, young) (girl, standing) (court, tennis) (girl, on-top-of, court) 1 Johnson et. al. Image Retrieval Using Scene Graphs, CVPR 2015 2 Klein & Manning: Accurate Unlexicalized Parsing, ACL 2003 3 Schuster et. al: Generating semantically precise scene graphs from textual descriptions for improved image retrieval, EMNLP 2015 www.roboticvisi ARC Centre of Excellence for Robotic Vision 9 on.org

SPICE Calculation SPICE calculated as an F-score over tuples, with: • Merging of synonymous nodes, and • Wordnet synsets used for tuple matching and merging. Given candidate caption c, a set of reference captions S , and the mapping T from captions to tuples: www.roboticvisi ARC Centre of Excellence for Robotic Vision 10 on.org

Example – good caption www.roboticvisi ARC Centre of Excellence for Robotic Vision 11 on.org

Example – good caption www.roboticvisi ARC Centre of Excellence for Robotic Vision 12 on.org

Example – weak caption www.roboticvisi ARC Centre of Excellence for Robotic Vision 13 on.org

Example – weak caption www.roboticvisi ARC Centre of Excellence for Robotic Vision 14 on.org

Evaluation – MS COCO (C40) Pearson ρ correlation between evaluation metrics and human judgments for the 15 competition entries plus human captions in the 2015 COCO Captioning Challenge, using 40 reference captions. Source: Our thanks to the COCO Consortium for performing this evaluation using MS COCO Captions C40. www.roboticvisi ARC Centre of Excellence for Robotic Vision 15 on.org

Evaluation – MS COCO (C40) SPICE picks the same top-5 as human evaluators. Absolute scores are lower with 40 reference captions (compared to 5 reference captions) Source: Our thanks to the COCO Consortium for performing this evaluation using MS COCO Captions C40. www.roboticvisi ARC Centre of Excellence for Robotic Vision 16 on.org

Gameability • SPICE measures how well caption models recover objects, attributes and relations • Fluency is neglected (as with n-gram metrics) • If fluency is a concern, include a fluency metric such as surprisal* *Hale, J: A probabilistic Earley Parser as a Psycholinguistic Model 2001; Levy, R: Expectation-based syntactic comprehension 2008 www.roboticvisi ARC Centre of Excellence for Robotic Vision 17 on.org

SPICE for error analysis Breakdown of SPICE F-score over objects, attributes and relations www.roboticvisi ARC Centre of Excellence for Robotic Vision 18 on.org

Can caption models count? Breakdown of attribute F-score over color, number and size attributes www.roboticvisi ARC Centre of Excellence for Robotic Vision 19 on.org

Summary • SPICE measures how well caption models recover objects, attributes and relations • SPICE captures human judgment better than CIDEr, BLEU, METEOR and ROUGE • Tuples can be categorized for detailed error analysis • Scope for further improvement as better semantic parsers are developed • Next steps: Using SPICE to build better caption models! www.roboticvisi ARC Centre of Excellence for Robotic Vision 20 on.org

Thank you Link: SPICE Project Page (http://panderson.me/spice) Acknowledgement: We are grateful to the COCO Consortium for re-evaluating the 2015 Captioning Challenge entries using SPICE. ARC Centre of Excellence for Robotic Vision ARC Centre of Excellence for Robotic Vision 21

SPICE: Semantic Propositional Image Caption Evaluation Presented to - PowerPoint PPT Presentation

SPICE: Semantic Propositional Image Caption Evaluation Presented to the COCO Consortium, Sept 2016 Peter Anderson 1 , Basura Fernando 1 , Mark Johnson 2 and Stephen Gould 1 1 Australian National University 2 Macquarie University ARC Centre of

June 12, 2020 Type to enter a caption. Greeter Graham Drake Type to enter a caption. Give

Spice Spice contains no compensatory substances

Community Solar and SPICE Presented by: Solar Energy Society of Alberta - September 26, 2019

Image Caption Image Caption Image Caption Lorem ipsum dolor sit amet, consectetur adipiscing

[PDF] Spice of Life : The Recipes and Cooking Culture of Thailand (book with CD Rom in

SPICE and desktop virtualization Gerd Hoffmann <kraxel@redhat.com> Red Hat LinuxTag, May

Orange Mockup Review Oct. 21 st , 2010 Select-a-Spice

Desktop Virtualization with SPICE Gerd Hoffmann <kraxel@redhat.com> Linux Kongress, Sep 23

Propositional Logic Propositional logic is a subset of the predicate logic. ! Syntax ! Semantics !

Logic as a Tool Chapter 1: Understanding Propositional Logic 1.2 Propositional logical

First-Order Logic Russell and Norvig Chapter 8 Propositional logic J Propositional logic is

Logic as a Tool Chapter 1: Understanding Propositional Logic 1.2 Propositional logical

Image Restoration Image Enhancement and Image Restoration both deal with improving images. Image

April 3, 2020 Type to enter a caption. Estate Planning | 9 Estate Planning | 10 Jamie

SPICE project theme 4 Georg Martin Project description Name: Implementation and development

Splitting and Propositional Variables in Resolution Theorem Provers Splitting and Propositional

TR TRECVI VID 2019 Vi Video t to T Text D Descr cription As Asad d A. But utt NIST;

2000-2016: Dal bortezomib ai nuovi inibitori del proteasoma

Managing Hyperlipidemia: Update 2020 Dedra Hayden, DNP, ANP, APRN-BC Disclosures Dedra

Explain the difference between a simile and a metaphor 1 Lesson 21 lesson writing.notebook May

GANs for Discrete Text Generation Junfu Oct. 20 th , 2018 Show, Tell and Discriminate

Video Captioning via Hierarchical Reinforcement Learning Xin Wang, Wenhu Chen, Jiawei Wi,

DADI Block-Level Image Service for Agile and Elastic Application Deployment Huiba Li, Yifan Yuan,

No Metrics Are Perfect: Adversarial REward Learning for Visual Storytelling Xin (Eric) Wang*,

SPICE: Semantic Propositional Image Caption Evaluation Presented to - PowerPoint PPT Presentation

SPICE: Semantic Propositional Image Caption Evaluation Presented to the COCO Consortium, Sept 2016 Peter Anderson 1 , Basura Fernando 1 , Mark Johnson 2 and Stephen Gould 1 1 Australian National University 2 Macquarie University ARC Centre of

June 12, 2020 Type to enter a caption. Greeter Graham Drake Type to enter a caption. Give

Spice Spice contains no compensatory substances

Community Solar and SPICE Presented by: Solar Energy Society of Alberta - September 26, 2019

Image Caption Image Caption Image Caption Lorem ipsum dolor sit amet, consectetur adipiscing

[PDF] Spice of Life : The Recipes and Cooking Culture of Thailand (book with CD Rom in

SPICE and desktop virtualization Gerd Hoffmann &lt;kraxel@redhat.com&gt; Red Hat LinuxTag, May

Orange Mockup Review Oct. 21 st , 2010 Select-a-Spice

Desktop Virtualization with SPICE Gerd Hoffmann &lt;kraxel@redhat.com&gt; Linux Kongress, Sep 23

Propositional Logic Propositional logic is a subset of the predicate logic. ! Syntax ! Semantics !

Logic as a Tool Chapter 1: Understanding Propositional Logic 1.2 Propositional logical

First-Order Logic Russell and Norvig Chapter 8 Propositional logic J Propositional logic is

Logic as a Tool Chapter 1: Understanding Propositional Logic 1.2 Propositional logical

Image Restoration Image Enhancement and Image Restoration both deal with improving images. Image

April 3, 2020 Type to enter a caption. Estate Planning | 9 Estate Planning | 10 Jamie

SPICE project theme 4 Georg Martin Project description Name: Implementation and development

Splitting and Propositional Variables in Resolution Theorem Provers Splitting and Propositional

TR TRECVI VID 2019 Vi Video t to T Text D Descr cription As Asad d A. But utt NIST;

2000-2016: Dal bortezomib ai nuovi inibitori del proteasoma

Managing Hyperlipidemia: Update 2020 Dedra Hayden, DNP, ANP, APRN-BC Disclosures Dedra

Explain the difference between a simile and a metaphor 1 Lesson 21 lesson writing.notebook May

GANs for Discrete Text Generation Junfu Oct. 20 th , 2018 Show, Tell and Discriminate

Video Captioning via Hierarchical Reinforcement Learning Xin Wang, Wenhu Chen, Jiawei Wi,

DADI Block-Level Image Service for Agile and Elastic Application Deployment Huiba Li, Yifan Yuan,

No Metrics Are Perfect: Adversarial REward Learning for Visual Storytelling Xin (Eric) Wang*,

SPICE and desktop virtualization Gerd Hoffmann <kraxel@redhat.com> Red Hat LinuxTag, May

Desktop Virtualization with SPICE Gerd Hoffmann <kraxel@redhat.com> Linux Kongress, Sep 23