image as a single label
play

Image as a single label king crab Image Source: ImageNet Image as - PowerPoint PPT Presentation

Image as a single label king crab Image Source: ImageNet Image as an object set Man Person Woman Woman GIrl Coat King crab Box Image Source: ImageNet Image as a scene graph Man embrace Woman Woman Woman GIrl Relationships:


  1. Image as a single label “king crab” Image Source: ImageNet

  2. Image as an object set Man Person Woman Woman GIrl Coat King crab Box Image Source: ImageNet

  3. Image as a scene graph Man embrace Woman Woman Woman GIrl Relationships: hold wear look at “Woman look at box” Coat King crab Box “Man hold king crab” “Woman wear coat” “Man embrace woman” Image Source: ImageNet

  4. Image as a scene graph Man embrace Woman Woman Woman Attributes: GIrl Relationships: hold wear look at “Red king crab” “Woman look at box” Coat “Transparent box” King crab Box “Man hold king crab” “Blue coat” “Woman wear coat” “Smiling woman” “Man embrace woman” “Smiling Man” Image Source: ImageNet

  5. Why we need scene graph? Distinguish images more accurately Man Hat Man Hat Horse Horse Walking with Feeding [1] Image Retrieval using Scene Graphs. Johnson et al. CVPR 2015 Left: https://cals.ncsu.edu/wp-content/uploads/2016/08/horse-1500x931.png Rigth: https://www.videoblocks.com/video/the-man-in-hat-feed-a-brown-horse-with-flowers-on-the-meadow-supmox_3xj0tvkb67

  6. Why we need scene graph? Describe images more grounding Man Hat Man Hat Horse Horse “a man is walking with a horse” “the man is feeding a horse” [1]. Auto-Encoding Scene Graphs for Image Captioning. Yang et al. arXiv 2018 [2]. Exploring Visual Relationship for Image Captioning. Yao et al. ECCV 2018 Left: https://cals.ncsu.edu/wp-content/uploads/2016/08/horse-1500x931.png Rigth: https://www.videoblocks.com/video/the-man-in-hat-feed-a-brown-horse-with-flowers-on-the-meadow-supmox_3xj0tvkb67

  7. Why we need scene graph? Answer question more precisely Man Hat Man Hat Horse Horse Q: What is the man walking with? Q: Is the man feeding a horse? A: A horse A: Yes [1] Graph-Structured Representations for Visual Question Answering. Teney et al. CVPR 2017 [2] Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding. Yi et al. Neurips 2018 Left: https://cals.ncsu.edu/wp-content/uploads/2016/08/horse-1500x931.png Rigth: https://www.videoblocks.com/video/the-man-in-hat-feed-a-brown-horse-with-flowers-on-the-meadow-supmox_3xj0tvkb67

  8. Why we need scene graph? Generate questions more grounding Man Hat Man Hat Horse Horse Q: What animal is the man Q: What is the man doting with walking with? the horse? [1] Visual Curiosity: Learning to Ask Questions to Learn Visual Recognition. Yang et al. CoRL 2018 [2] Information Maximizing Visual Question Generation. Krishna et al. CVPR 2019 Left: https://cals.ncsu.edu/wp-content/uploads/2016/08/horse-1500x931.png Rigth: https://www.videoblocks.com/video/the-man-in-hat-feed-a-brown-horse-with-flowers-on-the-meadow-supmox_3xj0tvkb67

  9. Visual System Communication Human Scene Graph generator

  10. Visual Question Answering Answer Questions Visual System Human Scene Graph generator

  11. Visual Question Answering Answer Questions Visual System Human Scene Graph generator Ask Questions Visual Question Generation

  12. Visual Question Answering Answer Questions Visual System Human Scene Graph generator Ask Questions Visual Question Generation

  13. Skeleton Model

  14. Skeleton Model Input

  15. Skeleton Model RPN Input Region Proposals

  16. Skeleton Model Object Features ROI Pooling RPN Relationship Features ROI Pooling Input Region Proposals

  17. Skeleton Model Object Object Features Scores ROI Pooling RPN Relationship Relationship Features Scores ROI Pooling Input Region Proposals

  18. Skeleton Model Cup Object Object Hold Features Dog Scores In On ROI Person In Pooling Book TV RPN Watch Watch Left of Relationship Relationship Right of Features Scores Cat ROI Cat Pooling Input Region Proposals

  19. Iterative Message Passing (IMP) Feature Updating Cup Object Object Hold Features Dog Scores In On ROI Person In Message Pooling Book Passing TV RPN Watch Watch Left of Relationship Relationship Right of Features Scores Cat ROI Cat Pooling Input Region Proposals Feature Updating Scene Graph Generation by Iterative Message Passing. Xu et al. CVPR 2017

  20. Multi-level Scene Description Network (MSDN) Region Captions Feature Updating Cup Object Object Hold Features Dog Scores In On ROI Person In Message Pooling Book Passing TV RPN Watch Watch Left of Relationship Relationship Right of Features Scores Cat ROI Cat Pooling Input Region Proposals Feature Updating Scene Graph Generation from Objects, Phrases and Region Captions. Li et al. ICCV 2017

  21. Neural Motif Network Feature Updating Score Updating Cup Object Object Hold Features Dog Scores In On ROI Person In Pooling Book TV RPN Watch Watch Left of Relationship Relationship Right of Features Scores Cat ROI Cat Pooling Input Region Proposals Frequency Prior Neural Motifs: Scene Graph Parsing with Global Context. Zellers et al. CVPR 2018

  22. Graph R-CNN (Our work) Feature Updating Score Updating Cup Object Object Hold Features Dog Scores In On ROI Person In Message Message Pooling Book Passing Passing TV RPN Watch Watch Left of Relationship Relationship Right of Features Scores Cat ROI Cat Pooling Input Region Proposals Feature Updating Score Updating Neural Motifs: Scene Graph Parsing with Global Context. Zellers et al. CVPR 2018

  23. Graph R-CNN (Our work) Feature Updating Score Updating Cup Object Object Hold Features Dog Scores In On ROI Person In Message Message Pooling Book Passing Passing TV RPN Watch Watch Left of Relationship Relationship Right of Features Scores Cat ROI Cat Pooling Region Input Feature Updating Score Updating Proposals Relation Proposal Network (RePN) Jianwei Yang*, Jiasen Lu*, Stefan Lee, Dhruv Batra, Devi Parikh. Graph R-CNN for Scene Graph Generation. ECCV 2018.

  24. Motivations car building behind next to on car wheel near boy near wear behind fire hydrant sweater (a) (b) (c) (d)

  25. Motivations car building behind next to on car wheel near boy near wear behind fire hydrant sweater (a) (b) (c) (d) 1. Objects in a scene usually have relationships with others;

  26. Motivations car building behind next to on car wheel near boy near wear behind fire hydrant sweater (a) (b) (c) (d) 1. Objects in a scene usually have relationships with others; 2. Not all object pairs have relationships, the scene graph is usually sparse;

  27. Motivations car building behind next to on car wheel near boy near wear behind fire hydrant sweater (a) (b) (c) (d) 1. Objects in a scene usually have relationships with others; 2. Not all object pairs have relationships, the scene graph is usually sparse; 3. Existence of relationships highly depends on the object categories, and type of relationships highly depends on the context.

  28. Framework head leaf RePN aGCN on has Sparse graph Dense graph Attentional graph of behind tree bird 3 st Layer 2 st Layer in has 1 st Layer on Object Subject ! fc 0.2 $ stand fc 0.3 has + on … ReLU … wings … … 0.05 fc branch … … … Target Source … " Attention tails Object Score Matrix Object Relational Proposal Network Conv Feature Attentional GCNs Scene Graph

  29. Framework head leaf RePN aGCN on has Sparse graph Dense graph Attentional graph of behind tree bird 3 st Layer 2 st Layer in has 1 st Layer on Object Subject ! fc 0.2 $ stand fc 0.3 has + on … ReLU … wings … … 0.05 fc branch … … … Target Source … " Attention tails Object Score Matrix Object Relational Proposal Network Conv Feature Attentional GCNs Scene Graph

  30. Framework head leaf RePN aGCN on has Sparse graph Dense graph Attentional graph of behind tree bird 3 st Layer 2 st Layer in has 1 st Layer on Object Subject Subject ! fc 0.2 $ stand fc 0.3 has + on … ReLU … wings … … 0.05 fc branch … … … Target Source … " Attention tails Object Score Matrix Object Relational Proposal Network Conv Feature Attentional GCNs Scene Graph 1. Relation proposal network (RePN) to learn to prune the densely connected scene graph;

  31. Framework head leaf RePN aGCN on has Sparse graph Dense graph Attentional graph of behind tree bird 3 st Layer 2 st Layer in has 1 st Layer on Object Subject Subject ! fc 0.2 $ stand fc 0.3 has + on … ReLU … wings … … 0.05 fc branch … … … Target Source … " Attention tails Object Score Matrix Object Relational Proposal Network Conv Feature Attentional GCNs Scene Graph 1. Relation proposal network (RePN) to learn to prune the densely connected scene graph; 2. Attentional graph convolutional networks (aGCN) to incorporate the contextual information.

Recommend


More recommend