the language of visual attributes
play

The language of visual attributes Kristen Grauman Facebook AI - PowerPoint PPT Presentation

The language of visual attributes Kristen Grauman Facebook AI Research University of Texas at Austin Attributes vs. objects Red Round Visual Physical properties entity Ripe Fresh Value of attributes Find a more Zebras have stripes


  1. The language of visual attributes Kristen Grauman Facebook AI Research University of Texas at Austin

  2. Attributes vs. objects Red Round Visual Physical properties entity Ripe Fresh

  3. Value of attributes “Find a more Zebras have stripes What color A lone cow grazes formal shoe” and four legs… is the beak? in a green pasture. Interactive Visual Zero-shot Image/video recognition search learning description [Ferrari & Zisserman 2007, Kumar et al. 2008, Farhadi et al. 2009, Lampert et al. 2009, Wang & Mori 2010, Berg et al. 2010, Parikh & Grauman 2011, Branson et al. 2010, Kovashka et al. 2012, Kulkarni et al. 2011, Wang et al. 2016, Liu et al. 2015, Singh et al. 2016, …]

  4. The language of visual attributes • Attributes as operators Attributes:adjectives that modify objects:nouns • Attributes for comparisons Relative differences that people first describe • Attributes for visual styles Semantic topic models for data-driven styles

  5. Attributes and objects Red Round Visual Physical properties entity Ripe Fresh Attributes and objects are fundamentally different

  6. Attribute and Object Representations Yet status quo apple treats attributes and objects the same... As latent vector sliced encodings e.g., Wang CVPR16, Liu CVPR15, Singh ECCV16, Lu CVPR17, Su ECCV16,…

  7. Attribute vs. Object Representations object attribute ? Prototypical Prototypical “car” instance “sliced” instance

  8. Challenges for the status quo approach ... Object-agnostic attribute representation Has to capture interactions with every object

  9. Challenges for the status quo approach vs. Object-agnostic Old car Old man attribute representation Has to capture attributes’ distinct manifestations

  10. Our idea – Attributes as operators = Attributes are operators that transform object encodings = [Nagarajan & Grauman, ECCV 2018]

  11. Our idea – Attributes as operators = Objects are vectors Attributes are operators T = Composition is: an attribute operator transforming an object vector [Nagarajan & Grauman, ECCV 2018]

  12. Linguistically inspired regularizers Antonym-consistency: “Unripe should undo the effect of ripe” [Nagarajan & Grauman, ECCV 2018]

  13. Linguistically inspired regularizers Attribute commutation: Attribute effects should stack . [Nagarajan & Grauman, ECCV 2018]

  14. Learning attribute operators [Nagarajan & Grauman, ECCV 2018]

  15. Learning attribute operators Triplet loss to learn embedding space [Nagarajan & Grauman, ECCV 2018]

  16. Learning attribute operators Triplet loss [plus linguistic regularizers] to learn embedding space Initialize with GloVe word embeddings [Pennington et al. EMNLP 2014]

  17. Learning attribute operators Allows unseen compositions [Nagarajan & Grauman, ECCV 2018]

  18. Evaluation UT-Zappos 50k MIT States (Yu & Grauman, CVPR 14) (Isola et al., CVPR 15) 16 attributes x 12 objects 115 attributes x 245 objects

  19. Evaluating our composition model Sliced carrot Unripe orange Sliced Diced orange carrot Diced onion Sliced apple Train time Test time

  20. Evaluating our composition model Combination never seen during training Sliced carrot Unripe orange Sliced Diced orange carrot Diced onion Sliced apple Train time Test time

  21. Results – Attribute+object composition recognition MIT States: 6% increase in open world (3% h-mean) # * UT-Zap: 14% increase in open world (12% h-mean) *Misra et al. CVPR 2017 [Nagarajan & Grauman, ECCV 2018] #Chen & Grauman CVPR 2014

  22. Results - Retrieving unseen (unseen) compositions Rusty Lock query Nearest Images in ImageNet

  23. The language of visual attributes • Attributes as operators Attributes:adjectives that modify objects:nouns • Attributes for comparisons Relative differences that people first describe • Attributes for visual styles Semantic topic models for data-driven styles

  24. Relative attributes Smiling ??? Not Smiling >? Parikh & Grauman, ICCV 2011 Singh & Lee, ECCV 2016

  25. Relative attributes < Not Smiling � >? Learn a ranking function per attribute Parikh & Grauman, ICCV 2011 Singh & Lee, ECCV 2016

  26. Relative attributes Compare images by an attribute’s “strength” bright smiling natural [Parikh & Grauman, ICCV 2011]

  27. Challenge #1: fine-grained comparisons Which is more sporty? Coarse Fine-Grained v v s. s. Sparsity of supervision problem: 1. Label availability: lots of possible pairs. 2. Image availability: subtleties hard to curate.

  28. Idea: Semantic jitter Overcome sparsity of available fine-grained image pairs with attribute-conditioned image generation sporty open comfort + + vs. + - - - Status quo: Our idea: Low-level jitter Semantic jitter Yu & Grauman, ICCV 2017

  29. Semantic jitter for attribute learning Train rankers with both real and synthetic image pairs, test on real fine-grained pairs. Novel Pair vs. Faces, Shoes 100 Real Pairs Synthetic Pairs Attribute accuracy 90 80 Ranking functions trained with deep spatial transformer ranking networks Yu & Grauman, ICCV 2017 [Singh & Lee 2016] or Local RankSVM [Yu & Grauman 2014]

  30. Challenge #2: Which attributes matter?

  31. Idea: Prominent relative attributes Infer which comparisons are perceptually salient Chen & Grauman, CVPR 2018

  32. Approach : What causes prominence? Prominent Difference: • Large difference in Colorful attribute strength: Visible • Unusual and uncommon Forehead attribute occurrences: • Absence of other Dark Hair noticeable differences: In general: Interactions between all the relative attributes in an image pair cause prominent differences. Chen & Grauman, CVPR 2018

  33. Approach: Predicting prominent differences input: �� � � Relative Attribute � � �⋯� Rankers Prominent � � Prominence Difference: Multiclass �� Classifier Visible Teeth Symmetric � �� encoding Relative � � Attribute �⋯� Rankers � � Chen & Grauman, CVPR 2018

  34. Results: Prominent differences (Top 3 prominent differences for each pair)

  35. Results: Prominent differences Rank-SVM Rank-CNN Accuracy Accuracy # Top prominent as ground truth # Top prominent as ground truth

  36. Prominent differences: impact on visual search Query: “white high-heeled shoes” Initial top … search results Feedback: Feedback: “shinier “less formal than these” than these” Refined top … search results Leverage prominence to better focus search results Chen & Grauman, CVPR 2018

  37. Prominent differences: impact on visual search Faster retrieval of user’s target image without using any additional user feedback. Leverage prominence to better focus search results Chen & Grauman, CVPR 2018

  38. From items to styles

  39. The language of visual attributes • Attributes as operators Attributes:adjectives that modify objects:nouns • Attributes for comparisons Relative differences that people first describe • Attributes for visual styles Semantic topic models for data-driven styles

  40. How to represent visual style ? CNN image manually defined similarity stylistic similarity? style labels Challenges: • Same “look” manifests in different garments • Emerges organically and evolves over time • Soft boundaries

  41. Idea: Discovering visual styles Unsupervised learning of a style-coherent embedding with a polylingual topic model ... An outfit is a mixture of (latent) styles. An outfit is a mixture of (latent) styles. A style is a distribution over attributes. A style is a distribution over attributes. Hsiao & Grauman, ICCV 2017 Mimno et al. "Polylingual topic models." EMNLP 2009.

  42. Example discovered styles (dresses) Styles we automatically discover in the Amazon dataset [McAuley et al. 2015]

  43. Example discovered styles (full outfit) Styles automatically discovered in the HipsterWars dataset [Kiapour et al]

  44. Mixing styles Our embedding naturally facilitates browsing for mixes of user-selected styles Bohemian Hipster Hsiao & Grauman, ICCV 2017

  45. Creating a “capsule” wardrobe Goal : Select minimal set of pieces that mix and match well to create many viable outfits Outfit #2 Outfit #3 Outfit #1 Outfit #5 Outfit #4 Pose as subset selection problem set of garments = argmax compatibility + versatility Inventory Capsule pieces Hsiao & Grauman, CVPR 2018

  46. Creating a “capsule” wardrobe Discover user’s style preferences from album Personalized capsule Hsiao & Grauman, CVPR 2018

  47. Visual trend forecasting We predict the future popularity of each style Amazon dataset [McAuley et al. SIGIR 2015] Al-Halah et al., ICCV 2017

  48. Visual trend forecasting What kind of fabric, texture, color will be popular next year?

  49. VizWiz: Answer blind people’s visual questions [Gurari et al. CVPR 2018] Spotlight/Poster Wednesday • Goal-oriented visual questions • Conversational language Hi there can you • Assistive technology Is my monitor What type of pills What is this? please tell me what on? are these? flavor this is?

Recommend


More recommend