learning the right thing
play

Learning the right thing with visual attributes Kristen Grauman - PowerPoint PPT Presentation

Learning the right thing with visual attributes Kristen Grauman Department of Computer Science University of Texas at Austin With Chao-Yeh Chen, Aron Yu, and Dinesh Jayaraman Beyond image labels What does it mean to understand an image? Cow


  1. Learning the right thing with visual attributes Kristen Grauman Department of Computer Science University of Texas at Austin With Chao-Yeh Chen, Aron Yu, and Dinesh Jayaraman

  2. Beyond image labels What does it mean to understand an image? Cow Tree Labels Grass vs. A lone cow grazes A lone cow grazes in a bright green in a bright green The story of pasture near an pasture near an old tree, probably old tree, probably an image in the Scottish in the Scottish Highlands. Highlands.

  3. Attributes high outdoors metallic flat heel brown has- red ornaments four-legged indoors • Mid-level semantic properties shared by objects • Human-understandable and machine-detectable [Ferrari & Zisserman 2007, Kumar et al. 2008, Farhadi et al. 2009, Lampert et al. 2009, Endres et al. 2010, Wang & Mori 2010, Berg et al. 2010, Parikh & Grauman 2011, …]

  4. Using attributes: Visual search Susp uspect #1 #1: : Mal ale, , sun sungla lasses, , black an bla and whi hite ha hat, t, blu blue shir shirt “Like this…but more orn rnate ” Relative feedback [Kovashka et al. 2012] Person search [Kumar et al. 2008, Feris et al. 2013]

  5. Using attributes: Interactive recognition Computer Vision Cone-shaped beak ? yes Computer Vision American Goldfinch? [Branson et al. 2010, 2013]

  6. Using attributes: Semantic supervision Band-tailed pigeons: Mules:  White collar  Shorter legs than donkeys  Yellow feet  Shorter tails than horses  Yellow bill  Red breast Training with relative descriptions Zero-shot learning [Parikh & Grauman 2011, [Lampert et al. 2009] Shrivastava & Gupta 2012] Strong ong body dy HOT NOT HOT Annotator rationales [Donahue & Grauman 2011]

  7. Problem With attributes, it’s easy to learn the wrong thing. • Incidental correlations • Spatially overlapping properties • Subtle visual differences • Partially category-dependent • Variance in human-perceived definitions …yet applications demand that correct meaning be captured!

  8. Goal Learn the right thing. • How to decorrelate attributes that often occur simultaneously? • Are attributes really class-independent? • How to detect fine-grained attribute differences?

  9. The curse of correlation What will be learned from this training set? Object Learning       Cat

  10. The curse of correlation What will be learned from this training set? Attribute Learning      Forest animal? Brown? Has ears? Combinations? Problem : Attributes that often co-occur cannot be distinguished by the learner

  11. The curse of correlation Forest animal      Brown      Problem : Attributes that often co-occur cannot be distinguished by the learner

  12. Idea: Resist the urge to share Forest animal “Compete”      for features Brown      Problem : Attributes that often co-occur cannot be distinguished by the learner JAYARAMAN ET AL., CVPR 2014

  13. Semantic attribute groups • Closely related attributes may share features • Assume attribute “groups” from external knowledge. JAYARAMAN ET AL., CVPR 2014

  14. Standard approach: learning separately Loss function: feature dimensions JAYARAMAN ET AL., CVPR 2014

  15. Proposed group-based formulation S 3 S 1 S 2 Group-wise weight matrix color texture motion Compute Penalize Penalize row L2 norms row L1 norms row L2 norms (inter-group competition) (feature sharing) (in-group sharing) JAYARAMAN ET AL., CVPR 2014

  16. Formulation effect Ours Sparse features Standard multi-task learning (inter-group competition, (no relationships (sharing and conflation in-group sharing) among attributes) across groups) Forest animal Brown Forest animal Brown Forest animal Brown JAYARAMAN ET AL., CVPR 2014

  17. Results – Attribute detection Birds Pascal Animals 0.22 0.32 0.57 Series1 0.3 0.55 0.2 Series2 0.28 0.53 0.18 AP AP 0.26 0.51 AP Series3 0.16 0.24 0.49 Series4 0.22 0.47 0.14 Series5 0.2 0.45 0.12 1 1 1 By decorrelating attributes, our attribute detectors generalize much better to novel unseen categories. (*) Argyriou et al, Multi-task Feature Learning, NIPS 2007 (~) Farhadi et al, Describing Objects by Their Attributes, CVPR 2009 JAYARAMAN ET AL., CVPR 2014

  18. Attribute detection example Su Success case ses No mouth Not brown No eye Not boxy No ear underparts Fail ilure case ses No Not Eyeline Black breast Not feather furry vegetation JAYARAMAN ET AL., CVPR 2014

  19. Attribute localization examples Brown wing Blue back Olive back Crested head Standard Our method avoids conflation to learn the correct semantic attribute. Ours JAYARAMAN ET AL., CVPR 2014

  20. Goal Learn the right thing. • How to decorrelate attributes that often occur simultaneously? • Are attributes really class-independent? • How to detect fine-grained attribute differences?

  21. Problem Are attributes really category-independent? ? = Fluffy dog Fluffy towel

  22. An intuitive but impractical solution • Learn category-specific attributes? Impractical! Would need examples for all category-attribute combinations… Fluffy dogs Non-fluffy dogs

  23. Idea: Analogous attributes • Given sparse set of category-specific models, infer “missing” analogous attribute classifiers 1 Learned category-sensitive attributes 2 Inferred Attribute attribute Striped Brown Spotted ?? = ?? - + + No Dog - training Category examples Prediction 3 No + - - training + + Equine examples A striped dog? Yes. Chen & Grauman, CVPR 2014

  24. Transfer via tensor completion Discover low-d latent factors Construct sparse and infer missing classifiers object-attribute (the analogous attributes) classifier tensor Category Category W W Attribute Attribute Bayesian probabilistic tensor factorization [Xiong et al., SDM 2010].

  25. Datasets • ImageNet attributes – 9600 images – 384 object categories – 25 attributes – 1498 object-attribute pairs [Russakovsky & Fei-Fei 2010] available • SUN attributes – 14340 images – 280 object categories – 59 attributes – 6118 object-attribute pairs available [Patterson & Hays 2012]

  26. Inferring class-sensitive attributes 84 total attributes, 664 object/scene classes 74 Our approach Series1 Series2 72 infers all 18K Average mAP 70 “missing” Series3 classifiers → 68 savings of 348K 66 labeled images 64 62 Category-sensitive 1 2 outperforms status quo 76% of the time, average gain of 15 points in AP Chen & Grauman, CVPR 2014

  27. Which attributes are analogous? Brown, red, Red, long, Shiny, White, gray, 2 1 yellow yellow wooden, wet wooden Brown, red, Gray, smooth, White, gray, Brown, white, long rough red red Tiles, metal, Socializing, Metal, Conducting 3 4 wire railing, eating gaming, business, carpet, leaves foliage Grass, wire, Working, Congregating, Conducting working paper, cleaning, business, carpet, sailing/boating socializing foliage Chen & Grauman, CVPR 2014

  28. Goal Learn the right thing. • How to decorrelate attributes that often occur simultaneously? • Are attributes really class-independent? • How to detect fine-grained attribute differences?

  29. Problem : Fine-grained attribute comparisons Which is more comfortable ?

  30. Relative attributes Use ordered image pairs to train a ranking function: Ranking function = …, Image features “smiling more than” [Parikh & Grauman, ICCV 2011; Joachims 2002]

  31. Relative attributes Rather than simply label images with their properties, Not bright Smiling Not natural

  32. Relative attributes We can compare images by attribute’s “strength” bright smiling natural

  33. Idea : Local learning for fine-grained relative attributes • Lazy learning: train query-specific model on the fly. • Local: use only pairs that are similar/relevant to test case. : Relevant nearby Test training pairs comparison Yu & Grauman, CVPR 2014

  34. Idea : Local learning for fine-grained relative attributes w Local Global Vs. more 2 2 ? ? 1 1 more less w less Yu & Grauman, CVPR 2014

  35. UT Zappos50K Dataset Large shoe dataset, consisting of 50,025 catalog images from Zappos.com Coarse • 4 relative attributes > • High quality pairwise labels from mTurk workers > • 6,751 ordered labels + 4,612 “equal” labels Fine-Grained • 4,334 twice-labeled fine-grained > labels (no “equal” option) > “open” Yu & Grauman, CVPR 2014

  36. Results: Fine-grained attributes Accuracy of comparisons – all attributes Accuracy on the 30 hardest test pairs Yu & Grauman, CVPR 2014

  37. Predicting useful neighborhoods • Most relevant points = most similar points? • Pose as large-scale multi-label classification problem . . . . . . 𝑧 𝑟 = [0, 0, 0, 1, 1, …, 1, 0] 𝑧 𝑜 = [1, 0, 1, 1, 0, …, 0, 1] Reconstruct 𝜚 𝑔 𝑔 𝑨 𝑜 𝑦 𝑟 𝑦 𝑜 Compressed label space Training Testing [Yu & Grauman NIPS 2014]

Recommend


More recommend