3D Shape Attributes David Fouhey, Abhinav Gupta and Andrew Zisserman CMU & University of Oxford http://www.robots.ox.ac.uk/~vgg To appear: CVPR 2016
Motivation • How to describe this object? 1. Label: Henry Moore Sculpture, “Oval with Points” 2. Shape description: 3D solid object, smooth for the most part but has pointed/conical parts, has hole, bulbous, rectangular (portrait) aspect ratio, approx. mirror symmetry
Motivation • Objective: represent the shape of 3D objects (in a viewpoint invariant manner) 1. 3D Shape Attributes : • Curvature • Contact • Volumetric • … 2. Vector (embedding) • Address the “open ‐ world” problem • Use sculptures as objects due to their great variety of shape
Motivation 3D shape from single images: • A fundamental goal of computer vision is 3D understanding from images, e.g. Koenderink & Van Doorn, 1971, and work from 1980s: • shape from contour • shape from texture • shape from specularities • …
Motivation 3D shape from single images is somewhat neglected in the ConvNet era, with some exceptions such as: • Regressing pixels ‐ > depth map Image Depth Normals Eigen et al. ‘15 Wang et al. ‘15 Among many others: Saxena et al. ’07, Barron et al. ’11 – ’15, Karsch ‘12, Fouhey ’13, ‘14, Eigen ‘14, ’15, Ladicky ’14, Liu ‘14, Baig ‘15, Wang ’15, etc. • Class ‐ specific reconstructions, e.g. Kar et al., "Category specific object reconstruction from a single image.", CVPR 2015
3D Shape Attributes (12 of these)
Examples Positives: Has Planar Surfaces
Examples Negatives: Has Planar Surfaces
Examples Positives: Has Point/Line Contact
Examples Negatives: Has Point/Line Contact
Examples Positives: Has Thin Structures
Examples Negatives: Has Thin Structures
Examples Positives: Has Rough Surfaces
Examples Negatives: Has Rough Surfaces
3D Shape Attributes (12 of these)
Research Question • Can ConvNets learn to predict these 3D shape attributes, and a 3D embedding, in a viewpoint invariant manner? • and can they also generalize to other (non ‐ sculpture) classes?
Data
Data London Malaga Yorkshire Princeton Columbus Toronto
Data
Data 242 Artists 2187 Works 143K Images in 9352 Viewpoint Clusters A. Calder 5 Swords … Gwenfritz … Eagle … H. Moore Two Forms … … The Arch … Knife Edge … R. Serra … …
Data 242 Artists 2187 Works 143K Images in 9352 Viewpoint Clusters A. Calder 5 Swords … Gwenfritz … Eagle … … H. Moore Two Forms … The Arch Knife Edge … R. Serra …
Data Collection 5 Swords A. Calder Eagle Artist / Work B. Hepworth Gwenfritz Vocabulary Two Forms H. Moore Construction The Arch Knife Edge R. Serra ~250 ~2K ~150K Images Artists Works ~9K Clusters 5 Swords A. Calder Viewpoint Eagle Clustering + B. Hepworth Gwenfritz Cleaning + Two Forms Query expansion H. Moore The Arch Knife Edge R. Serra
Data Statistics Artists Works Images Train 122 1196 77K Val 61 459 31K Test 59 532 35K Total 242 2187 143K
Training Loss Functions • Multi ‐ task learning 1. Attribute classification loss • Sum of 12 cross ‐ entropy losses, one for each attribute 2. Embedding loss • Triplet loss to match images of the same work
Training Loss Functions 1. Attribute classification loss • Sum of 12 cross ‐ entropy losses, one for each attribute N L X X Y i,l log( P i,l ) + (1 − Y i,l ) log(1 − P i,l ) , L ( Y, P ) = i =1 l =1 ,Y i,l 6 = ∅ for image i and label l , with labels Y i,l ∈ { 0 , 1 , ∅ } N,L , and predicted probabilities P i,l ∈ [0 , 1] N,L
Training Loss Functions 2. Embedding loss • Triplet loss to match images of the same work CNN encoder Φ embedding space R d φ ( a ) a congruous near pair φ ( p ) far p incongruous φ ( n ) pair n anchor a, positive p, negative n Triplet loss as in Schults and Joachims ’04, Schroff et al. ’14, Wang et al. ‘15, Parkhi et al. ‘15
Embedding loss CNN encoder Φ embedding space R d φ ( a ) a congruous near pair φ ( p ) far p incongruous φ ( n ) pair n distance margin
Embedding loss a p n distance margin || φ ( a ) − φ ( p ) || 2 + α ≤ || φ ( a ) − φ ( n ) || 2 X max(0 , α + || φ ( a ) − φ ( p ) || 2 − || φ ( a ) − φ ( n ) || 2 ) min φ triplets
Learning To Predict 12D Shape Attributes 1024D Shape Embedding Input Conv. Layers FC Layers VGG ‐ M
Goals of Experiments • How well can we do? • Are we modeling 3D shape? • Does this generalize?
Qualitative Results Most Least Point/Line Contact … Rough Surface …
Qualitative Results Most Least Thin Structures …
Quantitative Results Curvature Contact Planar Not Planar Cylinder Rough Point/Line Multiple 82.8 77.2 56.9 76.0 74.4 76.4 Occupancy Has Hole Cubic Ratio Empty 2+ Pieces Is Thin Mirror Sym. 60.8 60.3 60.4 69.3 85.8 87.0 Mean Area Under ROC
Learning To Predict 12D Shape Attributes 1024D Shape Embedding Input Conv. Layers FC Layers
Mental Rotation Shepard and Metzler 1971, Tarr et al. ‘98 Are two 3D objects related by a rotation
Mental Rotation Shepard and Metzler 1971, Tarr et al. ‘98 Video credit: Thomas Fulcher
Mental Rotation • Use works from different locations and with different materials • Classify using distance between vector descriptors
Mental Rotation – Classification Results 100 million test image pairs ROC “Easy”: 0.9% positives ROC “Hard”: 0.3% positives
Does it generalize to other classes?
Synthetic Results – has planar P(Planar)
Synthetic Results – non planar P(Non Planar)
Synthetic Results – roughness P(Rough Surface)
PASCAL VOC Results Most Point/Line Contact Least Least Most Planarity
PASCAL VOC Results Most Toroidal Pieces Least Most Thin Structures Least
Summary • Have learnt to predict 3D shape attributes and shape embedding • Dataset to be released • Improvements: binary vs relative attributes
Recommend
More recommend