In Search of Art Elliot J. Crowley and Andrew Zisserman Visual Geometry Group Department of Engineering Science University of Oxford
The Goal • An on-the-fly system for searching paintings visually • A user can type in the name of any category... • Then hundreds of paintings containing that category will be retrieved in a matter of seconds dog
Benefits • In many instances, the retrieved paintings will not have been known to contain the category • Meaning these are new discoveries for the Art History community dog
Why is this good? • Art historians can discover when something first appeared in paintings • They can also observe how things have changed over time
How is this achieved? • Natural images annotated with object categories are everywhere. • These can be used to learn object classifiers. Google images of dog
Dataset of Paintings • We use `Your Paintings’ as the dataset • `Your Paintings’ consists of over 210,000 paintings from UK galleries http://www.bbc.co.uk/arts/yourpaintings/ • Method is independent of dataset however • Can use other datasets e.g. Rijksmuseum or PrintART
Outline • Methodology • Quantitative Evaluation • Aligning retrieved objects
What do we do? • We crawl Google Images for a given category and learn a CNN-based classifier • This classifier is applied to a dataset of paintings, retrieving paintings containing the category
The Architecture
How do we do this quickly? • The bulk of the data has been pre-processed offline (negative training data, dataset of paintings) • Online processing of Google Images is done in parallel across multiple cores
In more detail… • For a given query, the top 200 Google Image Hits are downloaded • For each of these a CNN feature is computed online • This is the positive training data
Negative Training Data • Offline , images are downloaded for Google searches of `things’ and ‘photos’ • The features for these are pre-computed
Classification • A Support Vector Machine is used to learn a classifier that discriminates the positive training data from the negative data beard not beard
Retrieval • The classifier is applied to the pre-processed features of `Your Paintings’ • Each painting is given a score by the classifier
Retrieval • The paintings are displayed in order of score. beard
The Architecture - Timings 2s 0.5s <0.5s 4.5s <0.5s
Example Queries bridge
Example Queries carriage
Example Queries flower
Example Queries house
Outline • Methodology • Quantitative Evaluation • Aligning retrieved objects
Quantitative Evaluation • Evaluating the domain transfer problem of learning classifiers on natural images and applying these to paintings
Test Set • For this an annotated dataset of paintings is required • 10,000 paintings in `Your Paintings’ have been tagged by the public • These tags + painting titles are used to form the `Paintings Dataset’ with annotations corresponding to classes of PASCAL VOC
The Paintings Dataset Class Paintings • Assume complete annotation with Class Aeroplane 200 in the PASCAL sense Bird 805 • Assess by calculating APs Boat 2143 Chair 1202 Dog Cow 625 Dining-table 1201 Dog 1145 Horse Horse 1493 Sheep 751 Train Train 329
Training Datasets • 4 Datasets of natural images are used for training • VOC12, VOC12+, Net Noisy, Net Curated
Experiments Features compared: • Shallow Features - Fisher Vectors VS. • Deep Features - Convolutional Neural Networks (CNNs)
Experiments - Features • Fisher Vector VS. CNN Features • CNN outperforms Fisher Vectors • Added advantage of being lower dimensionality
Augmentation • No augmentation 224 224 • C+F augmentation 224 256
Experiments - Augmentation • Sum Pool: Classifier applied to mean of augmented windows • Max Pool: Classifier applied to each augmented window and maximum score recorded • Best performance is aug + sum pool but almost as good with no aug + sum pool
Experiments - Dimensionality • 1K performs best • Not that different from the others however
Experiment Conclusions • For the on-the-fly system 1K CNN features are used as these performed the best • Sum pooled features are used for `Your Paintings’ as time is not a factor in computing these • No augmentation is used on the images downloaded from Google (0.3s per image per core vs. 2.4s)
Outline • Methodology • Quantitative Evaluation • Aligning retrieved objects
Alignment • Some objects are automatically aligned… moustache
The Pencil Moustache Anonymous Trendsetter, 1565 Copycats, Now
Alignment • Other objects require some work… train
Solution Learn a DPM [1] on either 1. annotated bounding boxes (e.g. PASCAL VOC) or 2. the downloaded Google Images [1] P Felzenszwalb, R Girshick, D McAllester, D Ramanan, Object Detection with Discriminatively Trained Part Based Models, CVPR 2010
Auto-alignment train
Auto-alignment horse
Conclusion • We provide a system that can find objects in paintings with high precision in very little time • The objects found can be further curated using a DPM
Links • VISOR: Visual Search of BBC News [1] http://www.robots.ox.ac.uk/~vgg/research/on-the-fly/ • CNN code [2] http://www.robots.ox.ac.uk/~vgg/research/deep_eval/ • Our system COMING SHORTLY! [1] K Chatfield, A Zisserman, VISOR: Towards On-the-Fly Large-Scale Object Category Retrieval, ACCV, 2012 [2] Ken Chatfield, Karen Simonyan, Andrea Vedaldi, Andrew Zisserman, Return of the Devil in the Details: Delving Deep into Convolutional Nets, BMVC, 2014
Thank you • Any questions? • Or email elliot@robots.ox.ac.uk
Recommend
More recommend