Learning Deep Features for Scene Recognition using Places Database Bolei Zhou, Agata Lapedriza, Jianxiong Xiao, Antonio Torralba, Aude Oliva NIPS2014 Bora Çelikkale
INTRODUCTION Human Visual Recognition Samples world several times / sec ~millions images within a year
INTRODUCTION Primate Brain Hierarchical organization in layers of increasing processing complexity Inspired CNNs
PROBLEM & MOTIVATION Obj Classification have obtained astonishing performanace with large databases (ImageNet) Iconic images do not contain the richness and diversity of visual info in scenes
CONTRIBUTIONS Scene-centric database 60x larger than SUN Comparison metrics for scene datasets: Density, Diversity
SCENE DATASETS Scene15 MIT Indoor67 (Lazebnik et al. 2006) (Quatham & Torralba 2009) 67 categories of indoor places 15 categories 15.620 imgs ~3000 imgs SUN (Xiao et al. 2010) Places (Zhou et al. 2014) 397 (well-sampled) categories 476 categories 130.519 imgs 7.076.580 imgs
PLACES DATASET Google Images Same categories from SUN 1 Bing Images 696 popular adjectives in Eng Flickr >40M imgs are downloaded
PLACES DATASET PCA-based duplicate removal across SUN 2 Places & SUN have different images Allows to combine Places & SUN
PLACES DATASET Annotations (with AMT) 3 Questions (eg: is this a living room?) Two round setup: 1. Default answer is NO 2. Default answer is YES Imgs shown / round : 750 + 60 from SUN for control Take >90% accuracy
COMPARISON METRICS Relative Density
COMPARISON METRICS Relative Density Images have more similar neighbors NN of a 1 NN of b 1
COMPARISON METRICS Relative Diversity Simpson Index: two random individual belong to same specie NN of a 1 NN of b 1
EXPERIMENTS Density & Diversity Comparison (AMT) 1 Relative diversity vs. relative density per each category and dataset Show 12 pairs of images Workers select the most similar pair Diversity: pairs are chosen random for each db Density: 5th NN (avoid near duplicates) is chosen as pair with GIST
EXPERIMENTS Cross Dataset Generalization 2 Training and testing across different datasets ImageNet-CNN and linear SVM
EXPERIMENTS Comparison with Hand-designed Features 3
EXPERIMENTS Training CNN for Scene Recognition 4 2,5M imgs from 205 categories, on AlexNet
PLACES-CNNs Hybrid-AlexNet Places + ImageNet 3.5M imgs, 1183 categories Accuracy = 0.5230 on validation set Places205-GoogLeNet (on 205 categories) Accuracy: top1 = 0.5567 , top5 = 0.8541 on validation set Places205-VGG16 (on 205 categories) Accuracy: top1 = 0.5890 , top5 = 0.8770 on validation set
PLACES2 DATASET 400+ unique scene categories >10M images AlexNet top1 accuracy: 43.0% VGG16 top1 accuracy: 47.6%
DEMO http://places.csail.mit.edu/demo.html http://places2.csail.mit.edu/demo.html
THANK YOU
Recommend
More recommend