Deep Networks for Computer Vision at Google Chuck Rosenberg ImageNet ILSVRC Workshop September 12, 2014
Quick Intro Private Photo Search and Public Image Search Teams Google Photos Our work: Pixels → Knowledge
Search by Image Applications Google Image Search
Applications - Photo Search
Applications - Auto Curation
Google Photos - Auto Awesome
More Image Understanding at Google YouTube Google Shopping Much Advertising more... StreetView / Maps Self-Driving Cars Robotics
Understanding is about extracting Knowledge
Image Understanding: Pixels → Entities Single-word entities go way beyond simple objects! My photos of … objects: “dog” fine-grained objects: “Husky” scenes: “beach”, “sunset” actions: “kitesurfing” ,“kiss” emotions: “happiness”, “laughter” events: “birthday”, “basketball game” abstract concepts: “love”, “zen”
The Deep and now Deeper Hammer Target output Pixels Deep Neural Network Deep learning infrastructure by the Google Brain team “ImageNet Classification with Deep Convolutional Neural Networks”, Krizhevsky, Sutskever, Hinton, NIPS 2012
Personal Photos - Example Annotations Crowd Hummingbird Play Christmas tree Cheering Macro photography Meal Red People Reflection Cake Christmas decoration Stadium Red Child Christmas
More Example Network Annotations
More Example Network Annotations
Google Network Stats ● Training Data ○ ImageNet 1K ~1M images ○ X-Net 100’s of millions of images ● Label Set ○ Image 1K ○ X-Net ~10’s of thousands of labels ● Ground Truth Issues ○ Incomplete Training Data ○ Noisy Training Data
Challenge: Incomplete label ground-truth Problem increasingly serious as we add more types of entities and fine- grained categories: “Airedale Terrier” but not “Terrier” “Dog” “Animal” or “Pet” “Cute” or “Curb” “Grass” “Street” ...
Challenge: Noisy data “Tortoise Shell “Random noise” “Tortoise” Sunglasses”
Image Understanding: Localization sky Mountain human running grass dog road object detection scene parsing pose estimation
Sample detections ImageNet Pascal VOC
Training Embeddings Using Triplets E Triplets M B E Triplet D Deep Neural Net L2 Loss D ... I N G ● Training data consists of triplets: an anchor image, positive image, and negative image. Negative ● Loss function: Anchor [1] Positive [1] “Learning Fine-grained Image Similarity with Deep Ranking”, Wang, Song, Leung, Rosenberg, Wang, Philbin, Chen, Wu, CVPR 2014 Google Confidential and Proprietary
Embedding Results Google Confidential and Proprietary
Embedding Results Google Confidential and Proprietary
Embedding Results Google Confidential and Proprietary
Google Confidential and Proprietary
Some Take Aways What Works ● ImageNet - of course! =) ● More data leads to better performance ● Deeper and bigger networks lead to better performance ● Networks handle many diverse problems very well What Needs Work ● More insight into the “Black Box” - diagnosis and understanding ● Understand and improve and training data efficiency ● Efficient means of collecting more training data ● Better ways to deal with noisy training data
Thanks to the teams... ● Image Understanding Team ● Google Photos Team ● Google Brain Team ● Google Research ● Our great interns And We’re Hiring! I’m: chuck@google.com
The End
Recommend
More recommend