deep networks for computer vision at google
play

Deep Networks for Computer Vision at Google Chuck Rosenberg - PowerPoint PPT Presentation

Deep Networks for Computer Vision at Google Chuck Rosenberg ImageNet ILSVRC Workshop September 12, 2014 Quick Intro Private Photo Search and Public Image Search Teams Google Photos Our work: Pixels Knowledge Search by Image


  1. Deep Networks for Computer Vision at Google Chuck Rosenberg ImageNet ILSVRC Workshop September 12, 2014

  2. Quick Intro Private Photo Search and Public Image Search Teams Google Photos Our work: Pixels → Knowledge

  3. Search by Image Applications Google Image Search

  4. Applications - Photo Search

  5. Applications - Auto Curation

  6. Google Photos - Auto Awesome

  7. More Image Understanding at Google YouTube Google Shopping Much Advertising more... StreetView / Maps Self-Driving Cars Robotics

  8. Understanding is about extracting Knowledge

  9. Image Understanding: Pixels → Entities Single-word entities go way beyond simple objects! My photos of … objects: “dog” fine-grained objects: “Husky” scenes: “beach”, “sunset” actions: “kitesurfing” ,“kiss” emotions: “happiness”, “laughter” events: “birthday”, “basketball game” abstract concepts: “love”, “zen”

  10. The Deep and now Deeper Hammer Target output Pixels Deep Neural Network Deep learning infrastructure by the Google Brain team “ImageNet Classification with Deep Convolutional Neural Networks”, Krizhevsky, Sutskever, Hinton, NIPS 2012

  11. Personal Photos - Example Annotations Crowd Hummingbird Play Christmas tree Cheering Macro photography Meal Red People Reflection Cake Christmas decoration Stadium Red Child Christmas

  12. More Example Network Annotations

  13. More Example Network Annotations

  14. Google Network Stats ● Training Data ○ ImageNet 1K ~1M images ○ X-Net 100’s of millions of images ● Label Set ○ Image 1K ○ X-Net ~10’s of thousands of labels ● Ground Truth Issues ○ Incomplete Training Data ○ Noisy Training Data

  15. Challenge: Incomplete label ground-truth Problem increasingly serious as we add more types of entities and fine- grained categories: “Airedale Terrier” but not “Terrier” “Dog” “Animal” or “Pet” “Cute” or “Curb” “Grass” “Street” ...

  16. Challenge: Noisy data “Tortoise Shell “Random noise” “Tortoise” Sunglasses”

  17. Image Understanding: Localization sky Mountain human running grass dog road object detection scene parsing pose estimation

  18. Sample detections ImageNet Pascal VOC

  19. Training Embeddings Using Triplets E Triplets M B E Triplet D Deep Neural Net L2 Loss D ... I N G ● Training data consists of triplets: an anchor image, positive image, and negative image. Negative ● Loss function: Anchor [1] Positive [1] “Learning Fine-grained Image Similarity with Deep Ranking”, Wang, Song, Leung, Rosenberg, Wang, Philbin, Chen, Wu, CVPR 2014 Google Confidential and Proprietary

  20. Embedding Results Google Confidential and Proprietary

  21. Embedding Results Google Confidential and Proprietary

  22. Embedding Results Google Confidential and Proprietary

  23. Google Confidential and Proprietary

  24. Some Take Aways What Works ● ImageNet - of course! =) ● More data leads to better performance ● Deeper and bigger networks lead to better performance ● Networks handle many diverse problems very well What Needs Work ● More insight into the “Black Box” - diagnosis and understanding ● Understand and improve and training data efficiency ● Efficient means of collecting more training data ● Better ways to deal with noisy training data

  25. Thanks to the teams... ● Image Understanding Team ● Google Photos Team ● Google Brain Team ● Google Research ● Our great interns And We’re Hiring! I’m: chuck@google.com

  26. The End

Recommend


More recommend