vicos eye a web service for visual object categorization
play

ViCoS Eye a web service for visual object categorization Domen - PowerPoint PPT Presentation

ViCoS Eye a web service for visual object categorization Domen Tabernik, Luka ehovin, Matej Kristan, Marko Boben, Ale Leonardis University of Ljubljana Faculty of Computer and Information Science Visual Cognitive Systems Laboratory


  1. ViCoS Eye – a web service for visual object categorization Domen Tabernik, Luka Čehovin, Matej Kristan, Marko Boben, Aleš Leonardis University of Ljubljana Faculty of Computer and Information Science Visual Cognitive Systems Laboratory

  2. Computer Vision as Web-Services • content-based image search: existing services: • Google Image Search • – find similar images from a huge TinEye • database Microglossa • Query image CVWW2013, Hernstein, February 2013

  3. Limitations of existing services • perform well if query image from the web • poor performance for images taken by camera: • no knowledge about the content – perform similarity match using local features and color distribution. • TinEye Web-Site: – „ TinEye does not commonly return similar matches, and it cannot recognize the contents of any image. “ [1] • our goal : present an architecture for running visual object categorization as backend support for a web-service [1] http://www.tineye.com/faq#similar CVWW2013, Hernstein, February 2013

  4. Requrements for our system • online web-service for computer vision algorithms – support for more advanced computer vision algorithms (object categorization etc.) • handle hundreds of requests per second • fast response time – recommended tolerable waiting time (TWT) for web pages is approximately 2 seconds [1] – for our web-service 5 seconds is acceptable • fully utilize available hardware – distributed processing in a cluster of machines (a cloud) – allows for high scalability – better handle increased traffic in future [1] F. F.-H. Nah. A study on tolerable waiting time: how long are web users willing to wait? Behaviour & IT, 23(3):153 – 163, 2004. CVWW2013, Hernstein, February 2013

  5. Two-level architecture Distributed learning Real-time stream processing process from 1000 to 100 000 of processing requests from web-service • • images distributing by Storm application • distributing using MapReduce fast response time • • (Hadoop implementation) process hundreds of requests at once • learn appropriate libraries/models etc. • Storm CVWW2013, Hernstein, February 2013

  6. Distributed learning - MapReduce MapReduce is a two stage processing procedure: CVWW2013, Hernstein, February 2013

  7. Distributed learning - implementation • Object categorization with Support Vector Machine using HoC [Tabernik et.al. 2012] descriptor: 1. Process the input image with an LHOP library 2. Generate HoC descriptor from LHOP model of an image 3. Classify image into any of the known categories using SVM LHOP Model HoC Descriptor SVM CVWW2013, Hernstein, February 2013

  8. Distributed learning - implementation Translate the following learning process into MapReduce domain: 1. Process the input image with LHOP library 2. Generate HoC descriptor from LHOP model of an image 3. Train SVM for each category using all images No reduce function CVWW2013, Hernstein, February 2013

  9. Distributed learning - performance • Using Caltech-101 as training dataset – 2x9000 images as we mirror each image • 101 categories and ~18000 images trained less than 40 min Hardware : 3x Hadoop instances running in • virtual machine each machine: • 40 cores (35 worker nodes) • 80 GB RAM • Estimation for bigger dataset: • 100 000 images ~4 hours • 1 000 000 images ~40 hours • CVWW2013, Hernstein, February 2013

  10. Real-time stream processing - Storm • Storm [1] is an open source distributed real-time computation system • Provides infrastructure of workers and queues distributed across cluster machines • Writing application in a form of digraph (topology): – Spouts : sources of data – Bolts : processing elements Spout Bolt [1] http://storm-project.net CVWW2013, Hernstein, February 2013

  11. Real-time stream processing – our topology • separated workers for bolts - pipeline-like approach • multiple workers per bolt – handle multiple requests at once • easily scalable • performance: – response time: ~ 2.1 sec = 1.5s (LHOP) + 0.4s (SVM) + other • Hardware: 8 core CPU, 16 GB RAM – estimate of max load on 105 nodes : 48 requests per second before saturation CVWW2013, Hernstein, February 2013

  12. Web Service front end • Web-site: http://eye.vicos.si • Android app: CVWW2013, Hernstein, February 2013

  13. Conclusion • presented an architecture capable of running computer vision algorithms in cluster of machines – learn 1000 and 100 000 images in manageable time – response time of ~2 sec – handle ~50 requests per second • future work: – detection with LHOP (localization + classification) – category-oriented content-based image search CVWW2013, Hernstein, February 2013

Recommend


More recommend