multi task learning for precise object
play

Multi-task Learning for Precise Object Search from Massive - PowerPoint PPT Presentation

Multi-task Learning for Precise Object Search from Massive Images/Videos Fan Yang National Engineering Laboratory for Video Technology School of EE & CS, Peking University Outline Introduction Motivation Challenge Multi-task


  1. Multi-task Learning for Precise Object Search from Massive Images/Videos Fan Yang National Engineering Laboratory for Video Technology School of EE & CS, Peking University

  2. Outline  Introduction  Motivation  Challenge  Multi-task learning for precise object search 1. Multi-task based person re-identification 2. Multi-task based vehicle search  Summary

  3. Outline  Introduction  Motivation  Challenge  Multi-task learning for precise object search 1. Multi-task based person re-identification 2. Multi-task based vehicle search  Summary

  4. Laboratory Organization National Engineering Laboratory for Video Technology Video Coding Lab System Lab New Media Lab Dr. Tiejun Huang Dr. Wen Gao SoC Lab IEEE Fellow ACM Fellow Testing Lab

  5. Research Fields and Groups  Video coding algorithm : Wen Gao , Siwei Ma , Ruiqin Xiong  Video coding standard  Cooperation: CCTV 、 Huawei 、 AVS Industry Alliance  Intelligent video analysis: Tiejun Huang , Yonghong Tian , Wei Zeng , Yaowei Wang  Analysis and mine surveillance videos, recognition friendly video coding  Cooperation : China Security & Protection , Hisense  Mobile Visual Search: Linyu Duan, Shiliang Zhang  CDVS international standard  Cooperation : Baidu , Singapore media bureau  Media content analysis: Yizhou Wang, Tingting Jiang  Computer vision  Cooperation : Machine intelligence Lab, Computing Technology, Chinese Academy of sciences  Image/Video Chip : Xiaodong Xie, Huizhu Jia  Industrial production  Application : National defense, Camera, Consumer Electronics

  6. Cooperation with NVIDIA(NVAIL)  Accelerating Video Encoding  investigate the acceleration methods of video encoding on Graphics Processing Unit (GPU).  Video Classification/Recognition for CDN Surveillance  Extend the current state-of-the-art methods and further improve their performance especially for the CDN surveillance purpose  Accelerating Compact Descriptors for Visual Search  Use GPU to accelerate the CDVS extracting process.  Image Super-Resolution via Convolutional Neural Networks  Extend the current state-of-the-art CNNs based super-resolution approaches and accelerate the time inference of CNNs.

  7. Outline  Introduction  Motivation  Challenge  Multi-task learning for precise object search 1. Multi-task based person re-identification 2. Multi-task based vehicle search  Summary

  8. The e Bi Big Da g Data ta Era Era  Big Data collected/collecting by societies  More data has been created in the past two years than in the entire previous history of the human race.  Data is growing faster than ever before and by the year 2020, about 1.7 megabytes of new information will be created every second for every human being on the planet. 78.5 (90%) Images and Videos Others Data Size (EB) 34 (81%) 21.5 (77%) 13 (72%) 1 1.5 2 4.5 8 (48%) (43%) (40%) (53%) (66%) The growth trend of Internet Data, estimated by IDC

  9. Sur urveill veillanc ance e Vi Video: o: Th The Bi Bigges ggest t Bi Big Data Data Center City Operation Traffic Surveillance Healthcare Video Network Public Social Lifes Security Surveillance Video Network: Surveillance Videos: The Key infrastructure of intelligent city More than half of all big data >100K cameras for a middle-size city China T. Huang, "Surveillance Video: The Biggest Big Data," Computing Now, vol. 7, no. 2, Feb. 2014, IEEE Computer Society [online]; http://www.computer.org/web/computingnow/archive/february2014.

  10. BUT, data is far from being analyzed and used  “Target rich” data, i.e., the data with especial value, take about 1.5% of the digital universe  To obtain such “target rich” data, we need to analyze and mine all the data.  At the moment, less than 0.5% of all data is ever analyzed and used

  11. The Stat The Status of Curr us of Current ent Syst Systems: ems: Le Less ss Sma Smart rt Have eyes (i.e., camera)  Ca Cannot See See (i.e., Recog. and Search) London Moscow Paris Boston 4

  12. Sur urveil eillan lance ce Vide ideo o An Anal alys ysis is  To develop intelligent algorithms, technologies and systems that can detect/recognize/search specific objects (e.g., pedestrian, vehicle), behavior, or events.  Enabling Technologies  Background modeling  Object detection/tracking (e.g., pedestrian, vehicle)  Object recognition (e.g., face)  Object re-identification and search  Action/Behavior detection/recognition  (Abnormal) Event detection  Crowd analysis  Cross-camera tracking  … 12

  13. A Challenging Problem  How can we search a specific object from massive image or video data?  NOT for visually similar object  BUT for exactly the same object ID=1 ID=2 ID=3 … … Gallery Query Detection and classification Precise object search

  14. Precise Object Search  Task: to search a specific object from a large-scale dataset which contains a set of visually similar objects captured from different camera networks.  Search as Similarity Ranking (SaS)  Search as Recognition (SaR) Precise person search Precise vehicle search

  15. Example: Det etect ect Fa Fake e Li Lice cense se Pl Plate Tollgate Car Monitoring 2 Car Registry Database Search Engine Car Monitoring 3 Car Monitoring N Honda accord Peugeot 206 Fake Plate

  16. Example: Tr Traci cing ng Sus uspic icious ious Ve Vehicle icle 2014.10.9 10:42:15 2014.10.19 10:36:33 2014.10.19 10:22:32 2014.10.19 10:12:11 2014.10.19 12:42:11 2014.10.19 13:02:18 Search Engine

  17. From Search to Recognition  Precise object recognition: The ultimate goal  Till to now, none of any recognition technology (including vehicle plate number recognition, face recognition) can achieve sufficiently high precision under an unconstrained environment  The success story of Google and Baidu tell us: Search can help, even substitute for in some cases, recognition.

  18. vs. Visual Search  The task is aiming to find visually similar objects from a large database through visual similarity measurement and ranking  In most cases, the returned objects that are visually similar (e.g., within the same (sub-)category, having the same attributes such as color) are treated as correct Query Returned List ...

  19. Recent Work: Deep Learning for Visual Search  Three Schemes  Direct Representation Refine with class labels (classification loss)  Refining by Similarity Learning Refine with side information (similarity rank loss)  Refining by Model Retraining Wan J, Wang D, Hoi S C H, et al. Deep learning for content-based image retrieval: A comprehensive study[C] ACM MM2014

  20. Recent Work: Large-scale Clothes Image Retrieval  Cross-domain Image Retrieval Given a user photo depicting a clothing image, the goal is to retrieve the same or attribute-similar clothing items from online shopping stores  Dual Attribute-aware Ranking Network 1. Two sub-networks, one for each domain. 2. Feature representations are driven by semantic attribute learning. 3. Learning to rank by triplet visual similarity constraint. Huang, Junshi, et al. "Cross-domain image retrieval with a dual attribute- aware ranking network." ICCV 2015.

  21. Outline  Introduction  Motivation  Challenge  Multi-task learning for precise object search 1. Multi-task based person re-identification 2. Multi-task based vehicle search  Summary

  22. Challenge Challenge 1: 1: Hard Hard to retriev to retrieval al The he e exp xpon onen entiall tially i y inc ncrea easing sing siz size of of ima image ges s an and video d videos s prese pr esent nts s a a gran and d cha hall llen enge ge to to pa patte ttern n rec ecog ognition nition! ! 2.2B images ~15M classes 150M 2500 12000 2.2B … … Datasize-Recognition Gap 10000 2000 8000 Class Number Image Size 1500 6000 14M images 220K classes 1000 1.2M images 4000 1000 classes 500 60K images, 30K images 2000 100 classes 256 classes 0 0 ImageNet Vehicle Images Caltech-256 ImageNet- CIFAR-100 ILSVRC ’12 in a Province 7

  23. Challeng Challenge e 2: 2: Hard Har d to i to ide dentify ntify  Using a unified framework to analysis, recognition and search from images/videos that are captured in an unconstrained environment 1) Huge amount of videos; 2) Different imaging views, illuminations, environmental conditions and image quality; 3) Visual appearance changes of the suspicious person/vehicle; 4) Other factors (e.g., lack of training data) Changchun Car Theft Case London Underground bombings Zhou Kehua Case 23

  24. Challeng Challenge e 2: 2: Har Hard d to i to ide dentify ntify  Difficult to distinguish different objects with similar appearance (i.e. vehicles of the same color and model)  Camera view, distance, illumination variations Different Same

  25. It is challenging also because It is challenging also because …  NOT depend on the strong identification information such as face or vehicle license plate number  Face is unavailable in most real-world surveillance cameras  Vehicle license plate may be faked Face Image Retrieval Scenario [Li, ICCV2015] How to search given these pictures? ✓ No front face image is available ✓ With some facial makeups ✓ Don’t know he is who ID Face Surveillance Database Face Database

Recommend


More recommend