Multi-task Learning for Precise Object Search from Massive Images/Videos Fan Yang National Engineering Laboratory for Video Technology School of EE & CS, Peking University
Outline Introduction Motivation Challenge Multi-task learning for precise object search 1. Multi-task based person re-identification 2. Multi-task based vehicle search Summary
Outline Introduction Motivation Challenge Multi-task learning for precise object search 1. Multi-task based person re-identification 2. Multi-task based vehicle search Summary
Laboratory Organization National Engineering Laboratory for Video Technology Video Coding Lab System Lab New Media Lab Dr. Tiejun Huang Dr. Wen Gao SoC Lab IEEE Fellow ACM Fellow Testing Lab
Research Fields and Groups Video coding algorithm : Wen Gao , Siwei Ma , Ruiqin Xiong Video coding standard Cooperation: CCTV 、 Huawei 、 AVS Industry Alliance Intelligent video analysis: Tiejun Huang , Yonghong Tian , Wei Zeng , Yaowei Wang Analysis and mine surveillance videos, recognition friendly video coding Cooperation : China Security & Protection , Hisense Mobile Visual Search: Linyu Duan, Shiliang Zhang CDVS international standard Cooperation : Baidu , Singapore media bureau Media content analysis: Yizhou Wang, Tingting Jiang Computer vision Cooperation : Machine intelligence Lab, Computing Technology, Chinese Academy of sciences Image/Video Chip : Xiaodong Xie, Huizhu Jia Industrial production Application : National defense, Camera, Consumer Electronics
Cooperation with NVIDIA(NVAIL) Accelerating Video Encoding investigate the acceleration methods of video encoding on Graphics Processing Unit (GPU). Video Classification/Recognition for CDN Surveillance Extend the current state-of-the-art methods and further improve their performance especially for the CDN surveillance purpose Accelerating Compact Descriptors for Visual Search Use GPU to accelerate the CDVS extracting process. Image Super-Resolution via Convolutional Neural Networks Extend the current state-of-the-art CNNs based super-resolution approaches and accelerate the time inference of CNNs.
Outline Introduction Motivation Challenge Multi-task learning for precise object search 1. Multi-task based person re-identification 2. Multi-task based vehicle search Summary
The e Bi Big Da g Data ta Era Era Big Data collected/collecting by societies More data has been created in the past two years than in the entire previous history of the human race. Data is growing faster than ever before and by the year 2020, about 1.7 megabytes of new information will be created every second for every human being on the planet. 78.5 (90%) Images and Videos Others Data Size (EB) 34 (81%) 21.5 (77%) 13 (72%) 1 1.5 2 4.5 8 (48%) (43%) (40%) (53%) (66%) The growth trend of Internet Data, estimated by IDC
Sur urveill veillanc ance e Vi Video: o: Th The Bi Bigges ggest t Bi Big Data Data Center City Operation Traffic Surveillance Healthcare Video Network Public Social Lifes Security Surveillance Video Network: Surveillance Videos: The Key infrastructure of intelligent city More than half of all big data >100K cameras for a middle-size city China T. Huang, "Surveillance Video: The Biggest Big Data," Computing Now, vol. 7, no. 2, Feb. 2014, IEEE Computer Society [online]; http://www.computer.org/web/computingnow/archive/february2014.
BUT, data is far from being analyzed and used “Target rich” data, i.e., the data with especial value, take about 1.5% of the digital universe To obtain such “target rich” data, we need to analyze and mine all the data. At the moment, less than 0.5% of all data is ever analyzed and used
The Stat The Status of Curr us of Current ent Syst Systems: ems: Le Less ss Sma Smart rt Have eyes (i.e., camera) Ca Cannot See See (i.e., Recog. and Search) London Moscow Paris Boston 4
Sur urveil eillan lance ce Vide ideo o An Anal alys ysis is To develop intelligent algorithms, technologies and systems that can detect/recognize/search specific objects (e.g., pedestrian, vehicle), behavior, or events. Enabling Technologies Background modeling Object detection/tracking (e.g., pedestrian, vehicle) Object recognition (e.g., face) Object re-identification and search Action/Behavior detection/recognition (Abnormal) Event detection Crowd analysis Cross-camera tracking … 12
A Challenging Problem How can we search a specific object from massive image or video data? NOT for visually similar object BUT for exactly the same object ID=1 ID=2 ID=3 … … Gallery Query Detection and classification Precise object search
Precise Object Search Task: to search a specific object from a large-scale dataset which contains a set of visually similar objects captured from different camera networks. Search as Similarity Ranking (SaS) Search as Recognition (SaR) Precise person search Precise vehicle search
Example: Det etect ect Fa Fake e Li Lice cense se Pl Plate Tollgate Car Monitoring 2 Car Registry Database Search Engine Car Monitoring 3 Car Monitoring N Honda accord Peugeot 206 Fake Plate
Example: Tr Traci cing ng Sus uspic icious ious Ve Vehicle icle 2014.10.9 10:42:15 2014.10.19 10:36:33 2014.10.19 10:22:32 2014.10.19 10:12:11 2014.10.19 12:42:11 2014.10.19 13:02:18 Search Engine
From Search to Recognition Precise object recognition: The ultimate goal Till to now, none of any recognition technology (including vehicle plate number recognition, face recognition) can achieve sufficiently high precision under an unconstrained environment The success story of Google and Baidu tell us: Search can help, even substitute for in some cases, recognition.
vs. Visual Search The task is aiming to find visually similar objects from a large database through visual similarity measurement and ranking In most cases, the returned objects that are visually similar (e.g., within the same (sub-)category, having the same attributes such as color) are treated as correct Query Returned List ...
Recent Work: Deep Learning for Visual Search Three Schemes Direct Representation Refine with class labels (classification loss) Refining by Similarity Learning Refine with side information (similarity rank loss) Refining by Model Retraining Wan J, Wang D, Hoi S C H, et al. Deep learning for content-based image retrieval: A comprehensive study[C] ACM MM2014
Recent Work: Large-scale Clothes Image Retrieval Cross-domain Image Retrieval Given a user photo depicting a clothing image, the goal is to retrieve the same or attribute-similar clothing items from online shopping stores Dual Attribute-aware Ranking Network 1. Two sub-networks, one for each domain. 2. Feature representations are driven by semantic attribute learning. 3. Learning to rank by triplet visual similarity constraint. Huang, Junshi, et al. "Cross-domain image retrieval with a dual attribute- aware ranking network." ICCV 2015.
Outline Introduction Motivation Challenge Multi-task learning for precise object search 1. Multi-task based person re-identification 2. Multi-task based vehicle search Summary
Challenge Challenge 1: 1: Hard Hard to retriev to retrieval al The he e exp xpon onen entiall tially i y inc ncrea easing sing siz size of of ima image ges s an and video d videos s prese pr esent nts s a a gran and d cha hall llen enge ge to to pa patte ttern n rec ecog ognition nition! ! 2.2B images ~15M classes 150M 2500 12000 2.2B … … Datasize-Recognition Gap 10000 2000 8000 Class Number Image Size 1500 6000 14M images 220K classes 1000 1.2M images 4000 1000 classes 500 60K images, 30K images 2000 100 classes 256 classes 0 0 ImageNet Vehicle Images Caltech-256 ImageNet- CIFAR-100 ILSVRC ’12 in a Province 7
Challeng Challenge e 2: 2: Hard Har d to i to ide dentify ntify Using a unified framework to analysis, recognition and search from images/videos that are captured in an unconstrained environment 1) Huge amount of videos; 2) Different imaging views, illuminations, environmental conditions and image quality; 3) Visual appearance changes of the suspicious person/vehicle; 4) Other factors (e.g., lack of training data) Changchun Car Theft Case London Underground bombings Zhou Kehua Case 23
Challeng Challenge e 2: 2: Har Hard d to i to ide dentify ntify Difficult to distinguish different objects with similar appearance (i.e. vehicles of the same color and model) Camera view, distance, illumination variations Different Same
It is challenging also because It is challenging also because … NOT depend on the strong identification information such as face or vehicle license plate number Face is unavailable in most real-world surveillance cameras Vehicle license plate may be faked Face Image Retrieval Scenario [Li, ICCV2015] How to search given these pictures? ✓ No front face image is available ✓ With some facial makeups ✓ Don’t know he is who ID Face Surveillance Database Face Database
Recommend
More recommend