WHU_NERCMS at TRECVID2018: INS Dongshu Xu, Longxiang Jiang, Xiaoyu - PowerPoint PPT Presentation

WHU_NERCMS at TRECVID2018: INS Dongshu Xu, Longxiang Jiang, Xiaoyu Chai, Jin Chen, Han Fang, Li Jiao, Jiaqi Li, Shichen Lu, and Chao Liang National Engineering Research Center for Multimedia Software Wuhan university, Wuhan, 430072, China cliang@whu.edu.cn

Category Introduction 1 Our approach 2 Results & conclusions 3

Introduction TRECVID 2018 INS Task  Given person name, example images and shots  Given scene name, example images and shots  Retrieve specific person in specific scene Person Scene Specific person in (Jane) (cafe2) specific scene

Framework MTCNN Face features f_face Reid features f_reid SSD Score fusion Local scene f_local_scene features … Global scene f_global_scene features Ranking list

Local scene retrieval Framework SSD stage1 Initial pedestrian features Input keyframes stage2 Query category Expected results Input image Trained SSD network

Global scene retrieval Places365-CNN The dataset covers 365 image scenes and also provides pre-trained models for multiple network architectures. Network Resnet50 Input images Global features Sort Pretrained places365-CNNS

Training samples of scene retrieval Training Dataset From different objects: From different views: Scene Landmarks cafe Pub 2 Cafe2 laun Laun Datasets production Market Keyframes are labelled with landmarks

Face recognition Face Detection MTCNN Face Alignment Feature Extraction Distance Measure

Face recognition Similarity Face Detection transformation Face Alignment Feature Extraction Distance Measure

Face recognition Face Detection Face-ResNet Res Block(10) Res Block(6) Res Block(4) Face Alignment Feature Extraction Res Block P C Distance Measure

Face recognition Face Detection Cosine distance Face Alignment Feature Extraction Distance Measure

Face recognition Pipeline Gallery set Shot 1 Shot n Topic identity f1 f2 fn Shot 1 map Extended score reference identity Cosine Distance Identity Max has the highest score for i=1:n { processing the i-th shot }

Person re-identification based person search Person search —We apply person re-id technique based on aligned re-id. Query person examples Person rank Aligned Similarity Person Detection Re-id score search (SSD) Aligned Re-id [1] Global Feature (2048-d) [1] X. Zhang, H.Luo, etc. AlignedReID: Surpassing Human-Level Performance in Person Re-Identification. arXiv:1711.08184v2, 2017

Person re-identification based person search How to get training dataset face boundingbox (with id) person boundingbox (with id) k-means 76 7 98 image training set dataset retag person boundingbox (without id) 76 98 7 For example Details of training dataset Number of Number of Number of images ids clusters 2,486,571 194 24864

Person re-identification based person search Visualization results good probe rank list (Top 6) Aligned re-id √ √ √ √ √ √ bad rank list (Top 6) probe Aligned re-id √ × √ × √ √ The reason for the bad query is that the clothes are too similar,

Score fusion  Weight based score fusion topic f_scene false true f_face f

Score fusion  Face filter and person expansion Rank with drop shots without Detected face assign id scene score target person id Ranking list filter Face Library Ranking Detected person Person Library expand expand shots with assign id target person id

Results & conclusions Results Auto Interactive Analysis  The ineffectiveness of reid:  IoU computation  Cluster strategy  The effectiveness of fine-tuning:  Fine-tuned on some scenes

Results & conclusions Conclusions  The face recognition is a key method to identify person. New person search method should be introduced for person images with back and side views or in low resolution  The training dataset of scene model needs more effective images including different views of positive and negative scenes.  Score fusion and expansion method is useful to retrieve hard samples.

A H N K T S

WHU_NERCMS at TRECVID2018: INS Dongshu Xu, Longxiang Jiang, Xiaoyu - PowerPoint PPT Presentation

WHU_NERCMS at TRECVID2018: INS Dongshu Xu, Longxiang Jiang, Xiaoyu Chai, Jin Chen, Han Fang, Li Jiao, Jiaqi Li, Shichen Lu, and Chao Liang National Engineering Research Center for Multimedia Software Wuhan university, Wuhan, 430072, China

INS Task in 2016-present 2016-present: find a specific person in a specific location The

video to text task Jia Chen 1 , Shizhe Chen 2 , Qin Jin 2 , Alexander Hauptmann 1 1 Carnegie Mellon