Recurrent Neural Networks for Person Re-identification Revisited - PowerPoint PPT Presentation

Recurrent Neural Networks for Person Re-identification Revisited Jean-Baptiste Boin André Araujo Bernd Girod Stanford University Google AI Stanford University jbboin@stanford.edu andrearaujo@google.com bgirod@stanford.edu

Person video re-identification ▪ Goal: associate person video tracks from different cameras ▪ Applications: › Video surveillance › Home automation › Crowd dynamics understanding Image credit: PRID2011 dataset [Hirzer et al. , 2011] 2

Person video re-identification: challenges Viewpoint changes Lighting variations Clothing similarity Background clutter and occlusions Credit: iLIDS-VID dataset [Wang et al. , 2014] 3

Framework: re-identification by retrieval Sequence feature extraction Sequence feature extraction Sequence feature Database extraction Sequence (Camera A) matching by Sequence feature extraction feature similarity Sequence feature extraction Sequence feature Query extraction (Camera B) 4

Related work Sequence feature ▪ Most common setup › Frame feature extraction: CNN Mean pooling › Sequence processing: RNN › Temporal pooling: mean pooling RNN RNN RNN › [McLaughlin et al., 2016], [Yan et al., 2016], [Wu et al., 2016] CNN CNN CNN 5

Related work Sequence feature ▪ Most common setup › Frame feature extraction: CNN Mean pooling › Sequence processing: RNN › Temporal pooling: mean pooling RNN RNN RNN › [McLaughlin et al., 2016], [Yan et al., 2016], [Wu et al., 2016] ▪ Extensions CNN CNN CNN › Bi-directional RNNs [Zhang et al., 2017] › Multi-scale + attention pooling [Xu et al., 2017] › Fusion of CNN+RNN features [Chen et al., 2017] See review paper [Zheng et al., 2016] 6

Outline ▪ Feed-forward RNN approximation with similar representational power ▪ New training protocol to leverage multiple video tracks within a mini-batch ▪ Experimental evaluation ▪ Conclusions 7

RNN setup o (t-1) o (t-1) tanh W s f (t) o (t) CNN v s W i o (t+1) o (t) 8

Proposed feed-forward approximation (1/2) ▪ “Short-term dependency” approximation Disregard terms from step (t-2) in output from step (t) 9

Proposed feed-forward approximation (2/2) ▪ “Long sequence” approximation Using approximation from previous slide Disregard edge cases (first and last frame) since videos are long 10

Proposed feed-forward approximation: new block RNN Ours: FNN o (t-1) tanh W s tanh W s õ (t) o (t) f (t) f (t) W i W i o (t) ▪ Same memory footprint ▪ Direct mapping between RNN and FNN parameters 11

Training pipeline ▪ Training data Frames Video tracks Video tracks (camera A) (camera B) 12

Training pipeline: RNN baseline ▪ SEQ : load sequences of consecutive frames in mini-batch Video tracks Video tracks (camera A) (camera B) 13

Proposed FNN training pipeline ▪ FRM : load independent frames ▪ Load images from many more identities in a mini-batch (same memory/computational cost) SEQ (baseline) FRM (ours) 14

Data and experimental protocol ▪ Dataset 1: PRID2011 [Hirzer et al., 2011] › 200 identities, average length: 100 frames / track ▪ Dataset 2: iLIDS-VID [Wang et al., 2014] › 300 identities, average length: 71 frames / track ▪ Data splits › Train/test set with half of the identities each › Performance averaged over 20 splits ▪ Evaluation metric: CMC (equivalent to mean accuracy at rank k) 15

Experiment: Influence of the recurrent connection ▪ Train weights on RNN-SEQ (RNN architecture, SEQ training protocol) ▪ Evaluate on RNN and FNN using the weights directly ( no re-training ) ▪ Same performance obtained PRID2011 dataset 16

Experiment: Comparison with baseline ▪ FNN-FRM (ours) outperforms RNN-SEQ ▪ More diversity in mini-batches allows for a much better training 17

Comparison with baseline (comprehensive) ▪ Our method outperforms the baseline for all ranks in both datasets CMC values (in %) 18

Comparison with state-of-the-art RNN methods ▪ Our method is considerably simpler than the other state-of-the-art RNN methods compared but still achieves comparable performance results CMC values (in %) 19

Conclusions ▪ Simple feed-forward RNN approximation with similar representational power ▪ New training protocol to leverage multiple video sequences within a mini-batch ▪ Results significantly and consistently improved compared to baseline ▪ Results on par or better than other published work based on RNNs, with a much simpler technique ▪ Faster model training compared to RNN baseline 20

Questions?

Recurrent Neural Networks for Person Re-identification Revisited - PowerPoint PPT Presentation

Recurrent Neural Networks for Person Re-identification Revisited Jean-Baptiste Boin Andr Araujo Bernd Girod Stanford University Google AI Stanford University jbboin@stanford.edu andrearaujo@google.com bgirod@stanford.edu Person video

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

The Power of Linear Recurrent Neural Networks Neural Networks Was knnen lineare rekurrente

Recurrent Neural Network Xiaogang Wang xgwang@ee.cuhk.edu.hk February 26, 2019 cuhk Xiaogang

CHAPTER VII VII CHAPTER Learning in Recurrent Networks Learning in Recurrent Networks CHAPTER

Recurrent Neural Networks Greg Mori - CMPT 419/726 Goodfellow, Bengio, and Courville: Deep

Understanding LSTM Networks Recurrent Neural Networks An unrolled recurrent neural network The

CSEP 517: Natural Language Processing Recurrent Neural Networks Autumn 2018 Luke Zettlemoyer

Recurrent Neural Networks CS60010: Deep Learning Abir Das IIT Kharagpur Mar 11, 2020

Person Re-Identification Yiheng Liu Outli line Background Image-Based Person

Computa(on through dynamics Using recurrent neural networks to unveil mechanism in neural

IN5550 Neural Methods in Natural Language Processing Recurrent Neural Networks Stephan Oepen

NLP Programming Tutorial 8 - Recurrent Neural Nets Graham Neubig Nara Institute of Science and

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

ConnectHome Nation Webinar Promising Digital Inclusion Practices in the Time of COVID-19 October

Using Video-Based Measurements to Generate a Real-Time

Detailed Survey Results 4Q 2015 Survey Background Conducted between November 3-23, 2015

Basic Survey Basic Survey: Purpose A survey to obtain data that is to serve as the basis for

ICRI 2018: The RI ecosystem and data: complementarities and synergies How to effectively link

rt t rs

Hoare Logic and Model Checking These slides are heavily based on previous versions by Mike

From Vancouver to Vladivostok, from rotting in-house spaghetti to Drupal Migrating osce.org