Feature Representation in Person Re-identification Hong Chang - PowerPoint PPT Presentation

Feature Representation in Person Re-identification Hong Chang Institute of Computing Technology Chinese Academy of Sciences 2020.1

Contents  Feature representation in person Re-ID – Related recent works  Learning features with – High robustness – High discriminativeness – Low information loss/redundancy  Discussions 2

Person Re-identification  The problem ？  Main challenges pose scale occlusion illumination 3

Feature Representation & Metric Learning  The work flow of person Re-ID Camera A Feature Detection Image/Video representation Metric results learning Feature Image/Video Detection representation Camera B  Two key components – Feature representation – Metric learning 4

Recent Works in Feature Representation  For images: traditional deep feature feature (a) global local hard adaptive part part part detection [1-3] [4-6] [7-10] – Better person part alignment (b) – Weaknesses: part detection loss, extra computation, etc. – Unsolved problems: (a) discriminative region? (b) occlusion? 5

Recent Works in Feature Representation  For videos: image set spatial-temporal feature feature [11-13] low-order high-order information information [14] recurrent network, non-local 3D convolution [14-16] [16] – Unsolved problems: (a) disturbance? (b) occlusion? (b) (a) 6

Feature Representation for Person Re-ID Discriminativeness (towards disturbance & occlusion) Cross-attention network Occlusion recovery Interaction- Existing aggregation feature representation Robustness (towards pose & scale changes) Knowledge propagation Completeness (low information loss) 7

Interaction-Aggregation Feature Representation  To deal with pose and scale changes pose scale  Main idea: – Unsupervised, Light weight – Semantic similarity 9

Interaction-Aggregation Feature Representation  Spatial IA – adaptively determines the receptive fields according to the input person pose and scale – Interaction: models the relations between spatial features to generate a semantic relation map 𝑇 . – Aggregation: aggregates semantically related features across different positions based on 𝑇 . 10

Interaction-Aggregation Feature Representation  Channel IA – selectively aggregates channel features to enhance the feature representation, especially for small scale visual cues – Interaction: models the relations between channel features to generate a semantic relation map C . – Aggregation based on relation map C 11

Interaction-Aggregation Feature Representation  Overall model – IANet: CNN with IA modules – Extension: spatial-temporal context IA 12

Interaction-Aggregation Feature Representation  Visualization results – receptive fields: sub-relation maps with high relation values – SIA can adaptively localize the body parts and visual attributes under various poses and scales. Images receptive fields Images receptive fields 13

Interaction-Aggregation Feature Representation  Visualization for pose and scale robustness  Quantitative results Ablation study Market-1501&DukeMTMC G : global feature P : part feature MS : multi-scale feature [17] R. Hou, B. Ma, H. Chang, X. Gu, S. Shan, and X. Chen. Interaction-and-aggregation network for person re- identification, in CVPR, 2019. 14

Feature Representation for Person Re-ID Discriminativeness (towards disturbance & occlusion) Cross-attention Occlusion recovery Interaction- Existing aggregation feature representation Robustness (towards pose & scale changes) Knowledge propagation Completeness (low information loss) 15

Cross-Attention Feature Representation  Motivation: to localize the relevant regions and generate more discriminative features – Person re-identification – Few-shot classification  Main idea: utilizing semantic relations meta-learns where to focus on! 16

Cross-Attention Feature Representation  Cross-attention module – highlights the relevant regions and generate more discriminative feature pairs – Correlation Layer: calculate a correlation map 𝑆 ∈ ℝ ℎ×𝑥 × ℎ×𝑥 between support feature 𝑄 and query feature 𝑅 . It denotes the semantic relevance between each spatial position of 𝑄, 𝑅. 17

Cross-Attention Feature Representation  Cross-attention module – Fusion Layer: generate the attention map pairs 𝐵 𝑞 𝐵 𝑟 ∈ ℝ ℎ×𝑥 based on the corresponding correlation maps 𝑆 .  The kernel 𝑥 fuses the correlation vector into an attention scalar.  The kernel 𝑥 should draw attention to the target object.  A meta fusion layer is designed to generate the kernel 𝑥 . 18

Cross-Attention Feature Representation  Experiments on few-shot classification – state-of-the-art on miniImageNet and tieredImageNet datasets O : Optimization-based P : Parameter-generating M : Metric-learning T : Transductive [18] R. Hou, H. Chang, B. Ma, S. Shan, and X. Chen. Cross Attention Network for Few-shot Classification. 19 In NeurIPS, 2019.

Temporal Knowledge Propagation  Image-to-video Re-ID – Image lacks temporal information – Information asymmetry increases matching difficulty  Our solution: temporal knowledge propagation 21

Temporal Knowledge Propagation  The framework – Propagation via cross sample – Propagation via features: distances: – Integrated Triplet Loss: 22

Temporal Knowledge Propagation  Testing pipeline of I2V Re_ID – SAT: spatial average pooling – TAP: temporal average pooling 23

Temporal Knowledge Propagation  Visualization – The learned image features focus on more foreground – More consistent feature distributions of two modalities 24

Temporal Knowledge Propagation  Experimental results Comparison among I2I, I2V and V2V ReID [19] X. Gu, B. Ma, H. Chang, S. Shan, X. Chen, Temporal Knowledge Propagation for Image-to-Video Person Re-identification. In ICCV, 2019. 25

Occlusion-free Video Re-ID  Occlusion problem  information loss  Our solution: explicitly recover the appearance of the occluded parts  Method overview – Similarity scoring mechanism: locate the occluded parts – STCnet: recover the appearance of the occluded parts 27

Occlusion-free Video Re-ID  Spatial-Temporal Completion network (STCnet) – Spatial Structure Generator: make a coarse prediction for occluded parts conditioned on the visible parts – Temporal Attention Generator: refine the occluded contents with temporal information – Discriminator: real or not? – ID Guider: classification target 28

Occlusion-free Video Re-ID  Visualization results  Quantitative results MARS Ablation study [20] R. Hou, B. Ma, H. Chang, X. Gu, S. Shan, and X. Chen, VRSTC: Occlusion-free video person re-identification, in CVPR, 2019. 29

Discussions Discriminativeness  As for our (towards disturbance & occlusion) methods …  meta-attended discriminative regions Cross-attention  good generalization ability network  necessarity? Occlusion  extension to ST context  redundancy recovery Interaction- Existing for video?  plug-in CNNs aggregation feature representation Robustness (towards pose & scale changes) Knowledge  lead in temporal information propagation  from videos to images Completeness Completeness (low information loss) (low information loss & redundancy) 30

Discussions  Limitations in feature representation learning – For images, the discriminative ability is upper bounded  Appearance { 𝑦 1 , 𝑦 2 , …, 𝑦 𝑛 }  Identity 𝑧  Large appearance variation & little relation with identity, e.g., the same person with different clothes or accessories  Application: short term, restricted regions – For videos, more discriminative spatial temporal features are required  Key: temporal information representation  Other information: trajectory, other spatial temporal references  Application: more real-world scenarios 31

Feature Representation in Person Re-identification Hong Chang - PowerPoint PPT Presentation

Feature Representation in Person Re-identification Hong Chang Institute of Computing Technology Chinese Academy of Sciences 2020.1 Contents Feature representation in person Re-ID Related recent works Learning features with High

Decision Tree Prof. Seungchul Lee Industrial AI Lab. Feature Test Feature 1 Feature 2 Feature

A Distinctive Feature of A Distinctive Feature of A Distinctive Feature of A Distinctive Feature

Outline Reducing Dimensionality Feature Selection 1 Steven J Zeil Feature Extraction 2

Neural representation of linguistic feature Neural representation of linguistic feature hierarchy

Intro to Feature Representation in Virtual Screening Shengchao Liu, Gitter Group Feature

Feature Representation Vision BoWs and Beyond Praveen Krishnan Feature Representation in

Earth: The Feature Presentation - feature, landscape, topography Earth: The Feature Presentation

Reducing Dimensionality Steven J Zeil Old Dominion Univ. Fall 2010 1 Feature Selection

Smart Cameras Mark DiVelbiss, Selena Grant, Qing Liu Overview - First Person vs Third Person -

Feature Extraction 7-1 Ronald Peikert SciVis 2007 - Feature Extraction What are features?

Feature Structures, Unification Some grammatical phenomena Linguistic features Feature

Feature Point Feature-based approach: Detect and match feature Detec.on and Matching points

Week 3 Video 4 Automated Feature Generation Automated Feature Selection Automated Feature

CS4495/6495 Introduction to Computer Vision 4B-L2 Matching feature points (a little) Feature

Using Data Fusion and Web Mining to Support Feature Location in Software SEMERU Feature: a

Feature Extraction 7-1 Ronald Peikert SciVis 2008 - Feature Extraction What are features?

Synthetic Occlusion Augmentation wit ith Volumetric Heatmaps fo for r 3D Human Pose Esti

Point-Based Global Illumination for Movie Production Per Christensen Pixar Animation Studios

Fast Multiple-baseline Stereo with Occlusion Marc-Antoine Drouin Martin Trudeau S ebastien

3D Photography: Stereo Vision Kalin Kolev, Marc Pollefeys Spring 2013

Learning Optical Flow with Limited Data Jia Xu ( ) T encent AI Lab 2019-03-14 1

Depth from Stereo Sanja Fidler CSC420: Intro to Image Understanding 1 / 12 Depth from Two

Person re-identification by Local Maximal Occurrence representation and metric learning Liao

Image Motion COMPSCI 527 Computer Vision COMPSCI 527 Computer Vision Image Motion 1 /