Leveraging local neighborhood topology for large scale person re-identification Svebor Karaman 1 , Giuseppe Lisanti 1 , Andrew D. Bagdanov 2 , Alberto Del Bimbo 1 1 Media Integration and Communication Center (MICC) University of Florence, Florence, Italy { firstname.lastname } @unifi.it, http://www.micc.unifi.it/vim/people 2 Computer Vision Center, Barcelona (CVC) Universitat Aut` onoma de Barcelona bagdanov@cvc.uab.es http://www.cvc.uab.es/LAMP Svebor Karaman et al. (1) Large scale person re-identification July 23, 2013 1 / 12
Person re-identification Problem definition: identify previously seen individuals in one or more images captured from one or more cameras. Important for modern surveillance systems: a way of maintaining identity information about targets in multiple views. Difficult: changes in illumination and pose, occlusions, similarity of appearance, and changes in camera view. Research focus: mostly on features for re-identification (SDALF, CPS, HPE, etc). Recent trend: re-identification from a “learn to rank” point of view (SVR, RPLM, transfer and metric learning, etc). Svebor Karaman et al. (1) Large scale person re-identification July 23, 2013 2 / 12
Laboratory versus realistic scenarios Standard formulation: re-identification in terms of gallery images and probe images to be re-identified. Three standard scenarios: ◮ Single-vs-Single (SvsS): exactly one example image of each person in the gallery and at least one instance of each person in the probe set. ◮ Multi-vs-Multi (MvsM): a group of M examples of each individual in the gallery and a group of M examples of each individual in the probe set. ◮ Multi-vs-Single (MvsS): multiple images of each person are given as groups in the gallery, and exactly one example image of each person in the probe set. Untructured scenario (SvsS) Structured scenario (MvsM) Test Gallery Test Gallery Real-world: many test images, many identities, few labeled samples. Svebor Karaman et al. (1) Large scale person re-identification July 23, 2013 3 / 12
Overview of our approach 3 3 3 1 1 1 2 2 2 (a) (b) (c) Our goal is to bring some of the advantages of structured re-identification to unstructured problems. Linear discriminants are used to weakly separate gallery individuals. A Conditional Random Field (CRF) is built on top of all available images. Inference in the CRF leverages local structure in feature space to enforce labeling consistency. Svebor Karaman et al. (1) Large scale person re-identification July 23, 2013 4 / 12
Linear SVMs for re-identification Linear discriminant model estimated for each person given the gallery images: n 1 2 || w i || 2 + C � ( w i , · , b i ) = arg min ξ j (1) w i , ξ ,b i j =1 subject to δ i l ( g j ) ( w T i g j + b i ) ≥ 1 − ξ j , ∀ i ∈ { 1 , . . . , N } ∀ j ∈ { 1 , . . . , n } , and ξ j ≥ 0 , ∀ j ∈ { 1 , . . . , n } , (2) where δ i j is a modified Kronecker delta function: � 1 if i = j δ i j = (3) − 1 otherwise . The discriminative models ( w i , b i ) are learned only on gallery images and are the only supervised components of our approach Svebor Karaman et al. (1) Large scale person re-identification July 23, 2013 5 / 12
Mapping a re-identification problem on a CRF Unsupervised approach induces a local topology over all images. Combined with discriminative models, becomes a semi-supervised approach. A CRF is defined by a graph G = ( V , E ) and a label set L . We create one vertex in V to represent each image in the re-identification scenario (all gallery and probe images): V = { v 1 , v 2 , . . . , v n + m } . (4) The graph topology is: ◮ defined by the group structure of probe images, if given (MvsM); and ◮ induced by the nearest neighbor topology in feature space using all images. We create an edge ( v i , v j ) if one is in the k-nearest neighbors of the other: E = { ( v i , v j ) : x i ∈ kNN ( x j ) ∨ x j ∈ kNN ( x i ) } . (5) Svebor Karaman et al. (1) Large scale person re-identification July 23, 2013 6 / 12
Mapping a re-identification problem on a CRF Given a hypothetical labeling ˆ y = (ˆ y 1 , . . . , ˆ y | V | ) assigning a label ˆ y i ∈ L to each vertex v i ∈ V in graph G , we define the energy function of ˆ y as: � � E (ˆ y ) = φ i (ˆ y i ) + λ ψ ij (ˆ y i , ˆ y j ) . (6) i ∈V ( v i ,v j ) ∈E The unary data cost φ i (ˆ y i ) is defined using the linear SVM models: y i ) = e − ( w ˆ yi x i + b ˆ yi ) . φ i (ˆ (7) The smoothness cost ψ ij is defined using distances in feature space: || x i − x j || 2 y j ) e − ψ ij (ˆ y i , ˆ y j ) = ψ (ˆ y i , ˆ (8) , σ 2 ◮ σ 2 : variance of the distances between all connected features in the graph; and ◮ ψ (ˆ y i , ˆ y j ) : average distance between gallery images of identity ˆ y i and ˆ y j : 1 � � || g − g ′ || 2 . ψ (ˆ y i , ˆ y j ) = (9) |G ˆ y i ||G ˆ y j | g ′ ∈G ˆ g ∈G ˆ yi yj Svebor Karaman et al. (1) Large scale person re-identification July 23, 2013 7 / 12
Experiments Inference in the CRF using graph cuts. All parameters ( C , λ , k ) estimated from data (see section 3.5). Experiments on standard structured/unstructured re-identification scenarios. Table 1 : Characteristics of re-identification datasets. ETHZ1 ETHZ2 ETHZ3 CAVIAR 3DPeS CMV100 Outdoor Outdoor Outdoor Indoor Outdoor Indoor Environment 1 1 1 2 8 5 Cameras 83 35 28 72 191 100 Identities Min/Avg/Max 7/58/226 6/56/206 5/62/356 10/17/20 2/5/26 7/361/1245 images per person 1003 4 Total images 4857 1961 1762 1220 36171 Average detection size 132 × 60 135 × 63 148 × 66 81 × 34 158 × 74 224 × 75 4 For 3DPeS we have 1003 labeled images and 62 796 unlabeled “anonymous” images. Svebor Karaman et al. (1) Large scale person re-identification July 23, 2013 8 / 12
Structured re-identification scenarios Our method can efficiently solve structured scenarios like MvsM. Comparison with AHPE, CPS, HPE, SDALF and IDINF (see section 4.3 for complete references and results). Table 2 : Comparison with the state-of-the-art on ETHZ and CAVIAR (structured). Structured ETHZ1 ETHZ2 ETHZ3 CAVIAR M = 2 5 10 2 5 10 2 5 10 2 3 5 AHPE - 91 - - 90.6 - - 94 - 7 8 7.5 CPS - 97.7 - - 97.3 - - 98 - - 13 13 HPE 77 84 85 77 79 81 83 86.5 83 - - - SDALF 78 90.2 89.6 85 91.6 89.6 86.5 93.7 89.6 - 8.5 8.3 IDINF 87 92 99 80.7 94.3 95.9 85.3 92.2 96.1 - - - FEAT+CRF 93.5 99.4 99.6 92.3 99.1 100 98.9 100 100 50.7 65.8 85.3 SVM+CRF 95.7 99.5 99.3 93.7 99.4 100 99.6 100 98.6 60.7 76.1 93.2 Svebor Karaman et al. (1) Large scale person re-identification July 23, 2013 9 / 12
Unstructured scenarios Unstructured scenarios are much harder than structured ones. Added: Manifold Ranking (MR- L u ) and Multiple Feature Learning (MFL opt) (see section 4.3 for complete references and results). Table 3 : Comparison with the state-of-the-art on ETHZ and CAVIAR (unstructured). Unstructured ETHZ1 ETHZ2 ETHZ3 CAVIAR M = 1 2 5 10 1 2 5 10 1 2 5 10 1 SDALF 64.8 - - - 64.4 - - - 77 - - - - AHPE - - - - - - - - - - - - 7 CPS - - - - - - - - - - - - 8.5 MR- L u k=4 78.8 - - - 73.7 - - - 85.1 - - - 28.1 MR- L u k=15 78.1 - - - 73.3 - - - 84.8 - - - 27.7 MFL opt. - - - - - - - - - - - - 8.2 IDINF 69.7 83.3 92.2 96.1 65.7 80.4 89.5 89.5 88.1 93.6 98.4 98 - FEAT+CRF 79 89.9 96.9 98.4 76.3 87.8 95 98 85.4 92.9 99.3 99.6 27.1 SVM+CRF 84.9 92.1 97.2 98.2 78.9 89.1 94.8 97 88.3 96.9 99.6 99.5 31.7 Svebor Karaman et al. (1) Large scale person re-identification July 23, 2013 10 / 12
Large-scale unstructured person re-identification We can solve large scale (up to 30K images) re-identification problems. Performance of our approach improves when more images are available. FEAT SVM FEAT+CRF SVM+CRF FEAT SVM FEAT+CRF SVM+CRF M1 M2 M3 T2 T5 T10 T20 T50 T100 0.58 0.65 0.56 0.6 0.54 Accuracy Accuracy 0.55 0.52 0.5 0.5 0.45 0.48 0.4 0 1000 2000 5000 10000 0 1000 5000 10000 Unlabelled images Unlabelled images (a) CMV100: one gallery image per person, (b) 3DPeS: fixed number of gallery images (M), fixed number test images (T), varying varying unlabeled. unlabeled Svebor Karaman et al. (1) Large scale person re-identification July 23, 2013 11 / 12
Discussion Semi-supervised approach combining discriminative models and a CRF model of local feature-space topology to solve re-identification problems. Our approach can efficiently solve structured re-identification problems, but particularly excels in the case of more difficult unstructured re-identification. Our approach performs very well even in cases of re-identification of very many probe images on the basis of very few gallery images. Adding unlabeled increases the performance of our approach. Higher performance with more test images, while standard discriminative models like SVMs see their performance degrade when adding test data. Increasing test data gives a denser sampling of the manifold in feature space. Our approach makes the most out of all available data while existing approaches are usually limited to exploiting only gallery images. Benefits of stronger discriminative models or metric learning but does not come at the cost of setting aside a portion of available data for learning. Svebor Karaman et al. (1) Large scale person re-identification July 23, 2013 12 / 12
Recommend
More recommend