smooth proxy anchor loss for noisy metric learning
play

Smooth Proxy-Anchor Loss for Noisy Metric Learning Carlos Roig - PDF document

Smooth Proxy-Anchor Loss for Noisy Metric Learning Carlos Roig David Varas Issey Masuda Juan Carlos Riveiro Elisenda Bou-Balust Vilynx { carlos,david.varas,issey,eli } @vilynx.com Abstract Many industrial applications use Metric Learning as


  1. Smooth Proxy-Anchor Loss for Noisy Metric Learning Carlos Roig David Varas Issey Masuda Juan Carlos Riveiro Elisenda Bou-Balust Vilynx { carlos,david.varas,issey,eli } @vilynx.com Abstract Many industrial applications use Metric Learning as a way to circumvent scalability issues when designing sys- tems with a high number of classes. Because of this, this field of research is attracting a lot of interest from the aca- demic and non-academic communities. Such industrial ap- plications require large-scale datasets, which are usually generated with web data and, as a result, often contain a high number of noisy labels. While Metric Learning sys- tems are sensitive to noisy labels, this is usually not tackled in the literature, that relies on manually annotated datasets. In this work, we propose a Metric Learning method that is able to overcome the presence of noisy labels using our novel Smooth Proxy-Anchor Loss. We also present an archi- tecture that uses the aforementioned loss with a two-phase Figure 1. Smooth Proxy-Anchor Method. 1) Noisy labels. 2) Sam- learning procedure. First, we train a confidence module ples are relabeled according to class confidences. 3) Then, the that computes sample class confidences. Second, these con- proxies are selected for the relabeled samples. 4) And, finally, the fidences are used to weight the influence of each sample for loss is smoothly weighted by the class confidences. the training of the embeddings. This results in a system that is able to provide robust sample embeddings. We compare the performance of the described method The field of research focused on these topics is called with current state-of-the-art Metric Learning losses (proxy- metric learning, and it can be separated in two big categories based and pair-based), when trained with a dataset con- according with the strategy used when computing the loss. taining noisy labels. The results showcase an improvement Namely, this groups are proxy -based and pair -based losses. of 2.63 and 3.29 in Recall@1 with respect to MultiSimi- The most important difference between proxy and pair- larity and Proxy-Anchor Loss respectively, proving that our based losses is that proxy-based losses use embeddings method outperforms the state-of-the-art of Metric Learning placeholders as class representatives when computing sim- in noisy labeling conditions. ilarities between samples, while pair-based losses compute similarity metrics from the vectorial representations of spe- cific samples. Pair-based methods are more sensitive to 1. Introduction noisy samples than proxy-based ones, since the proxy is able to ameliorate the effect of such samples. Therefore, Recent deep learning applications use a semantic dis- proxy-based methods are more robust to noise. tance metric, which enables applications such as face ver- ification [17, 4], person re-identification [1, 27], few-shot There is a lack of literature analyzing the performance learning [16, 21], content-based image retrieval [14, 19, 18] of proxy-based methods in noisy datasets. Most of research or representation learning [28, 14]. These type of applica- work usually tackles proxy-based losses in manually cu- tions rely on vectorial spaces (embedding spaces), which rated datasets such as CUB-200-2011 [25], Cars-196 [11] or are generated with the objective of gathering together the Stanford Online Products (SOP) [20]. While this is some- samples of the same classes while distancing themselves times possible in research, the generation of manually cu- from the ones of other classes. rated datasets in the industry is usually to expensive or un- 1

  2. feasible. In fact, many datasets are generated by crawling Contrastive loss, only pairs of samples are used. Instead, data from the Internet or other sources, resulting in a high the Triplet loss uses three samples, two from the same class number of noisy samples. (anchor and positive) and one from another class (negative). In this work we analyze the effect of noise samples in N-pair loss [18] and Lifted Structure loss [19] generalized proxy-based systems, and propose an approach that is not the Triplet loss to a pair of samples from the same class only able to cope with noise, but in fact can use these against many negative ones, by pulling together the positive samples to improve the performance of a metric learning pair and pushing away all the negatives. These methods do system. This is achieved by modifying the state-of-the- not use the complete information in a batch, because they art Proxy-Anchor loss [10] method. Our approach extends select a predefined number of samples for each pair. To the loss function by using class probabilities, which are overcome this limitation, Ranked List loss [24] proposes a used for proxy selection and sample contribution weight- method that takes into account all the samples in a batch by ing. This is implemented by adding a confidence module trying to separate the positive and negative samples. Multi- to the embedding network, which computes the individual Similarity loss [23] goes one step further by weighting the class probabilities by using a multi-class classification ob- influence of each pair in the batch, trying to focus on more jective. In order to generalize to unseen classes, this con- useful pairs and resulting on a performance and speed gain. fidence module is removed at inference time. Therefore, it The complexity of pair-based methods grow with the num- is only used in training stages of the system, resulting in a ber of tuples of data considered in a batch. Different works network with neither computational nor size overhead. The [17, 26] analyze the impact of having a big number of tu- resulting modified loss is called Smooth Proxy-Anchor Loss . ples, leading to a performance reduction. To solve the in- This results in a method that has the benefits of proxy-based crement in tuples, different mining approaches have been losses while being able to benefit from noisy samples. At proposed in the literature [17, 26, 6], reducing the complex- the same time, if a sample can not be matched to any class, ity of the problem and improving the performance of the it is used to improve the embedding space by separating it generated embeddings. from other classes. 2.2. Proxy-based Loss The main contribution of this work consists on a novel proxy-based loss for metric learning that makes use of noisy Proxy-based methods emerged with Proxy-NCA loss samples to improve the performance of the system. This [14] in order to cope with the complexity problem. By gen- is done by modifying a state-of-the-art proxy-based loss erating a proxy embedding for each of the classes in the and smoothly weighting the individual contribution of each dataset, these methods compute the distance between the sample to the loss, resulting in the Smooth Proxy-Anchor samples in the batch and the proxies. This reduces the com- Loss. Moreover, we introduce a training architecture that plexity (lowers the number of possible combinations) and is uses two branches to reduce the impact of noisy samples, more robust to noisy samples. Afterwards, SoftTriple was trained with the aforementioned loss. introduced as a modification of SoftMax loss for classifi- The rest of the paper is structured as follows: First, in cation, which works on a similar way than Proxy-NCA but Section 2, we review previous works in the area of metric assigns multiple proxies for each class, resulting in a bet- learning. Second, in Section 3, we introduce our Smooth ter intra-class representation. More recently, Proxy-Anchor Proxy-Anchor loss and we propose a system that benefits loss has been presented as a proxy-based loss that takes ad- from noisy samples. Then in Section 4, we review the ex- vantage from pair-based methods. To do so, the loss takes periments that have been performed to assess our system. into account all the samples of the batch and weights them Finally, conclusions are presented in Section 5. based on their distance to the proxies. 2. Related Work 2.3. Noisy Metric Learning In this section we review different types of losses for Proxy-based and pair-based losses are usually analyzed metric learning, starting with those pair-based. Then, we in manually curated datasets. In fact, there has been very review the literature about proxy-based losses. We end this few research on how noise affects metric learning losses. section reviewing other works that are related with noisy In [22] an Expectation-Maximization algorithm based on metric learning. Neighbourhood Component Analysis (NCA) is proposed to overcome the label noise present on datasets. Similarly, 2.1. Pair-based Loss [9] formulates the metric learning task as a combinatorial Contrastive [2] and Triplet loss [17] have been key in optimization problem based on smooth optimization [15]. the development of the field of metric learning. The objec- The approaches proposed are oriented towards very small tive of both losses is to group together samples of the same datasets (single class, 600 images), and are not suitable for class, while pushing away samples of other classes. For the larger datasets. 2

Recommend


More recommend