semi supervised object detector learning from minimal
play

Semi-supervised Object Detector Learning from Minimal Labels Sudeep - PDF document

Semi-supervised Object Detector Learning from Minimal Labels Sudeep Pillai December 12, 2012 Abstract While traditional machine learning approaches to classification involve using a substantial training phase with significant number of training


  1. Semi-supervised Object Detector Learning from Minimal Labels Sudeep Pillai December 12, 2012 Abstract While traditional machine learning approaches to classification involve using a substantial training phase with significant number of training examples, in a semi-supervised setting, the focus is on learning the trends in the data from a limited training set and simultaneously using the trends learned to label unlabeled data. The specific scenario that semi-supervised learning (SSL) focuses on is when the labels are expensive or difficult to obtain. Furthermore, with the availability of a large amounts of unlabeled data, SSL focuses on bootstrapping knowledge from training examples to predict the unlabeled data, and propagating that labeling in a well-formulated manner. This report focuses on a particular semi-supervised learning technique called Graph-based Regularized Least Squares (LapRLS) that can learn from both labeled and unlabeled data as long as the data satisfies a limited set of assumptions. This report compares the performance of traditional supervised learning algorithms against LapRLS and demonstrates that LapRLS outperforms the supervised classifiers on several datasets especially when the number of training examples are minimal. As a particular application, the LapRLS performs considerably well on Caltech-101, an object recognition dataset. This report also focuses on the particular methods used for feature selection and dimensionality reduction to build a robust object detector capable of learning purely from a single training example and a reasonably large set of unlabeled examples. 1 Introduction In a setting where labeled data is hard to find or expensive to attain, we can formulate the notion of learning from the vast amounts of unlabeled instances in the data given a few minimal labels per class instance. Formally, semi- supervised learning addresses this problem by using a large amount of unlabeled data along with labeled data to make better predictions of the class of the unlabeled data. Fundamentally, the goal of semi-supervised classification is to train a classifier f from both the labeled and unlabeled data, such that it is better than the supervised classifier trained on the original labeled data alone. Semi-supervised learning has tremendous practical value in several domains [8] including speech recognition, protein 3D structure prediction, video surveillance etc. In this report, the primary focus is to learn trends from labeled image data that may be readily available from human annotation, or some external source, and using the learned knowledge to label the evergrowing data deluge of unlabeled images on the internet. Particularly, the focus is on utilizing these semi-supervised techniques on Caltech-101 [4], [3], an object recognition dataset while also providing convincing results on toy datasets. 2 Background Before we delve into the details of the motivation and implementation behind semi-supervised learning, it is important to differentiate two distict forms of semi-supervised learning settings. In semi-supervised classification, the training dataset contains some unlabeled data, unlike in the supervised setting. Therefore, there are two distinct goals; one is to predict the labels on future test data, and the other goal is to predict the labels on the unlabeled instances in the training dataset. The former is called inductive semi-supervised learning and the latter transductive learning [9]. 2.1 Inductive semi-supervised learning Given a training example ( x i , y i ) l l + u i =1 , x j j = l +1 , inductive semi-supervised learning learns a function f : X �→ Y so l + u that f is expected to be a good predictor on future data, beyond x j j = l +1 . 1

  2. 2.2 Transductive learning j = l +1 , transductive learning trains a function f : X l + u �→ Y l + u so that f is Given a training example ( x i , y i ) l l + u i =1 , x j l + u expected to be a good redictor on the unlabeled data x j j = l +1 . 2.3 Assumptions While it is reasonable that semi-supervised learning can use additional unlabeled data to learn a better predictor f , the key lies in the model assumptions about the relation between the marginal distribution P ( x ) and the conditional distribution P ( y | x ). Thus it is important to realize that picking any semi-supervised learning technique cannot always perform better than the supervised case even with minimal labels. Plots visualizing the cluster assumption (left column), manifold assumption (middle column), and the clus- Figure 1: ter/manifold assumption (right column) 2.3.1 Cluster & Manifold Assumption Cluster assumption states that points which can be connected via multiple paths through high-density regions are likely to have the same labels. Figure 1 show high density regions in red in the 2 moons dataset where the cluster would be defined by points having multiple pathways between them that belong to a single moon cluster. Manifold assumption dictates that each class lies on a different continuous manifold. In this case of figure 1, each of the moon clusters lies on a different manifold, which is apparent from the figure. Keeping both these assumptions in mind, we can draw the cluster/manifold assumption as follows: Points which can be connected via a path through high density regions on the data manifold are likely to have the same label. In a semi-supervised setting, the idea is to use a regularizer that prefers functions which vary smoothly along the manifold and do not vary in high density regions as depicted in figure 1 (right column). 3 Graph-based Semi-Supervised Learning To motivate the use of graphs in a semi-supervised setting, we refer back to the background section 2. Via the clus- ter/manifold assumption, the learning ensures that we pick a regularizer that prefers functions that are differentiable and vary smoothly along the manifold and that does not vary in high density regions. In order to label points that are similar to each other with the same label, we create a graph where the nodes represent the data points L ∪ U (labeled and unlabeled, this is discussed in further detail in later sub-sections) and the edges represent the similarity measure between data points. Figure 2: The plots depict the motivation for graph-based learning where one can leverage the information from both labeled and unlabeled data 2

Recommend


More recommend