Improving Vision-based Topological Localization by Combining Local and Global Image Features Shuai Yang and Han Wang Nanyang Technological University, Singapore
Introduction • Self-localization is crucial for autonomous systems. • Current solutions: – GPS. Signal may be unavailable at times. Accuracy is also limited. – 3D sensors (laser). Can be costly. – Camera. Visual odometry. Visual SLAM. Appearance-based localization.
Introduction • Appearance-based techniques use image features and a database of previously collected images with known poses . – Given a query image, the position can be recovered by matching this query image against database images. Sunderhauf and Protzel, IROS , 2011 • Image feature types for place representation: – Local image features (e.g., SIFT and SURF). – Extracting local features consists of describing visual information from a patch or neighborhood ( M × M pixels) around a point of interest in the place item. – Can have hundreds of features for a given image. 3
Introduction • Appearance-based techniques use image features and a database of previously collected images with known poses . – Given a query image, the position can be recovered by matching this query image against database images. Sunderhauf and Protzel, IROS , 2011 • Image feature types for place representation: – Global image features (e.g., GIST) incorporate statistics of the overall distribution of the visual information in the scene. – Describe the entire image with a single feature vector. 4
Introduction • Several local image feature based localization systems have been proposed [ FAB-MAP, Zamir’13, Badino’11, Valgren’10, etc.] – Features are local, so robust to occlusion, illumination and clutter. – More discriminative. Many features can be generated for even small objects. – More time consuming. Extraction, description and matching these local features is more time-consuming. – Global information lost, e.g., spatial relationships. Spatial relationships between image features are important in the sense that they provide a kind of ‘linkage’ information between independent image features.
Introduction • Global image feature based systems [ SeqSLAM, Murillo’13, etc] – High efficiency, compact representation, suitable for large dataset. – Global structure covered. – Low invariance to viewpoint changes. The robot does not always follow exactly the same path on its second (live) traversal, but observes the known scenes from a different angle. – Occlusion and illumination sensitive. • Different feature subsets offer complementary information. • Intuitively, a hybrid approach that use both global and local image features to represent locations should perform better.
Proposal • In this work, we propose an approach that localize the mobile robot who follows a previously taken route or almost the same route. • Main difference: – Both global and local image features are used. – Do not simply combine global feature vector with local feature vector, but consider them independently. – A Bayesian framework is used to estimate the probability of the robot position as the robot moves and new observations are acquired.
Local and Global Features Bayesian Compute Local Select Image Input Query Feature Vectors Tracking Filter with Highest Images for Detected (spatial-temporal Probability Interest Points smoothing) Compute Global Feature Vectors for The Entire Image • Local feature: – Exact local feature representation: accurate but has high memory and computational requirements. – Quantized local feature representation: visual bag-of-word model is used. Schematic for representing an image with Bow model.
Local and Global Features Bayesian Compute Local Select Image Input Query Feature Vectors Tracking Filter with Highest Images for Detected (spatial-temporal Probability Interest Points smoothing) Compute Global Feature Vectors for The Entire Image • Local feature: – Term frequency-inverted document frequency (TF-IDF) weighting scheme is used in our bag-of-word model. Schematic for representing an image with Bow model.
Local and Global Features Bayesian Compute Local Select Image Input Query Feature Vectors Tracking Filter with Highest Images for Detected (spatial-temporal Probability Interest Points smoothing) Compute Global Feature Vectors for The Entire Image • Global feature: – Gist-based representation: Model the shape of a scene(dominant spatial structure). – Given an input image, the gist feature is computed by convolving the image with the an oriented filter (Gabor filter) at several different orientations and scales • In this work – we restrict to this combination – Although similar ideas can be applied to other modalities.
Bayesian Bayesian Compute Local Select Image Input Query Feature Vectors Tracking Filter with Highest Images for Detected (spatial-temporal Probability Interest Points smoothing) Compute Global Feature Vectors for The Entire Image Previous Posterior Likelihood State Transition
Bayesian Bayesian Compute Local Select Image Input Query Feature Vectors Tracking Filter with Highest Images for Detected (spatial-temporal Probability Interest Points smoothing) Compute Global Feature Vectors for The Entire Image
Bayesian Bayesian Compute Local Select Image Input Query Feature Vectors Tracking Filter with Highest Images for Detected (spatial-temporal Probability Interest Points smoothing) Compute Global Feature Vectors for The Entire Image
Bayesian Bayesian Compute Local Select Image Input Query Feature Vectors Tracking Filter with Highest Images for Detected (spatial-temporal Probability Interest Points smoothing) Compute Global Feature Vectors for The Entire Image
Bayesian Framework Bayesian Compute Local Select Image Input Query Feature Vectors Tracking Filter with Highest Images for Detected (spatial-temporal Probability Interest Points smoothing) Compute Global Feature Vectors for The Entire Image • Final State Estimation: – The estimated location of the mobile robot at every time step will be that with the largest probability within the set of possible locations.
Experiments • Sequences collected by our mobile robot were used. Robot and path – Video captured from a Ladybug camera. – The route contains a variety of environments (e.g., buildings, trees, openspace, etc). – Two sequences captured in different weather conditions (one as dataset, cloudy. another one as query, sunny). – DGPS as ground truth. Challenging images
Experiments
Experiments
Conclusion • Proposed a mobile robot localization system using both local and global image features. • Bayesian framework is used to consider spatial-temporal connections. • Experimental results show the improvements of the combination.
Questions?
Recommend
More recommend