Using Openstreetmap crowdsourced data and La Landsat im imagery - PowerPoint PPT Presentation

Using Openstreetmap crowdsourced data and La Landsat im imagery for la land cover mapping in in th the La Laguna de Bay area of f th the Philippines Brian A. Johnson*, Kotaro lizuka, lsao Endo, Damasa B. Magcale-Macandog, Milben Bragais *Institute for Global Environmental Strategies (Hayama, Japan) Institute for Sustainable Humanosphere, Kyoto University (Kyoto, Japan) University of the Philippines Los Banos (Los Banos, Philippines)

What is crowdsourced geo-data? • Geographic data provided by private citizens rather than government agencies. • Examples • OpenStreetMap: free base map data on roads, land use, buildings, etc. • users digitize points/lines/polygons onto georeferenced satellite imagery (Bing Maps imagery) or upload GPS data taken in the field • Largest source of crowdsourced geo-data • GeoWiki: global land cover validation data • users label the land cover at random locations based on interpretation of high-res images. • Flickr (geotagged photos) • users upload georeferenced photos with tags

Potential uses of crowdsourced data for land cover mapping • For accuracy assessment of land cover maps (e.g. GeoWiki) • For extracting training data. • Benefit: land cover mapping can be done very quickly (no need to collect training data). • Challenge: the data contains various types of errors. • User errors: volunteer mislabels polygon or digitizes inaccurate boundary • Image errors: image not accurately georegistered or image outdated

Research questions • What classification methods can handle the noisy training data extracted using OpenStreetMap (OSM) • “ landuse ” and “natural” polygon layers used in this study • What level of classification accuracy can be achieved using this extracted training data? What is new? • Other studies have used OSM for image classification, but they manually filtered the OSM data first to remove any errors (very time consuming). We try to use the noisy data without manual filtering.

Study area and data • Study area: Lake Laguna Manila • Largest lake in the Philippines • Important water source for millions of people • OSM data: “ landuse ” and “natural” polygon layers • Image data: Landsat NDVI time-series data from 2014- 2015 2014 2015 6-Jan 28-Apr 5-Oct 9-Jan 15-Apr 22-Jan 14-May 21-Oct 25-Jan 1-May 7-Feb 30-May 6-Nov 10-Feb 17-May 23-Feb 15-Jun 22-Nov 26-Feb 2-Jun 27-Mar 1-Jul 14-Mar 20-Jul 12-Apr 18-Aug 30-Mar

Extracting training data from Landsat images • OSM classes converted to 6 land cover classes . Aggregated to 4 OSM class LULC class classes after commercial impervious classification. residential impervious • OSM polygons split retail impervious 50/50 to generate industrial impervious training/validation forest tree data sets. orchard  tree orchard farm Farm -> other vegetation • Sample pixels grass other vegetation (~10,000) extracted meadow other vegetation from within training water water polygons • 300 points generated inside validation Final classes: impervious, polygons, manually tree, other vegetation, water labelled using Google Earth images from 2014-2015.

Common errors in extracted training data • (a) pixels representing “impervious” land cover, extracted from “industrial” OSM class, contain vegetation. (class conversion error) • (b) Inaccurate boundary of a farm in the OSM data (geolocation error in the Bing maps imagery)

Workflow OpenStreetMap OpenStreetMap Landsat “ landuse ” polygon “natural” polygon satellite layer layer images Merge Overlay Convert to land cover classes OpenStreetMap combined layer Training pixels extracted by OpenStreetMap combined layer Classify Classified LULC map

Image classification • 3 noise-tolerant algorithms tested for classification • C4.5 (decision tree) • Naïve bayes (probabilistic) • Random forest (ensemble decision tree) • Synthetic minority class over-sampling technique (SMOTE) used to balance training data. • High class imbalance in training data set due to different number/size of OSM polygons for each land cover class (classes with larger coverage have more training pixels). • Example: Forest =7431 training pixels, water = 205 training pixels • SMOTE generates artificial training samples in the feature space between training pixels to ensure classes have equal # of training samples.

Classification accuracies OA (four-class Classification algorithm • NB and SMOTE-RF had highest overall system) Naïve bayes (NB) 81.3% accuracies (OA). C4.5 66.0% • NB more accurate for “tree” class (class with Random forest (RF) 80.3% most validation samples), but SMOTE-RF SMOTE-NB 80.0% more accurate for all other classes. SMOTE-C4.5 71.3% SMOTE-RF 84.0% NB SMOTE-RF True LULC True LULC UA UA T V W Sum I T V W Sum I (%) (%) I 33 1 6 0 40 82.5 I 38 1 1 0 40 95.0 T 0 112 13 0 125 89.6 T 1 104 20 0 125 83.2 Classified V 5 21 62 1 89 69.7 V 5 12 72 0 89 80.9 W 3 2 4 37 46 80.4 W 3 2 3 38 46 82.6 Sum 41 136 85 38 300 Sum 47 119 96 38 300 PA (%) 80.5 82.4 72.9 97.4 PA (%) 80.9 87.4 75.0 100 OA (%) 81.3 OA (%) 84.0 I = “impervious”, T = “tree”, V = “other vegetation”, W = “water””

Visual comparison of classification results NB classification Landsat composite • NB overestimated impervious area, but better at discriminating between trees and other vegetation. • C4.5 performed worst and produced noisy result. • Random forest performed best for impervious class, but some confusion between trees and other vegetation SMOTE-C4.5 SMOTE-RF

Conclusions • Naïve bayes and random forest classifiers could produce moderately accurate (>80% OA) land cover maps using training pixels extracted automatically from OpenStreetMap layers. • Possibly lower accuracy than if training data was gathered the traditional way (due to errors in the OSM-extracted training data), but faster and more automated. • May be useful if budget or time is limited • SMOTE could overcome some of the impacts of class imbalance in the training data, particularly for C4.5 and RF algorithms.

Future work • Test additional classification algorithms • Evaluate different filtering methods to automatically identify and remove errors in the OSM-extracted training data. Thank you for your attention!!! *Funding provided by Climate Change Resilient Low Carbon Society Network (CCR-LCSNet)", Japanese Ministry of the Environment. Sunset view from IGES.

Using Openstreetmap crowdsourced data and La Landsat im imagery - PowerPoint PPT Presentation

Using Openstreetmap crowdsourced data and La Landsat im imagery for la land cover mapping in in th the La Laguna de Bay area of f th the Philippines Brian A. Johnson*, Kotaro lizuka, lsao Endo, Damasa B. Magcale-Macandog, Milben Bragais

SPECIFIC SEARCH OF CROWDSOURCED OPENSTREETMAP DATASET AND WIKI Prof. Stefan Keller and Michel

C ontinuity of the Web Enabled Landsat Data (WELD) Product Record in the Landsat 8 Era D avid

OpenStreetMap and Wikimedia: A quick overview State of the Map 2018 Eugene Alvin Villar

OpenStreetMap Mapathon Mapping pedestrian crossings with Pic4Review 1 Rally De Leon Data Team

routing.openstreetmap.de Michael Spreng State of the Map Milano July 29, 2018 What is

OSM Data Processing with PostgreSQL / PostGIS Jochen Topf jochentopf.com OpenStreetMap

OpenStreetMap Workshop Wikimania 2015 Katie Filbert - @filbertkm / @hotosm

RaceMob: Crowdsourced Data Race Detec,on Baris Kasikci, Cris,an

ohsome Comprehensive OpenStreetMap History Data Analyses - for and with the OSM community

Humanitarian OpenStreetMap Team Wikimania 2014, London Katie Filbert - @filbertkm / @hotosm

OpenStreetMap My Business SotM 2018 Milano - July 29th sfkeller@hsr.ch Intro OpenStreetMap My

Using Crowdsourced Data and Open Source Tools in Government Michael Schnuerle, Chief Data Officer

2D and 3D visualization of OpenStreetMap Data Candan Eyll Kilsedar 1 , Jakub Balhar 2 , Maria

Areas-of-Interest for OpenStreetMap with Big Spatial Data Analytics SotM 2018 Milano - July 29th

Crowdsourced IoT Data Modeling Friederike Groschupp Final Talk for Bachelors Thesis Advisors:

Humanitarian OpenStreetMap Team Open Knowledge Festival 2014 Berlin Katie Filbert - @filbertkm

Integration of Crowdsourced Data into Automated Traffic Signal Performance Measures (ATSPMs)

Truth Discovery for Spatio-Temporal Events from Crowdsourced Data Daniel Garca Ulloa, Li

Crowdsourced Classification with XOR Queries: An Algorithm with Optimal Sample Complexity

Using Crowdsourced Data for Real-Time Operations: Identifying Issues in Rural Utah during Holiday

CORPUS CREATION FOR NEW GENRES: A Crowdsourced Approach to PP Attachment Mukund Jha, Jacob

PROPOSITION 1 PROPOSITION 1 Map tiles by CartoDB, under CC BY 3.0. Data by OpenStreetMap, under

RaceMob: Crowdsourced Data Race Detection Baris Kasikci, Cristian Zamfir, George Candea

Kinect@Home: Crowdsourced RGB-D data Rasmus Gransson, Alper Aydemir and Patric Jensfelt