A Step Towards census From satellite Koel Roychowdhury Simon Jones Colin Arrowsmith Karin Reinke School of Mathematical and Geospatial Sciences RMIT University, Melbourne, Australia
Overview • Introduction – Research Objectives – Study Area – Datasets • Methods • Models • Results • Key Points • Future Research RMIT University Slide 2
INTRODUCTION Slide 3 RMIT University
Introduction » DMSP-OLS night-time images primary source of data for project » DMSP-OLS used for a variety of applications (e.g. environmental sustainability, urban mapping and light pollution etc) » Problem of unavailability of census variables, particularly for small administrative regions » Propose an approach to produce surrogate census metrics at the sub-national level RMIT University Slide 4
Research Objective What is the utility of Average DN and radiance calibrated DMSP-OLS images for accurately predicting Indian census metrics at a sub- national level? Achieve this by: • Investigating statistical relationships between census metrics and information derived from DMSP-OLS images • Development of prediction models • Validation and improvement of models • Application of models to derive prediction maps of census metrics RMIT University Slide 5
Study Area State of Maharashtra, India • Size: > 300,000 km 2 • Population: > 96 million • Urban Population: 41 million in 378 urban centres • Capital City: Mumbai RMIT University Slide 6
Datasets Used • Satellite Images – DMSP-OLS • Average Digital Number (DN) data (2001) • Brightness data (2001) • Census Data – Primary Census Abstract (2001) • Additional Census Data – Maharashtra Development Report (2002) RMIT University Slide 7
METHODS Slide 8 RMIT University
Method Outline DMSP - Census Vector OLS Images Datasets Sampling of Average DN Brightness (24) Districts Annual Annual Composite 2001 Composite 2001 Statistical Testing Intercalibration Census Metrics Mean and (10 / 144) Models Std. Deviation Selected Validation Census data processing DMSP image processing Final Outputs Model development and implementation RMIT University Slide 9
DMSP-OLS Image Processing RMIT University Slide 10
DMSP-OLS Image: Intercalibration • Differences in average DN between satellites • Reference Image: captured by satellite F12 in 1999 over Sicily • Second order regression equation: RMIT University Slide 11
DMSP-OLS Image: Results of Intercalibration • For 2001, images obtained from satellites F14 and F15. RMIT University Slide 12
DMSP-OLS Image: Selection of Average DN image • F15 image has less difference with F12 image after calibration • F152001 image selected for further analyses. RMIT University Slide 13
DMSP-OLS Image: Mean and Standard Deviation Summary Statistic District Highest mean Average DN Raigarh Lowest mean Average DN Gadchiroli Highest SD Average DN Nagpur Lowest SD Average DN Gadchiroli Highest mean brightness Nagpur and Pune (>20 watts/cm2/um) Lowest mean brightness Bhandara (<10 watts/cm2/um) Highest SD brightness Nagpur Lowest SD brightness Bhandara RMIT University Slide 14
Census Data Processing Slide 15 RMIT University
Census Data: Sampling of Districts • Random sampling. • Process of random number generation. • 24 / 35 districts selected randomly, 8 districts withheld for validation. RMIT University Slide 16
Census Data: Statistical Tests Statistical Tests Tests for normal Distribution Bootstrapping and correlation coefficients Histogram and Skewness and Normal – Probability (N-P) Kurtosis Plots RMIT University Slide 17
Statistical Tests: Histogram and N-P plot • Histogram distribution and normal probability plot used to test for assumption of normality • Example shown for percentage of households with access to electricity RMIT University Slide 18
Statistical Tests: Skewness and Kurtosis • Kurtosis and skewness: measures of peakiness and symmetry of the data. • The further the values are from zero, the more normal is the distribution. • Tests to check whether zero is within 95% confidence interval of skewness and kurtosis. RMIT University Slide 19
Statistical Tests: Bootstrapping and Correlation • Bootstrapping used to help overcome problems of limited sample size • 1000 bootstrap samples created Bootstrap distribution of correlation • Correlation coefficients coefficients between population density and average brightness (at 95% confidence) • Examples shown for two Bootstrap distribution of correlation different census variables coefficients between electricity and average brightness and average brightness RMIT University Slide 20
Selection of Census Variables: Common Census Metrics • Demographic variables (e.g. population density) • Economic variables (e.g. Per Capita District Domestic Product) • Social variables (e.g. percentage of households with cars, jeeps and vans) RMIT University Slide 21
Final List of Census Metrics • Number of households per square kilometre • Total population density • Urban population density • Female literates per square kilometre • Total number of workers per square kilometre. • Percentage of households with car, jeep and van • Percentage of households with access to electricity as power source • Percentage of households with television • Percentage of permanent houses • Per Capita District Domestic Product RMIT University Slide 22
Models Slide 23 RMIT University
Models: Simple Linear Regression • Models developed using single independent variables obtained from DMSP-OLS images • The independent variables used include: – Mean brightness – Standard deviation of brightness – Mean average DN – Standard deviation of average DN RMIT University Slide 24
Simple Regression Models Urban population density per square Kilometre b) Standard Deviation Brightness a) Mean Brightness r 2 = 0.79, p<0.05 r 2 = 0.93, p<0.05 Predicted Values d) Standard Deviation Average DN c) Mean Average DN r 2 = 0.42, p<0.05 r 2 = 0.68, p<0.05 Observed values RMIT University Slide 25
Validation: Simple Regression Models 25% Slide 26 RMIT University
Models: Multiple Linear Regression Both mean and standard deviations considered together in each model 6 types of models examined: – 2 models: Mean and standard deviation of brightness (with and without intercept) – 2 models: Mean and standard deviation of average DN (with and without intercept) – 2 models: all 4 independent variables (with and without intercept). RMIT University Slide 27
Multiple Linear Regression Models with intercept a) Mean and SD average DN r 2 = 0.71, p<0.05 Urban population density per square Kilometre Predicted Values b) Mean and SD Brightness c) Mean and SD average DN, and brightness r 2 = 0.92, p<0.05 r 2 = 0.92, p<0.05 Observed values RMIT University Slide 28
Multiple Linear Regression Models without intercept a) Mean and SD average DN r 2 = 0.68, p<0.05 Urban population density per square Kilometre Predicted Values c) Mean and SD average DN and brightness b) Mean and SD Brightness r 2 = 0.92, p<0.05 r 2 = 0.90, p<0.05 Observed values RMIT University Slide 29
Validation: Multiple Linear Regression Models RMIT University Slide 30
Accepted Models • Two models with the least errors were chosen at the districts level – Model with mean and SD of brightness – Model with all the four variables with intercept • Sample equations: = -108.45+16.93x( mean brightness)- 2.48x (mean Average DN) × − × 2 Expected Urban Population / Km 20 . 9849033 ( mean brightness ) 22 . 039980 ( Mean average DN ) + × 4 . 505353 ( SD AverageDN ) = − + × 2 ExpectedUr banPopulat ion / Km 119 . 12820 15 . 5740684 ( mean brightness ) RMIT University Slide 31
Final Outputs Slide 32 RMIT University
Spatial Implementation of the Models In addition to creating maps of predicted census metric values, maps showing the error between actual census data and the predicted data can also be generated RMIT University Slide 33
Some other sample maps (b) (a) RMIT University Slide 34
Summary
Key points • Annual composite of brightness image was better for prediction of census metrics at sub-national level. • Variables with absolute normal distribution over the districts (such as sex ratio and education facilities per square Kilometre) do not have significant correlations with either brightness or average DN. • Urban population density and PCDDP had higher correlations but produced maximum errors in predicted values. • Percentage of households with access to electricity had lower correlation coefficients but was always predicted with least error. RMIT University Slide 36
Recommend
More recommend