How to make R, PostGIS and QGis cooperate for statistical modelling duties How to make R, PostGIS and QGis cooperate for statistical modelling duties a case study on hedonic regressions Olivier Bonin – UPE IFSTTAR LVMT OGRS 2012
How to make R, PostGIS and QGis cooperate for statistical modelling duties Modelling requirements Hedonic models In an hedonic model (Rosen, 1974), the price of a product depends on a vector of its characteristics. When applied to housing, three kinds of characteristics must be taken into account (Kain and Quigley, 1970). � � � p i = α j x ij + β j y ij + γ j z ij + ε i with x ij the structural characteristics, y ij the neighbourhood characteristics, z ij the market location characteristics, and ε i a Gaussian error term.
How to make R, PostGIS and QGis cooperate for statistical modelling duties Modelling requirements Data Statistical data (tabular) as well as geographical data: several hundred thousands of records of residential property transactions coordinates of housing locations several GIS layers to compute the spatial characteristics of the dwellings: road networks and public transit networks, location of employment centers, of amenities etc. Construction of x ij from the tabular data (property transactions database). Construction of y ij and z ij from spatial analysis
How to make R, PostGIS and QGis cooperate for statistical modelling duties Modelling requirements Spatial dimension of the problem Main difficulty of the spatial analysis: the size of the property transaction database. Necessity to visualize the error term of models to check for spatial auto correlation. Necessity to produce maps on zones rather than on points representing the locations of dwellings.
How to make R, PostGIS and QGis cooperate for statistical modelling duties Software setup Statistical software R is the obvious choice. (R Development Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, http://www.R-project.org, 2009) Extensive set of libraries allowing advanced modelling techniques such as spatial regressions or multi-level modelling (used in Bonin, 2009). Existing connectors with PostGIS and QGis.
How to make R, PostGIS and QGis cooperate for statistical modelling duties Software setup Spatial analysis software GIS or spatially-aware RDBMS? GIS and spatially-aware RDBMS? PostGIS proved to be necessary, because of the large amount of data to handle, and of the need for a spatial index. A GIS software seemed to be useful for data visualization; QGis was selected because of its native ability to connect both to PostGIS and to QGis.
How to make R, PostGIS and QGis cooperate for statistical modelling duties Software setup How to connect R, QGis and PostGIS? It is theoretically possible to connect the three pieces of software, with bi-directional connexions. Is it simple? Efficient? Useful? Required? RODBC RDbi PostGIS R native "add PostGIS layer..." manageR spqr QGis
How to make R, PostGIS and QGis cooperate for statistical modelling duties Software setup R – PostGIS connexion PostGIS on a GNU/Linux server. R on GNU/Linux, Mac OS X and Windows clients. RODBC: “straightforward” solution (directly available at CRAN), but depends on ODBC (open source solution on Mac OS X if you like to use the Terminal and to compile libraries), and very slooooow. Rdbi + RdbiPgSQL: hosted on BioConductor, outdated website, but very good performance.
How to make R, PostGIS and QGis cooperate for statistical modelling duties Example Mapping data in QGis Rapid transit network and housing locations in the Ile-de-France region (source: notaries and STIF).
How to make R, PostGIS and QGis cooperate for statistical modelling duties Example Modelling in R Import from PostGIS into R: a few tens of seconds to load 125,000 records with 87 columns. Linear mixed-effects model fit by REML Data: bien Subset: condAP AIC BIC logLik 568916.1 569206.3 -284427.1 Random effects: Formula: ~1 | dep (Intercept) StdDev: 5.263306 Formula: ~1 | code_commn %in% dep (Intercept) Residual StdDev: 4.043686 6.575219
How to make R, PostGIS and QGis cooperate for statistical modelling duties Example Fixed effects: Value Std.Error DF t-value p-value (Intercept) 72.75860 1.9320485 85213 37.65878 0.0000 surfh -0.02580 0.0015843 85213 -16.28330 0.0000 nbppr5 -6.25587 0.2036677 85213 -30.71605 0.0000 nbppr10 -4.00707 1.0713927 85213 -3.74006 0.0002 anc1 -1.80266 0.2610076 85213 -6.90653 0.0000 bi_epoquB -3.76924 0.1701087 85213 -22.15781 0.0000 bi_epoquC -3.64039 0.1726365 85213 -21.08701 0.0000 bi_epoquD -4.36626 0.1737003 85213 -25.13676 0.0000 bi_epoquE -4.68421 0.1801792 85213 -25.99747 0.0000 bi_epoquF -2.77668 0.1922702 85213 -14.44157 0.0000 bi_epoquG -0.23061 0.1946496 85213 -1.18472 0.2361 bi_epoquH -0.30314 0.3061475 85213 -0.99017 0.3221 saldb1 1.54548 0.0752853 85213 20.52830 0.0000 saldb2 2.48280 0.1328440 85213 18.68963 0.0000 bi_ascenO 0.23147 0.0549959 85213 4.20877 0.0000 etage1 0.11218 0.0739998 85213 1.51589 0.1296 etage2 0.50147 0.0750172 85213 6.68471 0.0000 etage3 0.51211 0.0788536 85213 6.49448 0.0000 etage4 0.62466 0.0873488 85213 7.15138 0.0000 etage5 0.43328 0.0804540 85213 5.38544 0.0000 garag1 1.13419 0.0663346 85213 17.09804 0.0000 garag2 1.41051 0.1170110 85213 12.05449 0.0000 access_fer_n2 -1.30453 0.0573154 85213 -22.76062 0.0000 access_fer_n4 -3.77274 0.0752602 85213 -50.12927 0.0000 access_metroTRUE 0.45248 0.1287529 85213 3.51429 0.0004 distc -0.35556 0.0143153 85213 -24.83802 0.0000 surfh:nbppr5 0.07497 0.0026233 85213 28.57851 0.0000 surfh:nbppr10 0.05339 0.0050886 85213 10.49101 0.0000
How to make R, PostGIS and QGis cooperate for statistical modelling duties Example Cartographic representation of the error term The model seems correct. Is there any spatial structure in the error term? As the model is estimated on 250,000 property transactions (possibly with several transactions at the same location), it is necessary to aggregate the error terms on zones to visualize it. We choose here the finest available census level: the IRIS. We transfer the point dataset into PostGIS, and then use sql queries to compute average error terms on the IRIS areas. It is easy, but a little longish: more than 4 minutes.
How to make R, PostGIS and QGis cooperate for statistical modelling duties Example Residuals of the hedonic model on housings aggregated at the IRIS level in Ile-de-France
How to make R, PostGIS and QGis cooperate for statistical modelling duties Example R mapping capabilities Actually, R can make acceptable maps, with the help of RColorBrewer (for the color palettes), and of the sp and maptools libraries. To tell the truth: much more complicated, but a lot quicker.
How to make R, PostGIS and QGis cooperate for statistical modelling duties Example Residuals of the hedonic model on housings aggregated at the IRIS level in Ile-de-France
How to make R, PostGIS and QGis cooperate for statistical modelling duties Conclusion Convergence R, PostGIS and QGis can be connected, but they also have many common capabilities (e.g. spatial queries with the help of the GEOS library – Geometry engine – open source). RDBMS have moved early towards spatial data (first version of PostGIS in 2001), as well as R. R is now able to perform most of the GIS duties (except data acquisition). GIS are moving slowly towards software or libraries that could enhance their data processing and modelling capabilities (or turning into specialized platforms).
How to make R, PostGIS and QGis cooperate for statistical modelling duties Conclusion Conclusion Many researchers in social science with quantitative approaches (in my field: geography, regional science, transportation science) heavily rely on software with modelling and analysis capabilities: R for statistical modelling, Netlogo, Repast Simphony or GAMA for agent-based simulation, etc. All these platforms move towards GIS: many libraries in R, NetLogo GIS extension, Repast GIS support, build-in GIS capabilities in GAMA. Classical GIS platforms have to react if they want to remain attractive for these researchers. The USM OrbisGIS plugin (Rousseaux et al., 2012) is a very good step in this direction!
Recommend
More recommend