Lecture 15 GPs for GLMs + Spatial Data 3/20/2018 1
GPs and GLMs 2
๐ง ๐ โผ Bern (๐ ๐ ) Logistic Regression A typical logistic regression problem uses the following model, logit (๐ ๐ ) = ๐ ๐ธ there is no reason that the linear equation above canโt contain thing like random effects or GPs logit (๐ ๐ ) = ๐ ๐ธ + ๐ฅ(๐ฒ) where ๐ฅ(๐ฒ) โผ ๐ช(0, ฮฃ) 3 ๐ง ๐ โผ Bern (๐ ๐ ) = ๐พ 0 + ๐พ 1 ๐ฆ ๐1 + โฏ + ๐พ ๐ ๐ฆ ๐๐
Logistic Regression A typical logistic regression problem uses the following model, logit (๐ ๐ ) = ๐ ๐ธ there is no reason that the linear equation above canโt contain thing like random effects or GPs logit (๐ ๐ ) = ๐ ๐ธ + ๐ฅ(๐ฒ) where ๐ฅ(๐ฒ) โผ ๐ช(0, ฮฃ) 3 ๐ง ๐ โผ Bern (๐ ๐ ) = ๐พ 0 + ๐พ 1 ๐ฆ ๐1 + โฏ + ๐พ ๐ ๐ฆ ๐๐ ๐ง ๐ โผ Bern (๐ ๐ )
A toy example 4 2 0.75 1 eta 0 p 0.50 โ1 0.25 โ2 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 x x 1.00 0.75 y 0.50 0.25 0.00 0.00 0.25 0.50 0.75 1.00 x
Jags Model โ Sigma[j,i] = Sigma[i,j] }โ l ~ dunif(3/0.5, 3/0.01) tau ~ dgamma(1, 2) sigma2 = 1/tau beta0 ~ dnorm(0, 1) } Sigma[i,i] = sigma2 for (i in 1:length(y)) { } } Sigma[i,j] = sigma2 * exp(- l * d[i,j])) logistic_model = โmodel{ for (j in (i+1):length(y)) { for (i in 1:(length(y)-1)) { eta ~ dmnorm(rep(0,N), inverse(Sigma)) loglik = sum(loglik_i) } loglik_i[i] = y[i] * log(p[i]) + (1-y[i]) * log(1-p[i]) y_hat[i] ~ dbern(p[i]) logit(p[i]) = beta0 + eta[i] y[i] ~ dbern(p[i]) for(i in 1:N) { 5
Model Results - Diagnostics 6 0.5 0.0 (Intercept) โ0.5 โ1.0 โ1.5 โ2.0 20 estimate 15 phi 10 8 6 sigma.sq 4 2 0 2500 5000 7500 10000 .iteration
Model Results - Fit 7 eta p 1.00 3 0.75 estimate 0.95 0 0.50 0.8 0.5 0.25 โ3 0.00 โ6 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 x
Model vs glm g = glm (y~ poly (x,2), data=d, family=โbinomialโ) 8 eta p 2 0.75 0 type estimate 0.50 glm prediction โ2 truth 0.25 โ4 โ6 0.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 x
Count data - Polio cases Polio from the glarma package. This data set gives the monthly number of cases of poliomyelitis in the U.S. for the years 1970โ1983 as reported by the Center for Disease Control. 9 10 cases 5 0 1970 1975 1980 year
Polio Model Model : where ๐ฅ(๐ฎ) โผ ๐ช(0, ฮฃ) Priors : ๐ โผ Unif (3 6, 3 1/12) 10 ๐ง ๐ โผ Pois (๐ ๐ ) log (๐ ๐ ) = ๐พ 0 + ๐ฅ(๐ฎ) {๐ป} ๐๐ = ๐ 2 exp (โ|๐ ๐ ๐๐ |) ๐พ 0 โผ ๐ช(0, 1) 1/๐ 2 โผ Gamma (2, 1)
Model Results - Diagnostics 11 0.50 0.25 (Intercept) 0.00 โ0.25 โ0.50 30 estimate 20 phi 10 0 2.5 2.0 sigma.sq 1.5 1.0 0.5 0 250 500 750 1000 .iteration
12 Model Results - Fit lambda 15 10 estimate 0.95 0.8 0.5 5 0 1970 1975 1980 year
Spatial data in R 13
Analysis of geospatial data in R R has a rich package ecosystem for read/writing, manipulating, and analyzing geospatial data. Some core packages (CRAN - Spatial task view): โข sp - core classes for handling spatial data, additional utility functions. โข rgdal - R interface to gdal (Geospatial Data Abstraction Library) for reading and writing spatial data. โข rgeos - R interface to geos (Geometry Engine Open Source) library for querying and manipulating spatial data. Reading and writing WKT. โข sf - Combines the functionality of sp , rgdal , and rgeos into a single package based on tidy priciples. โข lwgeom - additional functionality for sf using PostGISโ liblwgeom. โข raster - classes and tools for handling raster data. 14
Installing sf This is the hardest part of using the sf package, difficulty comes from is dependence on several external libraries ( geos , gdal , and proj ). โข Windows - installing from source works when Rtools is installed (system requirements are downloaded from rwinlib) โข MacOS - install dependencies via homebrew: gdal2 , geos , proj . โข Linux - Install development pacakages for GDAL (>= 2.0.0), GEOS (>= 3.3.0) and Proj.4 (>= 4.8.0) from your package manager of choice. More specific details are included in the repo readme on github. 15
Simple Features 16 Point Linestring Polygon Polygon w/ Hole(s) Multipoint Multilinestring Multipolygon Multipolygon w/ Hole(s)
Reading, writing, and converting simple features โข sf See sf vignette #2. โข st_as_sfc / as(x, โSpatialโ) - sp โข st_as_sfc / st_as_binary - WKB โข st_as_sfc / st_as_wkt - WKT โข st_read / st_write - Shapefile, GeoJSON, KML, โฆ โข rgeos 17 โข maptools โข readShapePoints / writeShapePoints - Shapefile w/ points โข readShapeLines / writeShapeLines - Shapefile w/ lines โข readShapePoly / writeShapePoly - Shapefile w/ polygons โข readShapeSpatial / writeShapeSpatial - Shapefile โข rgdal โข readOGR / writeOGR - Shapefile, GeoJSON, KML, โฆ โข readWKT / writeWKT - Well Known Text
Geospatial data in the real world 18
Projections 19 Lat/Long (epsg:4326) Google / Web Mercator (epsg:3857) Lambert Conformal Conic: 2.0e+07 20 ยฐ N 40 ยฐ N 60 ยฐ N 80 ยฐ N 6e+06 1.0e+07 2e+06 โ4e+06 0.0e+00 150 ยฐ W 100 ยฐ W 50 ยฐ W โ2.0e+07 โ1.0e+07 0.0e+00 โ5e+06 0e+00 5e+06 Alberts Equal Area Robinson Mollweide 8e+06 10000000 1e+07 4e+06 5e+06 4000000 0e+00 0e+00 โ2000000 โ6e+06 โ5e+06 0e+00 5e+06 โ1.5e+07 โ5.0e+06 0.0e+00 โ1.5e+07 โ5.0e+06
Distance on a Sphere 20
Dateline Want to fly from the Western most point in the US to the Eastern most point? 21 70 ยฐ N 65 ยฐ N 60 ยฐ N 55 ยฐ N 180 ยฐ 160 ยฐ W 140 ยฐ W
22 50 ยฐ N 51 ยฐ N 52 ยฐ N 53 ยฐ N 54 ยฐ N 55 ยฐ N 190 ยฐ W 185 ยฐ W 180 ยฐ 175 ยฐ W 170 ยฐ W
23
Using sf 24
Example data Stokes C~ 37169 37 241. Camden C~ 37029 37 2002. NC 2.11 7 0.0626 ## 456. 2001. NC 8 0.115 1.40 6 0.119 ## 264. Currituc~ 37053 37 2000. NC 4.43 ## 1.46 ## ## 10 0.0925 [degree]> ## # ## # ... with 90 more rows, and 1 more variable: geometry <sf_geometry 356. Hertford~ 37091 37 2005. NC 1.81 551. 2003. NC Northamp~ 37131 37 2004. NC 2.40 9 0.143 ## 444. Warren C~ 37185 37 5 0.0687 342. nc <dbl> 1 0.112 ## <dbl> <chr> <chr> <dbl> <chr> <chr> <dbl> ## Gates Co~ 37073 37 STATE_FIPS SQUARE_MIL FIPS AREA PERIMETER COUNTYP010 STATE COUNTY ## ## # A tibble: 100 x 9 tbl_df (nc) = st_read (โ../data/gis/nc_counties/โ, quiet=TRUE, stringsAsFactors=FALSE) 1.61 1994. NC Ashe Cou~ 37009 37 429. 1999. NC 1.43 4 0.0891 ## 539. Surry Co~ 37171 37 1998. NC 1.77 3 0.140 ## 236. Alleghan~ 37005 37 1996. NC 1.35 2 0.0616 ## 25 air = st_read (โ../data/gis/airports/โ, quiet=TRUE, stringsAsFactors=FALSE) hwy = st_read (โ../data/gis/us_interstates/โ, quiet=TRUE, stringsAsFactors=FALSE)
tbl_df (air) 7 11. AIRPORT KCCR 8 ## MADIS~ 47 MC KELLAR~ JACK~ TN MKL 10. AIRPORT KMKL ## BUCHANAN ~ CONC~ CA COBB 13 COBB COUN~ ATLA~ GA RYY 8. AIRPORT KRYY 6 ## ST LU~ CCR 06 ST LUCIE ~ FORT~ FL CAD CNTL_TWR <chr>, geometry <sf_geometry [degree]> ## # LATITUDE <dbl>, LONGITUDE <dbl>, ELEV <dbl>, ACT_DATE <chr>, ## # ## # ... with 930 more rows, and 8 more variables: FIPS <chr>, TOT_ENP <dbl>, WEXFO~ 26 WEXFORD C~ CADI~ MI 15. AIRPORT KCAD CONTR~ ## 10 LOUDO~ 51 LEESBURG ~ LEES~ VA JYO 13. AIRPORT KJYO 9 ## 12 FPR ## # A tibble: 940 x 17 <chr> ## NEW L~ 09 GROTON-NE~ GROT~ CT GON 0. AIRPORT KGON 1 ## <chr> <chr> <chr> 3. AIRPORT K6S5 <chr> <chr> <chr> <dbl> <chr> ## STATE STATE_FIPS COUNTY AIRPT_NAME CITY IATA AIRPRTX010 FEATURE ICAO ## 2 6S5 7. AIRPORT KFPR ## 5 ## SAN D~ 06 GILLESPIE~ SAN ~ CA SEE 6. AIRPORT KSEE 4 KERN RAVALLI C~ HAMI~ MT 06 MOJAVE AI~ MOJA~ CA MHV 4. AIRPORT KMHV 3 ## RAVAL~ 30 26
tbl_df (hwy) ## 85.3 137. MULTILINESTRING ((680741.7 ... ## 6 I124 1.73 2.79 MULTILINESTRING ((1201467 3... ## 7 I126 3.56 5.72 MULTILINESTRING ((1601502 3... 8 I129 ## 3.10 4.99 MULTILINESTRING ((217446 47... ## 9 I135 96.3 155. MULTILINESTRING ((96922.97 ... ## 10 I15 1436. 2311. MULTILINESTRING ((-882875.7... ## # ... with 223 more rows 5 I12 2.55 MULTILINESTRING ((-1013796 ... ## # A tibble: 233 x 4 3941. ## ROUTE_NUM DIST_MILES DIST_KM geometry ## <chr> <dbl> <dbl> <sf_geometry [m]> ## 1 I10 2449. MULTILINESTRING ((-1881200 ... 1.58 ## 2 I105 20.8 33.4 MULTILINESTRING ((-1910156 ... ## 3 I110 41.4 66.6 MULTILINESTRING ((1054139 3... ## 4 I115 27
Recommend
More recommend