Log-Gaussian Cox Process for London crime data Jan Povala with Louis Ellam Dr Seppo Virtanen Prof Mark Girolami July 24, 2018
Outline Motivation Methodology Results Current work, Next steps Motivation 2
Aims and Objectives ◮ Modelling of crime and short-term forecasting. ◮ Two stages: 1. Inference - what is the underlying process that generated the observations? 2. Prediction - use the inferred process’s properties to forecast future values. Motivation 3
Burglary 45 40 35 30 25 20 15 10 5 0 Motivation 4
Theft from the person 160 140 120 100 80 60 40 20 0 Motivation 5
Outline Motivation Methodology Results Current work, Next steps Methodology 6
Cox Process Cox process is a natural choice for an environmentally driven point process (Diggle et al., 2013). Definition Cox process Y ( x ) is defined by two postulates: 1. Λ( x ) is a nonnegative-valued stochastic process; 2. conditional on the realisation λ ( x ) of the process Λ( x ) , the point process Y ( x ) is an inhomogeneous Poisson process with intensity λ ( x ) . Methodology 7
Log-Gaussian Cox Process ◮ Cox process with intensity driven by a fixed component Z ⊤ x β and a latent function f ( x ) : � Z ⊤ � Λ( x ) = exp x β + f ( x ) , where f ( x ) ∼ GP (0 , k θ ( · , · )) , Z x are socio-economic indicators, and β are their coefficients. ◮ Discretised version of the model: � � Z ⊤ �� y i ∼ Poisson exp x i β + f ( x i ) . Methodology 8
Inference We would like to infer the posterior distributions of β , θ , and f : p ( f , β , θ | y ) = p ( y | f , β ) p ( f | θ ) p ( θ ) p ( β ) , p ( y ) where � p ( y | f , β ) p ( f | θ ) p ( β ) p ( θ ) d θ d β d f , p ( y ) = which is intractable. Solutions 1. Laplace approximation 2. Markov Chain Monte Carlo sampling 3. . . . Methodology 9
Markov Chain Monte Carlo (MCMC) ◮ Sampling from the joint posterior distribution: p ( f , β , θ | y ) ∝ p ( y | f , β ) p ( f | θ ) p ( θ ) p ( β ) , using Hamiltonian Monte Carlo (HMC). ◮ Challenges: – θ , f , and β are strongly correlated. – High dimensionality of f - every iteration requires the inverse and the determinant of K . – Choosing the mass matrix in the HMC algorithm. Methodology 10
Computation Flaxman et al. (2015), Saatçi (2012) ◮ The calculations above require O � n 3 � � n 2 � operations and O space. ◮ Cheaper linear algebra available if separable kernel functions are assumed, e.g. in D = 2 dimensions: k (( x 1 , x 2 ) , ( x ′ 1 , x ′ 2 )) = k 1 ( x 1 , x ′ 1 ) k 2 ( x 2 , x ′ 2 ) implies that K = K 1 ⊗ K 2 . ◮ Applying the above properties, the inference can be performed using � � � � D +1 2 O Dn operations and O Dn space. D D Methodology 11
Outline Motivation Methodology Results Current work, Next steps Results 12
Experiment Model ◮ Factorisable covariance function (product of two Matérns). ◮ Uninformative prior for θ . ◮ N ( 0 , 10 I ) prior for β . Dataset ◮ Burglary , Theft from the person data for 2016. ◮ Grid: 59x46, one cell is an area of 1km by 1km. ◮ Missing locations are treated with a special noise model. Inferred random variables ◮ Coefficients ( β ) for various socio-economic indicators. ◮ Two hyperparameters θ : lengthscale( ℓ ), marginal variance ( σ 2 ). ◮ Latent field f . Results 13
Socio-economic indicators burglary intercept pop_density theft-from-the-person 1000 1000 0 0 3 4 5 6 0.005 0.010 0.015 dwelling_houses_proportion immigrants_proportion 2000 1000 0 0 0.020 0.015 0.010 0.0050.000 0.04 0.02 0.00 median_household_income age_median 2000 1000 0 0 0.00000 0.00002 0.00004 0.00006 0.10 0.05 edu_4_and_above_proportion night_economy_places_per_pop 1000 1000 0 0 0.01 0.00 0.01 0.02 0.03 5 0 5 Results 14
Hyperparameters burglary log variance log lengthscale theft-from-the-person 2000 2000 1750 1500 1500 1250 1000 1000 750 500 500 250 0 0 0.25 0.00 0.25 0.50 0.75 0.4 0.2 0.0 Results 15
Latent field - Burglary 1.0 2 0.8 1 0 0.6 1 0.4 2 0.2 3 (a) mean (b) standard deviation Results 16
Latent field - Theft from the person 4.5 1.25 3.0 1.00 1.5 0.75 0.0 0.50 0.25 1.5 (c) mean (d) standard deviation Results 17
Model Fit - RMSE We compare our model with inferences made using Poisson regression (GLM) using the root mean square error metric: Burglary MCMC 6.59224 GLM 30.39759 Theft from the person MCMC 4.71420 GLM 69.61551 Results 18
Discussion ◮ Effects missing in the GLM model are spatially correlated. This could imply two possibilities: – Model is missing a covariate that is spatially correlated. – The true process driving criminal activity is spatially correlated. ◮ Socio-economic indicators from the census data are ’static’ and might struggle to explain more ’dynamic’ crime types, e.g. burglary vs. violence against person . Results 19
Outline Motivation Methodology Results Current work, Next steps Current work, Next steps 20
Next steps ◮ Benchmark against INLA (Lindgren, Rue, and Lindström, 2011). ◮ Looking at a possibility to extend it into spatio-temporal case. Current work, Next steps 21
Bibliography I Diggle, Peter J. et al. (2013). “Spatial and Spatio-Temporal Log-Gaussian Cox Processes: Extending the Geostatistical Paradigm”. en. In: Statistical Science 28.4, pp. 542–563. ISSN: 0883-4237. DOI: 10.1214/13-STS441 . URL: http://projecteuclid.org/euclid.ss/1386078878 . Flaxman, Seth et al. (2015). “Fast Kronecker Inference in Gaussian Processes with non-Gaussian Likelihoods”. In: Proceedings of the 32nd International Conference on International Conference on Machine Learning . Vol. 37. ICML’15. Lille, France: JMLR.org, pp. 607–616. Bibliography 22
Bibliography II Lindgren, Finn, Håvard Rue, and Johan Lindström (2011). “An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach”. en. In: Journal of the Royal Statistical Society: Series B (Statistical Methodology) 73.4, pp. 423–498. ISSN: 1467-9868. DOI: 10.1111/j.1467-9868.2011.00777.x . URL: http://onlinelibrary.wiley.com/doi/10.1111/j.1467- 9868.2011.00777.x/abstract . Saatçi, Yunus (2012). “Scalable inference for structured Gaussian process models”. PhD Thesis. Citeseer. Wilson, Andrew Gordon et al. (2014). “Fast Kernel Learning for Multidimensional Pattern Extrapolation”. In: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2 . NIPS’14. Cambridge, MA, USA: MIT Press, pp. 3626–3634. URL: http://dl.acm.org/citation.cfm?id=2969033.2969231 . Bibliography 23
β traceplots intercept pop_density 5 0.0075 0.0050 4 dwelling_houses_proportion immigrants_proportion 0.000 0.01 0.02 0.005 0.03 median_household_income age_median 0.025 0.000025 0.050 0.000000 edu_4_and_above_proportion night_economy_places_per_pop 0.01 5 0.00 0 5 0.01 0 10000 20000 0 10000 20000 Extra slides 24
θ traceplots log variance 0.2 0.0 0.2 log lengthscale 0.10 0.05 0.00 0.05 0.10 0.15 0 5000 10000 15000 20000 25000 Extra slides 25
f traceplots Component 1188 0 2 Component 918 0 1 Component 191 0 2 Component 775 1.5 1.0 0 5000 10000 15000 20000 25000 Extra slides 26
Laplace Approximation Flaxman et al. (2015) ◮ For simplicity, we assume non-parametric model (no fixed term), and treat θ as a point estimate got by maximising marginal likelihood. ◮ Approximate the posterior distribution of the latent surface by: � � − 1 � ˆ � p ( f | y , θ ) ≈ N f , − ∇∇ Ψ( f ) | ˆ , f const where Ψ( f ) := log p ( f | y , θ ) = log p ( y | f , θ ) + log p ( f | θ ) is unnormalised log posterior, and ˆ f is the mode of the distribution. ◮ Newton’s method to find ˆ f . Extra slides 27
Matérn Covariance Function � √ � √ � ν � k ( r ) = 2 1 − ν 2 νr 2 νr K ν Γ( ν ) ℓ ℓ We fix ν = 2 . 5 as it is difficult to jointly estimate ℓ and ν due to identifiability issues. Extra slides 28
Kronecker Algebra Saatçi (2012) ◮ Matrix-vector multiplication ( ⊗ d A d ) b in O ( n ) time and space. ◮ Matrix inverse: ( A ⊗ B ) − 1 = A − 1 ⊗ B − 1 ◮ Let K d = Q d Λ d Q ⊤ d be the eigendecomposition of K d . Then, the eigendecomposition of K = ⊗ d K d is given by Q Λ Q ⊤ , where Q = ⊗ d Q d , and Λ = ⊗ d Λ d . The number of steps required is � � 3 O Dn . D Extra slides 29
Incomplete grids Wilson et al. (2014) We have that y i ∼ Poisson(exp( f i )) . For the points of the grid that are not in the domain, we let y i ∼ N ( f i , ǫ − 1 ) and ǫ → 0 . Hence, e f i � y i e − e f i � 1 − ǫ ( y i − f i )2 � � √ p ( y | f ) = 2 πǫ − 1 e 2 y i ! i ∈D i/ ∈D The log-likelihood is thus: [ y i f i − exp( f i ) + const ] − 1 � � ǫ ( y i − f i ) 2 2 i ∈D i/ ∈D We now take the gradient of the log-likelihood as � y i − exp( f i ) , if i ∈ D ∇ log p ( y | f ) i = ǫ ( y i − f i ) , if i / ∈ D and the hessian of the log-likelihood as � − exp( f i ) , if i ∈ D ∇∇ log p ( y | f ) ii = ∈ D . − ǫ if i / Extra slides 30
Recommend
More recommend