log gaussian cox process for london crime data
play

Log-Gaussian Cox Process for London crime data Jan Povala with - PowerPoint PPT Presentation

Log-Gaussian Cox Process for London crime data Jan Povala with Louis Ellam Dr Seppo Virtanen Prof Mark Girolami July 24, 2018 Outline Motivation Methodology Results Current work, Next steps Motivation 2 Aims and Objectives


  1. Log-Gaussian Cox Process for London crime data Jan Povala with Louis Ellam Dr Seppo Virtanen Prof Mark Girolami July 24, 2018

  2. Outline Motivation Methodology Results Current work, Next steps Motivation 2

  3. Aims and Objectives ◮ Modelling of crime and short-term forecasting. ◮ Two stages: 1. Inference - what is the underlying process that generated the observations? 2. Prediction - use the inferred process’s properties to forecast future values. Motivation 3

  4. Burglary 45 40 35 30 25 20 15 10 5 0 Motivation 4

  5. Theft from the person 160 140 120 100 80 60 40 20 0 Motivation 5

  6. Outline Motivation Methodology Results Current work, Next steps Methodology 6

  7. Cox Process Cox process is a natural choice for an environmentally driven point process (Diggle et al., 2013). Definition Cox process Y ( x ) is defined by two postulates: 1. Λ( x ) is a nonnegative-valued stochastic process; 2. conditional on the realisation λ ( x ) of the process Λ( x ) , the point process Y ( x ) is an inhomogeneous Poisson process with intensity λ ( x ) . Methodology 7

  8. Log-Gaussian Cox Process ◮ Cox process with intensity driven by a fixed component Z ⊤ x β and a latent function f ( x ) : � Z ⊤ � Λ( x ) = exp x β + f ( x ) , where f ( x ) ∼ GP (0 , k θ ( · , · )) , Z x are socio-economic indicators, and β are their coefficients. ◮ Discretised version of the model: � � Z ⊤ �� y i ∼ Poisson exp x i β + f ( x i ) . Methodology 8

  9. Inference We would like to infer the posterior distributions of β , θ , and f : p ( f , β , θ | y ) = p ( y | f , β ) p ( f | θ ) p ( θ ) p ( β ) , p ( y ) where � p ( y | f , β ) p ( f | θ ) p ( β ) p ( θ ) d θ d β d f , p ( y ) = which is intractable. Solutions 1. Laplace approximation 2. Markov Chain Monte Carlo sampling 3. . . . Methodology 9

  10. Markov Chain Monte Carlo (MCMC) ◮ Sampling from the joint posterior distribution: p ( f , β , θ | y ) ∝ p ( y | f , β ) p ( f | θ ) p ( θ ) p ( β ) , using Hamiltonian Monte Carlo (HMC). ◮ Challenges: – θ , f , and β are strongly correlated. – High dimensionality of f - every iteration requires the inverse and the determinant of K . – Choosing the mass matrix in the HMC algorithm. Methodology 10

  11. Computation Flaxman et al. (2015), Saatçi (2012) ◮ The calculations above require O � n 3 � � n 2 � operations and O space. ◮ Cheaper linear algebra available if separable kernel functions are assumed, e.g. in D = 2 dimensions: k (( x 1 , x 2 ) , ( x ′ 1 , x ′ 2 )) = k 1 ( x 1 , x ′ 1 ) k 2 ( x 2 , x ′ 2 ) implies that K = K 1 ⊗ K 2 . ◮ Applying the above properties, the inference can be performed using � � � � D +1 2 O Dn operations and O Dn space. D D Methodology 11

  12. Outline Motivation Methodology Results Current work, Next steps Results 12

  13. Experiment Model ◮ Factorisable covariance function (product of two Matérns). ◮ Uninformative prior for θ . ◮ N ( 0 , 10 I ) prior for β . Dataset ◮ Burglary , Theft from the person data for 2016. ◮ Grid: 59x46, one cell is an area of 1km by 1km. ◮ Missing locations are treated with a special noise model. Inferred random variables ◮ Coefficients ( β ) for various socio-economic indicators. ◮ Two hyperparameters θ : lengthscale( ℓ ), marginal variance ( σ 2 ). ◮ Latent field f . Results 13

  14. Socio-economic indicators burglary intercept pop_density theft-from-the-person 1000 1000 0 0 3 4 5 6 0.005 0.010 0.015 dwelling_houses_proportion immigrants_proportion 2000 1000 0 0 0.020 0.015 0.010 0.0050.000 0.04 0.02 0.00 median_household_income age_median 2000 1000 0 0 0.00000 0.00002 0.00004 0.00006 0.10 0.05 edu_4_and_above_proportion night_economy_places_per_pop 1000 1000 0 0 0.01 0.00 0.01 0.02 0.03 5 0 5 Results 14

  15. Hyperparameters burglary log variance log lengthscale theft-from-the-person 2000 2000 1750 1500 1500 1250 1000 1000 750 500 500 250 0 0 0.25 0.00 0.25 0.50 0.75 0.4 0.2 0.0 Results 15

  16. Latent field - Burglary 1.0 2 0.8 1 0 0.6 1 0.4 2 0.2 3 (a) mean (b) standard deviation Results 16

  17. Latent field - Theft from the person 4.5 1.25 3.0 1.00 1.5 0.75 0.0 0.50 0.25 1.5 (c) mean (d) standard deviation Results 17

  18. Model Fit - RMSE We compare our model with inferences made using Poisson regression (GLM) using the root mean square error metric: Burglary MCMC 6.59224 GLM 30.39759 Theft from the person MCMC 4.71420 GLM 69.61551 Results 18

  19. Discussion ◮ Effects missing in the GLM model are spatially correlated. This could imply two possibilities: – Model is missing a covariate that is spatially correlated. – The true process driving criminal activity is spatially correlated. ◮ Socio-economic indicators from the census data are ’static’ and might struggle to explain more ’dynamic’ crime types, e.g. burglary vs. violence against person . Results 19

  20. Outline Motivation Methodology Results Current work, Next steps Current work, Next steps 20

  21. Next steps ◮ Benchmark against INLA (Lindgren, Rue, and Lindström, 2011). ◮ Looking at a possibility to extend it into spatio-temporal case. Current work, Next steps 21

  22. Bibliography I Diggle, Peter J. et al. (2013). “Spatial and Spatio-Temporal Log-Gaussian Cox Processes: Extending the Geostatistical Paradigm”. en. In: Statistical Science 28.4, pp. 542–563. ISSN: 0883-4237. DOI: 10.1214/13-STS441 . URL: http://projecteuclid.org/euclid.ss/1386078878 . Flaxman, Seth et al. (2015). “Fast Kronecker Inference in Gaussian Processes with non-Gaussian Likelihoods”. In: Proceedings of the 32nd International Conference on International Conference on Machine Learning . Vol. 37. ICML’15. Lille, France: JMLR.org, pp. 607–616. Bibliography 22

  23. Bibliography II Lindgren, Finn, Håvard Rue, and Johan Lindström (2011). “An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach”. en. In: Journal of the Royal Statistical Society: Series B (Statistical Methodology) 73.4, pp. 423–498. ISSN: 1467-9868. DOI: 10.1111/j.1467-9868.2011.00777.x . URL: http://onlinelibrary.wiley.com/doi/10.1111/j.1467- 9868.2011.00777.x/abstract . Saatçi, Yunus (2012). “Scalable inference for structured Gaussian process models”. PhD Thesis. Citeseer. Wilson, Andrew Gordon et al. (2014). “Fast Kernel Learning for Multidimensional Pattern Extrapolation”. In: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2 . NIPS’14. Cambridge, MA, USA: MIT Press, pp. 3626–3634. URL: http://dl.acm.org/citation.cfm?id=2969033.2969231 . Bibliography 23

  24. β traceplots intercept pop_density 5 0.0075 0.0050 4 dwelling_houses_proportion immigrants_proportion 0.000 0.01 0.02 0.005 0.03 median_household_income age_median 0.025 0.000025 0.050 0.000000 edu_4_and_above_proportion night_economy_places_per_pop 0.01 5 0.00 0 5 0.01 0 10000 20000 0 10000 20000 Extra slides 24

  25. θ traceplots log variance 0.2 0.0 0.2 log lengthscale 0.10 0.05 0.00 0.05 0.10 0.15 0 5000 10000 15000 20000 25000 Extra slides 25

  26. f traceplots Component 1188 0 2 Component 918 0 1 Component 191 0 2 Component 775 1.5 1.0 0 5000 10000 15000 20000 25000 Extra slides 26

  27. Laplace Approximation Flaxman et al. (2015) ◮ For simplicity, we assume non-parametric model (no fixed term), and treat θ as a point estimate got by maximising marginal likelihood. ◮ Approximate the posterior distribution of the latent surface by: � � − 1 � ˆ � p ( f | y , θ ) ≈ N f , − ∇∇ Ψ( f ) | ˆ , f const where Ψ( f ) := log p ( f | y , θ ) = log p ( y | f , θ ) + log p ( f | θ ) is unnormalised log posterior, and ˆ f is the mode of the distribution. ◮ Newton’s method to find ˆ f . Extra slides 27

  28. Matérn Covariance Function � √ � √ � ν � k ( r ) = 2 1 − ν 2 νr 2 νr K ν Γ( ν ) ℓ ℓ We fix ν = 2 . 5 as it is difficult to jointly estimate ℓ and ν due to identifiability issues. Extra slides 28

  29. Kronecker Algebra Saatçi (2012) ◮ Matrix-vector multiplication ( ⊗ d A d ) b in O ( n ) time and space. ◮ Matrix inverse: ( A ⊗ B ) − 1 = A − 1 ⊗ B − 1 ◮ Let K d = Q d Λ d Q ⊤ d be the eigendecomposition of K d . Then, the eigendecomposition of K = ⊗ d K d is given by Q Λ Q ⊤ , where Q = ⊗ d Q d , and Λ = ⊗ d Λ d . The number of steps required is � � 3 O Dn . D Extra slides 29

  30. Incomplete grids Wilson et al. (2014) We have that y i ∼ Poisson(exp( f i )) . For the points of the grid that are not in the domain, we let y i ∼ N ( f i , ǫ − 1 ) and ǫ → 0 . Hence, e f i � y i e − e f i � 1 − ǫ ( y i − f i )2 � � √ p ( y | f ) = 2 πǫ − 1 e 2 y i ! i ∈D i/ ∈D The log-likelihood is thus: [ y i f i − exp( f i ) + const ] − 1 � � ǫ ( y i − f i ) 2 2 i ∈D i/ ∈D We now take the gradient of the log-likelihood as � y i − exp( f i ) , if i ∈ D ∇ log p ( y | f ) i = ǫ ( y i − f i ) , if i / ∈ D and the hessian of the log-likelihood as � − exp( f i ) , if i ∈ D ∇∇ log p ( y | f ) ii = ∈ D . − ǫ if i / Extra slides 30

Recommend


More recommend