Log-Gaussian Cox Process for London crime data Jan Povala with - PowerPoint PPT Presentation

Log-Gaussian Cox Process for London crime data Jan Povala with Louis Ellam Dr Seppo Virtanen Prof Mark Girolami July 24, 2018

Outline Motivation Methodology Results Current work, Next steps Motivation 2

Aims and Objectives ◮ Modelling of crime and short-term forecasting. ◮ Two stages: 1. Inference - what is the underlying process that generated the observations? 2. Prediction - use the inferred process’s properties to forecast future values. Motivation 3

Burglary 45 40 35 30 25 20 15 10 5 0 Motivation 4

Theft from the person 160 140 120 100 80 60 40 20 0 Motivation 5

Outline Motivation Methodology Results Current work, Next steps Methodology 6

Cox Process Cox process is a natural choice for an environmentally driven point process (Diggle et al., 2013). Definition Cox process Y ( x ) is defined by two postulates: 1. Λ( x ) is a nonnegative-valued stochastic process; 2. conditional on the realisation λ ( x ) of the process Λ( x ) , the point process Y ( x ) is an inhomogeneous Poisson process with intensity λ ( x ) . Methodology 7

Log-Gaussian Cox Process ◮ Cox process with intensity driven by a fixed component Z ⊤ x β and a latent function f ( x ) : � Z ⊤ � Λ( x ) = exp x β + f ( x ) , where f ( x ) ∼ GP (0 , k θ ( · , · )) , Z x are socio-economic indicators, and β are their coefficients. ◮ Discretised version of the model: � � Z ⊤ �� y i ∼ Poisson exp x i β + f ( x i ) . Methodology 8

Inference We would like to infer the posterior distributions of β , θ , and f : p ( f , β , θ | y ) = p ( y | f , β ) p ( f | θ ) p ( θ ) p ( β ) , p ( y ) where � p ( y | f , β ) p ( f | θ ) p ( β ) p ( θ ) d θ d β d f , p ( y ) = which is intractable. Solutions 1. Laplace approximation 2. Markov Chain Monte Carlo sampling 3. . . . Methodology 9

Markov Chain Monte Carlo (MCMC) ◮ Sampling from the joint posterior distribution: p ( f , β , θ | y ) ∝ p ( y | f , β ) p ( f | θ ) p ( θ ) p ( β ) , using Hamiltonian Monte Carlo (HMC). ◮ Challenges: – θ , f , and β are strongly correlated. – High dimensionality of f - every iteration requires the inverse and the determinant of K . – Choosing the mass matrix in the HMC algorithm. Methodology 10

Computation Flaxman et al. (2015), Saatçi (2012) ◮ The calculations above require O � n 3 � � n 2 � operations and O space. ◮ Cheaper linear algebra available if separable kernel functions are assumed, e.g. in D = 2 dimensions: k (( x 1 , x 2 ) , ( x ′ 1 , x ′ 2 )) = k 1 ( x 1 , x ′ 1 ) k 2 ( x 2 , x ′ 2 ) implies that K = K 1 ⊗ K 2 . ◮ Applying the above properties, the inference can be performed using � � � � D +1 2 O Dn operations and O Dn space. D D Methodology 11

Outline Motivation Methodology Results Current work, Next steps Results 12

Experiment Model ◮ Factorisable covariance function (product of two Matérns). ◮ Uninformative prior for θ . ◮ N ( 0 , 10 I ) prior for β . Dataset ◮ Burglary , Theft from the person data for 2016. ◮ Grid: 59x46, one cell is an area of 1km by 1km. ◮ Missing locations are treated with a special noise model. Inferred random variables ◮ Coefficients ( β ) for various socio-economic indicators. ◮ Two hyperparameters θ : lengthscale( ℓ ), marginal variance ( σ 2 ). ◮ Latent field f . Results 13

Socio-economic indicators burglary intercept pop_density theft-from-the-person 1000 1000 0 0 3 4 5 6 0.005 0.010 0.015 dwelling_houses_proportion immigrants_proportion 2000 1000 0 0 0.020 0.015 0.010 0.0050.000 0.04 0.02 0.00 median_household_income age_median 2000 1000 0 0 0.00000 0.00002 0.00004 0.00006 0.10 0.05 edu_4_and_above_proportion night_economy_places_per_pop 1000 1000 0 0 0.01 0.00 0.01 0.02 0.03 5 0 5 Results 14

Hyperparameters burglary log variance log lengthscale theft-from-the-person 2000 2000 1750 1500 1500 1250 1000 1000 750 500 500 250 0 0 0.25 0.00 0.25 0.50 0.75 0.4 0.2 0.0 Results 15

Latent field - Burglary 1.0 2 0.8 1 0 0.6 1 0.4 2 0.2 3 (a) mean (b) standard deviation Results 16

Latent field - Theft from the person 4.5 1.25 3.0 1.00 1.5 0.75 0.0 0.50 0.25 1.5 (c) mean (d) standard deviation Results 17

Model Fit - RMSE We compare our model with inferences made using Poisson regression (GLM) using the root mean square error metric: Burglary MCMC 6.59224 GLM 30.39759 Theft from the person MCMC 4.71420 GLM 69.61551 Results 18

Discussion ◮ Effects missing in the GLM model are spatially correlated. This could imply two possibilities: – Model is missing a covariate that is spatially correlated. – The true process driving criminal activity is spatially correlated. ◮ Socio-economic indicators from the census data are ’static’ and might struggle to explain more ’dynamic’ crime types, e.g. burglary vs. violence against person . Results 19

Outline Motivation Methodology Results Current work, Next steps Current work, Next steps 20

Next steps ◮ Benchmark against INLA (Lindgren, Rue, and Lindström, 2011). ◮ Looking at a possibility to extend it into spatio-temporal case. Current work, Next steps 21

Bibliography I Diggle, Peter J. et al. (2013). “Spatial and Spatio-Temporal Log-Gaussian Cox Processes: Extending the Geostatistical Paradigm”. en. In: Statistical Science 28.4, pp. 542–563. ISSN: 0883-4237. DOI: 10.1214/13-STS441 . URL: http://projecteuclid.org/euclid.ss/1386078878 . Flaxman, Seth et al. (2015). “Fast Kronecker Inference in Gaussian Processes with non-Gaussian Likelihoods”. In: Proceedings of the 32nd International Conference on International Conference on Machine Learning . Vol. 37. ICML’15. Lille, France: JMLR.org, pp. 607–616. Bibliography 22

Bibliography II Lindgren, Finn, Håvard Rue, and Johan Lindström (2011). “An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach”. en. In: Journal of the Royal Statistical Society: Series B (Statistical Methodology) 73.4, pp. 423–498. ISSN: 1467-9868. DOI: 10.1111/j.1467-9868.2011.00777.x . URL: http://onlinelibrary.wiley.com/doi/10.1111/j.1467- 9868.2011.00777.x/abstract . Saatçi, Yunus (2012). “Scalable inference for structured Gaussian process models”. PhD Thesis. Citeseer. Wilson, Andrew Gordon et al. (2014). “Fast Kernel Learning for Multidimensional Pattern Extrapolation”. In: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2 . NIPS’14. Cambridge, MA, USA: MIT Press, pp. 3626–3634. URL: http://dl.acm.org/citation.cfm?id=2969033.2969231 . Bibliography 23

β traceplots intercept pop_density 5 0.0075 0.0050 4 dwelling_houses_proportion immigrants_proportion 0.000 0.01 0.02 0.005 0.03 median_household_income age_median 0.025 0.000025 0.050 0.000000 edu_4_and_above_proportion night_economy_places_per_pop 0.01 5 0.00 0 5 0.01 0 10000 20000 0 10000 20000 Extra slides 24

θ traceplots log variance 0.2 0.0 0.2 log lengthscale 0.10 0.05 0.00 0.05 0.10 0.15 0 5000 10000 15000 20000 25000 Extra slides 25

f traceplots Component 1188 0 2 Component 918 0 1 Component 191 0 2 Component 775 1.5 1.0 0 5000 10000 15000 20000 25000 Extra slides 26

Laplace Approximation Flaxman et al. (2015) ◮ For simplicity, we assume non-parametric model (no fixed term), and treat θ as a point estimate got by maximising marginal likelihood. ◮ Approximate the posterior distribution of the latent surface by: � � − 1 � ˆ � p ( f | y , θ ) ≈ N f , − ∇∇ Ψ( f ) | ˆ , f const where Ψ( f ) := log p ( f | y , θ ) = log p ( y | f , θ ) + log p ( f | θ ) is unnormalised log posterior, and ˆ f is the mode of the distribution. ◮ Newton’s method to find ˆ f . Extra slides 27

Matérn Covariance Function � √ � √ � ν � k ( r ) = 2 1 − ν 2 νr 2 νr K ν Γ( ν ) ℓ ℓ We fix ν = 2 . 5 as it is difficult to jointly estimate ℓ and ν due to identifiability issues. Extra slides 28

Kronecker Algebra Saatçi (2012) ◮ Matrix-vector multiplication ( ⊗ d A d ) b in O ( n ) time and space. ◮ Matrix inverse: ( A ⊗ B ) − 1 = A − 1 ⊗ B − 1 ◮ Let K d = Q d Λ d Q ⊤ d be the eigendecomposition of K d . Then, the eigendecomposition of K = ⊗ d K d is given by Q Λ Q ⊤ , where Q = ⊗ d Q d , and Λ = ⊗ d Λ d . The number of steps required is � � 3 O Dn . D Extra slides 29

Incomplete grids Wilson et al. (2014) We have that y i ∼ Poisson(exp( f i )) . For the points of the grid that are not in the domain, we let y i ∼ N ( f i , ǫ − 1 ) and ǫ → 0 . Hence, e f i � y i e − e f i � 1 − ǫ ( y i − f i )2 � � √ p ( y | f ) = 2 πǫ − 1 e 2 y i ! i ∈D i/ ∈D The log-likelihood is thus: [ y i f i − exp( f i ) + const ] − 1 � � ǫ ( y i − f i ) 2 2 i ∈D i/ ∈D We now take the gradient of the log-likelihood as � y i − exp( f i ) , if i ∈ D ∇ log p ( y | f ) i = ǫ ( y i − f i ) , if i / ∈ D and the hessian of the log-likelihood as � − exp( f i ) , if i ∈ D ∇∇ log p ( y | f ) ii = ∈ D . − ǫ if i / Extra slides 30

Log-Gaussian Cox Process for London crime data Jan Povala with - PowerPoint PPT Presentation

Log-Gaussian Cox Process for London crime data Jan Povala with Louis Ellam Dr Seppo Virtanen Prof Mark Girolami July 24, 2018 Outline Motivation Methodology Results Current work, Next steps Motivation 2 Aims and Objectives

(142733/102960-Log[4])+(614851/73920-2 Log[64]) h 2 +(2329/1680-Log[4]) h 4 -h 10 /20160

LTS Efforts in Network Mapping LTS Efforts in Network Mapping Dr B Ann Cox Dr B Ann Cox Dr. B.

Algorithms for Cox rings Simon Keicher ICERM May 2018 Algorithms for Cox rings S. Keicher

Chandra data reduction The CDFs Giorgio, Margherita, Elisabeta, Eleonora, Lazarus, Enrica,

Gaussian Filter The Gaussian filter 1 2 1 A Gaussian kernel gives less 1 2 4 2 weight to

What is Cyber Crime? Cyber Enabled Crime Cyber Dependant Crime Traditional crime that is

Knife Crime Knife Crime This lesson deal with the issue of knife crime in modern Britain. Read

Lecture 3 Capacity of Multiuser Gaussian Channels The Gaussian uplink: 6.1 The fading

Police & Crime Panel 29 June 2017 Police and Crime Plan 2017-2021 Martyn Underhill,

International Cyber Crime Logan Hendershot Outline CSI International Cyber Crime

Speaking with Impact Issues 2 and Conversation Strategies Crime Crime . . . . . . . . .

DRAFT POLICE & CRIME PLAN FOR CONSULTATION January 2013 1 DRAFT POLICE & CRIME PLAN

Gaussian Process Lei Tang Arizona State University Jul. 31th, 2007 Lei Tang (ASU) Gaussian

Syslog and Log Rotate Computer Center, CS, NCTU Log files Execution information of each

Distributed ephemeral log service Log entries are replicated,dispersed See Ivy,

Scaling Log-Structured KV-Stores featuring Monkey and Dostoevsky SIGMOD17 / SIGMOD18 Niv Dayan

From Isolation to Radicalization: Anti-Muslim Hostility and Support for ISIS in the West Tamar

Path Logics for Q uerying Graphs combining expressiveness and efficiency Diego Figueira CNRS,

CSSS 569 Visualizing Data and Models Lab 5: Intro to tile Kai Ping (Brian) Leung Department of

Anti-Corruption Update: A Global Perspective October 4, 2017 1 1 Presenters Constantin

From the Curse of Cash to the Burden of Digitization Kenneth Rogoff, Harvard University Workshop

What? Investigating what a corpus is about Max Kemman University of Luxembourg October 25, 2015

Housekeeping Agenda Introduction Amy Bell OPBAS findings Common Issues for Law

New Banks seminar New Bank Start-up Unit 9 June 2017 NBSU Seminar How to become a bank 2 How

Log-Gaussian Cox Process for London crime data Jan Povala with - PowerPoint PPT Presentation

Log-Gaussian Cox Process for London crime data Jan Povala with Louis Ellam Dr Seppo Virtanen Prof Mark Girolami July 24, 2018 Outline Motivation Methodology Results Current work, Next steps Motivation 2 Aims and Objectives

(142733/102960-Log[4])+(614851/73920-2 Log[64]) h 2 +(2329/1680-Log[4]) h 4 -h 10 /20160

LTS Efforts in Network Mapping LTS Efforts in Network Mapping Dr B Ann Cox Dr B Ann Cox Dr. B.

Algorithms for Cox rings Simon Keicher ICERM May 2018 Algorithms for Cox rings S. Keicher

Chandra data reduction The CDFs Giorgio, Margherita, Elisabeta, Eleonora, Lazarus, Enrica,

Gaussian Filter The Gaussian filter 1 2 1 A Gaussian kernel gives less 1 2 4 2 weight to

What is Cyber Crime? Cyber Enabled Crime Cyber Dependant Crime Traditional crime that is

Knife Crime Knife Crime This lesson deal with the issue of knife crime in modern Britain. Read

Lecture 3 Capacity of Multiuser Gaussian Channels The Gaussian uplink: 6.1 The fading

Police &amp; Crime Panel 29 June 2017 Police and Crime Plan 2017-2021 Martyn Underhill,

International Cyber Crime Logan Hendershot Outline CSI International Cyber Crime

Speaking with Impact Issues 2 and Conversation Strategies Crime Crime . . . . . . . . .

DRAFT POLICE &amp; CRIME PLAN FOR CONSULTATION January 2013 1 DRAFT POLICE &amp; CRIME PLAN

Gaussian Process Lei Tang Arizona State University Jul. 31th, 2007 Lei Tang (ASU) Gaussian

Syslog and Log Rotate Computer Center, CS, NCTU Log files Execution information of each

Distributed ephemeral log service Log entries are replicated,dispersed See Ivy,

Scaling Log-Structured KV-Stores featuring Monkey and Dostoevsky SIGMOD17 / SIGMOD18 Niv Dayan

From Isolation to Radicalization: Anti-Muslim Hostility and Support for ISIS in the West Tamar

Path Logics for Q uerying Graphs combining expressiveness and efficiency Diego Figueira CNRS,

CSSS 569 Visualizing Data and Models Lab 5: Intro to tile Kai Ping (Brian) Leung Department of

Anti-Corruption Update: A Global Perspective October 4, 2017 1 1 Presenters Constantin

From the Curse of Cash to the Burden of Digitization Kenneth Rogoff, Harvard University Workshop

What? Investigating what a corpus is about Max Kemman University of Luxembourg October 25, 2015

Housekeeping Agenda Introduction Amy Bell OPBAS findings Common Issues for Law

New Banks seminar New Bank Start-up Unit 9 June 2017 NBSU Seminar How to become a bank 2 How

Police & Crime Panel 29 June 2017 Police and Crime Plan 2017-2021 Martyn Underhill,

DRAFT POLICE & CRIME PLAN FOR CONSULTATION January 2013 1 DRAFT POLICE & CRIME PLAN