scan statistics using for economical research
play

SCAN STATISTICS USING FOR ECONOMICAL RESEARCH V. Jansons, V. - PowerPoint PPT Presentation

SCAN STATISTICS USING FOR ECONOMICAL RESEARCH V. Jansons, V. Jurenoks, K. Didenko Riga Technical University, Latvia Bulgaria, Yundola - 2008 Traditional Statistics Methods more Appropriate for Local Investigations Taking Into Account


  1. SCAN STATISTICS USING FOR ECONOMICAL RESEARCH V. Jansons, V. Jurenoks, K. Didenko Riga Technical University, Latvia – Bulgaria, Yundola - 2008

  2. Traditional Statistics Methods – more Appropriate for Local Investigations Taking Into Account Local Factors

  3. Requirement of social-economics and technology nowadays is quickly and accurately determine whether the extra-events (hot spots - clusters) is occurring!!!

  4. Socio-Economic Cluster Detection With Spatial Scan Statistics • Changes in modern urban planning; • Urban planning concepts are changing; • Need for sustainable cities; • Importance to integrate the multi-scale nature of the city. New concepts in urban planning: fusion between • –Urban nuclei theory; • –Hierarchical centre/periphery models.

  5. Scan Statistics Using to Determine Marketing Hot Spot – High Demand Density C B A B B A B

  6. Urban Cluster Detection? Why? • Characterize urban space; • Highlight specialized areas in an urban area; • Highlight deficiencies in a certain neighbourhood; • Avoid excessive mobility; • Plan public transport networks.

  7. The Spatial Scan Statistics Have Been Used to Detect and Extract Spatiotemporal Clusters of Service Within the City of Riga

  8. Scan Statistics Using for Latvian Forest (health) Analysis and Control

  9. Latvian State Institutions Involved in Forest Fire Control Latvian Latvian Air Air National National Defence Defence Transport Guard Transport Guard Troops Troops Fire Fighting Local Fire Fighting Local State Forest Department State Forest Government Department Government Service Service Latvian Communicat Latvian Communicat Railway ion Service Railway ion Service Medical Civil Medical Civil Service Defence Service Defence

  10. Forest and Forest Fire Prevention System in Latvia – Fire Lookout Towers and Fire Stations Forests in Latvia cover 45 % of the surface of the country The state is the largest forest owner in Latvia with control of approximately 50% of the forests. All activities in Latvian forests must be conducted according to Latvian Forest Law.

  11. Average Number of Forest Fires in Latvia

  12. Space-Time Scan Statistic rather than 2D circular • Cylinders are used zones.

  13. Evolution of Average Number of Forest Fires in Latvia in Time Total Number of Fires in Latvia in Time T 2 Time T 1

  14. Example of Satellite-Based Forest Fire Monitoring in the Baltic Region

  15. Monitoring of Aquatic Ecosystems and Groundwater in the River Basin Areas Network Analysis of Biological Integrity in Freshwater Streams

  16. Water Quality Sampling Stations Each sampling stations control water parameters: � Bacteria � Chlorine levels � pH � Inorganic and organic pollutants � Colour, turbidity, odour � Many others

  17. Example of Wireless Sensor Networks for Habitat Monitoring

  18. Scalable Wireless Geo- Telemetry with Miniature Smart Sensors

  19. Data Fusion Hierarchy for Smart Sensor Network with Wireless Geo-Telemetry Capability Decisions Information Retrieval Information Analysis Data Analysis Data Integration: Sensors, Time, Location Data Processing:Refinement and Filltering Signal Acquisition From Sensors

  20. Forest Health Decision Support System Using Benchmarking and Scan Statistics Identification Module Key Forest Forest Areas Data from Threat sensors Locations Outside factors Data Benchmarking Module Hot Spot Processing - Compare Ground Forest Sensors Infected Non-infected Benchmarking Verification Air/Space Sample Platforms Decision

  21. Previous slides showed that data (information) for global Statistics control - Scan Statistics are enough!!! Computers capabilities allow to solve real Scan Statistics problems.

  22. Objectives of the Scan Statistics: • Perform socio-economical surveillance of phenomena, to detect areas of significantly high or low rates. Indicates whether there is clustering; • Test whether a phenomena is randomly distributed over space, over time or over space and time. Shows us where it is; • Evaluate reported spatial or space-time phenomena clusters, to see if they are statistically significant. Produces a relative risk for the cluster; • Perform repeated time-periodic phenomena surveillance for the early detection of phenomena outbreaks.

  23. Spatial Scan Statistic • The purpose of Scan statistics is the early detection of clusters; • Two phases: • –Identification of the most probable clusters for which the occurrences of a phenomenon within a region are higher than outside it; • –Distinguish clusters that are significant from those which occurred by chance.

  24. Scanning Window Principle • A scanning window considers every unit and its neighbours in search for overdensities; • The size of window is increased;

  25. Scanning Window Principle (Kulldorff, 1997; Neill and Moore, 2005) A circular scanning window is placed at different coordinates with radius that vary from 0 to some set upper limit. Grid points Circles around red point Circles around blew point For each location and size of window: the statistical criteria (Likelihood Ratio) is computed and the maximum is considered the most likely cluster

  26. Scanning Window Principle To detect and localize outbreaks, we can search for spatial regions where the counts are significantly higher than expected. Imagine moving a space-time window around the scan area, allowing the window size, shape, and duration to vary.

  27. Scanning Window Principle To detect and localize outbreaks, we can search for spatial regions where the counts are significantly higher than expected. Imagine moving a space-time window around the scan area, allowing the window size, shape, and duration to vary.

  28. Scanning Window Principle To detect and localize outbreaks, we can search for spatial regions where the counts are significantly higher than expected. Imagine moving a space- time window around the scan area, allowing the window size, shape, and duration to vary.

  29. Scanning Window Principle To detect and localize outbreaks, we can search for spatial regions where the counts are significantly higher than expected. Imagine moving a space-time window around the scan area, allowing the window size, shape, and duration to vary.

  30. Scan Statistics In either case, we find the regions with Not significant! (p = highest values of a likelihood ratio 0.18) statistic, and compute the statistical significance of each region by randomization testing. Alternative hypothesis H 1 : outbreak in region S Pr( Data | ( )) H S 1 S = L ( ) Pr( Data | ) H 0 Null hypothesisH 0 : Significant! (p = 0.01) no outbreak Parametric scan statistic approaches assume some parametric model for Maximum region the distribution of counts, and learn score = 9.5 the parameters from historical data.

  31. Kulldorff’s Population-based Frequency Model of Cluster - Critical Region S q out = 0.01 q in = 0.02 In Figure we illustrate a suspicion cluster – region with high level of intensity q in = 0.02 of phenomena. Scan statistic must gives answer – is this cluster real or it is “visual illusion”?

  32. Simplest Model For This Situation Can be Written as: • Null hypothesis H 0 (no clusters in region S ) q i = q all everywhere (use maximum likelihood estimate of q all in S ); • Alternative hypothesis H 1 (cluster in region S ) q i = q in inside region S , q i = q out elsewhere (use maximum likelihood estimates of q in and q out , subject to q in > q out ).

  33. Likelihood Function and p-Value Likelihood function is created depending on model selected. Likelihood Function is maximized over all window locations and sizes The one with the maximum likelihood is most likely cluster (least likely to have occurred by chance). Likelihood Ratio for this window becomes maximum likelihood. • ratio test statistic. • A p-value is obtained for the cluster by Monte - Carlo hypothesis testing

  34. Poisson Models – is Most Popular Model in Scan Statistics • In each area, we assume the data X to be distributed under null hypothesis H 0 , i.e., λ ~ ( ) X Poi 0 • This yields the likelihood function L 0 ) ∏ n ( = = ,..., | ( | ) L f x x w f x w 0 1 w n w i = 1 i

  35. Likelihood Ratio • We then compute the maximum of function L 0 ; • We also compute the maximum of L 1 , which is the same function with parameters unrestricted. • Each zone Z has different parameters, given the heterogeneous population distribution. We want to find the zone which maximizes the LR (likelihood ratio) between likelihoods L 1 and L 0 : ⎛ ⎞ L ⎜ ⎟ = 1 ( ) LR Z ⎜ ⎟ ⎝ ⎠ L 0 Z

  36. Scan Statistic LR st From Likelihood Ratio • the scan statistic LR st is defined as st = max ( ) LR LR Z Z • In the case of Poisson distributed process, the Likelihood ratio takes the following form:

Recommend


More recommend