overview
play

Overview HELSINGIN YLIOPISTO HELSINGFORS UNIVERSITET UNIVERSITY OF - PDF document

Overview HELSINGIN YLIOPISTO HELSINGFORS UNIVERSITET UNIVERSITY OF HELSINKI Spatial Data Mining Exploratory methods for analysing data Spatial component Spatial Data Mining Emphasis on point data Main topics Antti Leino


  1. Overview HELSINGIN YLIOPISTO HELSINGFORS UNIVERSITET UNIVERSITY OF HELSINKI Spatial Data Mining Exploratory methods for analysing data Spatial component Spatial Data Mining Emphasis on point data Main topics Antti Leino �antti.leino@cs.helsinki.�� Co-location rules Spatial clustering Spatial modelling Department of Computer Science Administrivia Schedule Lectures / meetings 12.3. Introduction 12th March � 26th April 2007 Mon, Thu 10�12 am, C222 15.3. Co-location patterns Introductory lecture for each main topic Other times two articles / meeting 19.3. Huang & al., `Discovering Colocation Patterns from � Presentation by a student, c. 20 min Spatial Data Sets: A General Approach' � Discussion Salmenkivi, `Ef�cient Mining of Correlation Exam Thu, 3rd May, 4�7 pm Patterns in Spatial Point Data' Project work by Wed, 16th May 22.3. Yoo & al., `A Joinless Approach for Mining Spatial Exercise in spatial data mining Colocation Patterns' Essay on a related topic Huang & al., `Can We Apply Projection Based Course diary Frequent Pattern Mining Paradigm to Spatial Colocation Mining?' http://www.cs.helsinki.�/u/leino/opetus/spatial-k07/ Schedule Schedule 26.3. Xiong & al., `A Framework for Discovering 16.4. Spatial modelling Co-location Patterns in Data Sets with Extended Spatial Objects' 19.4. Kavouras, `Understanding and Modelling Spatial Change' Yoo & al., `Discovery of Co-evolving Spatial Event Sets' Kazar & al., `Comparing Exact and Approximate Spatial Auto-Regression Model Solutions for 29.3. Spatial clustering Spatial Data Analysis' 2.4. Tung & al., `Spatial Clustering in the Presence of 23.4. Shekhar & al., `A Uni�ed Approach to Detecting Obstacles' Spatial Outliers' Wang & Hamilton, `DBRS: A Density-Based Spatial Hyvönen & al., `Multivariate Analysis of Finnish Clustering Method with Random Sampling' Dialect Data � an overview of lexical variation' � � Easter break 26.4. Summary

  2. Introductory example: Lontoo 1854 HELSINGIN YLIOPISTO HELSINGFORS UNIVERSITET UNIVERSITY OF HELSINKI 1854 cholera epidemic Hero of the story: John Snow, MD Spatial Data Mining Method: plot on a map Introduction Cholera deaths Public water pumps Antti Leino �antti.leino@cs.helsinki.�� Discovery: Deaths cluster around one pump The epidemic was shut down by removing the handle from the pump Department of Computer Science Nevertheless . . . Rest of the story Snow 1849: theory that cholera is transmitted by Rather low impact polluted water The episode involved only one district in London The data mining experiment a part of testing the The polluted pump was reopened some weeks theory later Snow's theory was �nally accepted a couple of decades later London had two water companies Snow became famous in 1936 One took its water from the Thames upriver of town, the other downriver Hindsight is easy The polluted pump belonged to the latter Classic examples often have mythical elements Follow-up studies to con�rm that The cholera victims had used the polluted pump Those who did not use the pump did not fall ill In other words, results were veri�ed by other means Data Mining Different kinds of data Point patterns Extract new, interesting information from massive Shape is not relevant amounts of data Each phenomenon New represented by a Surprising separate point pattern Not too strict prior expectations Example: Viking-age forts Interesting Red dots: place Relevant, useful names starting with Often requires some knowledge of the application Linna- `castle' Green squares: Spatial data mining: add a spatial component Viking-age hill forts

  3. Different kinds of data Different kinds of data Spatially continuous data Area data Describes a spatially continuous phenomenon Spatial variation presented as regions Not possible to measure across the space Measurements at distinct points Example 1: spatially continuous Measurement points not interesting as point phenomenon patterns Breeding certainty of the great crested grebe Goal: model the phenomenon in order to predict Finland divided into the values between the measurement points 10 × 10 km squares Co-location rules Different kinds of data Area data Typically for point patterns Example 2: Distinct Correlation between different point patterns area `Members of these point patterns often occur close Spatial distribution to each other' of a dialect word Aaholli Similar correlations can be established for area data Somewhat like a point pattern, but now the Spatial association rules shape is meaningful `If phenomena A 1 ,..., A n are found near each other, phenomenon B is also likely to be found' Cf. frequent sets and association rules in transaction data Spatial clustering Spatial modelling Generally: �nd a model that Goal: �nd clusters in a point describes the phenomenon pattern � Underlying factors or variables can be used for predictions Areas with high point density � Areas that have not been surveyed � Effect of changes Separated by areas with low density Two phases Select a suitable model Example: farms Find the parameters for the model Green dots: farm locations Example: dialect words Principal component analysis Large-scale clustering: areas divided by lakes Smaller-scale clustering: villages

Recommend


More recommend