Towards Keyword-Based Search over Environmental Data Sources 3rd International KEYSTONE Conference (IKC 2017) Gdańsk Poland, 11-12 September 2017. David Álvarez-Castro, José R.R. Viqueira, Alberto Bugarín Centro Singular de Investigación en Tecnoloxías da Información UNIVERSIDADE DE SANTIAGO DE COMPOSTELA citius.usc.es
Contents Motivation and Objective KEYWORDTERM Architecture Catalog and Index structure Searching process (PoS Data Restrictions) Conclusions and Future Work
Contents Motivation and Objective KEYWORDTERM Architecture Catalog and Index structure Searching process (PoS Data Restrictions) Conclusions and Future Work
Motivation and Objective Motivation RELEVANT METAPHOR : S EARCHING B OOKS
Motivation and Objective Motivation E XAMPLE 1: C HOLERA R ISK High sea surface temperature and rainfall near sea level during monsoon
Motivation and Objective Motivation E XAMPLE 1: C HOLERA R ISK High sea surface temperature and rainfall near sea level during monsoon Conventional value at each point of Property of Space (PoS) space Sea Surface Temperature (sst). 1/08/2011
Motivation and Objective Motivation E XAMPLE 1: C HOLERA R ISK High sea surface temperature and rainfall near sea level during monsoon Fuzzy Linguistic Value (FLV) Fuzzy set of numeric values
Motivation and Objective Motivation E XAMPLE 1: C HOLERA R ISK High sea surface temperature and rainfall near sea level during monsoon Fuzzy set of Data Restriction spatio‐temporal elements sst [1/08/2011] sst mean August [2005‐2012]
Motivation and Objective Motivation E XAMPLE 1: C HOLERA R ISK High sea surface temperature and rainfall near sea level during monsoon Data Restriction High rainfall
Motivation and Objective Motivation E XAMPLE 1: C HOLERA R ISK High sea surface temperature and rainfall near sea level during monsoon Data restriction Near the coastline and low elevation Name (time) Fuzzy Spatial Relationship (FSR) Geographic Named Entity (GNE) Geometry (time) Properties (time) Fuzzy set of Spatial Restriction spatio‐temporal elements
Motivation and Objective Motivation E XAMPLE 1: C HOLERA R ISK High sea surface temperature and rainfall near sea level during monsoon Fuzzy Temporal Relationship (FSR) Geographic Named Entity (GNE) Temporal Restriction Fuzzy set of spatio‐ temporal elements
Motivation and Objective Motivation E XAMPLE 2: T OURISM High sea surface temperature near Camping Miño
Motivation and Objective Motivation E XAMPLE 2: T OURISM High sea surface temperature near Camping Miño Data restriction Spatial Restriction Fuzzy set of spatio‐ temporal elements
Motivation and Objective Motivation S TATE OF THE A RT High sea surface temperature and rainfall near sea level during monsoon Geo data analysis Geographic Rainfall Sea Surface Information System Temperature Elevation Not Toolkit Coastline Discover Feasible Download Task Data Data Data Data Catalog Catalog Catalog Source Source Source Source
Motivation and Objective Objective
Contents Motivation and Objective KEYWORDTERM Architecture Catalog and Index structure Searching process (PoS Data Restrictions) Conclusions and Future Work
KEYWORDTERM Architecture Web GUI Discovery & Search OGC WMS Search Engine Search Discovery Index Catalog Structure Update Crawler Unidata OGC WFS NetCDF Subset GNE Data PoS Data Source Source OGC WMS OGC WMS
Contents Motivation and Objective KEYWORDTERM Architecture Catalog and Index structure Searching process (PoS Data Restrictions) Conclusions and Future Work
̶ ̶ ̶ Catalog and Index Structure Catalog Properties of Space (PoS) Examples: Sea Surface Temperature, Rainfall, Elevation, etc. Defined FLVs High, Normal, Low, etc. Geographic Named Entity Types (GNET) Examples: Accomodation_facility, Municipality, Coastline_feature, etc. List of properties Beds of Accomodation_facility, population of Municipality, etc. Defined FLVs for each property Not Harmonized One harmonized Semantic Data vocabulary data source for each Integration assumed PoS/GNET assumed
̶ ̶ ̶ Catalog and Index Structure Index Structure C ONTENTS Properties (of Space and of GNETs) Precomputed memberships of all possible primitive data restrictions (defined FLVs) High Sea Surface Temperature, low elevation, many beds, low population, etc. GNETs Temporal evolution of: Names geometries Crawling data sources registered in the harmonized Catalog
Catalog and Index Structure Index Structure P RECOMPUTED P O S D ATA R ESTRICTIONS Multiresolution spatial and temporal pyramids of raster tiles SPATIAL . . . TEMPORAL
Catalog and Index Structure Index Structure P RECOMPUTED P O S D ATA R ESTRICTIONS Generation of Membership tiles Membeship FLVs raster Tile Membership Very low value [0,1] Tiles with all 0’s are Low discarded Sea Surface Temperature Normal GL2 TL3 180 x 360 x 20 real High values ~ 10MB Very high
Catalog and Index Structure Index Structure P RECOMPUTED P O S D ATA R ESTRICTIONS Data access structures Membership Spatial/Temporal Property Name (Hash) FLV (Hash) raster tiles Indexing . . . Sea Surface Temperature R‐Tree Water Temperature (Space) Very high . . High . Normal Humidity B+‐Tree Low . (Time) . Very Low . Wind Speed . . . . . . Population Density . . .
Catalog and Index Structure Index Structure P RECOMPUTED GNET P ROPERTY D ATA R ESTRICTIONS Data access structures Membership Spatial/Temporal Property Name (Hash) FLV (Hash) vector zones Indexing . . Geo Time Memb. . [t1, t2] Sea Surface Temperature 0.5 R‐Tree Water Temperature (Space) . High [t3, t8] 0.7 . . Normal . Humidity Low . B+‐Tree . . (Time) . [ti, tj] . 1 Wind Speed . . . Population Density . . .
Catalog and Index Structure Index Structure T EMPORAL E VOLUTION OF GNE D ATA Data access structures GNEs Textual/Spatial/Temporal GNETs Indexing Sport Facilities Name Geo Time Roads . Hash . Hotels (Text) . . . . R‐Tree Storms Camping Miño [t1, t2] (Space) . . . Araguaney [t1, t8] . Administrative Divisions B+‐Tree . . (Time) . . . Virxe da cerca . [t5, t9] . .
Contents Motivation and Objective KEYWORDTERM Architecture Catalog and Index structure Searching process (PoS Data Restrictions) Conclusions and Future Work
̶ Searching process (PoS Data Restrictions) Phase 1: Accessing relevant raster membership tiles metadata O NE D ATA R ESTRICTION Obtain metadata of relevant tiles Result Set of relevant tile metadata T WO OR MORE D ATA R ESTRICTIONS Spatio-temporal join of tile metadata Result Set of tuples of tile metadata If (T1, T2, ..., Tn) is a tuple of tiles of the result then The intersection of their spatial and temporal extensions must be non-empty
Searching process (PoS Data Restrictions) Phase 1: Accessing relevant raster membership tiles Metadata I MPLEMENTATION Spatial Relational DBMS (PostgreSQL + PostGIS) P1 V1 AND P2 V2 PoS PID GL TL FLV BBox TimeS Tile TimeE 0 4 2 High t 12 t 27 tile1 0 4 2 High t 33 tile2 t 49 ... ... ... ... ... ... ... ... 0 4 2 Normal t 94 t 99 tile23 ... ... ... ... ... ... ... ... 0 4 2 Low t 7 t 85 tile45 ... ... ... ... ... ... ... ... B+-Tree Hash Hash R-Tree
Searching process (PoS Data Restrictions) Phase 1: Accessing relevant raster membership tiles metadata Real Dataset P ERFORMANCE 8340 Tiles ~ 80 GB of numeric real data Hardware 2 CPU x 2 Cores 4 GB RAM 50 GB DISK
Searching process (PoS Data Restrictions) Phase 1: Accessing relevant raster membership tiles metadata P ERFORMANCE Spatio‐ temporal Join Queries Only select
Searching process (PoS Data Restrictions) Phase 2: Tile data access + [Fuzzy intersection of tile tuples] O NE D ATA R ESTRICTION Obtain tile data from disk Generate response WMS layers T WO OR MORE D ATA R ESTRICTIONS Perform fuzzy intersection between the tiles of each tuple Minimum membership at each spatio-temporal cell Algorithm 1 Tiles with the same spatial and temporal resolution Hash Join using space and time Algorithm 2 Tiles with different spatial and/or temporal resolution Spatial and/or temporal resampling + Hash Join using space and time
̶ ̶ ̶ ̶ Searching process (PoS Data Restrictions) Phase 2: Tile data access + [Fuzzy intersection of tile tuples] I MPLEMENTATION Centralized implementation in Python Distributed implementation Storage: Apache Parquet Distributed columnar storage Data encodings and compression Processing: Apache Spark Map/reduce Distributed relational operations Efficient Hash Join based on Map/Reduce
Searching process (PoS Data Restrictions) Phase 2: Tile data access + [Fuzzy intersection of tile tuples] P ERFORMANCE 8 executors 8 GB RAM
Searching process (PoS Data Restrictions) Phase 2: Tile data access + [Fuzzy intersection of tile tuples] Resampling ‐> P ERFORMANCE more processing 20 tuples of tiles 20 tuples of tiles 8 executors 8 GB RAM
Recommend
More recommend