towards keyword based search over environmental data
play

Towards Keyword-Based Search over Environmental Data Sources 3rd - PowerPoint PPT Presentation

Towards Keyword-Based Search over Environmental Data Sources 3rd International KEYSTONE Conference (IKC 2017) Gdask Poland, 11-12 September 2017. David lvarez-Castro, Jos R.R. Viqueira, Alberto Bugarn Centro Singular de Investigacin


  1. Towards Keyword-Based Search over Environmental Data Sources 3rd International KEYSTONE Conference (IKC 2017) Gdańsk Poland, 11-12 September 2017. David Álvarez-Castro, José R.R. Viqueira, Alberto Bugarín Centro Singular de Investigación en Tecnoloxías da Información UNIVERSIDADE DE SANTIAGO DE COMPOSTELA citius.usc.es

  2. Contents  Motivation and Objective  KEYWORDTERM Architecture  Catalog and Index structure  Searching process (PoS Data Restrictions)  Conclusions and Future Work

  3. Contents  Motivation and Objective  KEYWORDTERM Architecture  Catalog and Index structure  Searching process (PoS Data Restrictions)  Conclusions and Future Work

  4. Motivation and Objective Motivation RELEVANT METAPHOR : S EARCHING B OOKS

  5. Motivation and Objective Motivation E XAMPLE 1: C HOLERA R ISK High sea surface temperature and rainfall near sea level during monsoon

  6. Motivation and Objective Motivation E XAMPLE 1: C HOLERA R ISK High sea surface temperature and rainfall near sea level during monsoon Conventional value at each point of Property of Space (PoS) space Sea Surface Temperature (sst). 1/08/2011

  7. Motivation and Objective Motivation E XAMPLE 1: C HOLERA R ISK High sea surface temperature and rainfall near sea level during monsoon Fuzzy Linguistic Value (FLV) Fuzzy set of numeric values

  8. Motivation and Objective Motivation E XAMPLE 1: C HOLERA R ISK High sea surface temperature and rainfall near sea level during monsoon Fuzzy set of Data Restriction spatio‐temporal elements sst [1/08/2011] sst mean August [2005‐2012]

  9. Motivation and Objective Motivation E XAMPLE 1: C HOLERA R ISK High sea surface temperature and rainfall near sea level during monsoon Data Restriction High rainfall

  10. Motivation and Objective Motivation E XAMPLE 1: C HOLERA R ISK High sea surface temperature and rainfall near sea level during monsoon Data restriction Near the coastline and low elevation Name (time) Fuzzy Spatial Relationship (FSR) Geographic Named Entity (GNE) Geometry (time) Properties (time) Fuzzy set of Spatial Restriction spatio‐temporal elements

  11. Motivation and Objective Motivation E XAMPLE 1: C HOLERA R ISK High sea surface temperature and rainfall near sea level during monsoon Fuzzy Temporal Relationship (FSR) Geographic Named Entity (GNE) Temporal Restriction Fuzzy set of spatio‐ temporal elements

  12. Motivation and Objective Motivation E XAMPLE 2: T OURISM High sea surface temperature near Camping Miño

  13. Motivation and Objective Motivation E XAMPLE 2: T OURISM High sea surface temperature near Camping Miño Data restriction Spatial Restriction Fuzzy set of spatio‐ temporal elements

  14. Motivation and Objective Motivation S TATE OF THE A RT High sea surface temperature and rainfall near sea level during monsoon Geo data analysis Geographic Rainfall Sea Surface Information System Temperature Elevation Not Toolkit Coastline Discover Feasible Download Task Data Data Data Data Catalog Catalog Catalog Source Source Source Source

  15. Motivation and Objective Objective

  16. Contents  Motivation and Objective  KEYWORDTERM Architecture  Catalog and Index structure  Searching process (PoS Data Restrictions)  Conclusions and Future Work

  17. KEYWORDTERM Architecture Web GUI Discovery & Search OGC WMS Search Engine Search Discovery Index Catalog Structure Update Crawler Unidata OGC WFS NetCDF Subset GNE Data PoS Data Source Source OGC WMS OGC WMS

  18. Contents  Motivation and Objective  KEYWORDTERM Architecture  Catalog and Index structure  Searching process (PoS Data Restrictions)  Conclusions and Future Work

  19. ̶ ̶ ̶ Catalog and Index Structure Catalog  Properties of Space (PoS)  Examples: Sea Surface Temperature, Rainfall, Elevation, etc.  Defined FLVs High, Normal, Low, etc.  Geographic Named Entity Types (GNET)  Examples: Accomodation_facility, Municipality, Coastline_feature, etc.  List of properties Beds of Accomodation_facility, population of Municipality, etc. Defined FLVs for each property Not Harmonized One harmonized Semantic Data vocabulary data source for each Integration assumed PoS/GNET assumed

  20. ̶ ̶ ̶ Catalog and Index Structure Index Structure C ONTENTS  Properties (of Space and of GNETs)  Precomputed memberships of all possible primitive data restrictions (defined FLVs) High Sea Surface Temperature, low elevation, many beds, low population, etc.  GNETs  Temporal evolution of: Names geometries Crawling data sources registered in the harmonized Catalog

  21. Catalog and Index Structure Index Structure P RECOMPUTED P O S D ATA R ESTRICTIONS  Multiresolution spatial and temporal pyramids of raster tiles SPATIAL . . . TEMPORAL

  22. Catalog and Index Structure Index Structure P RECOMPUTED P O S D ATA R ESTRICTIONS  Generation of Membership tiles Membeship FLVs raster Tile Membership Very low value [0,1] Tiles with all 0’s are Low discarded Sea Surface Temperature Normal GL2 TL3 180 x 360 x 20 real High values ~ 10MB Very high

  23. Catalog and Index Structure Index Structure P RECOMPUTED P O S D ATA R ESTRICTIONS  Data access structures Membership Spatial/Temporal Property Name (Hash) FLV (Hash) raster tiles Indexing . . . Sea Surface Temperature R‐Tree Water Temperature (Space) Very high . . High . Normal Humidity B+‐Tree Low . (Time) . Very Low . Wind Speed . . . . . . Population Density . . .

  24. Catalog and Index Structure Index Structure P RECOMPUTED GNET P ROPERTY D ATA R ESTRICTIONS  Data access structures Membership Spatial/Temporal Property Name (Hash) FLV (Hash) vector zones Indexing . . Geo Time Memb. . [t1, t2] Sea Surface Temperature 0.5 R‐Tree Water Temperature (Space) . High [t3, t8] 0.7 . . Normal . Humidity Low . B+‐Tree . . (Time) . [ti, tj] . 1 Wind Speed . . . Population Density . . .

  25. Catalog and Index Structure Index Structure T EMPORAL E VOLUTION OF GNE D ATA  Data access structures GNEs Textual/Spatial/Temporal GNETs Indexing Sport Facilities Name Geo Time Roads . Hash . Hotels (Text) . . . . R‐Tree Storms Camping Miño [t1, t2] (Space) . . . Araguaney [t1, t8] . Administrative Divisions B+‐Tree . . (Time) . . . Virxe da cerca . [t5, t9] . .

  26. Contents  Motivation and Objective  KEYWORDTERM Architecture  Catalog and Index structure  Searching process (PoS Data Restrictions)  Conclusions and Future Work

  27. ̶ Searching process (PoS Data Restrictions) Phase 1: Accessing relevant raster membership tiles metadata O NE D ATA R ESTRICTION  Obtain metadata of relevant tiles  Result  Set of relevant tile metadata T WO OR MORE D ATA R ESTRICTIONS  Spatio-temporal join of tile metadata  Result  Set of tuples of tile metadata  If (T1, T2, ..., Tn) is a tuple of tiles of the result then The intersection of their spatial and temporal extensions must be non-empty

  28. Searching process (PoS Data Restrictions) Phase 1: Accessing relevant raster membership tiles Metadata I MPLEMENTATION  Spatial Relational DBMS (PostgreSQL + PostGIS) P1 V1 AND P2 V2 PoS PID GL TL FLV BBox TimeS Tile TimeE 0 4 2 High t 12 t 27 tile1 0 4 2 High t 33 tile2 t 49 ... ... ... ... ... ... ... ... 0 4 2 Normal t 94 t 99 tile23 ... ... ... ... ... ... ... ... 0 4 2 Low t 7 t 85 tile45 ... ... ... ... ... ... ... ... B+-Tree Hash Hash R-Tree

  29. Searching process (PoS Data Restrictions) Phase 1: Accessing relevant raster membership tiles metadata Real Dataset P ERFORMANCE 8340 Tiles ~ 80 GB of numeric real data Hardware 2 CPU x 2 Cores 4 GB RAM 50 GB DISK

  30. Searching process (PoS Data Restrictions) Phase 1: Accessing relevant raster membership tiles metadata P ERFORMANCE Spatio‐ temporal Join Queries Only select

  31. Searching process (PoS Data Restrictions) Phase 2: Tile data access + [Fuzzy intersection of tile tuples] O NE D ATA R ESTRICTION  Obtain tile data from disk  Generate response WMS layers T WO OR MORE D ATA R ESTRICTIONS  Perform fuzzy intersection between the tiles of each tuple  Minimum membership at each spatio-temporal cell  Algorithm 1  Tiles with the same spatial and temporal resolution  Hash Join using space and time  Algorithm 2  Tiles with different spatial and/or temporal resolution  Spatial and/or temporal resampling + Hash Join using space and time

  32. ̶ ̶ ̶ ̶ Searching process (PoS Data Restrictions) Phase 2: Tile data access + [Fuzzy intersection of tile tuples] I MPLEMENTATION  Centralized implementation in Python  Distributed implementation  Storage: Apache Parquet Distributed columnar storage Data encodings and compression  Processing: Apache Spark Map/reduce Distributed relational operations  Efficient Hash Join based on Map/Reduce

  33. Searching process (PoS Data Restrictions) Phase 2: Tile data access + [Fuzzy intersection of tile tuples] P ERFORMANCE 8 executors 8 GB RAM

  34. Searching process (PoS Data Restrictions) Phase 2: Tile data access + [Fuzzy intersection of tile tuples] Resampling ‐> P ERFORMANCE more processing 20 tuples of tiles 20 tuples of tiles 8 executors 8 GB RAM

Recommend


More recommend