distributed databases
play

Distributed Databases Stefan Kufer and Andreas Henrich - PowerPoint PPT Presentation

Quadtree-based Resource Description Techniques for Spatial Data in Distributed Databases Stefan Kufer and Andreas Henrich stefan.kufer@uni-bamberg.de University of Bamberg Media Informatics Group Stuttgart, 09.03.2017 Motivation age of


  1. Quadtree-based Resource Description Techniques for Spatial Data in Distributed Databases Stefan Kufer and Andreas Henrich stefan.kufer@uni-bamberg.de University of Bamberg Media Informatics Group Stuttgart, 09.03.2017

  2. Motivation  age of social media: creation and distribution of media items → maintained in (personal) media archives  large, heterogeneous distributed database of various resources (= nodes in the network) … → adequate indexing techniques are needed heterogeneous resources in the distributed database Quadtree-based Resource Description Techniques for Spatial Data in Distributed Databases (p. 2) Stefan Kufer and Andreas Henrich − BTW 2017 in Stuttgart, March 09, 2017

  3. Problem Description  search criteria to be adressed:  text  timestamps  content features  geographic information  retrieval tasks in a distributed environment  resource description problem  resource selection problem  (result merging) Quadtree-based Resource Description Techniques for Spatial Data in Distributed Databases (p. 3) Stefan Kufer and Andreas Henrich − BTW 2017 in Stuttgart, March 09, 2017

  4. Search Scenario resource A  general preliminaries:  set of resources  each resource maintains a set of geotagged media items [lat/y=48.22, [lat/y=-33.86, lon/x=11.62] lon/x=151.22]  plate-carrée projection summarize  lat/lon coordinates = y/x coordinates in a 2-dimensional plane  more general spatial data scenario  summaries of the spatial content of a resource resource description  query routing based on summaries Quadtree-based Resource Description Techniques for Spatial Data in Distributed Databases (p. 4) Stefan Kufer and Andreas Henrich − BTW 2017 in Stuttgart, March 09, 2017

  5. Search Scenario = query object resource description summarize A similarity query summarize criterion: d(q,o) resource d = Euclidean selection distance q = query object B 1. C o = database object 2. A 3. B summarize C = resource data point (database object) Quadtree-based Resource Description Techniques for Spatial Data in Distributed Databases (p. 5) Stefan Kufer and Andreas Henrich − BTW 2017 in Stuttgart, March 09, 2017

  6. Resource Descriptions  objective: encoding sets of two-dimensional data points  effectiveness → accurate delineation (selectivity)  efficiency → compact storage (space efficiency)  categories of resource descriptions techniques (previous work): [KBH12], [KBH13], [KH14]  Geometric Approaches  Space Partitioning Approaches  Hybrid Approaches Quadtree-based Resource Description Techniques for Spatial Data in Distributed Databases (p. 6) Stefan Kufer and Andreas Henrich − BTW 2017 in Stuttgart, March 09, 2017

  7. Geometric Approaches  approaches that organize the data  one | several bounding volumes ( bv ) to delimit the set of data points → extents of bv described in summaries  evaluated approaches:  MBR (as a comparative baseline)  RecMAR k = maximum number of Minimum Area Rectangles k MBR RecMAR 2 Quadtree-based Resource Description Techniques for Spatial Data in Distributed Databases (p. 7) Stefan Kufer and Andreas Henrich − BTW 2017 in Stuttgart, March 09, 2017

  8. Geometric Approaches  approaches that organize the data  one | several bounding volumes ( bv ) to delimit the set of data points → extents of bv described in summaries  evaluated approaches:  MBR (as a comparative baseline)  RecMAR k = maximum number of Minimum Area Rectangles k MBR RecMAR 3 Quadtree-based Resource Description Techniques for Spatial Data in Distributed Databases (p. 8) Stefan Kufer and Andreas Henrich − BTW 2017 in Stuttgart, March 09, 2017

  9. Geometric Approaches  approaches that organize the data  one | several bounding volumes ( bv ) to delimit the set of data points → extents of bv described in summaries  evaluated approaches:  MBR (as a comparative baseline)  RecMAR k = maximum number of Minimum Area Rectangles k MBR RecMAR 6 Quadtree-based Resource Description Techniques for Spatial Data in Distributed Databases (p. 9) Stefan Kufer and Andreas Henrich − BTW 2017 in Stuttgart, March 09, 2017

  10. Space Partitioning Approaches  approaches that organize the embedding space  decompose the space into disjoint subspaces identify regions (not) containing data points → information about cell occupancy in summaries (0 = non-occupied, 1 = occupied)  evaluated approach: other examples ( not evaluated !) uniform grid  UFS n n = number of sites/subspaces kd space partitioning UFS 32 Quadtree-based Resource Description Techniques for Spatial Data in Distributed Databases (p. 10) Stefan Kufer and Andreas Henrich − BTW 2017 in Stuttgart, March 09, 2017

  11. Space Partitioning Approaches  global space partitioning → the same for all resources ! (summaries only need to contain information about cell occupancy) A B C D  space partitioning must be adapted to the data distribution of the whole data collection !  additional tasks:  collect information about the data distribution in the network  partition space, distribute information in the network  (update information as data collection changes) Quadtree-based Resource Description Techniques for Spatial Data in Distributed Databases (p. 11) Stefan Kufer and Andreas Henrich − BTW 2017 in Stuttgart, March 09, 2017

  12. Hybrid Approaches  combine properties of two arbitrary resource description techniques  method A: builds foundation, method B: refines foundation  evaluated approach: b  KDMBR n = number of subspaces, b = number of bits per bound (4* b for an MBR) n → summary: binary information about cell occupancy (foundation), quantized MBR information for occupied cells (refinement) 3 KDMBR 32 Quadtree-based Resource Description Techniques for Spatial Data in Distributed Databases (p. 12) Stefan Kufer and Andreas Henrich − BTW 2017 in Stuttgart, March 09, 2017

  13. Novel Quadtree-based Resource Description Techniques  quadtree: recursive division of space into four quadrants  regular decomposition (equal sized cells) → linear storage of quadtrees possible (memory efficient representation) [MRJ02]  linear quadtree encoding types:  only black nodes encoding cf. paper!  whole quadtree structure (all internal nodes + leaves) Quadtree-based Resource Description Techniques for Spatial Data in Distributed Databases (p. 13) Stefan Kufer and Andreas Henrich − BTW 2017 in Stuttgart, March 09, 2017

  14. Novel Quadtree-based Resource Description Techniques  linear quadtrees: allow for local space partitioning  adapted to the data distribution of the single resource A B C D  area-driven decomposition of the space, parameters:  c → maximum number of subspaces of the quadtree structure (storage space oriented stopping criterion)  a → threshold area, if undercut by all black cells: end of construction (selectivity oriented stopping criterion) Quadtree-based Resource Description Techniques for Spatial Data in Distributed Databases (p. 14) Stefan Kufer and Andreas Henrich − BTW 2017 in Stuttgart, March 09, 2017

  15. Novel Quadtree-based Resource Description Techniques  QT c,a  space partitioning ( sp ) technique resource-individual sp (local sp ) QT 32,0.1 c,a  GridQT r = number of rows (columns = 2* r ) r  hybrid technique uniform grid (global sp ) + qt-structure (local sp ) 32,0.1 GridQT 4 c,a  KDQT n  hybrid technique kd-structure (global sp ) + qt-structure (local sp ) 32,0.1 KDQT 32 Quadtree-based Resource Description Techniques for Spatial Data in Distributed Databases (p. 15) Stefan Kufer and Andreas Henrich − BTW 2017 in Stuttgart, March 09, 2017

  16. Novel Quadtree-based Resource Description Techniques b  QTMBR c,a  hybrid technique qt structure (local sp ) 3 + quantized MBRs ( bv ) QTMBR 32,0.1 c,a  MBRQT  hybrid technique external MBR ( bv ) 32,0.1 MBRQT + qt-structure (local sp ) Quadtree-based Resource Description Techniques for Spatial Data in Distributed Databases (p. 16) Stefan Kufer and Andreas Henrich − BTW 2017 in Stuttgart, March 09, 2017

  17. Resource Selection - Ranking  all techniques describe areas containing data points → ranking is based on minimum distance between cf. paper the areas of a resource and the query point q for details! = resource data point = query point q A B example: mindist of the areas described by the summary of resource B < mindist of the areas described by the summary of resource A ⇒ B ranked higher than A Quadtree-based Resource Description Techniques for Spatial Data in Distributed Databases (p. 17) Stefan Kufer and Andreas Henrich − BTW 2017 in Stuttgart, March 09, 2017

  18. Evaluation – Data Collection  406,450 geo-referenced images from Flickr  5,951 different users → 5,951 resources  long-tail distribution of data to resources  data space: densely populated and unpopulated areas vary 4 log-scaled! n=4.0 → 10 – 1 = 9.999 Quadtree-based Resource Description Techniques for Spatial Data in Distributed Databases (p. 18) Stefan Kufer and Andreas Henrich − BTW 2017 in Stuttgart, March 09, 2017

Recommend


More recommend