Quadtree-based Resource Description Techniques for Spatial Data in Distributed Databases Stefan Kufer and Andreas Henrich stefan.kufer@uni-bamberg.de University of Bamberg Media Informatics Group Stuttgart, 09.03.2017
Motivation age of social media: creation and distribution of media items → maintained in (personal) media archives large, heterogeneous distributed database of various resources (= nodes in the network) … → adequate indexing techniques are needed heterogeneous resources in the distributed database Quadtree-based Resource Description Techniques for Spatial Data in Distributed Databases (p. 2) Stefan Kufer and Andreas Henrich − BTW 2017 in Stuttgart, March 09, 2017
Problem Description search criteria to be adressed: text timestamps content features geographic information retrieval tasks in a distributed environment resource description problem resource selection problem (result merging) Quadtree-based Resource Description Techniques for Spatial Data in Distributed Databases (p. 3) Stefan Kufer and Andreas Henrich − BTW 2017 in Stuttgart, March 09, 2017
Search Scenario resource A general preliminaries: set of resources each resource maintains a set of geotagged media items [lat/y=48.22, [lat/y=-33.86, lon/x=11.62] lon/x=151.22] plate-carrée projection summarize lat/lon coordinates = y/x coordinates in a 2-dimensional plane more general spatial data scenario summaries of the spatial content of a resource resource description query routing based on summaries Quadtree-based Resource Description Techniques for Spatial Data in Distributed Databases (p. 4) Stefan Kufer and Andreas Henrich − BTW 2017 in Stuttgart, March 09, 2017
Search Scenario = query object resource description summarize A similarity query summarize criterion: d(q,o) resource d = Euclidean selection distance q = query object B 1. C o = database object 2. A 3. B summarize C = resource data point (database object) Quadtree-based Resource Description Techniques for Spatial Data in Distributed Databases (p. 5) Stefan Kufer and Andreas Henrich − BTW 2017 in Stuttgart, March 09, 2017
Resource Descriptions objective: encoding sets of two-dimensional data points effectiveness → accurate delineation (selectivity) efficiency → compact storage (space efficiency) categories of resource descriptions techniques (previous work): [KBH12], [KBH13], [KH14] Geometric Approaches Space Partitioning Approaches Hybrid Approaches Quadtree-based Resource Description Techniques for Spatial Data in Distributed Databases (p. 6) Stefan Kufer and Andreas Henrich − BTW 2017 in Stuttgart, March 09, 2017
Geometric Approaches approaches that organize the data one | several bounding volumes ( bv ) to delimit the set of data points → extents of bv described in summaries evaluated approaches: MBR (as a comparative baseline) RecMAR k = maximum number of Minimum Area Rectangles k MBR RecMAR 2 Quadtree-based Resource Description Techniques for Spatial Data in Distributed Databases (p. 7) Stefan Kufer and Andreas Henrich − BTW 2017 in Stuttgart, March 09, 2017
Geometric Approaches approaches that organize the data one | several bounding volumes ( bv ) to delimit the set of data points → extents of bv described in summaries evaluated approaches: MBR (as a comparative baseline) RecMAR k = maximum number of Minimum Area Rectangles k MBR RecMAR 3 Quadtree-based Resource Description Techniques for Spatial Data in Distributed Databases (p. 8) Stefan Kufer and Andreas Henrich − BTW 2017 in Stuttgart, March 09, 2017
Geometric Approaches approaches that organize the data one | several bounding volumes ( bv ) to delimit the set of data points → extents of bv described in summaries evaluated approaches: MBR (as a comparative baseline) RecMAR k = maximum number of Minimum Area Rectangles k MBR RecMAR 6 Quadtree-based Resource Description Techniques for Spatial Data in Distributed Databases (p. 9) Stefan Kufer and Andreas Henrich − BTW 2017 in Stuttgart, March 09, 2017
Space Partitioning Approaches approaches that organize the embedding space decompose the space into disjoint subspaces identify regions (not) containing data points → information about cell occupancy in summaries (0 = non-occupied, 1 = occupied) evaluated approach: other examples ( not evaluated !) uniform grid UFS n n = number of sites/subspaces kd space partitioning UFS 32 Quadtree-based Resource Description Techniques for Spatial Data in Distributed Databases (p. 10) Stefan Kufer and Andreas Henrich − BTW 2017 in Stuttgart, March 09, 2017
Space Partitioning Approaches global space partitioning → the same for all resources ! (summaries only need to contain information about cell occupancy) A B C D space partitioning must be adapted to the data distribution of the whole data collection ! additional tasks: collect information about the data distribution in the network partition space, distribute information in the network (update information as data collection changes) Quadtree-based Resource Description Techniques for Spatial Data in Distributed Databases (p. 11) Stefan Kufer and Andreas Henrich − BTW 2017 in Stuttgart, March 09, 2017
Hybrid Approaches combine properties of two arbitrary resource description techniques method A: builds foundation, method B: refines foundation evaluated approach: b KDMBR n = number of subspaces, b = number of bits per bound (4* b for an MBR) n → summary: binary information about cell occupancy (foundation), quantized MBR information for occupied cells (refinement) 3 KDMBR 32 Quadtree-based Resource Description Techniques for Spatial Data in Distributed Databases (p. 12) Stefan Kufer and Andreas Henrich − BTW 2017 in Stuttgart, March 09, 2017
Novel Quadtree-based Resource Description Techniques quadtree: recursive division of space into four quadrants regular decomposition (equal sized cells) → linear storage of quadtrees possible (memory efficient representation) [MRJ02] linear quadtree encoding types: only black nodes encoding cf. paper! whole quadtree structure (all internal nodes + leaves) Quadtree-based Resource Description Techniques for Spatial Data in Distributed Databases (p. 13) Stefan Kufer and Andreas Henrich − BTW 2017 in Stuttgart, March 09, 2017
Novel Quadtree-based Resource Description Techniques linear quadtrees: allow for local space partitioning adapted to the data distribution of the single resource A B C D area-driven decomposition of the space, parameters: c → maximum number of subspaces of the quadtree structure (storage space oriented stopping criterion) a → threshold area, if undercut by all black cells: end of construction (selectivity oriented stopping criterion) Quadtree-based Resource Description Techniques for Spatial Data in Distributed Databases (p. 14) Stefan Kufer and Andreas Henrich − BTW 2017 in Stuttgart, March 09, 2017
Novel Quadtree-based Resource Description Techniques QT c,a space partitioning ( sp ) technique resource-individual sp (local sp ) QT 32,0.1 c,a GridQT r = number of rows (columns = 2* r ) r hybrid technique uniform grid (global sp ) + qt-structure (local sp ) 32,0.1 GridQT 4 c,a KDQT n hybrid technique kd-structure (global sp ) + qt-structure (local sp ) 32,0.1 KDQT 32 Quadtree-based Resource Description Techniques for Spatial Data in Distributed Databases (p. 15) Stefan Kufer and Andreas Henrich − BTW 2017 in Stuttgart, March 09, 2017
Novel Quadtree-based Resource Description Techniques b QTMBR c,a hybrid technique qt structure (local sp ) 3 + quantized MBRs ( bv ) QTMBR 32,0.1 c,a MBRQT hybrid technique external MBR ( bv ) 32,0.1 MBRQT + qt-structure (local sp ) Quadtree-based Resource Description Techniques for Spatial Data in Distributed Databases (p. 16) Stefan Kufer and Andreas Henrich − BTW 2017 in Stuttgart, March 09, 2017
Resource Selection - Ranking all techniques describe areas containing data points → ranking is based on minimum distance between cf. paper the areas of a resource and the query point q for details! = resource data point = query point q A B example: mindist of the areas described by the summary of resource B < mindist of the areas described by the summary of resource A ⇒ B ranked higher than A Quadtree-based Resource Description Techniques for Spatial Data in Distributed Databases (p. 17) Stefan Kufer and Andreas Henrich − BTW 2017 in Stuttgart, March 09, 2017
Evaluation – Data Collection 406,450 geo-referenced images from Flickr 5,951 different users → 5,951 resources long-tail distribution of data to resources data space: densely populated and unpopulated areas vary 4 log-scaled! n=4.0 → 10 – 1 = 9.999 Quadtree-based Resource Description Techniques for Spatial Data in Distributed Databases (p. 18) Stefan Kufer and Andreas Henrich − BTW 2017 in Stuttgart, March 09, 2017
Recommend
More recommend