spatial data
play

Spatial Data Ahmed Eldawy Computer Science and Engineering Claudius - PowerPoint PPT Presentation

The Era of Big Spatial Data Ahmed Eldawy Computer Science and Engineering Claudius Ptolemy (AD 90 AD 168) Al Idrisi (1099 1165) Cholera cases in the London epidemic of 1854 Cool computer technology..!! Can I use it in my My pleasure.


  1. The Era of Big Spatial Data Ahmed Eldawy Computer Science and Engineering

  2. Claudius Ptolemy (AD 90 – AD 168)

  3. Al Idrisi (1099 – 1165)

  4. Cholera cases in the London epidemic of 1854

  5. Cool computer technology..!! Can I use it in my My pleasure. application? Here it is. I have BIG data. I need HELP..!! Oh..!! But, it is not made for me. Can’t make use of it as is

  6. 1969 Kindly let me get Kindly let me the technology understand your you have needs

  7. Cool Database mmm…Let me technology..!! check with my Can I use it in my good friends there. HELP..!! I have application? My pleasure. BIG data. Your Here it is. technology is not Oh..!! But, it is not helping me made for me. Can’t make use of it as is

  8. Kindly let me Kindly let me get the understand your technology you needs have

  9. Cool Big Data technology..!! Let me check with Can I use it in my application? my other good friends there. My pleasure. Here it is. Oh..!! But, it’s not made for me. Can’t HELP..!! Again, Sorry, seems like I have BIG data. make use of it as is the DBMS Your technology is technology cannot not helping me scale more

  10. Kindly let me get the technology you have Kindly let me understand your needs

  11. Big Spatial Data

  12. Tons of Spatial data out there… Geotagged Pictures Geotagged Microblogs Medical Data Sensor Networks Smart Phones Satellite Images Traffic Data VGI

  13.  SpatialHadoop Spatial Data & Hadoop Spatial Data Hadoop SpatialHadoop points = LOAD ’points’ AS points = LOAD ’points’ AS (id:int, x:int, y:int); (id:int, location:point); result = FILTER points BY result = FILTER points BY x < xmax AND x >= xmin AND Overlap(location, rectangle y < ymax AND y >= ymin; (xmin, ymin, xmax, ymax)); Takes 193 seconds Finishes in 2 seconds

  14. Spatial Operations Visualization Spatial Language Spatial Indexes 80,000 downloads Conducted more than seven in one year keynotes, tutorials, and invited talks Industry Academia Students Projects Collaboration >500GB public datasets for benchmarking and testing University of Genova

  15. The Built-in Approach of SpatialHadoop The On-top From Scratch The Built-in Approach Approach Approach (SpatialHadoop) (Spatial) Spatial Modules User Program + User Programs MapReduce Spatial User Programs Language APIs Hadoop Pig Java + Latin Pig Hadoop APIS Job Spatial Latin Java APIS Monitoring Operators Job Monitoring and and Scheduling Job Monitoring Scheduling + and Scheduling MapReduce Early MapReduce Pruning Runtime Runtime MapReduce + Runtime Spatial Storage Storage (HDFS) Indexing + Storage (HDFS) …

  16. Agenda The ecosystem of SpatialHadoop Motivation Internal system design Applications Related work Performance results Open Research Problems

  17. SpatialHadoop Architecture Applications: SHAHED [ICDE’15] – MNTG [SSTD’13, ICDE’14] TAREEG[SIGMOD’14, SIGSPATIAL’14] Language Visualization Pigeon [ICDE’14] [VLDB’15, ICDE’16 ] VLDB’13 ST-Hadoop Basic operations – CG_Hadoop ICDE’15 Operations [SIGSPATIAL’13] Spatial File Splitter MapReduce Spatial Record Reader Grid – R-tree – R+-tree – Quad tree Indexing [VLDB’15]

  18. Indexing Applications: SHAHED [ICDE’15] – MNTG [SSTD’13, ICDE’14] TAREEG[SIGMOD’14, SIGSPATIAL’14] Language Visualization Pigeon [ICDE’14] [VLDB’15, ICDE’16 ] VLDB’13 ST-Hadoop Basic operations – CG_Hadoop ICDE’15 Operations [SIGSPATIAL’13] Spatial File Splitter MapReduce Spatial Record Reader Grid – R-tree – R+-tree – Quad tree Indexing [VLDB’15]

  19. Data Loading in Hadoop Blindly chops down a Input File Data Nodes big file into 128MB chunks 128MB Values of records are not considered 128MB Relevant records are typically assigned to 128MB two different blocks HDFS is too restrictive where files cannot be 128MB modified

  20. Spatial Distributed File System Default Partitioning Spatial Partitioning

  21. Uniform Grid Works only for uniformly distributed data

  22. R-tree Read a sample Bulk load the sample into an R-tree Leaf node capacity C 𝑙. 𝐶 𝐷 = 𝑆 (1 + 𝛽) k: Sample size B: HDFS Block capacity |R|: Input size α: Index overhead Use MBR of leaf nodes as partition boundaries

  23. R-tree Read a sample Bulk load the sample into an R-tree Leaf node capacity C 𝑙. 𝐶 𝐷 = 𝑆 (1 + 𝛽) k: Sample size B: HDFS Block capacity |R|: Input size α: Index overhead Use MBR of leaf nodes as partition boundaries Partition the data

  24. R-tree Read a sample Bulk load the sample into an R-tree Leaf node capacity C 𝑙. 𝐶 𝐷 = 𝑆 (1 + 𝛽) k: Sample size B: HDFS Block capacity |R|: Input size α: Index overhead Use MBR of leaf nodes as partition boundaries Partition the data Optional: Build R-tree Local indexes

  25. R-tree-based Index of a 400 GB road network

  26. Non-indexed Heap File

  27. Operations Applications: SHAHED [ICDE’15] – MNTG [SSTD’13, ICDE’14] TAREEG[SIGMOD’14, SIGSPATIAL’14] Language Visualization Pigeon [ICDE’14] [VLDB’15, ICDE’16 ] VLDB’13 ST-Hadoop Basic operations – CG_Hadoop ICDE’15 Operations [SIGSPATIAL’13] Spatial File Splitter MapReduce Spatial Record Reader Grid – R-tree – R+-tree – Quad tree Indexing [VLDB’15]

  28. Operations Layer Basic Operations : e.g, Range query and KNN Spatial Join Operations Computational geometry operations: e.g., Polygon Union, Voronoi diagram, Delaunay Triangulation, and Convex Hull User-defined operations: e.g., kNN join

  29. Range Query Use the global index Use local indexes to to prune disjoint find matching records partitions

  30. KNN over Indexed Data First iteration runs as before and result is tested for correctness  Answer is incorrect Second iteration processes other blocks that might contain an answer  Answer is correct k=3

  31. Spatial Join Partition – Join Join Directly

  32. Spatial Join Partition – Join Join Directly Total of 36 overlapping pairs Only 16 overlapping pairs

  33. CG_Hadoop 260x Polygon Union Skyline 29x 1x Single Spatial Convex Hull Delaunay Voronoi Machine Hadoop Hadoop Farthest/closest pair Triangulation Diagram A. Eldawy, Y. Li, M. F. Mokbel, R. Janardan . “ CG_Hadoop : Computational Geometry in MapReduce”, ACM SIGSPATIAL’13

  34. Convex Hull Find the minimal convex polygon that contains all points Input Output

  35. Convex Hull in CG_Hadoop Hadoop SpatialHadoop  Partition  Pruning  Local hull  Global hull

  36. Advanced Analytics (Ongoing work) Partitioning Local VD Pruning Vertical Merge Pruning Horizontal Merge Final output

  37. Applications Applications: SHAHED [ICDE’15] – MNTG [SSTD’13, ICDE’14] TAREEG[SIGMOD’14, SIGSPATIAL’14] Language Visualization Pigeon [ICDE’14] [VLDB’15 , ICDE’16] VLDB’13 ST-Hadoop Basic operations – CG_Hadoop ICDE’15 Operations [SIGSPATIAL’13,] Spatial File Splitter MapReduce Spatial Record Reader Grid – R-tree – R+-tree – Quad tree Indexing [VLDB’15]

  38. SHAHED – A system for querying and visualizing spatio-temporal satellite data http://shahed.cs.umn.edu/ Visualize animated heat maps or still images Run spatio-temporal selection and aggregate queries A. Eldawy et al . “SHAHED: A MapReduce -based System for Querying and Visualizing Spatio- temporal Satellite Data”, IEEE ICDE’15 (Best poster runner-up) A. Eldawy et al . “A Demonstration of SHAHED: A MapReduce - based System for Querying and Visualizing Satellite Data”, IEEE ICDE’15

  39. TAREEG – Web-based extractor for OpenStreetMap data using MapReduce http://tareeg.net/ L. Alarabi, A. Eldawy, R. Alghamdi , M. F. Mokbel. “TAREEG: A MapReduce -Based System for Extracting Spatial Data from OpenStreetMap ”, ACM SIGSPATIAL’14 ___ “TAREEG: A MapReduce -Based Web Service for Extracting Spatial Data from OpenStreetMap ”, SIGMOD’14

  40. Agenda The ecosystem of SpatialHadoop Motivation Internal system design Applications Related work Performance Results Other research projects Future work

  41. Other Big Spatial Data Systems Parallel ESRI Tools for Hadoop SpatialHadoop is the only extensible system that can be easily expanded by researchers and developers A. Eldawy and M. Mokbel. “The Era of Big Spatial Data: A Survey”, Foundations and Trends in Databases 2016

  42. Performance Results Spatial Join Throughput of Range Running time with different indexes Query RUNNING TIME (SEC) SpatialHadoop 100 2500 2000 10 500X 1500 1 1000 1 2 4 8 16 64 128 Hilbert 500 K-d 0.1 Hadoop Quad 0 0.01 Speedup of CG_Hadoop Visualization Speedup 48X 260X 60 300 40 200 20 100 0 0 Scatter Roads Heatmap Satellite Vector Border Union Voronoi Skyline Convex Closest Farthest Plot Map Lines Hull Pair Pair Baseline HadoopViz Baseline Hadoop SpatialHadoop

  43. Agenda The ecosystem of SpatialHadoop Motivation System design Applications Related work Performance results Future directions

Recommend


More recommend