gist scan acceleration using coprocessors
play

GiST Scan Acceleration Using Coprocessors Felix Beier, Torsten - PowerPoint PPT Presentation

Introduction GiST Hardware Abstraction Layer Evaluation Summary GiST Scan Acceleration Using Coprocessors Felix Beier, Torsten Kilias, Kai-Uwe Sattler Ilmenau University of Technology 05/21/2012 1 / 18 Introduction GiST Hardware


  1. Introduction GiST Hardware Abstraction Layer Evaluation Summary GiST Scan Acceleration Using Coprocessors Felix Beier, Torsten Kilias, Kai-Uwe Sattler Ilmenau University of Technology 05/21/2012 1 / 18

  2. Introduction GiST Hardware Abstraction Layer Evaluation Summary Outline Introduction 1 GiST Hardware Abstraction Layer 2 Evaluation 3 Summary 4 2 / 18

  3. Introduction GiST Hardware Abstraction Layer Evaluation Summary Co-processing Index Searches Ray tracing: many independent point queries Collision detection (spatial join): many independent range queries Source: [1] Utilization of massive parallelism offered by modern coprocessors → Special index structures carefully tuned for specific hardware Source: [5] 3 / 18

  4. Introduction GiST Hardware Abstraction Layer Evaluation Summary Index Frameworks Various applications require specialized index structures Scientific data Enormous data volumes Unknown data Source: [3] characteristics Costly prototyping Gap: scientists vs. system developers → Rapid index development with frameworks like GiST Source: [2] 4 / 18

  5. Introduction GiST Hardware Abstraction Layer Evaluation Summary GiST - Generalized Index Search Tree Framework for implementation of height-balanced search trees Implements common tree operations (insertions, deletions, node splits, height-balancing) Developer specifies key data type and type-specific operations Lookup predicate returns false: entry can not be found in child subtree true: entry may be found in child subtree Example: R-tree Key type: minimal bounding rectangles key1 key2 ... Predicate: rectangle intersection test Internal Nodes (directory) Source: [4] Leaf Nodes (linked list) 5 / 18

  6. Introduction GiST Hardware Abstraction Layer Evaluation Summary Motivation Combining the best of both worlds Extensibility of GiST framework Performance improvements through co-processing Challenges Finding fine-grained parallel algorithms Utilizing hardware capabilities Out-of-core implementation Consideration of co-processing overheads 6 / 18

  7. Introduction GiST Hardware Abstraction Layer Evaluation Summary Framework Design Application Applications issue stream of query ... batches Query Stream - s e d o N , i N , j . . . N N g k n i h . . . c Iterator for matching leaf nodes t a M N j , . . . Index Framework Grouping queries to node batches Control Layer Node Buffer Result Sets root ... Ni, Nj, ... for better locality N1 Scheduler ... - ... ... ... ... Execution Layer Specialized scan implementation Worker Thread 1 ... Worker Thread T Matching Matching Inner Nodes Leaf Nodes for various (co)processors CPUs GPUs ... Automatic scheduling to best nod nod nod nod nod e i e i e i e i e i execution unit Memory Layer root root Out-of-core implementation N 1 N M N 1 N M N k N k ... ... ... Buffer Pool 7 / 18

  8. Introduction GiST Hardware Abstraction Layer Evaluation Summary Scan Parallelization Inter-Node Parallelization: Intra-Node Parallelization: Independent node batches Independent predicate tests Pipelining SIMD features Query Queue Query Queue Query Queue Query Queue Layer root Query Queue Query Queue 0 child nodes node i node j node k child nodes child nodes child nodes N 1 N m Query Queue Query Queue Query Queue Query Queue 1 child nodes ... child nodes ... N k ... ... 2 child nodes child nodes child nodes child nodes max matching ... ... ... ... child nodes 8 / 18

  9. Introduction GiST Hardware Abstraction Layer Evaluation Summary GPU Implementation - Processing Model Nvidia CUDA implementation Multiple cores per GPU device Execution of independent subtasks without synchronization Subtask = node - query batch pair Multiple thread processors per GPU core Execution of data parallel instructions by separate threads Synchronization possible Instruction = predicate test 9 / 18

  10. Introduction GiST Hardware Abstraction Layer Evaluation Summary GPU Implementation - Memory Hierarchy Global main memory on device Explicit transfer from host memory via PCIe bus Caching of node and query data to avoid transfer overhead Scan preparation phase to determine input data offsets Shared memory on each die Two orders of magnitude faster than global memory Software-controlled cache for scan data 10 / 18

  11. Introduction GiST Hardware Abstraction Layer Evaluation Summary Setup 3-D R-tree implementation Generated index nodes Intel Xeon X5690 CPU Nvidia Tesla C2050 GPU → Where shall a scan be executed? → How can the performance be improved with hybrid processing? 11 / 18

  12. Introduction GiST Hardware Abstraction Layer Evaluation Summary Tree Parameters 1.8 Generated index nodes 1.6 1.8 1.4 1.6 1.2 Generated predicates 1.4 1 1.2 0.8 1 Full processor utilization 0.6 0.8 0.6 0.4 0.4 Overheads included for GPU 128 measurements 96 0 64 128 number of slots 192 64 256 speedup = CPU time 320 384 number of queries per task 448 32 512 GPU time 12 / 18

  13. Introduction GiST Hardware Abstraction Layer Evaluation Summary Workload Simulation CPU for small batches GPU for large batches How do batch sizes change when queries are streamed through the tree? Simulation for full R-tree 96 slots per node 5 layers → 8 billion indexed entries 10.000 root queries 13 / 18

  14. Introduction GiST Hardware Abstraction Layer Evaluation Summary Workload Simulation - Parameter Correlation !"#$%&'()&*+,-'&'#$.()#/'0(123# +(,-./012,$'",-3.340565072,#!!!!,8//0,9:3;5312,'#$,<=>?=04@-5A3 #!!" %'" %!" BCD,EF.7 *!" !"#$%&'()&*+,-'&'#$ GCD,EF.7 !'53$%-'(!"#$%&' $'" 83=. (!" HI3=. $!" HI3=.,61,BCD HI3=.,61,GCD #'" &!" HI3=.,61,83=. #!" 83=.,61,BCD $!" 83=.,61,GCD '" !" !" !" #!" $!" %!" &!" '!" (!" )!" *!" +!" 4,++'53$%,# 14 / 18

  15. Introduction GiST Hardware Abstraction Layer Evaluation Summary Workload Simulation - Parameter Selectivity !"#$%&'()&*+,-'&'#$.()#/'0(123# +(,-./012,$'",K:L.54=0312,#!!!!,8//0,9:3;5312,'#$,<=>?=04@-5A3 #!!" )!" (!" BCD,EF.7 !"#$%&'()&*+,-'&'#$ *!" GCD,EF.7 '!" !'53$%-'(!"#$%&' 83=. &!" HI3=. (!" HI3=.,61,BCD %!" HI3=.,61,GCD &!" $!" HI3=.,61,83=. 83=.,61,BCD #!" 83=.,61,GCD $!" !" !" J#!" '" #!" #'" $!" $'" %!" %'" &!" &'" '!" 1'5'2$%-%$6 15 / 18

  16. Introduction GiST Hardware Abstraction Layer Evaluation Summary Conclusion & Outlook Conclusion Extended GiST with hardware abstraction layer Performance improvements are possible Overheads are not negligible! Next steps Prototype improvements Specialization for other tree types Full support for all GiST operations 16 / 18

  17. Introduction GiST Hardware Abstraction Layer Evaluation Summary References [1] http://en.wikipedia.org/wiki/File:Ray_trace_diagram.svg . [2] Science and Technology Review, June 2007. ”Virtual Dams Subjected to Strong Earthquakes”. [3] Chourasia, A., Olsen, K., Cui, Y., Lee, K., Zhou, J., Ely, G., Small, P., Roten, D., Day, S., Maechling, P., Jordan, T., Panda, D. K., and Levesque, J. Ground motion visualization of M8 earthquake simulation using height field. In SciDAC (2011). Available at http://www.mcs.anl.gov/uploads/cels/papers/scidac11/ . [4] Hellerstein, J. M., Naughton, J. F., and Pfeffer, A. Generalized Search Trees for Database Systems. In VLDB (1995). [5] Kavan, L., and Zara, J. Fast Collision Detection for Skeletally Deformable Models. Computer Graphics Forum 24 , 3 (2005), 363–372. 17 / 18

  18. Introduction GiST Hardware Abstraction Layer Evaluation Summary Discussion Thanks for your attention! Questions? 18 / 18

Recommend


More recommend