main memory adaptive indexing for multi core systems
play

Main Memory Adaptive Indexing for Multi-core Systems Felix Martin - PowerPoint PPT Presentation

SIGMOD DaMoN 23.06.2014 Main Memory Adaptive Indexing for Multi-core Systems Felix Martin Schuhknecht Victor Alvarez Jens Dittrich Stefan Richter Information Systems Group Saarland University https://infosys.uni-saarland.de/ Problem:


  1. Single-threaded algorithms 1 Thread, 100 Million Elements 25 Accumulated Query Response Time [s] Standard Cracking (SC) Hybrid Crack Sort (HCS) Coarse-granular Index (CGI) 20 Radix Sort (RS) STL std::sort (STL-S) 15 10 5 0 1 10 100 1000 10000 Query Sequence [The Uncracked Pieces in Database Cracking. F. M. Schuhknecht, A. Jindal, J. Dittrich. In PVLDB 2013] 9 / 30

  2. Single-threaded algorithms 1 Thread, 100 Million Elements 25 Accumulated Query Response Time [s] Standard Cracking (SC) Hybrid Crack Sort (HCS) Coarse-granular Index (CGI) 20 Radix Sort (RS) STL std::sort (STL-S) 15 750 Queries 10 5 0 1 10 100 1000 10000 Query Sequence [The Uncracked Pieces in Database Cracking. F. M. Schuhknecht, A. Jindal, J. Dittrich. In PVLDB 2013] 9 / 30

  3. Single-threaded algorithms 1 Thread, 100 Million Elements 25 Accumulated Query Response Time [s] Standard Cracking (SC) Hybrid Crack Sort (HCS) Coarse-granular Index (CGI) 20 Radix Sort (RS) STL std::sort (STL-S) 15 750 Queries > 10000 10 Queries 5 0 1 10 100 1000 10000 Query Sequence [The Uncracked Pieces in Database Cracking. F. M. Schuhknecht, A. Jindal, J. Dittrich. In PVLDB 2013] 9 / 30

  4. Multi-threaded environments? Multi-threaded algorithms! 10 / 30

  5. Multi-threaded algorithms: Parallel Standard Cracking (P-SC) [Concurrency control for adaptive indexing. G.Graefe, F.Halim, S.Idreos, H.Kuno, S.Manegold. In PVLDB 2013] 11 / 30

  6. Multi-threaded algorithms: Parallel Standard Cracking (P-SC) Q1 [Concurrency control for adaptive indexing. G.Graefe, F.Halim, S.Idreos, H.Kuno, S.Manegold. In PVLDB 2013] 11 / 30

  7. Multi-threaded algorithms: Parallel Standard Cracking (P-SC) Q1 Q2 [Concurrency control for adaptive indexing. G.Graefe, F.Halim, S.Idreos, H.Kuno, S.Manegold. In PVLDB 2013] 11 / 30

  8. Multi-threaded algorithms: Parallel Standard Cracking (P-SC) Q1 T1 Q2 T2 [Concurrency control for adaptive indexing. G.Graefe, F.Halim, S.Idreos, H.Kuno, S.Manegold. In PVLDB 2013] 11 / 30

  9. Multi-threaded algorithms: Parallel Standard Cracking (P-SC) Requested Locks Q1 Q2 W Q1 R T1 R W R R R R Q2 T2 W R W [Concurrency control for adaptive indexing. G.Graefe, F.Halim, S.Idreos, H.Kuno, S.Manegold. In PVLDB 2013] 11 / 30

  10. � � Multi-threaded algorithms: Parallel Standard Cracking (P-SC) Requested Locks Q1 Q2 ✓ W Q1 ✓ R T1 ⚡ R W ✓ R R ✓ R R Q2 T2 ⚡ W R ✓ W [Concurrency control for adaptive indexing. G.Graefe, F.Halim, S.Idreos, H.Kuno, S.Manegold. In PVLDB 2013] 11 / 30

  11. � � Multi-threaded algorithms: Parallel Standard Cracking (P-SC) Requested Locks Q1 Q2 ✓ W Q1 ✓ R T1 ⚡ R W ✓ Inter-query R R parallelism ✓ R R Q2 T2 ⚡ W R ✓ W [Concurrency control for adaptive indexing. G.Graefe, F.Halim, S.Idreos, H.Kuno, S.Manegold. In PVLDB 2013] 11 / 30

  12. � � Multi-threaded algorithms: Parallel Standard Cracking (P-SC) Requested Locks Q1 Q2 ✓ W Q1 ✓ R T1 ⚡ R W ✓ Inter-query R R parallelism Lock ✓ contention R R Q2 T2 ⚡ W R ✓ W [Concurrency control for adaptive indexing. G.Graefe, F.Halim, S.Idreos, H.Kuno, S.Manegold. In PVLDB 2013] 11 / 30

  13. � � Multi-threaded algorithms: Parallel Standard Cracking (P-SC) Requested Locks Q1 Q2 ✓ W Q1 ✓ R T1 ⚡ R W ✓ Inter-query R R parallelism Lock ✓ contention R R Q2 T2 ⚡ W R ✓ W Underutilize resources (T3, T4, T5, ...) [Concurrency control for adaptive indexing. G.Graefe, F.Halim, S.Idreos, H.Kuno, S.Manegold. In PVLDB 2013] 11 / 30

  14. Multi-threaded algorithms: Parallel-chunked Standard Cracking (P-CSC) Query 12 / 30

  15. Multi-threaded algorithms: Parallel-chunked Standard Cracking (P-CSC) Cracker Index Cracker Index Query Cracker Index Cracker Index k Chunks 12 / 30

  16. Multi-threaded algorithms: Parallel-chunked Standard Cracking (P-CSC) Cracker Index Cracker Index Query Cracker Index Cracker Index k Chunks 12 / 30

  17. Multi-threaded algorithms: Parallel-chunked Standard Cracking (P-CSC) T1 Cracker Index T2 Cracker Index Query T3 Cracker Index Tk Cracker Index k Chunks 12 / 30

  18. Multi-threaded algorithms: Parallel-chunked Standard Cracking (P-CSC) T1 Local Result Cracker Index T2 Local Result Cracker Index Query T3 Local Result Cracker Index Tk Local Result Cracker Index k Chunks 12 / 30

  19. Multi-threaded algorithms: Parallel-chunked Standard Cracking (P-CSC) T1 Cracker Index T2 Cracker Index Query T3 Cracker Index Tk Cracker Index k Chunks 12 / 30

  20. Multi-threaded algorithms: Parallel-chunked Standard Cracking (P-CSC) T1 Cracker Index T2 Cracker Index Query T3 Cracker Index Tk Cracker Index k Chunks 13 / 30

  21. Multi-threaded algorithms: Parallel-chunked Standard Cracking (P-CSC) T1 Cracker Index T2 Cracker Index Query T3 Cracker Index Tk Cracker Index k Chunks 13 / 30

  22. Multi-threaded algorithms: Parallel-chunked Standard Cracking (P-CSC) T1 Cracker Index T2 Cracker Index Complete Query independence T3 Cracker Index Tk Cracker Index k Chunks 13 / 30

  23. Multi-threaded algorithms: Parallel-chunked Standard Cracking (P-CSC) T1 Cracker Fully utilize Index resources T2 Cracker Index Complete Query independence T3 Cracker Index Tk Cracker Index k Chunks 13 / 30

  24. Multi-threaded algorithms: Parallel-chunked Standard Cracking (P-CSC) T1 Cracker Fully utilize Index resources T2 Cracker Index Complete Query independence T3 Cracker Index No Tk consecutive Cracker Index result k Chunks 13 / 30

  25. Micro Benchmark Reading 1% from k locations using one thread 10 7.5 Time [s] 5 2.5 0 1 10 100 1000 10000 100000 1000000 Number of Chunks (k) 14 / 30

  26. Micro Benchmark Reading 1% from k locations using one thread 10 7.5 Time [s] 5 No problem for realistic k 2.5 0 1 10 100 1000 10000 100000 1000000 Number of Chunks (k) 14 / 30

  27. Multi-threaded algorithms: Parallel Coarse-Granular Index (P-CGI) A 15 / 30

  28. Multi-threaded algorithms: Parallel Coarse-Granular Index (P-CGI) A Index(A) Index on A 1. Range-partition while copying 1024 partitions 15 / 30

  29. Multi-threaded algorithms: Parallel Coarse-Granular Index (P-CGI) A Index(A) Index on A 1. Range-partition while copying Cracker Index 1024 partitions 15 / 30

  30. Multi-threaded algorithms: Parallel Coarse-Granular Index (P-CGI) A Index(A) Index on A 1. Range-partition while copying Cracker Index Query 2. Perform P-SC 1024 partitions 15 / 30

  31. Multi-threaded algorithms: Parallel Coarse-Granular Index (P-CGI) A Index(A) Index on A 1. Range-partition while copying Cracker Index W Query R 2. Perform P-SC R W 1024 partitions 15 / 30

  32. Multi-threaded algorithms: Parallel Coarse-Granular Index (P-CGI) A Index(A) Like starting ... Index on A 1. Range-partition while copying Cracker Index W Query R ... after 1000 2. Perform P-SC R cracks W 1024 partitions 15 / 30

  33. Multi-threaded algorithms: Parallel Coarse-Granular Index (P-CGI) Reduces lock contention of P-SC A Index(A) Like starting ... Index on A 1. Range-partition while copying Cracker Index W Query R ... after 1000 2. Perform P-SC R cracks W 1024 partitions 15 / 30

  34. Multi-threaded algorithms: Parallel Coarse-Granular Index (P-CGI) Reduces lock contention of P-SC A Index(A) Like starting ... Index on A 1. Range-partition while copying Cracker Index W Query R ... after 1000 2. Perform P-SC R cracks W 1024 partitions Reduces Variance 15 / 30

  35. Multi-threaded algorithms: Parallel Coarse-Granular Index (P-CGI) Reduces lock contention of P-SC Adds (small) A Index(A) initialization time Like starting ... Index on A 1. Range-partition while copying Cracker Index W Query R ... after 1000 2. Perform P-SC R cracks W 1024 partitions Reduces Variance 15 / 30

  36. Multi-threaded algorithms: Parallel Coarse-Granular Index (P-CGI) Reduces lock contention of P-SC Adds (small) A Index(A) initialization time Like starting ... Index on A 1. Range-partition while copying Cracker Index How to do? W Query R ... after 1000 2. Perform P-SC R cracks W 1024 partitions Reduces Variance 15 / 30

  37. Multi-threaded algorithms: Parallel Coarse-Granular Index (P-CGI): Parallel Range Partitioning Source Destination A B # Elements n t 1 Thread 1 # Threads (k) k Thread 2 t 2 t 1 Thread 1 t 2 Thread 2 . . . t k Thread k . . . t k Thread k 16 / 30

  38. Multi-threaded algorithms: Parallel Coarse-Granular Index (P-CGI): Parallel Range Partitioning Source Destination A B # Elements n t 1 Thread 1 # Threads (k) k Thread 2 t 2 t 1 Thread 1 t 2 Thread 2 1. Build Histogram . . . t k Thread k . . . t k Thread k 16 / 30

  39. Multi-threaded algorithms: Parallel Coarse-Granular Index (P-CGI): Parallel Range Partitioning Source Destination A B # Elements Range-partition n t 1 Thread 1 # Threads (k) k Thread 2 t 2 t 1 Thread 1 t 2 Thread 2 1. Build Histogram . . . t k Thread k . . . t k Thread k 16 / 30

  40. Multi-threaded algorithms: Parallel Coarse-Granular Index (P-CGI): Parallel Range Partitioning Source Destination A B # Elements Range-partition n t 1 Thread 1 # Threads (k) k Thread 2 t 2 t 1 Thread 1 t 2 Thread 2 1. Build Histogram . . . t k Thread k 2. Copy entries . . . t k Thread k 16 / 30

  41. Multi-threaded algorithms: Parallel Coarse-Granular Index (P-CGI): Parallel Range Partitioning Source Destination A B # Elements Range-partition n t 1 Thread 1 # Threads (k) k Thread 2 t 2 t 1 Thread 1 t 2 Thread 2 1. Build Histogram . . . t k Thread k 2. Copy entries . . . t k Thread k 16 / 30

  42. Multi-threaded algorithms: Parallel Coarse-Granular Index (P-CGI): Parallel Range Partitioning Source Destination A B # Elements Range-partition n t 1 Thread 1 # Threads (k) k Thread 2 t 2 t 1 Thread 1 t 2 Thread 2 1. Build Histogram . . . t k Thread k 2. Copy entries . . . No locks required t k Thread k 16 / 30

  43. Multi-threaded algorithms: Parallel Coarse-Granular Index (P-CGI): Parallel Range Partitioning Source Destination A B # Elements Range-partition n t 1 Thread 1 # Threads (k) k Thread 2 t 2 t 1 Thread 1 t 2 Thread 2 1. Build Histogram . . . t k Thread k 2. Copy entries . . . No locks required Fully utilize t k Thread k resources 16 / 30

  44. Multi-threaded algorithms: Parallel Coarse-Granular Index (P-CGI): Parallel Range Partitioning Source Destination A B # Elements Range-partition n t 1 Thread 1 # Threads (k) k NUMA- fragmented memory Thread 2 t 2 t 1 Thread 1 t 2 Thread 2 1. Build Histogram . . . t k Thread k 2. Copy entries . . . No locks required Fully utilize t k Thread k resources 16 / 30

  45. Multi-threaded algorithms: Parallel-chunked Standard Cracking (P-CSC) T1 Cracker Index T2 Cracker Index T3 Cracker Index Tk Cracker Index k Chunks 17 / 30

  46. Multi-threaded algorithms: Parallel-chunked Standard Cracking (P-CSC) Coarse-Granular Index (P-CCGI) T1 Cracker Index T2 Cracker Index T3 Cracker Index Tk Cracker Index k Chunks 17 / 30

  47. Multi-threaded algorithms: Parallel-chunked Standard Cracking (P-CSC) Coarse-Granular Index (P-CCGI) Range-partitioning T1 Cracker Index T2 Cracker Index T3 Cracker Index Tk Cracker Index k Chunks 17 / 30

  48. Multi-threaded algorithms: Parallel-chunked Standard Cracking (P-CSC) Coarse-Granular Index (P-CCGI) Range-partitioning T1 Cracker Index T2 Cracker Index Query T3 Cracker Index Tk Cracker Index k Chunks 17 / 30

  49. Multi-threaded algorithms: Parallel-chunked Standard Cracking (P-CSC) Coarse-Granular Index (P-CCGI) Range-partitioning T1 Local Result Cracker Index T2 Local Result Cracker Index Query T3 Local Result Cracker Index Tk Local Result Cracker Index k Chunks 17 / 30

  50. Multi-threaded algorithms: Parallel-chunked Standard Cracking (P-CSC) Coarse-Granular Index (P-CCGI) P-CSC + Range Partitioning Range-partitioning T1 Local Result Cracker Index T2 Local Result Cracker Index Query T3 Local Result Cracker Index Tk Local Result Cracker Index k Chunks 17 / 30

  51. Multi-threaded algorithms: Parallel Range-Partitioned Radix Sort (P-RPRS) A 18 / 30

  52. Multi-threaded algorithms: Parallel Range-Partitioned Radix Sort (P-RPRS) A Index(A) 1. Range-partition while copying 1024 partitions 18 / 30

  53. Multi-threaded algorithms: Parallel Range-Partitioned Radix Sort (P-RPRS) Index(A) A Index(A) 2. Perform in-place 1. Range-partition radix sort on while copying each partition Fully sorted 1024 partitions 18 / 30

  54. Multi-threaded algorithms: Parallel Range-Partitioned Radix Sort (P-RPRS) Index(A) A Index(A) 2. Perform in-place 1. Range-partition radix sort on while copying each partition shared with P-CCGI Fully sorted 1024 partitions 18 / 30

Recommend


More recommend