Single-threaded algorithms 1 Thread, 100 Million Elements 25 Accumulated Query Response Time [s] Standard Cracking (SC) Hybrid Crack Sort (HCS) Coarse-granular Index (CGI) 20 Radix Sort (RS) STL std::sort (STL-S) 15 10 5 0 1 10 100 1000 10000 Query Sequence [The Uncracked Pieces in Database Cracking. F. M. Schuhknecht, A. Jindal, J. Dittrich. In PVLDB 2013] 9 / 30
Single-threaded algorithms 1 Thread, 100 Million Elements 25 Accumulated Query Response Time [s] Standard Cracking (SC) Hybrid Crack Sort (HCS) Coarse-granular Index (CGI) 20 Radix Sort (RS) STL std::sort (STL-S) 15 750 Queries 10 5 0 1 10 100 1000 10000 Query Sequence [The Uncracked Pieces in Database Cracking. F. M. Schuhknecht, A. Jindal, J. Dittrich. In PVLDB 2013] 9 / 30
Single-threaded algorithms 1 Thread, 100 Million Elements 25 Accumulated Query Response Time [s] Standard Cracking (SC) Hybrid Crack Sort (HCS) Coarse-granular Index (CGI) 20 Radix Sort (RS) STL std::sort (STL-S) 15 750 Queries > 10000 10 Queries 5 0 1 10 100 1000 10000 Query Sequence [The Uncracked Pieces in Database Cracking. F. M. Schuhknecht, A. Jindal, J. Dittrich. In PVLDB 2013] 9 / 30
Multi-threaded environments? Multi-threaded algorithms! 10 / 30
Multi-threaded algorithms: Parallel Standard Cracking (P-SC) [Concurrency control for adaptive indexing. G.Graefe, F.Halim, S.Idreos, H.Kuno, S.Manegold. In PVLDB 2013] 11 / 30
Multi-threaded algorithms: Parallel Standard Cracking (P-SC) Q1 [Concurrency control for adaptive indexing. G.Graefe, F.Halim, S.Idreos, H.Kuno, S.Manegold. In PVLDB 2013] 11 / 30
Multi-threaded algorithms: Parallel Standard Cracking (P-SC) Q1 Q2 [Concurrency control for adaptive indexing. G.Graefe, F.Halim, S.Idreos, H.Kuno, S.Manegold. In PVLDB 2013] 11 / 30
Multi-threaded algorithms: Parallel Standard Cracking (P-SC) Q1 T1 Q2 T2 [Concurrency control for adaptive indexing. G.Graefe, F.Halim, S.Idreos, H.Kuno, S.Manegold. In PVLDB 2013] 11 / 30
Multi-threaded algorithms: Parallel Standard Cracking (P-SC) Requested Locks Q1 Q2 W Q1 R T1 R W R R R R Q2 T2 W R W [Concurrency control for adaptive indexing. G.Graefe, F.Halim, S.Idreos, H.Kuno, S.Manegold. In PVLDB 2013] 11 / 30
� � Multi-threaded algorithms: Parallel Standard Cracking (P-SC) Requested Locks Q1 Q2 ✓ W Q1 ✓ R T1 ⚡ R W ✓ R R ✓ R R Q2 T2 ⚡ W R ✓ W [Concurrency control for adaptive indexing. G.Graefe, F.Halim, S.Idreos, H.Kuno, S.Manegold. In PVLDB 2013] 11 / 30
� � Multi-threaded algorithms: Parallel Standard Cracking (P-SC) Requested Locks Q1 Q2 ✓ W Q1 ✓ R T1 ⚡ R W ✓ Inter-query R R parallelism ✓ R R Q2 T2 ⚡ W R ✓ W [Concurrency control for adaptive indexing. G.Graefe, F.Halim, S.Idreos, H.Kuno, S.Manegold. In PVLDB 2013] 11 / 30
� � Multi-threaded algorithms: Parallel Standard Cracking (P-SC) Requested Locks Q1 Q2 ✓ W Q1 ✓ R T1 ⚡ R W ✓ Inter-query R R parallelism Lock ✓ contention R R Q2 T2 ⚡ W R ✓ W [Concurrency control for adaptive indexing. G.Graefe, F.Halim, S.Idreos, H.Kuno, S.Manegold. In PVLDB 2013] 11 / 30
� � Multi-threaded algorithms: Parallel Standard Cracking (P-SC) Requested Locks Q1 Q2 ✓ W Q1 ✓ R T1 ⚡ R W ✓ Inter-query R R parallelism Lock ✓ contention R R Q2 T2 ⚡ W R ✓ W Underutilize resources (T3, T4, T5, ...) [Concurrency control for adaptive indexing. G.Graefe, F.Halim, S.Idreos, H.Kuno, S.Manegold. In PVLDB 2013] 11 / 30
Multi-threaded algorithms: Parallel-chunked Standard Cracking (P-CSC) Query 12 / 30
Multi-threaded algorithms: Parallel-chunked Standard Cracking (P-CSC) Cracker Index Cracker Index Query Cracker Index Cracker Index k Chunks 12 / 30
Multi-threaded algorithms: Parallel-chunked Standard Cracking (P-CSC) Cracker Index Cracker Index Query Cracker Index Cracker Index k Chunks 12 / 30
Multi-threaded algorithms: Parallel-chunked Standard Cracking (P-CSC) T1 Cracker Index T2 Cracker Index Query T3 Cracker Index Tk Cracker Index k Chunks 12 / 30
Multi-threaded algorithms: Parallel-chunked Standard Cracking (P-CSC) T1 Local Result Cracker Index T2 Local Result Cracker Index Query T3 Local Result Cracker Index Tk Local Result Cracker Index k Chunks 12 / 30
Multi-threaded algorithms: Parallel-chunked Standard Cracking (P-CSC) T1 Cracker Index T2 Cracker Index Query T3 Cracker Index Tk Cracker Index k Chunks 12 / 30
Multi-threaded algorithms: Parallel-chunked Standard Cracking (P-CSC) T1 Cracker Index T2 Cracker Index Query T3 Cracker Index Tk Cracker Index k Chunks 13 / 30
Multi-threaded algorithms: Parallel-chunked Standard Cracking (P-CSC) T1 Cracker Index T2 Cracker Index Query T3 Cracker Index Tk Cracker Index k Chunks 13 / 30
Multi-threaded algorithms: Parallel-chunked Standard Cracking (P-CSC) T1 Cracker Index T2 Cracker Index Complete Query independence T3 Cracker Index Tk Cracker Index k Chunks 13 / 30
Multi-threaded algorithms: Parallel-chunked Standard Cracking (P-CSC) T1 Cracker Fully utilize Index resources T2 Cracker Index Complete Query independence T3 Cracker Index Tk Cracker Index k Chunks 13 / 30
Multi-threaded algorithms: Parallel-chunked Standard Cracking (P-CSC) T1 Cracker Fully utilize Index resources T2 Cracker Index Complete Query independence T3 Cracker Index No Tk consecutive Cracker Index result k Chunks 13 / 30
Micro Benchmark Reading 1% from k locations using one thread 10 7.5 Time [s] 5 2.5 0 1 10 100 1000 10000 100000 1000000 Number of Chunks (k) 14 / 30
Micro Benchmark Reading 1% from k locations using one thread 10 7.5 Time [s] 5 No problem for realistic k 2.5 0 1 10 100 1000 10000 100000 1000000 Number of Chunks (k) 14 / 30
Multi-threaded algorithms: Parallel Coarse-Granular Index (P-CGI) A 15 / 30
Multi-threaded algorithms: Parallel Coarse-Granular Index (P-CGI) A Index(A) Index on A 1. Range-partition while copying 1024 partitions 15 / 30
Multi-threaded algorithms: Parallel Coarse-Granular Index (P-CGI) A Index(A) Index on A 1. Range-partition while copying Cracker Index 1024 partitions 15 / 30
Multi-threaded algorithms: Parallel Coarse-Granular Index (P-CGI) A Index(A) Index on A 1. Range-partition while copying Cracker Index Query 2. Perform P-SC 1024 partitions 15 / 30
Multi-threaded algorithms: Parallel Coarse-Granular Index (P-CGI) A Index(A) Index on A 1. Range-partition while copying Cracker Index W Query R 2. Perform P-SC R W 1024 partitions 15 / 30
Multi-threaded algorithms: Parallel Coarse-Granular Index (P-CGI) A Index(A) Like starting ... Index on A 1. Range-partition while copying Cracker Index W Query R ... after 1000 2. Perform P-SC R cracks W 1024 partitions 15 / 30
Multi-threaded algorithms: Parallel Coarse-Granular Index (P-CGI) Reduces lock contention of P-SC A Index(A) Like starting ... Index on A 1. Range-partition while copying Cracker Index W Query R ... after 1000 2. Perform P-SC R cracks W 1024 partitions 15 / 30
Multi-threaded algorithms: Parallel Coarse-Granular Index (P-CGI) Reduces lock contention of P-SC A Index(A) Like starting ... Index on A 1. Range-partition while copying Cracker Index W Query R ... after 1000 2. Perform P-SC R cracks W 1024 partitions Reduces Variance 15 / 30
Multi-threaded algorithms: Parallel Coarse-Granular Index (P-CGI) Reduces lock contention of P-SC Adds (small) A Index(A) initialization time Like starting ... Index on A 1. Range-partition while copying Cracker Index W Query R ... after 1000 2. Perform P-SC R cracks W 1024 partitions Reduces Variance 15 / 30
Multi-threaded algorithms: Parallel Coarse-Granular Index (P-CGI) Reduces lock contention of P-SC Adds (small) A Index(A) initialization time Like starting ... Index on A 1. Range-partition while copying Cracker Index How to do? W Query R ... after 1000 2. Perform P-SC R cracks W 1024 partitions Reduces Variance 15 / 30
Multi-threaded algorithms: Parallel Coarse-Granular Index (P-CGI): Parallel Range Partitioning Source Destination A B # Elements n t 1 Thread 1 # Threads (k) k Thread 2 t 2 t 1 Thread 1 t 2 Thread 2 . . . t k Thread k . . . t k Thread k 16 / 30
Multi-threaded algorithms: Parallel Coarse-Granular Index (P-CGI): Parallel Range Partitioning Source Destination A B # Elements n t 1 Thread 1 # Threads (k) k Thread 2 t 2 t 1 Thread 1 t 2 Thread 2 1. Build Histogram . . . t k Thread k . . . t k Thread k 16 / 30
Multi-threaded algorithms: Parallel Coarse-Granular Index (P-CGI): Parallel Range Partitioning Source Destination A B # Elements Range-partition n t 1 Thread 1 # Threads (k) k Thread 2 t 2 t 1 Thread 1 t 2 Thread 2 1. Build Histogram . . . t k Thread k . . . t k Thread k 16 / 30
Multi-threaded algorithms: Parallel Coarse-Granular Index (P-CGI): Parallel Range Partitioning Source Destination A B # Elements Range-partition n t 1 Thread 1 # Threads (k) k Thread 2 t 2 t 1 Thread 1 t 2 Thread 2 1. Build Histogram . . . t k Thread k 2. Copy entries . . . t k Thread k 16 / 30
Multi-threaded algorithms: Parallel Coarse-Granular Index (P-CGI): Parallel Range Partitioning Source Destination A B # Elements Range-partition n t 1 Thread 1 # Threads (k) k Thread 2 t 2 t 1 Thread 1 t 2 Thread 2 1. Build Histogram . . . t k Thread k 2. Copy entries . . . t k Thread k 16 / 30
Multi-threaded algorithms: Parallel Coarse-Granular Index (P-CGI): Parallel Range Partitioning Source Destination A B # Elements Range-partition n t 1 Thread 1 # Threads (k) k Thread 2 t 2 t 1 Thread 1 t 2 Thread 2 1. Build Histogram . . . t k Thread k 2. Copy entries . . . No locks required t k Thread k 16 / 30
Multi-threaded algorithms: Parallel Coarse-Granular Index (P-CGI): Parallel Range Partitioning Source Destination A B # Elements Range-partition n t 1 Thread 1 # Threads (k) k Thread 2 t 2 t 1 Thread 1 t 2 Thread 2 1. Build Histogram . . . t k Thread k 2. Copy entries . . . No locks required Fully utilize t k Thread k resources 16 / 30
Multi-threaded algorithms: Parallel Coarse-Granular Index (P-CGI): Parallel Range Partitioning Source Destination A B # Elements Range-partition n t 1 Thread 1 # Threads (k) k NUMA- fragmented memory Thread 2 t 2 t 1 Thread 1 t 2 Thread 2 1. Build Histogram . . . t k Thread k 2. Copy entries . . . No locks required Fully utilize t k Thread k resources 16 / 30
Multi-threaded algorithms: Parallel-chunked Standard Cracking (P-CSC) T1 Cracker Index T2 Cracker Index T3 Cracker Index Tk Cracker Index k Chunks 17 / 30
Multi-threaded algorithms: Parallel-chunked Standard Cracking (P-CSC) Coarse-Granular Index (P-CCGI) T1 Cracker Index T2 Cracker Index T3 Cracker Index Tk Cracker Index k Chunks 17 / 30
Multi-threaded algorithms: Parallel-chunked Standard Cracking (P-CSC) Coarse-Granular Index (P-CCGI) Range-partitioning T1 Cracker Index T2 Cracker Index T3 Cracker Index Tk Cracker Index k Chunks 17 / 30
Multi-threaded algorithms: Parallel-chunked Standard Cracking (P-CSC) Coarse-Granular Index (P-CCGI) Range-partitioning T1 Cracker Index T2 Cracker Index Query T3 Cracker Index Tk Cracker Index k Chunks 17 / 30
Multi-threaded algorithms: Parallel-chunked Standard Cracking (P-CSC) Coarse-Granular Index (P-CCGI) Range-partitioning T1 Local Result Cracker Index T2 Local Result Cracker Index Query T3 Local Result Cracker Index Tk Local Result Cracker Index k Chunks 17 / 30
Multi-threaded algorithms: Parallel-chunked Standard Cracking (P-CSC) Coarse-Granular Index (P-CCGI) P-CSC + Range Partitioning Range-partitioning T1 Local Result Cracker Index T2 Local Result Cracker Index Query T3 Local Result Cracker Index Tk Local Result Cracker Index k Chunks 17 / 30
Multi-threaded algorithms: Parallel Range-Partitioned Radix Sort (P-RPRS) A 18 / 30
Multi-threaded algorithms: Parallel Range-Partitioned Radix Sort (P-RPRS) A Index(A) 1. Range-partition while copying 1024 partitions 18 / 30
Multi-threaded algorithms: Parallel Range-Partitioned Radix Sort (P-RPRS) Index(A) A Index(A) 2. Perform in-place 1. Range-partition radix sort on while copying each partition Fully sorted 1024 partitions 18 / 30
Multi-threaded algorithms: Parallel Range-Partitioned Radix Sort (P-RPRS) Index(A) A Index(A) 2. Perform in-place 1. Range-partition radix sort on while copying each partition shared with P-CCGI Fully sorted 1024 partitions 18 / 30
Recommend
More recommend