rethinking simd vectorization for in memory databases
play

Rethinking SIMD Vectorization for In-Memory Databases Sri Harshal - PowerPoint PPT Presentation

Rethinking SIMD Vectorization for In-Memory Databases Sri Harshal Parimi Motivation Need for fast analytical query execution in systems where the database is mostly resident in main memory. Architectures with SIMD capabilities, like (Many


  1. Rethinking SIMD Vectorization for In-Memory Databases Sri Harshal Parimi

  2. Motivation ´ Need for fast analytical query execution in systems where the database is mostly resident in main memory. ´ Architectures with SIMD capabilities, like (Many Integrated cores)MIC, use a large number of low-powered cores with advanced instruction sets and larger registers .

  3. SIMD(Single Instruction, Multiple Data) ´ Multiple processing elements that perform the same operation on multiple data points simultaneously.

  4. Vectorization ´ Program that performs operations on a vector(1D- array). 𝑌 + 𝑍 = 𝑎 (█𝑦 1 � 𝑦 2 � … � 𝑦𝑜 ) + (█𝑧 1 � 𝑧 2 � … � 𝑧𝑜 ) = (█𝑦 1+ 𝑧 1 � 𝑦 2+ 𝑧 2 � … � 𝑦𝑜 + 𝑧𝑜 ) for(i = 0; i<n; i++){ Z[i] = X[i] + Y[i]; }

  5. Vectorization(Example) 128 bit SIMD Y X register 1 8 SIMD 1 7 ADD 1 6 1 5 1 4 1 3 1 2 1 1 Z 9 8 7 6 5 4 3 2

  6. Advantages of Vectorization ´ Full vectorization ´ From O(f(n)) scalar to O(f(n)/W) vector operations where W is the length of the vector. ´ Reuse fundamental operations across multiple vectorizations. ´ Vectorize basic database operators: ´ Selection scans ´ Hash tables ´ Partitioning

  7. Fundamental Operations ´ Selective Load ´ Selective Store ´ Selective Gather ´ Selective Scatter

  8. Selective Load Selective Store Vector A B C D Memory U V W X Y Mask 0 1 0 1 Mask 0 1 0 1 Memory U V W X Y Vector A B C D Result Result Vector A U C V B D W X Y Memory

  9. Selective Gather Selective Scatter Value A B A D Memory U V W X Y Z Vector Index Index 2 1 5 3 2 1 5 3 Vector Vector Value Memory U V W X Y Z A B C D Vector Value W V Z X U B A D Y C Memory Vector

  10. Selection Scans Scalar(Branching): Scalar(Branchless): ´ I = 0 ´ I = 0 ´ For t in table: ´ For t in table: ´ If((key>= “O” && key<=“U”)): ´ Key = t.key ´ Copy(t, output[i]); ´ M = (key>=“O”?1:0)&&(key<=“U”?1:0); ´ I = I + 1; ´ I = I + M; SELECT * FROM table WHERE key >=“O” AND key<=“U”

  11. Selection Scans(Vectorized) Key Vector J O Y S U X ID KEY 1 J ´ I = 0 SIMD Compare 2 O ´ For V t in table: 3 Y ´ simdLoad(V t .key, V k ) Mask 4 S 0 1 0 1 1 0 ´ V m = (V k >=“O”?1:0)&&(V k <=“U”?1:0) 5 U 6 X ´ If(V m != false): 0 1 2 3 4 5 All Offsets ´ simdStore(V t , V m , output[i]) ´ I = I + |V m != false| SIMD Store Matched 1 3 4 Offsets

  12. Performance Comparison: Selection Scans

  13. Hash Tables – Probing (Scalar) Linear probing hash table Scalar Key Payload Hash(key) Hash Index Input key k1 # h1 k9 k3 k1 k1

  14. Hash Tables – Probing (Horizontal Vectorization) Linear probing bucketized hash table KEYS PAYLOAD Hash(key) Hash Index Input key k1 # h1 SIMD K9 K3 K8 K1 k1 Compare

  15. Hash Tables – Probing (Vertical Vectorization) Key Payload K99 Key Vec Hash(ke Hash Key Vec Gathered Mask y) Index Key Vec Vec K1 K1 1 # K1 H1 K2 K2 0 K99 # K1 H2 K3 K3 0 K88 # H3 K4 K4 K4 1 # H4 K4 SIMD Compare K88

  16. Hash Tables – Probing (Vertical Vectorization Continued) Key Payload K99 Key Vec Hash(ke Hash y) Index Vec K5 K2 # H5 K2 # K1 H2+ 1 K3 # H3+ 1 K5 K6 # H6 K4 K6 K88

  17. Performance Comparison: Hash Tables

  18. Partitioning - Histogram Histogra Key Vec Hash m Index K1 Vec H1 +1 K2 H2 +1 K3 H3 K4 H4 +1 SIMD SIMD Add Radix

  19. Partitioning – Histogram(Continued) Replicated Histogram Key Vec Hash Index Vec K1 +1 H1 K2 +1 +1 H2 K3 H3 K4 H4 +1 SIMD Radix SIMD Scatter

  20. Joins ´ No partitioning ´ Build one shared hash table using atomics ´ Partially vectorized ´ Min partitioning ´ Partition building table ´ Build hash table per thread ´ Fully vectorized ´ Max partitioning ´ Partition both tables repeatedly ´ Build and probe cache-resident hash tables ´ Fully vectorized

  21. Joins

  22. Main Takeaways ´ Vectorization is essential for OLAP queries ´ Impact on hardware design ´ Improved power efficiency for analytical databases ´ Impact on software design ´ Vectorization favors cache-conscious algorithms ´ Partitioned hash join >> non-partitioned hash join, if vectorized ´ Vectorization is independent of other optimizations ´ Both buffered and unbuffered partitioning benefit from vectorization speedup

  23. Comparisons with Trill ´ Trill uses a similar bit-mask technique for applying the filter clause during selections. ´ While Trill deals with a query model for streaming data, this paper offers algorithms that can improve throughput of database operators which can also be extended to a streaming model by leveraging buffered data. ´ Trill uses dynamic HLL code-generation to operate over columnar data. SIMD provides vectorization to handle data-points simultaneously and has a diverse instruction set(supported by H/W) to perform constant operations on vectors.

  24. Questions?

Recommend


More recommend