Rethinking SIMD Vectorization for In-Memory Databases Sri Harshal - PowerPoint PPT Presentation

Rethinking SIMD Vectorization for In-Memory Databases Sri Harshal Parimi

Motivation ´ Need for fast analytical query execution in systems where the database is mostly resident in main memory. ´ Architectures with SIMD capabilities, like (Many Integrated cores)MIC, use a large number of low-powered cores with advanced instruction sets and larger registers .

SIMD(Single Instruction, Multiple Data) ´ Multiple processing elements that perform the same operation on multiple data points simultaneously.

Vectorization ´ Program that performs operations on a vector(1D- array). 𝑌 + 𝑍 = 𝑎 (█𝑦 1 � 𝑦 2 � … � 𝑦𝑜 ) + (█𝑧 1 � 𝑧 2 � … � 𝑧𝑜 ) = (█𝑦 1+ 𝑧 1 � 𝑦 2+ 𝑧 2 � … � 𝑦𝑜 + 𝑧𝑜 ) for(i = 0; i<n; i++){ Z[i] = X[i] + Y[i]; }

Vectorization(Example) 128 bit SIMD Y X register 1 8 SIMD 1 7 ADD 1 6 1 5 1 4 1 3 1 2 1 1 Z 9 8 7 6 5 4 3 2

Advantages of Vectorization ´ Full vectorization ´ From O(f(n)) scalar to O(f(n)/W) vector operations where W is the length of the vector. ´ Reuse fundamental operations across multiple vectorizations. ´ Vectorize basic database operators: ´ Selection scans ´ Hash tables ´ Partitioning

Fundamental Operations ´ Selective Load ´ Selective Store ´ Selective Gather ´ Selective Scatter

Selective Load Selective Store Vector A B C D Memory U V W X Y Mask 0 1 0 1 Mask 0 1 0 1 Memory U V W X Y Vector A B C D Result Result Vector A U C V B D W X Y Memory

Selective Gather Selective Scatter Value A B A D Memory U V W X Y Z Vector Index Index 2 1 5 3 2 1 5 3 Vector Vector Value Memory U V W X Y Z A B C D Vector Value W V Z X U B A D Y C Memory Vector

Selection Scans Scalar(Branching): Scalar(Branchless): ´ I = 0 ´ I = 0 ´ For t in table: ´ For t in table: ´ If((key>= “O” && key<=“U”)): ´ Key = t.key ´ Copy(t, output[i]); ´ M = (key>=“O”?1:0)&&(key<=“U”?1:0); ´ I = I + 1; ´ I = I + M; SELECT * FROM table WHERE key >=“O” AND key<=“U”

Selection Scans(Vectorized) Key Vector J O Y S U X ID KEY 1 J ´ I = 0 SIMD Compare 2 O ´ For V t in table: 3 Y ´ simdLoad(V t .key, V k ) Mask 4 S 0 1 0 1 1 0 ´ V m = (V k >=“O”?1:0)&&(V k <=“U”?1:0) 5 U 6 X ´ If(V m != false): 0 1 2 3 4 5 All Offsets ´ simdStore(V t , V m , output[i]) ´ I = I + |V m != false| SIMD Store Matched 1 3 4 Offsets

Performance Comparison: Selection Scans

Hash Tables – Probing (Scalar) Linear probing hash table Scalar Key Payload Hash(key) Hash Index Input key k1 # h1 k9 k3 k1 k1

Hash Tables – Probing (Horizontal Vectorization) Linear probing bucketized hash table KEYS PAYLOAD Hash(key) Hash Index Input key k1 # h1 SIMD K9 K3 K8 K1 k1 Compare

Hash Tables – Probing (Vertical Vectorization) Key Payload K99 Key Vec Hash(ke Hash Key Vec Gathered Mask y) Index Key Vec Vec K1 K1 1 # K1 H1 K2 K2 0 K99 # K1 H2 K3 K3 0 K88 # H3 K4 K4 K4 1 # H4 K4 SIMD Compare K88

Hash Tables – Probing (Vertical Vectorization Continued) Key Payload K99 Key Vec Hash(ke Hash y) Index Vec K5 K2 # H5 K2 # K1 H2+ 1 K3 # H3+ 1 K5 K6 # H6 K4 K6 K88

Performance Comparison: Hash Tables

Partitioning - Histogram Histogra Key Vec Hash m Index K1 Vec H1 +1 K2 H2 +1 K3 H3 K4 H4 +1 SIMD SIMD Add Radix

Partitioning – Histogram(Continued) Replicated Histogram Key Vec Hash Index Vec K1 +1 H1 K2 +1 +1 H2 K3 H3 K4 H4 +1 SIMD Radix SIMD Scatter

Joins ´ No partitioning ´ Build one shared hash table using atomics ´ Partially vectorized ´ Min partitioning ´ Partition building table ´ Build hash table per thread ´ Fully vectorized ´ Max partitioning ´ Partition both tables repeatedly ´ Build and probe cache-resident hash tables ´ Fully vectorized

Main Takeaways ´ Vectorization is essential for OLAP queries ´ Impact on hardware design ´ Improved power efficiency for analytical databases ´ Impact on software design ´ Vectorization favors cache-conscious algorithms ´ Partitioned hash join >> non-partitioned hash join, if vectorized ´ Vectorization is independent of other optimizations ´ Both buffered and unbuffered partitioning benefit from vectorization speedup

Comparisons with Trill ´ Trill uses a similar bit-mask technique for applying the filter clause during selections. ´ While Trill deals with a query model for streaming data, this paper offers algorithms that can improve throughput of database operators which can also be extended to a streaming model by leveraging buffered data. ´ Trill uses dynamic HLL code-generation to operate over columnar data. SIMD provides vectorization to handle data-points simultaneously and has a diverse instruction set(supported by H/W) to perform constant operations on vectors.

Questions?

Rethinking SIMD Vectorization for In-Memory Databases Sri Harshal - PowerPoint PPT Presentation

Rethinking SIMD Vectorization for In-Memory Databases Sri Harshal Parimi Motivation Need for fast analytical query execution in systems where the database is mostly resident in main memory. Architectures with SIMD capabilities, like (Many

Lecture 11: Persistent Memory Databases 1 / 71 Persistent Memory Databases Recap

Is vectorization easy? Is vectorization enough? Sbastien Ponce Florian Lemaitre Plan

Lecture 3 SIMD and Vectorization GPU Architecture Todays lecture Vectorization and SSE

Automatic SIMD vectorization for Haskell Leaf Petersen, Dominic Orchard , Neal Glew ICFP 2013 -

SIMD+ Overview Illiac IV History Early machines First massively parallel (SIMD) computer

SIMD+ Overview Illiac IV History Early machines First massively parallel (SIMD) computer

Welcome! INFOMOV Lecture 5 SIMD (1) 2 Meanwhile, on ars technica INFOMOV

SIMD+ Overview Illiac IV History Early machines First massively

Lecture 10: Larger-than-Memory Databases 1 / 53 Larger-than-Memory Databases Recap

RETHINKING THE TOOLS OF ENGAGEMENT FLIPPING THE OUTCOMES RETHINKING THE TOOLS OF ENGAGEMENT /

Function Call Re-Vectorization Pupil: Rubens Emilio Alves Moreira Advisor: Fernando Magno Quinto

LLVM Auto-Vectorization Past Present Future Renato Golin www.linaro.org LLVM

TSLP Throttling Automatic Vectorization: When Less is More Vasileios Porpodas and Timothy M.

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

AUTOMATIC VECTORIZATION OF TREE TRAVERSALS Youngjoon Jo, Michael Goldfarb and Milind Kulkarni

Exploiting automatic vectorization to employ SPMD on SIMD registers Stefan Sprenger

Introduction to Object-Oriented Programming Hashed Collections Christopher Simpkins

A Parallel Compact Hash Table Alfons Laarman & Steven van der Vegt Overview Research

Hashing () Hashing () K08

Conditional Course Lecture 4 Hash Tables I: Separate Chaining and Open Addressing Fabian Kuhn

Hash Tables Bryce Boe 2013/08/20 CS24, Summer 2013 C

Dictionaries A Dictionary stores keyelement pairs, called items . Several Inf 2B: Hash Tables

Universal hashing Problem: if h is fixed there are with many collisions Idea of

GCL SymbolTable A Chain of Hash Tables based on java.util.Hashtable Joseph Bergin 1/12/99 1

Rethinking SIMD Vectorization for In-Memory Databases Sri Harshal - PowerPoint PPT Presentation

Rethinking SIMD Vectorization for In-Memory Databases Sri Harshal Parimi Motivation Need for fast analytical query execution in systems where the database is mostly resident in main memory. Architectures with SIMD capabilities, like (Many

Lecture 11: Persistent Memory Databases 1 / 71 Persistent Memory Databases Recap

Is vectorization easy? Is vectorization enough? Sbastien Ponce Florian Lemaitre Plan

Lecture 3 SIMD and Vectorization GPU Architecture Todays lecture Vectorization and SSE

Automatic SIMD vectorization for Haskell Leaf Petersen, Dominic Orchard , Neal Glew ICFP 2013 -

SIMD+ Overview Illiac IV History Early machines First massively parallel (SIMD) computer

SIMD+ Overview Illiac IV History Early machines First massively parallel (SIMD) computer

Welcome! INFOMOV Lecture 5 SIMD (1) 2 Meanwhile, on ars technica INFOMOV

SIMD+ Overview Illiac IV History Early machines First massively

Lecture 10: Larger-than-Memory Databases 1 / 53 Larger-than-Memory Databases Recap

RETHINKING THE TOOLS OF ENGAGEMENT FLIPPING THE OUTCOMES RETHINKING THE TOOLS OF ENGAGEMENT /

Function Call Re-Vectorization Pupil: Rubens Emilio Alves Moreira Advisor: Fernando Magno Quinto

LLVM Auto-Vectorization Past Present Future Renato Golin www.linaro.org LLVM

TSLP Throttling Automatic Vectorization: When Less is More Vasileios Porpodas and Timothy M.

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

AUTOMATIC VECTORIZATION OF TREE TRAVERSALS Youngjoon Jo, Michael Goldfarb and Milind Kulkarni

Exploiting automatic vectorization to employ SPMD on SIMD registers Stefan Sprenger

Introduction to Object-Oriented Programming Hashed Collections Christopher Simpkins

A Parallel Compact Hash Table Alfons Laarman &amp; Steven van der Vegt Overview Research

Hashing () Hashing () K08

Conditional Course Lecture 4 Hash Tables I: Separate Chaining and Open Addressing Fabian Kuhn

Hash Tables Bryce Boe 2013/08/20 CS24, Summer 2013 C

Dictionaries A Dictionary stores keyelement pairs, called items . Several Inf 2B: Hash Tables

Universal hashing Problem: if h is fixed there are with many collisions Idea of

GCL SymbolTable A Chain of Hash Tables based on java.util.Hashtable Joseph Bergin 1/12/99 1

A Parallel Compact Hash Table Alfons Laarman & Steven van der Vegt Overview Research