ADVANCED DATABASE SYSTEMS Vectorization vs. Compilation @ - PowerPoint PPT Presentation

Lect ure # 21 ADVANCED DATABASE SYSTEMS Vectorization vs. Compilation @ Andy_Pavlo // 15- 721 // Spring 2019

CMU 15-721 (Spring 2019) 2 O BSERVATIO N Vectorization can speed up query performance. Compilation can speed up query performance. We have not discussed which approach is better and under what conditions.

CMU 15-721 (Spring 2019) 3 VECTO RWISE PRECO M PILED PRIM ITIVES Pre- compiles thousands of “primitives” that perform basic operations on typed data. → Using simple kernels for each primitive means that they are easier to vectorize. The DBMS then executes a query plan that invokes these primitives at runtime. → Function calls are amortized over multiple tuples MICRO ADAPTIVITY IN IN VECTORWISE SIGMOD 2013

CMU 15-721 (Spring 2019) 4 H YPER J IT Q UERY CO M PILATIO N Compile queries in-memory into native code using the LLVM toolkit. Organizes query processing in a way to keep a tuple in CPU registers for as long as possible. → Bottom-to-top / push-based query processing model. → Not vectorizable (as originally described). EFFICIENTLY COMPILING EFFICIENT QUERY PLANS FOR MODERN H HARDWARE VLDB 2011

CMU 15-721 (Spring 2019) 5 Vectorization vs. Compilation Relaxed Operator Fusion

CMU 15-721 (Spring 2019) 6 VECTO RIZATIO N VS. CO M PILATIO N Single test-bed system to analyze the trade-offs between vectorized execution and query compilation. Implemented high-level algorithms the same in each system but varied the implementation details. → Example: Murmur2 vs. CRC Hash Functions EVERYTHING YOU ALWAYS WANTED TO KNOW ABOUT COMPILED AND VECTORIZED Q QUERIES BUT WERE AFRAID TO ASK VLDB 2018

CMU 15-721 (Spring 2019) 7 IM PLEM EN TATIO N S Approach #1: Tectorwise → Break operations into pre-compiled primitives. → Have to materialize the output of primitives at each step. Approach #2: Typer → Push-based processing model with JIT compilation. → Process a single tuple up entire pipeline without materializing the intermediate results.

CMU 15-721 (Spring 2019) 8 TPC- H WO RKLOAD Q1 : Fixed-point arithmetic, 4-group aggregation Q6 : Selective filters Q3 : Join (build: 147k tuples / probe: 3.2m tuples) Q9 : Join (build: 320k tuples / probe: 1.5M tuples) Q18 : High-cardinality aggregation (1.5m groups) TPC- H ANALYZED: HIDDEN MESSAGES AND LESSONS LEARNED FROM AN INFLUENTIAL BENCHMARK TPCTC 2013

CMU 15-721 (Spring 2019) 9 SIN GLE- TH READED PERFO RM AN CE Source: Timo Kersten

CMU 15-721 (Spring 2019) 10 SIN GLE- TH READED PERFO RM AN CE Cycles IPC Instr. L1 Miss LLC Miss Bran. Miss 34 2.0 68 0.6 0.57 0.01 Typer Q1 TW 59 2.8 162 2.0 0.57 0.03 Typer 11 1.8 20 0.3 0.35 0.06 Q6 TW 11 1.4 15 0.2 0.29 0.01 Typer 25 0.8 21 0.5 0.16 0.27 Q3 24 1.8 42 0.9 0.16 0.08 TW 74 0.6 42 1.7 0.46 0.34 Typer Q9 56 1.3 76 2.1 0.47 0.39 TW 30 1.6 46 0.8 0.19 0.16 Typer Q18 48 2.1 102 1.9 0.18 0.37 TW

CMU 15-721 (Spring 2019) 11 M AIN FIN DIN GS Both models are efficient and achieve roughly the same performance. Data-centric is better for computational queries with few cache misses. Vectorization is slightly better at hiding cache miss latencies.

CMU 15-721 (Spring 2019) 12 SIM D PERFO RM AN CE Evaluate vectorized branchless selection and hash probe in Tectorwise. They use AVX-512 because it includes new instructions to make it easier to implement algorithms using vertical vectorization.

CMU 15-721 (Spring 2019) 13 SIM D EVALUATIO N Hashing Gather Join Source: Timo Kersten

CMU 15-721 (Spring 2019) 14 AUTO - VECTO RIZATIO N Measure how well the compiler is able to vectorize the Vectorwise primitives. → Targets: GCC v7.2, Clang v5.0, ICC v18 ICC was able to vectorize the most primitives using AVX-512: → Vectorized: Hashing, Selection, Projection → Not Vectorized: Hash Table Probing, Aggregation

CMU 15-721 (Spring 2019) 15 AUTO - VECTO RIZATIO N Intel Core i9-7900X (10 cores × 2HT) Compiler: ICC v18 Auto Manual Auto+Manual 100 82.6 82.9 Reduction of Instr. (%) 80 62.5 61.2 60.1 60 46.6 42.0 35.0 40 31.5 29.0 27.2 15.4 15.4 20 12.0 -1.01 0 Q1 Q6 Q3 Q9 Q18 Source: Timo Kersten

CMU 15-721 (Spring 2019) 16 AUTO - VECTO RIZATIO N Intel Core i9-7900X (10 cores × 2HT) Compiler: ICC v18 Auto Manual Auto+Manual 30 21.6 21.4 Reduction of Time (%) 20 16.4 15.7 12.6 11.0 8.5 10 5.4 3.5 1.1 0.3 0 Q1 Q6 Q3 Q9 Q18 -0.3 -1.6 -6.0 -10 -14.6 -20 Source: Timo Kersten

CMU 15-721 (Spring 2019) 17 O BSERVATIO N The paper (partially) assumes that vectorization and compilation are mutually exclusive. HyPer fuses operators together so that they work on a single tuple a time to maximize CPU register reuse and minimize cache misses.

CMU 15-721 (Spring 2019) 18 VECTO RIZATIO N VS. CO M PILATIO N Source: Timo Kersten

CMU 15-721 (Spring 2019) 19 PIPELIN E PERSPECTIVE Each pipeline fuses operators together into loop Each pipeline is a tuple-at-a-time process Emit def plan(state): agg = dict() for t in A: Agg if t.age > 20: agg[t.city]['count']++ for t in agg: Filter emit (t) Scan

CMU 15-721 (Spring 2019) 19 PIPELIN E PERSPECTIVE Each pipeline fuses operators together into loop Each pipeline is a tuple-at-a-time process Emit Pipeline #2 def plan(state): agg = dict() for t in A: Agg if t.age > 20: agg[t.city]['count']++ for t in agg: Filter Pipeline #1 emit (t) Scan

CMU 15-721 (Spring 2019) 20 FUSIO N PRO BLEM S Fusion inhibits some optimizations: → Unable to look ahead in tuple stream. → Unable to overlap computation and memory access. def plan(state): agg = dict() Scan for t in A: Filter if t.age > 20: Agg agg[t.city]['count']++ for t in agg: emit (t)

CMU 15-721 (Spring 2019) 20 FUSIO N PRO BLEM S Fusion inhibits some optimizations: → Unable to look ahead in tuple stream. → Unable to overlap computation and memory access. def plan(state): agg = dict() Scan for t in A: Cannot SIMD Filter if t.age > 20: Agg agg[t.city]['count']++ for t in agg: emit (t)

CMU 15-721 (Spring 2019) 20 FUSIO N PRO BLEM S Fusion inhibits some optimizations: → Unable to look ahead in tuple stream. → Unable to overlap computation and memory access. def plan(state): agg = dict() Scan for t in A: Cannot SIMD Filter if t.age > 20: Agg agg[t.city]['count']++ Cannot Prefetch for t in agg: emit (t)

CMU 15-721 (Spring 2019) 21 RELAXED O PERATO R FUSIO N Vectorized processing model designed for query compilation execution engines. Decompose pipelines into stages that operate on vectors of tuples. → Each stage may contain multiple operators. → Communicate through cache-resident buffers. → Stages are granularity of vectorization + fusion. RELAXED OPERATOR FUSION FOR IN- MEMORY DATABASES: MAKING COMPILATION, VECTORIZATION, AND PREFETCHING WORK TOGETHER AT LAST VLDB 2017

CMU 15-721 (Spring 2019) 22 RO F EXAM PLE Emit Emit Agg Agg Vectorization Candidate Filter Filter Scan Scan

CMU 15-721 (Spring 2019) 22 RO F EXAM PLE Emit Emit Agg Stage #2 Agg Vectorization Stage Buffer Candidate Filter Filter Stage #1 Scan Scan

CMU 15-721 (Spring 2019) 22 RO F EXAM PLE Emit def plan(state): Agg agg = dict() Stage #2 for t in A step 1024: out = simd_cmp_gt (t, 20, 1024) Stage Buffer for ft in out: agg[ft.city]['count']++ for t in agg: emit (t) Filter Stage #1 Scan

CMU 15-721 (Spring 2019) 23 RO F SO FTWARE PREFETCH IN G The DBMS can tell the CPU to grab the next vector while it works on the current batch. → Prefetch-enabled operators define start of new stage. → Hides the cache miss latency. Any prefetching technique is suitable → Group prefetching, software pipelining, AMAC. → Group prefetching works and is simple to implement.

CMU 15-721 (Spring 2019) 24 RO F EVALUATIO N Dual Socket Intel Xeon E5-2630v4 @ 2.20GHz TPC-H 10 GB Database LLVM LLVM + ROF 3000 2641 Execution Time (ms) 2000 1763 1396 901 892 846 1000 540 383 220 191 0 Q1 Q3 Q13 Q14 Q19 Source: Prashanth Menon

CMU 15-721 (Spring 2019) 24 RO F EVALUATIO N Dual Socket Intel Xeon E5-2630v4 @ 2.20GHz TPC-H 10 GB Database LLVM LLVM + ROF 3000 2641 Execution Time (ms) 2000 1763 SIMD/Prefetch Does Not Help 1396 SIMD/Prefetch Does Help 901 892 846 1000 540 383 220 191 0 Q1 Q3 Q13 Q14 Q19 Source: Prashanth Menon

ADVANCED DATABASE SYSTEMS Vectorization vs. Compilation @ - PowerPoint PPT Presentation

Lect ure # 21 ADVANCED DATABASE SYSTEMS Vectorization vs. Compilation @ Andy_Pavlo // 15- 721 // Spring 2019 CMU 15-721 (Spring 2019) 2 O BSERVATIO N Vectorization can speed up query performance. Compilation can speed up query

Database Utilities 10/17/2007 DC/Win Database Utilities Opening Database Utilities From File on

Advanced Database CS 525: Organization? Advanced Database =Database Implementation

Advanced Database Management Systems Database Management Systems Alvaro A A Fernandes School of

DATABASE SYSTEMS Database programming in a web environment Database System Course, 2016-2017

DATABASE SYSTEMS Database programming in a web environment Database System Course AGENDA FOR

Lect ure # 11 ADVANCED DATABASE SYSTEMS System Catalogs and Database Compression @

NEBC Database Course 2008 Database Servers Database Interfaces Tim Booth : tbooth@ceh.ac.uk

DATABASE SECURITY CS4750 Database Systems Prof. Nada Basit Email: basit@virginia.edu Fall

DATABASE SECURITY CS4750 Database Systems Prof. Nada Basit Email: basit@virginia.edu Fall

Database Systems Database Systems 1 Creating a Database System Design Construction

Overview of Database Systems CS3860 - Jay Urbain, PhD Introduction to Database Systems 1 2 UFR

ADVANCED DATABASE SYSTEMS Database Compression @ Andy_Pavlo // 15- 721 // Spring 2019 CMU

National Address Database National Address Database What is a National Address Database?

CSc 337 LECTURE 24: CREATING A DATABASE AND MORE JOINS Creating a database In the command line

Lect ure # 01 ADVANCED DATABASE SYSTEMS Course Introduction & History of Database Systems

ADVANCED DATABASE SYSTEMS Self-Driving Database Management Systems @ Andy_Pavlo // 15- 721 //

Production After Thinning in Bottomland Hardwood Stands in the Southern United States Steve

CS302: Paradigms of Programming Logic Paradigm (Cont.) Manas Thakur Feb-June 2020 From the

Analysis of Techniques to Improve Protocol Processing Latency David Mosberger, Patrick Bridges,

MAT 166 Calculus for Bus/Soc Chapter 7 Notes Antiderivatives Integration David J. Gisch

Treasury (spending / accessing funds, A-Board (requesting funds) SUMS technical support) Form

C OMPETITIVENESS WITH V ALUE -C HAIN A NALYSIS Marty Romitti, PhD, Senior Vice President CREC 1

Lecture no: 12 Centralized and AdHoc networks Wireless LAN Ove Edfors, Department of Electroical

Lecture 4.3: Partially ordered sets Matthew Macauley Department of Mathematical Sciences Clemson

ADVANCED DATABASE SYSTEMS Vectorization vs. Compilation @ - PowerPoint PPT Presentation

Lect ure # 21 ADVANCED DATABASE SYSTEMS Vectorization vs. Compilation @ Andy_Pavlo // 15- 721 // Spring 2019 CMU 15-721 (Spring 2019) 2 O BSERVATIO N Vectorization can speed up query performance. Compilation can speed up query

Database Utilities 10/17/2007 DC/Win Database Utilities Opening Database Utilities From File on

Advanced Database CS 525: Organization? Advanced Database =Database Implementation

Advanced Database Management Systems Database Management Systems Alvaro A A Fernandes School of

DATABASE SYSTEMS Database programming in a web environment Database System Course, 2016-2017

DATABASE SYSTEMS Database programming in a web environment Database System Course AGENDA FOR

Lect ure # 11 ADVANCED DATABASE SYSTEMS System Catalogs and Database Compression @

NEBC Database Course 2008 Database Servers Database Interfaces Tim Booth : tbooth@ceh.ac.uk

DATABASE SECURITY CS4750 Database Systems Prof. Nada Basit Email: basit@virginia.edu Fall

DATABASE SECURITY CS4750 Database Systems Prof. Nada Basit Email: basit@virginia.edu Fall

Database Systems Database Systems 1 Creating a Database System Design Construction

Overview of Database Systems CS3860 - Jay Urbain, PhD Introduction to Database Systems 1 2 UFR

ADVANCED DATABASE SYSTEMS Database Compression @ Andy_Pavlo // 15- 721 // Spring 2019 CMU

National Address Database National Address Database What is a National Address Database?

CSc 337 LECTURE 24: CREATING A DATABASE AND MORE JOINS Creating a database In the command line

Lect ure # 01 ADVANCED DATABASE SYSTEMS Course Introduction &amp; History of Database Systems

ADVANCED DATABASE SYSTEMS Self-Driving Database Management Systems @ Andy_Pavlo // 15- 721 //

Production After Thinning in Bottomland Hardwood Stands in the Southern United States Steve

CS302: Paradigms of Programming Logic Paradigm (Cont.) Manas Thakur Feb-June 2020 From the

Analysis of Techniques to Improve Protocol Processing Latency David Mosberger, Patrick Bridges,

MAT 166 Calculus for Bus/Soc Chapter 7 Notes Antiderivatives Integration David J. Gisch

Treasury (spending / accessing funds, A-Board (requesting funds) SUMS technical support) Form

C OMPETITIVENESS WITH V ALUE -C HAIN A NALYSIS Marty Romitti, PhD, Senior Vice President CREC 1

Lecture no: 12 Centralized and AdHoc networks Wireless LAN Ove Edfors, Department of Electroical

Lecture 4.3: Partially ordered sets Matthew Macauley Department of Mathematical Sciences Clemson

Lect ure # 01 ADVANCED DATABASE SYSTEMS Course Introduction & History of Database Systems