PG-Strom Query Acceleration Engine of PostgreSQL Powered by GPGPU NEC OSS Promotion Center The PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
Self Introduction ▌ Name: KaiGai Kohei ▌ Company: NEC ▌ Mission: Software architect & Intrepreneur ▌ Background: Linux kernel development (2003~?) PostgreSQL development (2006~) SAP alliance (2011~2013) PG-Strom development & productization (2012~) ▌ PG-Strom Project: In-company startup of NEC Also, an open source software project P.2 PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -
What is PG-Strom ▌ An Extension of PostgreSQL ▌ Off-loads CPU intensive SQL workloads to GPU processors ▌ Major Features ① Automatic and just-in-time GPU code generation from SQL ② Asynchronous and concurrent query executor PG-Strom GPU code Query Custom Frontend on the fly SQL Query Planner Planner command Query Custom Executor Executor Async- Execution database P.3 PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -
Concept ▌ No Pain Looks like a traditional PostgreSQL database from standpoint of applications, thus, we can utilize existing tools, drivers, applications. ▌ No Tuning Massive computing capability by GPGPU kills necessity of database tuning by human. It allows engineering folks to focus on the task only human can do. ▌ No Complexity No need to export large data to external tools from RDBMS, because its computing performance is sufficient to run the workloads nearby data. P.4 PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -
Technology Trend SOURCE: THE HEART OF AMD INNOVATION, Lisa Su, at AMD Developer Summit 2013 ▌ Movement to CPU/GPU integrated architecture rather than multicore CPU ▌ Free lunch for SW by HW evolution will finish soon Unless software is not designed to utilize GPU capability, unable to pull-out the full hardware capability. Page. 5 DB Tech Showcase 2014 Tokyo; PG-Strom - GPGPU acceleration on PostgreSQL
Background: How SQL is executed postgres# EXPLAIN SELECT cat, avg(x) FROM t0 NATURAL JOIN t1 WHERE t0.z like '%abc%' GROUP BY cat; QUERY PLAN ------------------------------------------------------------------ HashAggregate (cost=6629.88..6629.89 rows=1 width=12) Group Key: t0.cat -> Hash Join (cost=1234.00..6619.77 rows=2020 width=12) Hash Cond: (t0.aid = t1.aid) -> Seq Scan on t0 (cost=0.00..5358.00 rows=2020 width=16) Filter: (z ~~ '%abc%'::text) -> Hash (cost=734.00..734.00 rows=40000 width=4) -> Seq Scan on t1 (cost=0.00..734.00 rows=40000 width=4) (8 rows) ▌ Planner constructs query execution plan based on cost estimation ▌ SQL never defines how to execute the query, but what shall be returned P.6 PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -
Background: Custom-Plan Interface SELECT cat, avg(x) FROM t1, t2 WHERE t1.id = t2.id AND y > 100 GROUP BY cat; Aggregate • Hash Join key: cat • Merge Join • Nested Loop • Custom Join “ GpuHashJoin ” Join t1.id = t2.id • Seq Scan • Index Scan IndexScan on t1 Scan on t1 “ BulkLoad ” on t1 Scan on t2 • Index-Only Scan • Tid Scan y > 100 • Custom Scan t1 t2 P.7 PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -
PG-Strom Features ▌ Logics GpuScan ... Parallel evaluation of scan qualifiers GpuHashJoin ... Parallel multi-relational join GpuPreAgg ... Two phase aggregation GpuSort ... GPU + CPU Hybrid Sorting GpuNestedLoop (in develop) ▌ Data Types Integer, Float, Date/Time, Numeric, Text ▌ Function and Operators Equality and comparison operators Arithmetic operators and mathematical functions Aggregates: count, min/max, sum, avg, std, var, corr, regr P.8 PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -
Automatic GPU code generation postgres=# SET pg_strom.show_device_kernel = on ; postgres=# EXPLAIN VERBOSE SELECT * FROM t0 WHERE sqrt(x+y) < 10 ; QUERY PLAN -------------------------------------------------------------------------------- Custom Scan (GpuScan) on public.t0 (cost=500.00..357569.35 rows=6666683 width=77) Output: id, cat, aid, bid, cid, did, eid, x, y, z Device Filter: (sqrt((t0.x + t0.y)) < 10::double precision) Features: likely-tuple-slot Kernel Source: #include "opencl_common.h “ : static pg_bool_t gpuscan_qual_eval(__private cl_int *errcode, __global kern_parambuf *kparams, __global kern_data_store *kds, __global kern_data_store *ktoast, size_t kds_index) { pg_float8_t KPARAM_0 = pg_float8_param(kparams,errcode,0); pg_float8_t KVAR_8 = pg_float8_vref(kds,ktoast,errcode,7,kds_index); pg_float8_t KVAR_9 = pg_float8_vref(kds,ktoast,errcode,8,kds_index); return pgfn_float8lt(errcode, pgfn_dsqrt(errcode, pgfn_float8pl(errcode, KVAR_8, KVAR_9)), KPARAM_0); } P.9 PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -
Implementation (1/3) – GpuScan Result Output Stream DMA DMA Send Recv DMA DMA Send Recv Chunk DMA DMA Send (16~64MB) Recv Execution of Input auto-generated Stream GPU code Table P.10 PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -
Software Architecture SQL Query ※ Current version based on OpenCL Breaks down Query Message Queue the query to Parser parse tree Custom-Plan APIs GpuScan Query PG-Strom Makes query GpuHashJoin Optimizer OpenCL execution plan Server GpuPreAgg Query GpuSort Run the query Executor GPU Program Storage Manager Manager GPU Code Generator Shared Storage Buffer PG-Strom PostgreSQL Page. 11 DB Tech Showcase 2014 Tokyo; PG-Strom - GPGPU acceleration on PostgreSQL
Implementation (2/3) – GpuHashJoin vanilla Hash-Join GpuHashJoin Next stage Next stage CPU just Sequential references Materialization materialized by CPU results Parallel Materialization Hash-Table Search by CPU Hash Table Hash Table Parallel Hash-Table Search Inner Outer Inner Outer relation relation relation relation Page. 12 PG-Strom Preview Feb-2015
Benchmark result (1/2) – simple tables join Simple Tables Join Benchmark 300.00 263.51 Query Response Time [sec] 250.00 233.31 207.85 200.00 179.62 155.10 150.00 132.21 109.73 87.73 100.00 64.27 56.38 49.57 43.22 37.64 50.00 26.69 23.66 21.04 19.45 18.19 0.00 2 3 4 5 6 7 8 9 10 number of tables joined PG-Strom PostgreSQL ▌ Benchmark Query: SELECT * FROM t0 NATURAL JOIN t1 [NATURAL JOIN ....]; ▌ Environment: t0 has 100million rows (13GB), t1 - t9 has 40,000 rows for each, all-data pre-loaded CPU: Xeon E5-2670v3 (12C, 2.3GHz) x2, RAM: 384GB, GPU: Tesla K20c x1 P.13 PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -
Implementation (3/3) – GpuPreAgg 1 st Stage 2 nd Stage Reduction Reduction Chunk (16~64MB) Table P.14 PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -
Benchmark result (2/2) – Star Schema Model Typical Reporting Queries on Retail / Star-Schema Data 2000.00 1800.00 Query Response Time [sec] 1600.00 1400.00 1200.00 1000.00 800.00 600.00 400.00 200.00 0.00 Q.01 Q.02 Q.03 Q.04 Q.05 Q.06 Q.07 Q.08 Q.09 Q.10 Q.11 Q.12 Q.13 Q.14 Q.15 Q.16 Q.17 Q.18 Q.19 Q.20 Q.21 Q.22 Q.23 Q.24 Q.25 Q.26 Q.27 Q.28 Q.29 Q.30 Q.31 Q.32 Q.33 Q.34 Q.35 Q.36 Q.37 Q.38 Q.39 Q.40 PG-Strom PostgreSQL ▌ 40 typical reporting queries ▌ 100GB of retail / start-schema data, all pre-loaded ▌ Environment CPU: Xeon E5-2670v3(12C, 2.3GHz) x2, RAM: 384GB, GPU: Tesla K20c x1 P.15 PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -
Expected Scenario – Reduction of ETL BI SCM CRM ERP BI Replication ETL OLTP OLAP database database Master / Fact Tables OLAP Cubes PG-Strom Optimized to Optimized to transaction analytic workloads workloads Replica of Master / Fact Tables ▌ ETL – Its design is human centric task Sufficient to analytic ▌ Replication – much automatous task workloads also P.16 PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -
Direction of PG-Strom P.17 PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -
Development Plan Current version: PG- Strom β + PostgreSQL v9.5devel Migration of OpenCL to CUDA Add support of GpuNestedLoop Add support multi-functional kernel Standardization of custom-join interface ...and more...? Short term target: PostgreSQL v9.5 timeline (2015) Integration with funnel executor Investigation to SSD/NvRAM utilization Custom-sort/aggregate interface Add support for spatial data types (?) Middle term target: PostgreSQL v9.6 timeline (2016) P.18 PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -
Let’s try – Deployment on AWS Search by “ strom ” ! AWS GPU Instance (g2.2xlarge) CPU Xeon E5-2670 (8 xCPU) RAM 15GB GPU NVIDIA GRID K2 (1536core) Storage 60GB of SSD Price $0.898/hour, $646.56/mon (*) Price for on-demand instance on Tokyo region at Nov-2014 Page. 19 The PostgreSQL Conference 2014, Tokyo - GPGPU Accelerates PostgreSQL
Welcome your involvement! ▌ How to be involved? as a user as a developer as a business partner check it out! ▌ Source code https://github.com/pg-strom/devel ▌ Contact US e-mail: kaigai@ak.jp.nec.com twitter: @kkaigai ...or, catch me in the Convention Center P.20 PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -
Recommend
More recommend