pg strom
play

PG-Strom Query Acceleration Engine of PostgreSQL Powered by GPGPU - PowerPoint PPT Presentation

PG-Strom Query Acceleration Engine of PostgreSQL Powered by GPGPU NEC OSS Promotion Center The PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com> Self Introduction Name: KaiGai Kohei Company: NEC Mission: Software


  1. PG-Strom Query Acceleration Engine of PostgreSQL Powered by GPGPU NEC OSS Promotion Center The PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>

  2. Self Introduction ▌ Name: KaiGai Kohei ▌ Company: NEC ▌ Mission: Software architect & Intrepreneur ▌ Background:  Linux kernel development (2003~?)  PostgreSQL development (2006~)  SAP alliance (2011~2013)  PG-Strom development & productization (2012~) ▌ PG-Strom Project:  In-company startup of NEC  Also, an open source software project P.2 PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -

  3. What is PG-Strom ▌ An Extension of PostgreSQL ▌ Off-loads CPU intensive SQL workloads to GPU processors ▌ Major Features ① Automatic and just-in-time GPU code generation from SQL ② Asynchronous and concurrent query executor PG-Strom GPU code Query Custom Frontend on the fly SQL Query Planner Planner command Query Custom Executor Executor Async- Execution database P.3 PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -

  4. Concept ▌ No Pain  Looks like a traditional PostgreSQL database from standpoint of applications, thus, we can utilize existing tools, drivers, applications. ▌ No Tuning  Massive computing capability by GPGPU kills necessity of database tuning by human. It allows engineering folks to focus on the task only human can do. ▌ No Complexity  No need to export large data to external tools from RDBMS, because its computing performance is sufficient to run the workloads nearby data. P.4 PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -

  5. Technology Trend SOURCE: THE HEART OF AMD INNOVATION, Lisa Su, at AMD Developer Summit 2013 ▌ Movement to CPU/GPU integrated architecture rather than multicore CPU ▌ Free lunch for SW by HW evolution will finish soon  Unless software is not designed to utilize GPU capability, unable to pull-out the full hardware capability. Page. 5 DB Tech Showcase 2014 Tokyo; PG-Strom - GPGPU acceleration on PostgreSQL

  6. Background: How SQL is executed postgres# EXPLAIN SELECT cat, avg(x) FROM t0 NATURAL JOIN t1 WHERE t0.z like '%abc%' GROUP BY cat; QUERY PLAN ------------------------------------------------------------------ HashAggregate (cost=6629.88..6629.89 rows=1 width=12) Group Key: t0.cat -> Hash Join (cost=1234.00..6619.77 rows=2020 width=12) Hash Cond: (t0.aid = t1.aid) -> Seq Scan on t0 (cost=0.00..5358.00 rows=2020 width=16) Filter: (z ~~ '%abc%'::text) -> Hash (cost=734.00..734.00 rows=40000 width=4) -> Seq Scan on t1 (cost=0.00..734.00 rows=40000 width=4) (8 rows) ▌ Planner constructs query execution plan based on cost estimation ▌ SQL never defines how to execute the query, but what shall be returned P.6 PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -

  7. Background: Custom-Plan Interface SELECT cat, avg(x) FROM t1, t2 WHERE t1.id = t2.id AND y > 100 GROUP BY cat; Aggregate • Hash Join key: cat • Merge Join • Nested Loop • Custom Join “ GpuHashJoin ” Join t1.id = t2.id • Seq Scan • Index Scan IndexScan on t1 Scan on t1 “ BulkLoad ” on t1 Scan on t2 • Index-Only Scan • Tid Scan y > 100 • Custom Scan t1 t2 P.7 PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -

  8. PG-Strom Features ▌ Logics  GpuScan ... Parallel evaluation of scan qualifiers  GpuHashJoin ... Parallel multi-relational join  GpuPreAgg ... Two phase aggregation  GpuSort ... GPU + CPU Hybrid Sorting  GpuNestedLoop (in develop) ▌ Data Types  Integer, Float, Date/Time, Numeric, Text ▌ Function and Operators  Equality and comparison operators  Arithmetic operators and mathematical functions  Aggregates: count, min/max, sum, avg, std, var, corr, regr P.8 PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -

  9. Automatic GPU code generation postgres=# SET pg_strom.show_device_kernel = on ; postgres=# EXPLAIN VERBOSE SELECT * FROM t0 WHERE sqrt(x+y) < 10 ; QUERY PLAN -------------------------------------------------------------------------------- Custom Scan (GpuScan) on public.t0 (cost=500.00..357569.35 rows=6666683 width=77) Output: id, cat, aid, bid, cid, did, eid, x, y, z Device Filter: (sqrt((t0.x + t0.y)) < 10::double precision) Features: likely-tuple-slot Kernel Source: #include "opencl_common.h “ : static pg_bool_t gpuscan_qual_eval(__private cl_int *errcode, __global kern_parambuf *kparams, __global kern_data_store *kds, __global kern_data_store *ktoast, size_t kds_index) { pg_float8_t KPARAM_0 = pg_float8_param(kparams,errcode,0); pg_float8_t KVAR_8 = pg_float8_vref(kds,ktoast,errcode,7,kds_index); pg_float8_t KVAR_9 = pg_float8_vref(kds,ktoast,errcode,8,kds_index); return pgfn_float8lt(errcode, pgfn_dsqrt(errcode, pgfn_float8pl(errcode, KVAR_8, KVAR_9)), KPARAM_0); } P.9 PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -

  10. Implementation (1/3) – GpuScan Result Output Stream DMA DMA Send Recv DMA DMA Send Recv Chunk DMA DMA Send (16~64MB) Recv Execution of Input auto-generated Stream GPU code Table P.10 PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -

  11. Software Architecture SQL Query ※ Current version based on OpenCL Breaks down Query Message Queue the query to Parser parse tree Custom-Plan APIs GpuScan Query PG-Strom Makes query GpuHashJoin Optimizer OpenCL execution plan Server GpuPreAgg Query GpuSort Run the query Executor GPU Program Storage Manager Manager GPU Code Generator Shared Storage Buffer PG-Strom PostgreSQL Page. 11 DB Tech Showcase 2014 Tokyo; PG-Strom - GPGPU acceleration on PostgreSQL

  12. Implementation (2/3) – GpuHashJoin vanilla Hash-Join GpuHashJoin Next stage Next stage CPU just Sequential references Materialization materialized by CPU results Parallel Materialization Hash-Table Search by CPU Hash Table Hash Table Parallel Hash-Table Search Inner Outer Inner Outer relation relation relation relation Page. 12 PG-Strom Preview Feb-2015

  13. Benchmark result (1/2) – simple tables join Simple Tables Join Benchmark 300.00 263.51 Query Response Time [sec] 250.00 233.31 207.85 200.00 179.62 155.10 150.00 132.21 109.73 87.73 100.00 64.27 56.38 49.57 43.22 37.64 50.00 26.69 23.66 21.04 19.45 18.19 0.00 2 3 4 5 6 7 8 9 10 number of tables joined PG-Strom PostgreSQL ▌ Benchmark Query: SELECT * FROM t0 NATURAL JOIN t1 [NATURAL JOIN ....]; ▌ Environment:  t0 has 100million rows (13GB), t1 - t9 has 40,000 rows for each, all-data pre-loaded  CPU: Xeon E5-2670v3 (12C, 2.3GHz) x2, RAM: 384GB, GPU: Tesla K20c x1 P.13 PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -

  14. Implementation (3/3) – GpuPreAgg 1 st Stage 2 nd Stage Reduction Reduction Chunk (16~64MB) Table P.14 PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -

  15. Benchmark result (2/2) – Star Schema Model Typical Reporting Queries on Retail / Star-Schema Data 2000.00 1800.00 Query Response Time [sec] 1600.00 1400.00 1200.00 1000.00 800.00 600.00 400.00 200.00 0.00 Q.01 Q.02 Q.03 Q.04 Q.05 Q.06 Q.07 Q.08 Q.09 Q.10 Q.11 Q.12 Q.13 Q.14 Q.15 Q.16 Q.17 Q.18 Q.19 Q.20 Q.21 Q.22 Q.23 Q.24 Q.25 Q.26 Q.27 Q.28 Q.29 Q.30 Q.31 Q.32 Q.33 Q.34 Q.35 Q.36 Q.37 Q.38 Q.39 Q.40 PG-Strom PostgreSQL ▌ 40 typical reporting queries ▌ 100GB of retail / start-schema data, all pre-loaded ▌ Environment  CPU: Xeon E5-2670v3(12C, 2.3GHz) x2, RAM: 384GB, GPU: Tesla K20c x1 P.15 PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -

  16. Expected Scenario – Reduction of ETL BI SCM CRM ERP BI Replication ETL OLTP OLAP database database Master / Fact Tables OLAP Cubes PG-Strom Optimized to Optimized to transaction analytic workloads workloads Replica of Master / Fact Tables ▌ ETL – Its design is human centric task Sufficient to analytic ▌ Replication – much automatous task workloads also P.16 PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -

  17. Direction of PG-Strom P.17 PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -

  18. Development Plan Current version: PG- Strom β + PostgreSQL v9.5devel  Migration of OpenCL to CUDA  Add support of GpuNestedLoop  Add support multi-functional kernel  Standardization of custom-join interface  ...and more...? Short term target: PostgreSQL v9.5 timeline (2015)  Integration with funnel executor  Investigation to SSD/NvRAM utilization  Custom-sort/aggregate interface  Add support for spatial data types (?) Middle term target: PostgreSQL v9.6 timeline (2016) P.18 PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -

  19. Let’s try – Deployment on AWS Search by “ strom ” ! AWS GPU Instance (g2.2xlarge) CPU Xeon E5-2670 (8 xCPU) RAM 15GB GPU NVIDIA GRID K2 (1536core) Storage 60GB of SSD Price $0.898/hour, $646.56/mon (*) Price for on-demand instance on Tokyo region at Nov-2014 Page. 19 The PostgreSQL Conference 2014, Tokyo - GPGPU Accelerates PostgreSQL

  20. Welcome your involvement! ▌ How to be involved?  as a user  as a developer  as a business partner check it out! ▌ Source code  https://github.com/pg-strom/devel ▌ Contact US  e-mail: kaigai@ak.jp.nec.com  twitter: @kkaigai ...or, catch me in the Convention Center P.20 PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -

Recommend


More recommend