Database and Software Engineering Group University of Magdeburg Are Databases Fit for Hybrid Workloads on GPUs? A Storage Engine’s Perspective Marcus Pinnecke , David Broneske, Gabriel Campero Durand, Gunter Saake HardBD 2017, San Diego, April 22, 2017
Hybrid Transaction and Analytic Processing (HTAP) HTAP Optimized HTAP database systems run both OLTP & OLAP OLAP OLTP Physical Record Layout Optimized Optimized Re-Organization • HyPer, Peloton, HANA, … ANALYTICAL TRANSACTIONAL benefit is larger business value, through: Database WORKLOADS WORKLOADS Storage Engine • less latency for analysis • Compute Device Physical Record Layout less synchronization effort OLTP OLAP P Main Processor Co-Processor Re-Assignment t Re-Organization Optimized Optimized Only Only related challenges Co-Processor HTAP • Accelerated Optimized different data access pattern • adapt record layout (NSM, DSM,…) • interference between query types • contradicting optimization goals • different types of parallelism • hot and cold data 1
Database Systems on Heterogenous Platforms HTAP Optimized heterogenous systems use co-processors OLAP OLTP Physical Record Layout • Optimized Optimized Re-Organization host (CPU), and device (e.g., GPU) • CoGaDB, GPUTx, Ocelot, … ANALYTICAL TRANSACTIONAL Database WORKLOADS WORKLOADS Storage Engine benefit is exploiting compute capacities • overcome limitations of power wall Compute Device Main Processor Co-Processor Re-Assignment • Only Only special jobs for specialized processors Co-Processor related challenges Accelerated • data transfer costs for I/O • different programming models • device limitations (e.g., memory capacity) • data and operator placement 2
Motivation
Hybridization of HTAP and Heterogenous Computing HTAP First: Is there performance potential? HTAP Optimized Optimized OLAP OLTP Physical Record Layout OLAP OLTP Physical Record Layout Optimized Optimized Optimized Re-Organization Optimized Re-Organization ANALYTICAL ANALYTICAL TRANSACTIONAL TRANSACTIONAL WORKLOADS WORKLOADS WORKLOADS WORKLOADS Database Database Compute Device Physical Record Layout Compute Device OLTP OLAP Main Processor P Co-Processor t Re-Assignment Re-Organization Main Processor Co-Processor Re-Assignment Optimized Optimized Only Only Only Only Co-Processor HTAP Co-Processor Accelerated Optimized Accelerated HTAP Database Systems Heterogenous Database Systems TPC-C Benchmark Dataset measured effort “OLTP“ query “HTAP“ query “OLAP“ query materialization aggregation of some aggregation of all select * select sum(c_bought_item.price) select sum(price) from customers from customers ⨝ … ⨝ item from item where 150 customers where true where 150 items 3
Hybridization of HTAP and Heterogenous Computing First: Is there performance potential? „OLTP“ query materialization materialize 150 customers higher values are better 150M throughput [records/s] throughput [records/s] 0.12M ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● 0.09M ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● row-store / host & single-threaded ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 100M ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● row-store / host & multi-threaded ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.06M 50M ● column-store / host & multi-threaded 0.03M ● column-store / host & single-threaded ● 0M 5M 25M 45M 65M 85M #records in customer table Setup TPC-C benchmark customer record 96B (21 fields) / item record 20B + 8B (4 fields + price field ), system configuration operator-at-a-time processing w/ late materialization, host: max. 8 4 threads blockwise partitioning, device: optimized parallel reduction kernel (>= 1024 blocks w/ 512 threads), final reduction on 1 block w/ 1024 threads, effort for join processing not incl.
Recommend
More recommend