The Glass Half Full Using Programmable Hardware Accelerators in - PowerPoint PPT Presentation

The Glass Half Full Using Programmable Hardware Accelerators in Analytical Databases Zsolt István IMDEA Software Institute 1

IM IMDEA Soft ftware In Institute • 16 Faculty in the areas of: • Program Analysis and Verification • Languages and Compilers • Security and Privacy • Theoretical Computer Science • Distributed Systems and Databases • ~10 Post-docs, ~25 PhD Students, ~10 Interns • Located in UPM Montegancedo Campus, Madrid • We are hiring! https://software.imdea.org/

Context: Analytical Databases ▪ OLAP – Online Analytical Processing ▪ Large datasets – up to TBs ▪ Ad-hoc querying to extract insight, recurring reporting – Possibly complex operations ▪ Read-mostly workloads, updates in batches ▪ OLTP – Online Transaction Processing ▪ Smaller datasets ▪ Queries known, relate to business actions ▪ Makes heavy use of indexes ▪ Reads and updates intermixed 3

Databases were a 25 Billion $ market in 2018… Could we specialize machines to them? 4 https://www.statista.com/statistics/810188/worldwide-commercial-database-market-size/

Database Computer – ’70s “The first goal is to design it with the capability of handling a very large on-line database of 10^10 bytes or beyond since special-purpose machines are not likely to be cost- effective for small databases.” ▪ Fully custom machine for databases ▪ Processors – special ISA microprocessors ▪ Memory – magnetic bubbles and CCDs ▪ Semiconductor technology and general purpose CPUs took over Jayanta Banerjee, David K. Hsiao, Krishnamurthi Kannan: DBC - A Database Computer for Very Large Databases . 5 IEEE Trans. Computers 28(6): 414-429 (1979)

Gamma Machine – ’80s ▪ Based on VAX multi- processor system ▪ By the time the software and hardware were developed, CPUs have become much faster ▪ Couldn’t keep up with Moore’s law David J. DeWitt, Robert H. Gerber, Goetz Graefe, Michael L. Heytens, Krishna B. Kumar, M. Muralikrishna: GAMMA - A High 6 Performance Dataflow Database Machine. VLDB 1986: 228-237

Data/Compute Gap Specialized CPU Scaling Commodity in Cloud Hardware Revival 7

Renewed interest in Specialized Hardware CPUs FPGAs ASICs 8

Re-programmable Specialized Hardware F ield P rogrammable G ate A rray (FPGA) ▪ Free choice of architecture Op 1 ▪ Fine-grained pipelining, communication, distributed memory Op 2 ▪ Tradeoff: all “code” occupies chip space Op 3 ▪ Evolving platform: larger chips, more heterogeneity 9

Integration Options Data Data Accel. Accel. Data Accel. 1) On the side 3) Co-processor 2) In data-path 10

In the Cloud Today ▪ Accelerator ▪ Amazon F1 ▪ In data path ▪ Microsoft Catapult ▪ Co-processor FPGA ▪ Intel Xeon+FPGA FPGA FPGA CPU CPU CPU Socket1 Socket2 Socket1 Intel Xeon+FPGA Gen.2 Intel Xeon+FPGA Gen.1 11

The Glass Half Empty… 12

The Glass Half Empty… ▪ 1) On the side acceleration introduces overhead Query execution time 120 100 80 Accel. 60 2x 40 Data 20 0 Software With Acceleration Compute Data Movement ▪ Many related work offers no real speedup if we factor in data movement, transformation, software overhead… 13

The Glass Half Empty… ▪ 2) “All or nothing” behavior makes query planning difficult ▪ Example: fixed capacity hash table on FPGA ▪ Constant time access for reads and writes ▪ What happens if data doesn’t fit? ▪ Can’t always know the number of keys aprioi # 14

The Glass Half Empty… ▪ 3) Analytical databases becoming more optimized / not much compute in core SQL ▪ X100 [CIDR05] showed that <10% of compute time spent on SQL operators +,-,*,SUM,AVG in analytical queries ▪ Columnar stores often memory bound (10s of GB/s) 15

The Glass Half Empty… ▪ On the side acceleration introduces overhead ▪ “All or nothing” behavior makes query planning difficult ▪ Analytical databases becoming more optimized / not much compute in core SQL 16

The Glass Half Full… ▪ On the side acceleration introduces overhead ✓ Reduce data movement bottlenecks 17

Processing in data path: Smart Flash ▪ IBEX: Database storage engine with processing offload ▪ Filter and pre-aggregate for analytic workloads → Larger bandwidth, more IOPS (Samsung YourSQL, MIT BlueDBM) ▪ Opportunity to extend SSDs/Flash with complex offload SSD IBEX Database Server Samsung “smart” SSD IBEX – An Intelligent Storage Engine with Support for Advanced SQL Off-loading. L. Woods, Z. Istvan and G. Alonso, VLDB’14 18

Processing in data path: Distributed Processing Caribou: Distributed Workers (Compute) storage with processing + Provisioning • Specialized HW nodes • + Scalability 10Gbps access • 25W power cons. Storage Zsolt István, David Sidler, Gustavo Alonso: Caribou: Intelligent Distributed Storage . PVLDB 10(11), 2017. 19

Smart Storage in Databases: Filter push-down SELECT … FROM customer WHERE age<35 AND purchases>2 AND address LIKE “%PO. Box 123%” ▪ Challenge: guarantee that filtering never slows down retrieval Intel Hyperscan library (Xeon E5-2680 v2) ▪ Algorithms can be re-imagined to become bandwidth-bound 2.8x instead of compute-bound ▪ Extend the state of the art: parameterization without re-programming [FCCM16] ▪ Many options: Regular expressions, comparisons, decompression, … 20 [FCCM16] Runtime Parameterizable Regular Expression Operators for Databases. Zs. Istvan, D. Sidler, G. Alonso. FCCM’16

The Glass Half Full… ✓ Reduce data movement bottlenecks ▪ “All or nothing” behavior makes query planning difficult ✓ Hybrid processing 21

IBEX’s Hybrid Group -by ▪ Group-by: Compute aggregate function over categories ▪ select avg(salary) from employees group by department Ibex with SW-only Group-By CPU Final Filtered Projection Selection Group-by Group Input table data s 22

IBEX’s Hybrid Group -by ▪ Group-by: Compute aggregate function over categories ▪ select avg(salary) from employees group by department Ibex with HW-only Group-By CPU Final Filtered Projection Selection Group-by Group Input table data s 23

IBEX’s Hybrid Group -by ▪ Group-by: Compute aggregate function over categories ▪ select avg(salary) from employees group by department ▪ If number of groups does not fit on FPGA? ▪ Send partial aggregates – finalize in SW ▪ Worst case: same as no acceleration ▪ Best- case: All in HW! Ibex with Hybrid Group-by CPU Ibex with HW-only Group-By CPU Final Final Partial Filtered Filtered Projection Projection Selection Selection Group-by Group-by Group-by Group Group Group Input table Input table data data s s s Challenge: How to split across accelerator and software? 24

The Glass Half Full ✓ Reduce data movement bottlenecks ✓ Hybrid Processing ▪ Analytical databases becoming more optimized / not much compute in core SQL ✓ Emerging compute-intensive workloads 25

The Rise of Machine Learning ▪ Databases adopting new ways of analyzing the data ▪ SAP Hana, Oracle, SQL Server, etc. ▪ Specialized hardware can help both with model building [Kara18] , inference [Owaida18] ▪ Benefits for “classical” algorithms as well [Kara18] Kara et al: ColumnML: Column-Store Machine Learning with On-The-Fly Data Transformation. PVLDB 12(4): 348-361 (2018) 26 [Owaida18] Owaida et al: Application Partitioning on FPGA Clusters: Inference over Decision Tree Ensembles. FPL 2018: 295-300

doppioDB: a hybrid database engine No data copy, transformation, DRAM (DB Tables) partitioning, etc. FPGA Co-processor CPU Hardware Hardware Software Software Operator Operator operator operator Database Hardware Hardware Engine Operator Operator (MonetDB) ▪ Goal: extend the capabilities of analytical databases ▪ FPGA works on the same data as software (cache-coherent access) ▪ Can combine SW and HW operators inside the same query ▪ Challenge: ensure high utilization of FPGA, use in many queries 27

K-means – Algorithm ◼ Goal: partition unlabeled data into several clusters, where the number of clusters is the “k” in the k -means. ◼ Two steps in each iteration: ◼ Assignment : assign data points to closet centroid according to distance metric ◼ Centroid update : the centroids are re- calculated by averaging all the data points within each cluster ◼ Long process if the data set and number of iterations are large 28

Design – Execution Walk-Through 1 4 Receives K-Means parameters Accumulates data points per cluster and counts how many data points are assigned to 2 Fetch the initial centroids and each cluster the data 5 Collect partial results from each pipeline 3 Calculates the distance between 6 Division for updating new centroid a data point and all the centroids and assign it to closest centroid 7 Writes back the final results 3 4 1 DRAM 2 (DB Tables) 7 6 5 29 Zhenhao He, David Sidler, Zsolt István, Gustavo Alonso: A Flexible K-Means Operator for Hybrid Databases . FPL 2018

Uses of Parallelism ▪ K-Means algorithm ▪ FPGA outperforms several cores of the CPU Need to determine K ▪ Can use parallelism in two ways – cover more queries (Elbow method) K is known / Centroids known 30

The Glass Half Full Using Programmable Hardware Accelerators in - PowerPoint PPT Presentation

The Glass Half Full Using Programmable Hardware Accelerators in Analytical Databases Zsolt Istvn IMDEA Software Institute 1 IM IMDEA Soft ftware In Institute 16 Faculty in the areas of: Program Analysis and Verification

SCULPTURE MATERIALS SCULPTURE MATERIALS PATTERNS PLATE GLASS GLASS BOTTLES GLASS BOTTLES

CH105 Part II: Inorganic Chemistry The optimist sees the glass half full. The pessimist sees

Is the Glass Half Empty or Half Full: The State of Glass Recycling at U.S. MRFs Northeast

full year results full year results full year results full full year results full year results full

3M Glass Bubbles iM16K 3M Glass Bubbles iM16K 3M Glass Bubbles and Other Additives

Color Half Toning Half Toning Digital Half Toning Half toning and Colors Half Toning Half

Knowledge Seminar Series : High Performance Glass October 18, 2017 Jim Larsen Director,

An Image Capture Application For The Google Glass Framework Oliver Nina (UCF), Roger Pack (FS)

RUSSIAN FLAT GLASS COATINGS MARKET OVERVIEW Dmitriy D. Bernt Glass Coating Technology

Glass Packaging Institute Overview and Activity Update Bryan Vickers Glass Packaging Institute

Glass Container Manufacturing & Recycling Overview Scott DeFife President Glass Packaging

3. A vacuum Glass Catch attachment for broken glass GLASS OTHERS HOME $2.35 Million

The Glass Menagerie Jadn S. & Claire B. Symbolism Lauras Glass Menagerie The Glass

Glass Fiber Reinforced Concrete What is Glass Fiber Reinforced Concrete Glass-fiber Reinforced

Recycling of Glass from Construction & Building Demolition Waste Views from the flat glass

Sept 2017 Fast, Safe , clean the Easy Lamination of Glass-Glass CONFIDENTIAL Page 1 Agenda

Data Warehousing Outline Overview of data warehousing Dimensional Modeling Online

Engaging in Logical Code Reasoning with an Activity-Based Online Tool Computer Science n School of

Online Convex Optimization in Adversarial MDPs Aviv Rosenberg Yishay Mansour Motivation:

Welcome to the Internet Seminar Case Studies to Assess Poten/al Impacts of

CloudDB: A Data Store for all Sizes in the Cloud Hakan Hacigumus Data Management Research NEC

What are Information Systems? Roman Kontchakov Birkbeck, University of London Based on Chapter 1

Big Data Processing Techniques Chentao Wu Associate Professor Dept. of Computer Science and

Exploiting Versions for Online Warehouse Maintenance in MOLAP Servers Heum-Geun Kang and

The Glass Half Full Using Programmable Hardware Accelerators in - PowerPoint PPT Presentation

The Glass Half Full Using Programmable Hardware Accelerators in Analytical Databases Zsolt Istvn IMDEA Software Institute 1 IM IMDEA Soft ftware In Institute 16 Faculty in the areas of: Program Analysis and Verification

SCULPTURE MATERIALS SCULPTURE MATERIALS PATTERNS PLATE GLASS GLASS BOTTLES GLASS BOTTLES

CH105 Part II: Inorganic Chemistry The optimist sees the glass half full. The pessimist sees

Is the Glass Half Empty or Half Full: The State of Glass Recycling at U.S. MRFs Northeast

full year results full year results full year results full full year results full year results full

3M Glass Bubbles iM16K 3M Glass Bubbles iM16K 3M Glass Bubbles and Other Additives

Color Half Toning Half Toning Digital Half Toning Half toning and Colors Half Toning Half

Knowledge Seminar Series : High Performance Glass October 18, 2017 Jim Larsen Director,

An Image Capture Application For The Google Glass Framework Oliver Nina (UCF), Roger Pack (FS)

RUSSIAN FLAT GLASS COATINGS MARKET OVERVIEW Dmitriy D. Bernt Glass Coating Technology

Glass Packaging Institute Overview and Activity Update Bryan Vickers Glass Packaging Institute

Glass Container Manufacturing &amp; Recycling Overview Scott DeFife President Glass Packaging

3. A vacuum Glass Catch attachment for broken glass GLASS OTHERS HOME $2.35 Million

The Glass Menagerie Jadn S. &amp; Claire B. Symbolism Lauras Glass Menagerie The Glass

Glass Fiber Reinforced Concrete What is Glass Fiber Reinforced Concrete Glass-fiber Reinforced

Recycling of Glass from Construction &amp; Building Demolition Waste Views from the flat glass

Sept 2017 Fast, Safe , clean the Easy Lamination of Glass-Glass CONFIDENTIAL Page 1 Agenda

Data Warehousing Outline Overview of data warehousing Dimensional Modeling Online

Engaging in Logical Code Reasoning with an Activity-Based Online Tool Computer Science n School of

Online Convex Optimization in Adversarial MDPs Aviv Rosenberg Yishay Mansour Motivation:

Welcome to the Internet Seminar Case Studies to Assess Poten/al Impacts of

CloudDB: A Data Store for all Sizes in the Cloud Hakan Hacigumus Data Management Research NEC

What are Information Systems? Roman Kontchakov Birkbeck, University of London Based on Chapter 1

Big Data Processing Techniques Chentao Wu Associate Professor Dept. of Computer Science and

Exploiting Versions for Online Warehouse Maintenance in MOLAP Servers Heum-Geun Kang and

Glass Container Manufacturing & Recycling Overview Scott DeFife President Glass Packaging

The Glass Menagerie Jadn S. & Claire B. Symbolism Lauras Glass Menagerie The Glass

Recycling of Glass from Construction & Building Demolition Waste Views from the flat glass