Adaptive FPGA-based Database Accelerators – Achievements, Possibilities, and Challenges Daniel Ziener and Jürgen Teich
Database Acceleration – Overview Idea: Translate each SQL query into an FPGA-based accelerator circuit through run-time assembly of dynamically reconfigurable hardware modules Hardware Trades Module Library a: Symbol = USBN SQL query WHERE a SELECT Price, Volume = W S FROM Trades Trades UBSTrades SELECT WHERE Price, Vol. a Symbol=“UBSN“ INTO UBSTrades FPGA UBSTrades DynSoC Daniel Ziener | 07.03.2017 | Dagstuhl | FPGA-based Database Accelerators – Achievments, Possibilities, and Challenges 2
Database Acceleration – Architecture SELECT * FROM table WHERE age > 20 SELECT * FROM table WHERE salary > 10000 AND year < 1990 Host FPGA Reconf. Manager O I A U > N N T D PCIe Reconfigurable Area > < Data Library O I A A U N > < > > < N N T D D Reconfigurable Area Daniel Ziener | 07.03.2017 | Dagstuhl | FPGA-based Database Accelerators – Achievments, Possibilities, and Challenges 3
Database Acceleration – Overview Module Library ● Each partial area consists of 16 slots Module Operator Coverage Number of Throughput Slots Arithmetic (+,-, ) Comparators (<,>,=,≠) Restriction 2 1 Sample/Cycle Bitwise functions (AND, OR, NOT, XOR, ...) Aggregation SUM(), MIN(), MAX(), COUNT() 2 1 Sample/Cycle Reorder Reorder Attributes of a tuple 4 1 Sample/Cycle Join Hash and Merge Join - 1 Sample/Cycle Sort line for sorting 2 KB (64 KB) data 16 1 Sample/Cycle Sort tree merges sorted block - 1 Sample/Cycle ● Each reconfigurable area consists of 16 slots ● 4 reconfigurable areas available on our prototype Daniel Ziener | 07.03.2017 | Dagstuhl | FPGA-based Database Accelerators – Achievments, Possibilities, and Challenges 4
New Architecture: 12.8 GByte/s and 64 Bytes per Database Acceleration – Lessons Learned Clock Cycle ● High processing throughput achievable ● Pipelined modules have a throughput of 2 GByte/s per reconfigurable area (125 MHz x 16 Bytes) ● The throughput is independent of the number of concatenated modules New Architecture: ● I/O turns out to define the bottleneck DDR3 Memory: 12.8 GByte/s ● PCIe Gen2 x4: 1.7 GByte/s ● Only one interface to feed all reconfigurable areas ● Flexibility is the key feature ● For each query different decisions can be taken at run-time Hash Merge Row- Column- Join Join based based ● All processing alternatives can be executed on the same static system Daniel Ziener | 07.03.2017 | Dagstuhl | FPGA-based Database Accelerators – Achievments, Possibilities, and Challenges 5
Database Acceleration – New High-Performance Architecture Incoming queries FPGA Host Conf. Database Tables Manager Reconfigurable Area Align- B Hash ment L > = O Join + Unit O Aggr. M Query analysis + filter configuration Data processing Data processing Host FPGA Data processing Data processing time Daniel Ziener | 07.03.2017 | Dagstuhl | FPGA-based Database Accelerators – Achievments, Possibilities, and Challenges 6
Database Acceleration – Results (FPT’15) ● Comparing Energy/Power consumption of an Intel Core i7 with our approach based on an embedded Xilinx Zynq-SoC ● Analysis of example query based on the TPC-DS benchmark (1 GB scale), including restrictions, aggregations, and joins ARM – MySQL Intel i7 – MySQL Accl@ Zynq Execution time 44.2 ms 6900 ms 420 ms Overall energy 190 mJ 1.47 J 5.33 J Improvment t exe 156 9.5 Improvment Energ. 7.72 27.97 Daniel Ziener | 07.03.2017 | Dagstuhl | FPGA-based Database Accelerators – Achievments, Possibilities, and Challenges 7
Database Acceleration – Results (FPT’15) ● Comparing Energy/Power consumption of an Intel Core i7 with our approach based on an embedded Xilinx Zynq-SoC ● Analysis of example query based on the TPC-DS benchmark (1 GB More Information: scale), including restrictions, aggregations, and joins [1] D. Ziener, F. Bauer, A. Becher, C. Dennl, K. Meyer-Wegener, U. Schürfeld, J. Teich, J. Vogt and H. Weber. FPGA-Based Dynamically Reconfigurable SQL Query Processing. ACM Transactions on Reconfigurable Technology and Systems (TRETS), vol. 9, no. 4, Article 25, July 2016. ARM – MySQL Intel i7 – MySQL Accl@ Zynq [2] A. Becher, D. Ziener, K. Meyer-Wegener and J. Teich. Execution time 44.2 ms 6900 ms 420 ms A Co-Design Approach for Accelerated SQL Query Processing via FPGA-based Data Filtering. In Proceedings of 2015 International Overall energy 190 mJ 1.47 J 5.33 J Conference on Field-Programmable Technology (FPT '15), Queenstown, Improvment t exe 156 9.5 New Zealand, December 7--9, 2015. Improvment Energ. 7.72 27.97 Daniel Ziener | 07.03.2017 | Dagstuhl | FPGA-based Database Accelerators – Achievments, Possibilities, and Challenges 8
Current Database Management Systems ● Database management systems are multi-user systems ● Different queries with different complexity have to be processed on different data at the same time ● Response time is very important ● Bunch of different operations ● Query processing ● Sorting ● Data analytics ● Data update ● Changing load scenarios over time ● E.g., day: query processing; night: data analytics Daniel Ziener | 07.03.2017 | Dagstuhl | FPGA-based Database Accelerators – Achievments, Possibilities, and Challenges 9
HW Accelerators for Big Data Applications ● Current software solutions ● Multi-Core server systems with many nodes ● On each core, data processing is done with data or time slices ● Advantages: OS support (task switching, mapping onto processing places) Easy to extend with new operators or analytic functions Question: How can we achieve such a flexibility for HW-based accelerators? FPGA SSDs Host Library PCIe/ SATA CAPI > < & Ext. Memory Daniel Ziener | 07.03.2017 | Dagstuhl | FPGA-based Database Accelerators – Achievments, Possibilities, and Challenges 10
Recommend
More recommend