Hardware Acceleration of Database Operations Jared Casper and Kunle - PowerPoint PPT Presentation

Hardware Acceleration of Database Operations Jared Casper and Kunle Olukotun Pervasive Parallelism Laboratory Stanford University

Database machines n Database machines from late 1970s n Put some compute on the disk track/head/unit n Processors got faster, I/O performance did not n Processor could keep up with disk n No performance left on the table n Today's database machines n Made up of general purpose components n Massive amounts of memory n Very high speed interconnect n Tables, even databases, fit entirely within memory 2

Database Operation Acceleration n Processors can not keep up with memory n Join performance is at 100s of million tuples per second n 64-bit tuples → 2-3 GB/s n Chips can get over 100 GB/s n Performance is being left on the table n Follow 10x10 rule, build accelerators n Three acceleration blocks n Selection, merge join, sort n Combine these to do a sort merge join n Goal is to “keep up with memory” 3

Select 1 0 1 1 1 0 0 1 C F E B E C A F E B A B E n Software implementation uses SIMD n Read data into SIMD register n Use SIMD shuffle operation to move selected data to one end of the register n Mask used as index into table for shuffle values n Unaligned write to append to output n Limited by SIMD width, number of SIMD registers 4

Select 4 5 6 7 1011 5

Merge Join n Scan two sorted columns, output matching values n Can have associated values or record IDs n Output cross product when multiple values n Generally viewed as the “free” thing after sorting n More an indication of how slow sorting is n Software implementations have bad branching behaviour n Limits the IPC → hard to keep up with memory 6

Merge Join ¨ Output is bitmask of equal keys with corresponding values ¤ Ready for input into the select block 7

Merge Sort 4 8 2 1 5 5 7 0 1st 4 8 1 2 5 5 0 7 Pass 1 2 4 8 0 5 5 7 2nd Pass 0 1 2 4 5 5 7 8

Merge Sort Level 9

High Bandwidth Sort Merge Node 10

Sort Merge Join ¨ Sort, merge join, and select blocks are combined to perform an full sort merge join in hardware 11

Prototyping Platform - Maxeler 12

Select Throughput 24 64 23 62 60 % of Line Bandwidth 22 Throughput (GB/s) 58 21 56 54 Memory System 20 52 Saturated! 19 50 48 18 46 17 44 16 42 0 10 20 30 40 50 60 70 80 90 100 Cardinality (%) n Software achieved 7 GB/s (33%) n STREAM achieved 12 GB/s (57%) 13

Select Resources Throughput (GB/s @ 400 MHz) 24 36 48 60 72 84 96 108 120 132 10 ROM bits 8 Count (thousands) 16:1 mux 4:1 mux registers 6 4 2 0 64 88 112 136 160 184 208 232 256 280 304 328 352 376 Throughput (bytes/clock) 14

Merge Join Throughput 16 m=1 36 m=2 otal Line Throughput 34 m=3 14 Throughput (GB/s) m=8 32 30 12 28 26 24 10 22 % T 20 8 18 0 0.15 0.3 0.45 0.6 0.75 0.9 1.05 1.2 1.35 1.5 Output ratio ¨ Resources required is a quadratic function of desired bandwidth ¤ All in comparison logic, routing was the limiting factor ¨ Above 1.5x output, write bandwidth dominates ¤ Throughput above is input consumed 15

Sort throughput 1600 2 passes Million values per second 3 passes 1400 3 passes (projected) 1200 1000 800 600 400 375K 750K 1.5M 3M 6M 12.5M 25M 50M 100M 200M 400M 800M 1.6B 3.2B 6.4B 12.5B 25B 50B Size of Input ¨ Resources required is a linear function of desired input size ¤ Dominated by the memory required to hold working sets ¨ Recent CPU/GPU numbers ~300M 32-bit values per second 16

Sort Merge Join n Performance limited by intra-FPGA link n Total throughput is 800 million tuples/second n ~6.5 GB/s n 8x previous work on software joins 17

Conclusions n FPGAs can be used to saturate memory bandwidth in ways that processors can not n Make the most of every byte read n In some cases, address bandwidth is just as important as raw data bandwidth n Scaling your design to high bandwidths can greatly influence the architecture n Think streaming n Next step is to interact with the rest of the system 18

Questions?

Hardware Acceleration of Database Operations Jared Casper and Kunle - PowerPoint PPT Presentation

Hardware Acceleration of Database Operations Jared Casper and Kunle Olukotun Pervasive Parallelism Laboratory Stanford University Database machines n Database machines from late 1970s n Put some compute on the disk track/head/unit n

Hardware Observability Framework Hardware Observability Framework Hardware Observability

Hardware Acceleration of Hardware Acceleration of Graphics and Imaging Graphics and Imaging

A GPU-Inspired Soft Processor for High- Throughput Acceleration Throughput Acceleration Jeffrey

Database Utilities 10/17/2007 DC/Win Database Utilities Opening Database Utilities From File on

acceleration Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada NSS acceleration

Acceleration at North Allegheny Mathematics Acceleration (Elementary) Students may qualify for

Particle Driven Acceleration Experiments Edda Gschwendtner CAS, Plasma Wake Acceleration 2014 2

Motion with Constant Acceleration 1 Particle Under Constant Acceleration In the case of motion

VC. VC. Hardware Startup The Hardware Revolu/on The Hardware Revolution Removing Barriers to

Sec Secure ure Hardware Hardware and Hardware and Hardware- En Enabled abled Security

Understanding Hardware Acceleration on Mobile Browsers Ariya Hidayat Magical Advice Use

Hardware Acceleration of Cryptography Patrick Schaumont Professor Bradley Department of ECE

NEBC Database Course 2008 Database Servers Database Interfaces Tim Booth : tbooth@ceh.ac.uk

software and hardware for the Internet of Things. Choose hardware Design hardware Design

National Address Database National Address Database What is a National Address Database?

DATABASE SECURITY CS4750 Database Systems Prof. Nada Basit Email: basit@virginia.edu Fall

COMP 204 Exceptions (continued) and Sets Mathieu Blanchette based on material from Yue Li,

COMP 204 Exceptions Mathieu Blanchette based on material from Yue Li, Carlos Oliver Gonzalez and

Dataflow Computers Motivation: exploit instruction-level parallelism on a massive scale

Mining Data for Patterns Fall 2013 Carola Wenk Analyzing Data Modern technology allows us to

P t tr

t s t

Embedded Interpreters Nick Benton Microsoft Research Cambridge UK Setting: Scripting languages

s tr r

Sambuz

Useful Links

Newsletter

Mail Us