BitWeaving: Fast Scans for Main Memory Data Processing Yinan Li and Jignesh M. Patel University of Wisconsin-Madison
Motivation - Example TPC-H Query 6 SELECT SUM(l_extendedprice * l_discount) FROM lineitem WHERE l_shipdate BETWEEN Date AND Date + 1 year AND l_discount BETWEEN Discount – 0.01 AND Discount + 0.01 AND l_quantity < Quantity 2
Motivation - Example TPC-H Query 6 Main memory analytics DBMSs convert native column values to codes . SELECT SUM(l_extendedprice * l_discount) FROM lineitem WHERE l_shipdate BETWEEN Date AND Date + 1 year AND l_discount BETWEEN Discount – 0.01 AND Discount + 0.01 AND l_quantity < Quantity 2
Motivation - Example TPC-H Query 6 Main memory analytics DBMSs convert native column values to codes . SELECT SUM(l_extendedprice * l_discount) FROM lineitem WHERE l_shipdate BETWEEN Date AND Date + 1 year 12 bits AND l_discount BETWEEN Discount – 0.01 4 bits AND Discount + 0.01 AND l_quantity < Quantity 6 bits 2
Motivation - Example TPC-H Query 6 Main memory analytics DBMSs convert native column values to codes . SELECT SUM(l_extendedprice * l_discount) FROM lineitem WHERE l_shipdate BETWEEN Date AND Date + 1 year 12 bits AND l_discount BETWEEN Discount – 0.01 4 bits AND Discount + 0.01 AND l_quantity < Quantity 6 bits CPU register Code Word size: 64 Code size: 4-12 bits bits 2
Motivation - Example TPC-H Query 6 Main memory analytics DBMSs convert native column values to codes . SELECT SUM(l_extendedprice * l_discount) FROM lineitem WHERE l_shipdate BETWEEN Date AND Date + 1 year 12 bits AND l_discount BETWEEN Discount – 0.01 4 bits AND Discount + 0.01 AND l_quantity < Quantity 6 bits CPU register Code Word size: 64 Code size: 4-12 SIMD word size: 256 bits bits bits 2
Motivation - Example TPC-H Query 6 Main memory analytics DBMSs convert native column values to codes . SELECT SUM(l_extendedprice * l_discount) FROM lineitem WHERE l_shipdate BETWEEN Date AND Date + 1 year 12 bits AND l_discount BETWEEN Discount – 0.01 4 bits AND Discount + 0.01 AND l_quantity < Quantity 6 bits CPU register Underutilizes the processor word! Code Word size: 64 Code size: 4-12 SIMD word size: 256 bits bits bits 2
Motivation - Example TPC-H Query 6 Main memory analytics DBMSs convert native column values to codes . SELECT SUM(l_extendedprice * l_discount) FROM lineitem WHERE l_shipdate BETWEEN Date AND Date + 1 year 12 bits AND l_discount BETWEEN Discount – 0.01 4 bits AND Discount + 0.01 AND l_quantity < Quantity 6 bits CPU register Code Code Code Code Code Code Word size: 64 Code size: 4-12 SIMD word size: 256 bits bits bits 2
Motivation - Example TPC-H Query 6 Main memory analytics DBMSs convert native column values to codes . SELECT SUM(l_extendedprice * l_discount) FROM lineitem WHERE l_shipdate BETWEEN Date AND Date + 1 year 12 bits AND l_discount BETWEEN Discount – 0.01 4 bits AND Discount + 0.01 AND l_quantity < Quantity 6 bits CPU register Code Code Code Code Code Code Word size: 64 Code size: 4-12 SIMD word size: 256 bits bits bits Intra-cycle parallelism! 2
BitWeaving 3
BitWeaving • In this talk, we introduce BitWeaving – A fast scan method – for column-oriented databases 3
BitWeaving • In this talk, we introduce BitWeaving – A fast scan method – for column-oriented databases • Fully exploits intra-cycle parallelism 3
BitWeaving • In this talk, we introduce BitWeaving – A fast scan method – for column-oriented databases • Fully exploits intra-cycle parallelism • How: By “gainfully” using every bit in every processor word. 3
BitWeaving: Two Flavors Code BitWeaving/H (Horizontal bit organization) 0 1 1 0 0 1 0 1 0 1 0 0 Word BitWeaving/V (Vertical bit organization) 0 0 0 1 1 0 0 1 1 0 1 1 Code 1 0 1 0 1 0 0 0 1 1 0 0 1 1 0 0 0 1 0 0 1 0 0 0 Word 4
Framework 5
Framework • Targets single-table scans 5
Framework • Targets single-table scans • Column-scalar scan: scan on a single column – produce a result bit vector , with one bit for each input tuple to indicate the matching tuples 5
Framework • Targets single-table scans • Column-scalar scan: scan on a single column – produce a result bit vector , with one bit for each input tuple to indicate the matching tuples • Complex predicates in the scan: logical AND and OR operations on these result bit vectors 5
Framework – Example SELECT SUM(l_discount * l_price) FROM lineitem WHERE l_shipdate BETWEEN Date AND Date + 1 year AND l_discount BETWEEN Discount – 0.01 AND Discount + 0.01 AND l_quantity < Quantity AND AND l_quantity l_shipdate l_discount 6
Framework – Example SELECT SUM(l_discount * l_price) FROM lineitem WHERE l_shipdate BETWEEN Date AND Date + 1 year AND l_discount BETWEEN Discount – 0.01 AND Discount + 0.01 AND l_quantity < Quantity AND Result bit vector AND l_quantity l_shipdate l_discount 6
Framework – Example SELECT SUM(l_discount * l_price) FROM lineitem WHERE l_shipdate BETWEEN Date AND Date + 1 year AND l_discount BETWEEN Discount – 0.01 AND Discount + 0.01 AND l_quantity < Quantity AND Result bit vector AND Result bit vector l_quantity l_shipdate l_discount 6
Framework – Example SELECT SUM(l_discount * l_price) FROM lineitem WHERE l_shipdate BETWEEN Date AND Date + 1 year AND l_discount BETWEEN Discount – 0.01 AND Discount + 0.01 AND l_quantity < Quantity Result bit vector AND Result bit vector AND Result bit vector l_quantity l_shipdate l_discount 6
Framework – Example SELECT SUM(l_discount * l_price) FROM lineitem WHERE l_shipdate BETWEEN Date AND Date + 1 year AND l_discount BETWEEN Discount – 0.01 AND Discount + 0.01 AND l_quantity < Quantity Result bit vector Result bit vector Result bit vector AND Result bit vector AND Result bit vector l_quantity l_shipdate l_discount 6
Framework – Example SELECT SUM(l_discount * l_price) FROM lineitem WHERE l_shipdate BETWEEN Date AND Date + 1 year AND l_discount BETWEEN Discount – 0.01 AND Discount + 0.01 AND l_quantity < Quantity Convert to a RID List Result bit vector RID List: 9, 15 Result bit vector Result bit vector AND Result bit vector AND Result bit vector l_quantity l_shipdate l_discount 6
Framework – Example SELECT SUM(l_discount * l_price) FROM lineitem WHERE l_shipdate BETWEEN Date AND Date + 1 year AND l_discount BETWEEN Discount – 0.01 AND Discount + 0.01 AND l_quantity < Quantity Fetch codes from projection Convert to a RID List columns Result bit vector RID l_price Aggregation List: 9, 15 l_discount Result bit vector Result bit vector AND Result bit vector AND Result bit vector l_quantity l_shipdate l_discount 6
Framework – Example SELECT SUM(l_discount * l_price) FROM lineitem WHERE l_shipdate BETWEEN Date AND Date + 1 year AND l_discount BETWEEN Discount – 0.01 AND Discount + 0.01 AND l_quantity < Quantity BitWeaving/V columns Result bit vector RID l_price Aggregation List: 9, 15 l_discount Result bit vector Result bit vector AND Result bit vector AND Result bit vector l_quantity l_shipdate l_discount 7
Framework – Example SELECT SUM(l_discount * l_price) FROM lineitem WHERE l_shipdate BETWEEN Date AND Date + 1 year AND l_discount BETWEEN Discount – 0.01 AND Discount + 0.01 AND l_quantity < Quantity BitWeaving/V columns BitWeaving/V Result bit vector RID l_price Aggregation List: 9, 15 l_discount Result bit vector Result bit vector AND BitWeaving/V Result bit vector AND Result bit vector l_quantity BitWeaving/V l_shipdate l_discount BitWeaving/V BitWeaving/V 7
Framework – Example SELECT SUM(l_discount * l_price) FROM lineitem WHERE l_shipdate BETWEEN Date AND Date + 1 year AND l_discount BETWEEN Discount – 0.01 AND Discount + 0.01 AND l_quantity < Quantity Mixing of BitWeaving/V BitWeaving/H columns Result bit vector RID l_price Aggregation List: 9, 15 l_discount Result bit vector Result bit vector AND Result bit vector AND Result bit vector l_quantity l_shipdate l_discount 8
Framework – Example SELECT SUM(l_discount * l_price) FROM lineitem WHERE l_shipdate BETWEEN Date AND Date + 1 year AND l_discount BETWEEN Discount – 0.01 AND Discount + 0.01 AND l_quantity < Quantity Mixing of BitWeaving/V BitWeaving/H BitWeaving/ columns H Result bit vector RID l_price Aggregation List: 9, 15 l_discount Result bit vector Result bit vector AND BitWeaving/ H Result bit vector AND Result bit vector l_quantity BitWeaving/V l_shipdate l_discount BitWeaving/ BitWeaving/V 8 H
Framework – Example SELECT SUM(l_discount * l_price) FROM lineitem WHERE l_shipdate BETWEEN Date AND Date + 1 year AND l_discount BETWEEN Discount – 0.01 AND Discount + 0.01 AND l_quantity < Quantity Mixing of BitWeaving/V BitWeaving/H BitWeaving/ columns H Result bit vector RID l_price Aggregation List: 9, 15 l_discount Result bit vector Result bit vector AND BitWeaving/ H Result bit vector AND Result bit vector l_quantity BitWeaving/V Same result l_shipdate l_discount format BitWeaving/ BitWeaving/V 8 H
Outline • Motivation & Overview • BitWeaving/V • BitWeaving/H • Evaluations • Conclusion 9
Recommend
More recommend