GPU-Accelerated Analytics on your Data Lake.
Data Lake @blazingdb
Data Swamp @blazingdb
ETL Hell >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>> >>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>> >>>> >>> >>>> >>>>> >>>>> 01010101001001 DATA LAKE 01010101100001 >>>>>>>>>>> >>> >>>> >>>> >>>>>>>>> 01011010100100 0001010100001001011010110 01011010100001 >>>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>> 01010110100001 >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>> >>>>>>>>>>> 01010101001001 >>>>>>>>>>>>>> >>>>>>>>>>> >>> 01010101100001 >>>>> >>>>>>>>>>>>>> 01011010100100 >>>>>>>>>>>>>>>>>> 01011010100001 >>>> 01010110100001 >>>>>>>>>>>>>>>>>>>>>>> @blazingdb
COMMON DATA LAYER @blazingdb
Simplify Data Storage SCHEMA METADATA DATA @blazingdb
SQL Warehouse on Data Lake @blazingdb
BlazingDB – How it works • Compression/Decompression • Filtering (Predicate Pushdown) • Aggregations • Transformations DATA LAKE • Joins • Sorting/Ordering 0001010100001001011010110 • RAM Cache (Hot) • Disk Cache (Medium) • HDD Local Disk • SSD HDFS AWS S3 @blazingdb
BlazingDB Multi-nodal Cluster @blazingdb
Shared Data Architecture DATA LAKE 0001010100001001011010110 @blazingdb
The Nays No Ingest No Duplication No BlazingDB No Consistency No Vendor Specific ETL Management Lock-in @blazingdb
The Yays Incredibly Scalable, Multi-Terabyte Data Sharing High Fast SQL On Demand Queries (Across Clusters Concurrency Data Warehouse And Other Tools) @blazingdb
DEMO @blazingdb
Demo - Architecture HDFS on Azure Azure GPU Servers NC24 V1 • 4 Servers @blazingdb
Queries: BlazingDB 4 Node Query times (Lower is better) 380.5 281.1 251.8 SECONDS Cold Medium (Disk cache only) Hot 154.1 142.1 135.5 73.6 73.8 72 63.1 46 46.3 14.9 14 12.2 Query 1 Query 2 Query 3 Query 4 Query 5 QUERIES @blazingdb
Query 1 Query1 select l_returnflag, l_linestatus, 1 sum(l_quantity) as sum_qty, 2 sum(l_extendeprice) as sum_disc_price, 3 sum(l_extendeprice*(1-l_discount)) as 4 sum_base_price, sum(l_extendeprice*(1-l_discount)*(1+l_tax)) as 5 sum_charge, avg(l_quatity) as avg_qty, 6 SECONDS avg(l_extendedprice) as avg_price, 7 avg(l_discount) as avg_disc, 8 count(l_quantity) as count_order 9 from lineitem 10 where l_shipdate <= ‘1995 -06- 01’ 11 group by l_returnflag, l_linestatus 12 order by l_returnflag, l_linestatus; 13 Data Points Query 1 • 6 billion row table Cold Medium Hot • Many aggregations/transformations (Disk cache only) @blazingdb
Query 2 Query2 select lineitem.l_orderkey, 1 sum(lineitem.l_extendedprice*(1- 2 lineitem.l_discount)) as revenue, 3 orders.o_orderdate, orders.o_shippriority 4 from customer inner join orders on customer.c_custkey = 5 orders.o_custkey inner join lineitem on lineitem.l_orderkey = orders.o_orderkey 6 SECONDS where 7 customer.c_mktsegment = 'BUILDING' 8 and orders.o_orderdate < '1995-03-15' 9 and lineitem.l_shipdate > '1995-03-15' 10 group by lineitem.l_orderkey, 11 orders.o_orderdate, orders.o_shippriority 12 order by revenue desc,orders.o_orderdate; 13 Data Points Query 2 • Join 6B rows to 1.5B rows to 150M rows Cold Medium Hot • Many aggregations/transformations (Disk cache only) • Order (sorting) @blazingdb
Query 3 Query3 select nation.name, sum(lineitem.l_extendedprice * 1 (1 - lineitem.l_discount)) as revenue 2 from customer 3 inner join orders on customer.cust_key = 4 orders.o_custkey inner join lineitem on lineitem.l_orderkey = orders.o_orderkey 5 inner join supplier on lineitem.l_suppkey = supplier.s_suppkey inner join nation on 6 SECONDS supplier.s_nationkey = nation.nation_key 7 inner join region on nation.region_key = 8 region.r_regionkey 9 where supplier.s_nationkey = nation.nation_key 10 and region.r_name = 'ASIA' 11 and orders.o_orderdate >= '19940101' 12 and orders.o_orderdate < '19950101' 13 group by nation.name order by revenue desc 14 Data Points Query 3 • Join 6B rows to 1.5B rows to 150M rows (and many Cold Medium Hot small joins) (Disk cache only) • Multiple aggregations/transformations • Order (sorting) @blazingdb
Query 4 Query4 select sum(l_extendedprice) as sum_exprice, 1 sum(l_discount) as sum_discount 2 from lineitem 3 where l_shipdate >= '19940101' 4 and l_shipdate < '19950101' and l_discount >= 0.05 and l_discount <= 0.07 5 and l_quantity < 24 6 SECONDS 7 8 9 10 11 12 13 14 Data Points Query 4 • 6B row table Cold Medium Hot • Multiple aggregations/transformations (Disk cache only) @blazingdb
Query 5 Query1 select supplier.s_acctbal, supplier.s_suppkey, nation.name, part.p_partkey, part.p_mfgr, supplier.s_address, supplier.s_phone, supplier.s_comment from supplier inner join partsupp on supplier.s_suppkey = partsupp.ps_suppkey inner join nation on supplier.s_nationkey = nation.nation_key inner join region on nation.region_key = region.r_regionkey inner join part on part.p_partkey = partsupp.ps_partkey where part.p_size = 15 and part.p_type in ('ECONOMY ANODIZED BRASS', 'ECONOMY BRUSHED BRASS', SECONDS 'ECONOMY BURNISHED BRASS', 'ECONOMY PLATED BRASS', 'ECONOMY POLISHED BRASS', 'LARGE ANODIZED BRASS', LARGE BRUSHED BRASS','LARGE BURNISHED BRASS','LARGE PLATED BRASS', 'LARGE POLISHED BRASS', 'SMALL ANODIZED BRASS', 'SMALL BRUSHED BRASS', 'SMALL BURNISHED BRASS', SMALL PLATED BRASS', 'SMALL POLISHED BRASS', 'STANDARD ANODIZED BRASS', 'STANDARD BRUSHED BRASS', 'STANDARD BURNISHED BRASS', 'STANDARD PLATED BRASS', 'STANDARD POLISHED BRASS') and region.r_name = 'EUROPE' order by supplier.s_acctbal desc, supplier.s_suppkey, nation.name, part.p_partkey Data Points Query 5 • Join multiple tables Cold Medium Hot • Many aggregations/transformations (Disk cache only) • String comparisons @blazingdb
Data Pipeline Coming Soon Common Data Layer STORAGE GPU Data Frame (Data Lake) Apache Arrow INGEST @blazingdb
Questions? @blazingdb
Recommend
More recommend