gpu accelerated analytics on your data lake data lake
play

GPU-Accelerated Analytics on your Data Lake. Data Lake @blazingdb - PowerPoint PPT Presentation

GPU-Accelerated Analytics on your Data Lake. Data Lake @blazingdb Data Swamp @blazingdb ETL Hell


  1. GPU-Accelerated Analytics on your Data Lake.

  2. Data Lake @blazingdb

  3. Data Swamp @blazingdb

  4. ETL Hell >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>> >>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>> >>>> >>> >>>> >>>>> >>>>> 01010101001001 DATA LAKE 01010101100001 >>>>>>>>>>> >>> >>>> >>>> >>>>>>>>> 01011010100100 0001010100001001011010110 01011010100001 >>>>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>> 01010110100001 >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>> >>>>>>>>>>> 01010101001001 >>>>>>>>>>>>>> >>>>>>>>>>> >>> 01010101100001 >>>>> >>>>>>>>>>>>>> 01011010100100 >>>>>>>>>>>>>>>>>> 01011010100001 >>>> 01010110100001 >>>>>>>>>>>>>>>>>>>>>>> @blazingdb

  5. COMMON DATA LAYER @blazingdb

  6. Simplify Data Storage SCHEMA METADATA DATA @blazingdb

  7. SQL Warehouse on Data Lake @blazingdb

  8. BlazingDB – How it works • Compression/Decompression • Filtering (Predicate Pushdown) • Aggregations • Transformations DATA LAKE • Joins • Sorting/Ordering 0001010100001001011010110 • RAM Cache (Hot) • Disk Cache (Medium) • HDD Local Disk • SSD HDFS AWS S3 @blazingdb

  9. BlazingDB Multi-nodal Cluster @blazingdb

  10. Shared Data Architecture DATA LAKE 0001010100001001011010110 @blazingdb

  11. The Nays No Ingest No Duplication No BlazingDB No Consistency No Vendor Specific ETL Management Lock-in @blazingdb

  12. The Yays Incredibly Scalable, Multi-Terabyte Data Sharing High Fast SQL On Demand Queries (Across Clusters Concurrency Data Warehouse And Other Tools) @blazingdb

  13. DEMO @blazingdb

  14. Demo - Architecture HDFS on Azure Azure GPU Servers NC24 V1 • 4 Servers @blazingdb

  15. Queries: BlazingDB 4 Node Query times (Lower is better) 380.5 281.1 251.8 SECONDS Cold Medium (Disk cache only) Hot 154.1 142.1 135.5 73.6 73.8 72 63.1 46 46.3 14.9 14 12.2 Query 1 Query 2 Query 3 Query 4 Query 5 QUERIES @blazingdb

  16. Query 1 Query1 select l_returnflag, l_linestatus, 1 sum(l_quantity) as sum_qty, 2 sum(l_extendeprice) as sum_disc_price, 3 sum(l_extendeprice*(1-l_discount)) as 4 sum_base_price, sum(l_extendeprice*(1-l_discount)*(1+l_tax)) as 5 sum_charge, avg(l_quatity) as avg_qty, 6 SECONDS avg(l_extendedprice) as avg_price, 7 avg(l_discount) as avg_disc, 8 count(l_quantity) as count_order 9 from lineitem 10 where l_shipdate <= ‘1995 -06- 01’ 11 group by l_returnflag, l_linestatus 12 order by l_returnflag, l_linestatus; 13 Data Points Query 1 • 6 billion row table Cold Medium Hot • Many aggregations/transformations (Disk cache only) @blazingdb

  17. Query 2 Query2 select lineitem.l_orderkey, 1 sum(lineitem.l_extendedprice*(1- 2 lineitem.l_discount)) as revenue, 3 orders.o_orderdate, orders.o_shippriority 4 from customer inner join orders on customer.c_custkey = 5 orders.o_custkey inner join lineitem on lineitem.l_orderkey = orders.o_orderkey 6 SECONDS where 7 customer.c_mktsegment = 'BUILDING' 8 and orders.o_orderdate < '1995-03-15' 9 and lineitem.l_shipdate > '1995-03-15' 10 group by lineitem.l_orderkey, 11 orders.o_orderdate, orders.o_shippriority 12 order by revenue desc,orders.o_orderdate; 13 Data Points Query 2 • Join 6B rows to 1.5B rows to 150M rows Cold Medium Hot • Many aggregations/transformations (Disk cache only) • Order (sorting) @blazingdb

  18. Query 3 Query3 select nation.name, sum(lineitem.l_extendedprice * 1 (1 - lineitem.l_discount)) as revenue 2 from customer 3 inner join orders on customer.cust_key = 4 orders.o_custkey inner join lineitem on lineitem.l_orderkey = orders.o_orderkey 5 inner join supplier on lineitem.l_suppkey = supplier.s_suppkey inner join nation on 6 SECONDS supplier.s_nationkey = nation.nation_key 7 inner join region on nation.region_key = 8 region.r_regionkey 9 where supplier.s_nationkey = nation.nation_key 10 and region.r_name = 'ASIA' 11 and orders.o_orderdate >= '19940101' 12 and orders.o_orderdate < '19950101' 13 group by nation.name order by revenue desc 14 Data Points Query 3 • Join 6B rows to 1.5B rows to 150M rows (and many Cold Medium Hot small joins) (Disk cache only) • Multiple aggregations/transformations • Order (sorting) @blazingdb

  19. Query 4 Query4 select sum(l_extendedprice) as sum_exprice, 1 sum(l_discount) as sum_discount 2 from lineitem 3 where l_shipdate >= '19940101' 4 and l_shipdate < '19950101' and l_discount >= 0.05 and l_discount <= 0.07 5 and l_quantity < 24 6 SECONDS 7 8 9 10 11 12 13 14 Data Points Query 4 • 6B row table Cold Medium Hot • Multiple aggregations/transformations (Disk cache only) @blazingdb

  20. Query 5 Query1 select supplier.s_acctbal, supplier.s_suppkey, nation.name, part.p_partkey, part.p_mfgr, supplier.s_address, supplier.s_phone, supplier.s_comment from supplier inner join partsupp on supplier.s_suppkey = partsupp.ps_suppkey inner join nation on supplier.s_nationkey = nation.nation_key inner join region on nation.region_key = region.r_regionkey inner join part on part.p_partkey = partsupp.ps_partkey where part.p_size = 15 and part.p_type in ('ECONOMY ANODIZED BRASS', 'ECONOMY BRUSHED BRASS', SECONDS 'ECONOMY BURNISHED BRASS', 'ECONOMY PLATED BRASS', 'ECONOMY POLISHED BRASS', 'LARGE ANODIZED BRASS', LARGE BRUSHED BRASS','LARGE BURNISHED BRASS','LARGE PLATED BRASS', 'LARGE POLISHED BRASS', 'SMALL ANODIZED BRASS', 'SMALL BRUSHED BRASS', 'SMALL BURNISHED BRASS', SMALL PLATED BRASS', 'SMALL POLISHED BRASS', 'STANDARD ANODIZED BRASS', 'STANDARD BRUSHED BRASS', 'STANDARD BURNISHED BRASS', 'STANDARD PLATED BRASS', 'STANDARD POLISHED BRASS') and region.r_name = 'EUROPE' order by supplier.s_acctbal desc, supplier.s_suppkey, nation.name, part.p_partkey Data Points Query 5 • Join multiple tables Cold Medium Hot • Many aggregations/transformations (Disk cache only) • String comparisons @blazingdb

  21. Data Pipeline Coming Soon Common Data Layer STORAGE GPU Data Frame (Data Lake) Apache Arrow INGEST @blazingdb

  22. Questions? @blazingdb

Recommend


More recommend