Leveraging Customer Behavioral Data to Drive Revenue the GPU way @arnon86 S7456 1
Hi! Arnon Shimoni Senior Solutions Architect I like hardware & parallel / concurrent stuff In my 4 th year at SQream Technologies Send gifs to @arnon86 or arnon@sqream.com @arnon86 S7456 2
tl;dr • GPUs are good number crunchers – makes them good for data processing • SQream DB with GPUs is fast • Rethink current solutions, the GPU can help • Simple hardware is good enough, let’s avoid throwing lots of hardware at issues. Don’t need to shovel money at the problem! @arnon86 S7456 3
SQream DB – an SQL database powered by GPUs Powered by GPUs • Massively parallel engine • Relies on GPUs for power, not RAM Fast • Columnar storage • Always on compression • 2 TB / hour / GPU ingest speed Scalable • 10 TB to 1 PB with ease SQL Database • Familiar ANSI SQL • Standard connectors (ODBC, JDBC) Extensible for AI </> • Python, Jupyter, etc • Data science @arnon86 S7456 4
This story starts at MWC last year That ’ s my ear! @arnon86 S7456 5
SQream knows telecoms We’ve helped operators with Better analysis of network events • Speeding up CDR preparations • More history with security management (SIEM) • And now – customer behaviour •
There is a lot of data about customers in telecoms • Where and when they wake up and where they spend their days (daily grinders) • When/where were they were Instagramming (When and where data was used) • How frustrated they got (what the network experience was in each location) • What modes of transport they use • How close they are to competitor locations But Bu t are e th they y act ctually using sing th this is da data ta? ? Ar Are e th they y get etting anyth ything act ctionabl ble? Ar Are e th they y loo looking at t th the e en entir tire cu customer r ba base, se, and d not ot just just a sing single cu cust stomer? r? @arnon86 S7456 7
“ You know, Telefonica has this multi-million dollar product based on Hadoop for selling this customer behaviour data to 3 rd party companies. Have you thought about maybe getting the same solution for your company, but much simpler? ” @arnon86 S7456 8
“ Oh, and we ’ ll do it for you with a single machine ” @arnon86 S7456 9
Why their current setup wasn ’ t good enough for this • Data scientists and BI professionals have only short windows of time to run queries, because of overloaded systems • Windows cut even shorter due to long overnight loading • Queries take hours, and iterations become painful Long queries Coffee breaks Bathroom breaks Unhappy managers Unhappy everyone @arnon86 S7456 10
Databases that displease data scientists • When data scientists or BI professionals want to ask questions that no one has asked before, these systems tend to ‘ break ’ and not deliver what ’ s expected • They ’ re just not designed for ad-hoc querying • Le Legacy da data tabases require indexing and a lot of manual tuning • Ne Newe wer da data taba bases like Vertica also require creating projections, which is time-consuming and inflexible • Dist Distrib ibuted ed da data taba bases don ’ t perform well when JOIN operations are necessary • In In-memory da data tabases are very painful on the wallet if you need more than a couple of terabytes @arnon86 S7456 11
Picking the wrong databases will cause pain! Just some of what we saw Cloudera – for the BI team • Teradata – for the marketing team • Oracle Exadata – Transactional - for CDR collection and customer records • Vertica, Netezza – for financial • Lots of Greenplum – to collect from many sources, for marketing and BI • @arnon86 S7456 12
Chanel says racks are fashionable. Our customers think otherwise @arnon86 S7456 13
SQream DB software in a standard 2U server Configured with 96GB RAM and a single Tesla K80 for a $4,000 total investment. Designed to handle ~40 TB of telecom data @arnon86 S7456 14
Sample dashboards generated Dashboard showing 3G/4G data throughput throughout the day (Morning, Lunch, Evening, Night, … ). Larger circles represent more data throughput. Colour becomes darker as the day progresses. Dark-outline circles mean more night-time traffic. Dashboard aggregates directly off SQream DB, with no intermediate steps. Represents 3 table join (3.3B rows ⋈ 40M rows ⋈ 300K rows) @arnon86 S7456 15
Sample dashboards generated Dashboard showing 3G/4G data throughput throughout the day (Morning, Lunch, Evening, Night, … ). Larger circles represent more data throughput. Colour becomes darker as the day progresses. Dark-outline circles mean more night-time traffic. Dashboard aggregates directly off SQream DB, with no intermediate steps. Represents 3 table join (3.3B rows ⋈ 40M rows ⋈ 300K rows) @arnon86 S7456 16
Saving hours on reporting with SQream DB Augmenting legacy MPP with a faster, easier to use GPU-powered analytics database 5 hours 80 node CDR 4G Data Sources ETL Aggregations Process CDR 3G Direct Loading, 2TB/h ingest rate Dozens of Reports Non CDR 20 minutes with SQream DB 15x faster @arnon86 S7456 17
The cost of performance 80 nodes 80 s – 5 5 full racks ks HP DL380g 80g9 with NVIDI IDIA Tesla K80 960 CPU cores, 5.12 TB RAM 96 GB RAM + 6 TB storage ETL time 300 m 20 m 15x faster Reporting time 120 m 10 m 12x faster $ $ TCO w/license $10,000,000 $200,000 50x more cost effective SQream DB v1.9.6
That wasn ’ t an anomaly We ’ ve done it against Netezza, Teradata, Oracle, Vertica, and even Hadoop based systems. 8 full 42 42U U racks, 56 S-Blades Dell C413 130 0 with h 4x NVIDIA Tesla a K80 80 7 TB RAM 512 GB RAM + iSCSI JBOD (20TB) Averag age e quer ery y time 33.70 31.70 (second nds) Processi sing ng Units ts 56 4 (S (S-Blad ade e / GPUs) Compressi ession n ratio 4.0 4.7 $ $ 12,000,000 Cost of Ownershi hip 500,000 Netezza SQream DB v1.9.7
Find out more about SQream ’ s high performance GPU-driven database software www.sqream.com or arnon@sqream.com
Recommend
More recommend