mapd
play

MapD #mapd @datarefined www.map-d.com 180 Sansome St. Todd Mostak - PowerPoint PPT Presentation

MapD #mapd @datarefined www.map-d.com 180 Sansome St. Todd Mostak todd@map-d.com @datarefined San Francisco, CA 94104 super-fast database MapD? built into GPU memory worlds fastest Do? real-time big data analytics interactive


  1. MapD #mapd @datarefined www.map-d.com 180 Sansome St. Todd Mostak todd@map-d.com Ι Ι @datarefined San Francisco, CA 94104

  2. super-fast database MapD? built into GPU memory world’s fastest Do? real-time big data analytics interactive visualization twitter analytics platform Demo? 1billion+ tweets millisecond response time

  3. The importance of interactivity People have struggled for a long time to build interactive visualizations of big data that can deliver insight Interactivity means: • Hypothesis testing can occur at “speed of thought” How Interactive is interactive enough? • According to a study by Jeffrey Heer and Zhicheng Liu, “an injected delay of half a second per operation adversely affects user performance in exploratory data analysis.” • Some types of latency are more detrimental than others: • For example, linking and brushing more sensitive than zooming

  4. The Arrival of In-Memory Systems • Traditional RDBMS used to be too slow to serve as a back-end for interactive visualizations. • Queries of over a billion records could take minutes if not hours • But in-memory systems can execute such queries in a fraction of the time. • Both full DBMS and “pseudo” -DBMS solutions • But still often too slow

  5. Enter Map-D

  6. the technology

  7. Core Innovation SQL-enabled column store database built into the memory architecture on GPUs and CPUs Code developed from scratch to take advantage of: • Memory and computational bandwidth of multiple GPUs • Heterogeneous architectures (CPUs and GPUs) • Fast RDMA between GPUs on different nodes • GPU Graphics pipeline Two-level buffer pool across GPU and CPU memory Shared scans – multiple queries of the same data can share memory bandwidth System can scan data at > 2TB/sec per node, with > 10TB/sec per node logical throughput with shared scans

  8. The Hardware Switch IB IB IB IB GPU 0 GPU 1 GPU 2 GPU 3 GPU 0 GPU 1 GPU 2 GPU 3 PCI PCI PCI PCI QPI QPI CPU 0 CPU 1 CPU 0 CPU 1 RAID Controller RAID Controller S1 S2 S3 S4 S1 S2 S3 S4 Node 0 Node 1

  9. The Two-Level Buffer Pool GPU Memory CPU Memory SSD

  10. Shared Nothing Processing Multiple GPUs, with data partitioned between them Filter Filter Filter text ILIKE ‘rain’ text ILIKE ‘rain’ text ILIKE ‘rain’ Node 1 Node 2 Node 3

  11. the product

  12. Product GPU powered end-to-end big data analytics and visualization platform License Image processing Simple Machine learning OpenGL # of GPUs Graph analytics H.264/VP8 streaming Mobile/server versions GPU pipeline Visualization Complex Analytics Scale to cluster of GPU nodes SQL compiler Shared scans User defined functions Hybrid GPU/CPU execution GPU in-memory SQL OpenCL and CUDA database

  13. MapD hardware architecture Big Data Large Data Single GPU Map-D code 12GB memory runs on GPU + Map-D code CPU memory integrated into Map-D code 36U rack: GPU memory 8 cards = 4U box ~400GB GPU ~12TB CPU Single CPU 768GB memory Next Gen Flash Map-D code integrated into 40TB CPU memory 4 sockets = 4U box 100GB/s Small Data Mobile NVIDIA TEGRA Map-D running Mobile chip small datasets 4GB memory Native App Map-D code Web-based integrated into service chip memory

  14. MapD www.map-d.com @datarefined info@map-d.com

Recommend


More recommend