caribou intelligent distributed storage
play

Caribou: Intelligent Distributed Storage Zsolt Istvn, David Sidler, - PowerPoint PPT Presentation

Caribou: Intelligent Distributed Storage Zsolt Istvn, David Sidler, Gustavo Alonso Systems Group, Department of Computer Science, ETH Zurich 1 Rack-scale thinking ToR Switch Compute In the Cloud Compute Compute + Provisioning Compute +


  1. Caribou: Intelligent Distributed Storage Zsolt István, David Sidler, Gustavo Alonso Systems Group, Department of Computer Science, ETH Zurich 1

  2. Rack-scale thinking ToR Switch Compute In the Cloud Compute Compute + Provisioning Compute + Independent Scalability - Data movement bottleneck In an Appliance Storage Storage Storage Storage 2

  3. Storage Design Options Compute > Bandwidth Compute < Bandwidth Oracle Exadata Samsung YourSQL IBM PureData Winsconsin SmartSSD Compute ~ Bandwidth Deuteronomy Kinetic Drives … BlueCache … + Full-fledged Features similar to - Outside management - SW+HW overhead software + No-overhead access - Large footprint Balanced design + Small footprint 3

  4. What is Caribou?  Intelligent Distributed Storage with FPGAs 10Gbps Switch  Easy integration on commodity network Clients  Random access to tuples & in-storage scans Clients Clients  Selection predicate pushdown Clients Clients  Data replicated consistently to nodes  Extensible (open-source) design Caribou Caribou Node Node Caribou Caribou Node Node fpgasystems 4

  5. FPGA 101 Field Programmable Gate Array  Reprogrammable hardware  Large number of configurable logic blocks  Tight integration, massive parallelism  Network/App Co-design FPGA  Innovation… 5

  6. Inside a Caribou node The pipeline runs at the 10Gbps Switch same speed at the network (line-rate) Clients Software clients, Key-value interface (Single-key lookup or Scanning) Clients Clients Network TCP/IP Clients Clients Key-value Replication Processing management Caribou Caribou Caribou Node Node Caribou Caribou DRAM Node Node 1000s of connections, SW clients Cuckoo hash Conditionals, Primary/backup table, slab memory Regex, Atomic allocation, Decompression Broadcast bitmap indexes 6

  7. Throughput of random access to storage 7

  8. Random access response times • Response times comparable to SW on Infiniband, but Caribou uses commodity networking Get Put/Update Put/Update (Replicated) 60 Response time [us] 50 40 30 20 10 0 0 64 128 192 256 Value size [B] 8

  9. Operator push-down The filtering circuits SELECT … FROM customer are parameterized at WHERE age<35 AND purchases>2 runtime, with no overhead. AND address LIKE “% Luzern%CH %”  Multiple comparisons to constants (conjunction)  Substrings or regular expression matching [1]  Can filter compressed data (LZ77)  Extensible pipeline design [1] Accelerating Pattern Matching Queries in Hybrid CPU-FPGA Architectures . D. Sidler, Zs. 9 Istvan, M. Ewaida, G. Alonso. 2017 ACM SIGMOD/PODS Conference (SIGMOD'17)

  10. Exploiting Parallelism Complexity Value Throughput Throughput Value’ Regex LZ77 Core Regex LZ77 Core 1 Value’ Value DRAM Value’ Value’ Regex … LZ77 Core … … 1 0 1 Keep? Regex Core LZ77 Comparison Regular Transform Predicate Expressions 10

  11. Scan and filter  Choice of filter and value size do not impact scan rate. Bound by the Bound by the network/client Filter performance Scan rate in GB/s is same regardless value size 11

  12. Near Data Processing without Surprises  Filtering can be combined with random access reads as well 12

  13. “The Times They Are A -Changin ”  In-Storage Processing  Stand-alone boards, MPSoC (ARM+FPGA)  Add NVMe flash, N.V. Memory  Explore different KVS (memcached, redis , …)  In-Network Processing  Microsoft Catapult NICs  Work on streaming data  Distributed service in the cloud  Accelerator  Intel Xeon+FPGA  Offload computation without partitioning or 13 copying data

  14. Time to Explore…  Data movement bottleneck on many levels  Caribou – Intelligent Distributed Storage  Software-like service in a small footprint  Balanced design with “right amount” of compute  Caribou – Platform to Explore Near-data Processing  Open source, modular and portable  Data processing operators applicable on other HW platforms  https://github.com/fpgasystems/caribou https://www.systems.ethz.ch/fpga/ zsolt.istvan@inf.ethz.ch 14

Recommend


More recommend