scylladb achieving no compromise performance
play

ScyllaDB: Achieving No-Compromise Performance Avi Kivity, CTO - PowerPoint PPT Presentation

ScyllaDB: Achieving No-Compromise Performance Avi Kivity, CTO @AviKivity (Hiring!) Agenda Background Goals Methods Conclusion Non-Agenda Docker Orchestration Microservices JVM GC Tuning Node.js JSON over HTTP


  1. ScyllaDB: Achieving No-Compromise Performance Avi Kivity, CTO @AviKivity (Hiring!)

  2. Agenda Background Goals Methods Conclusion

  3. Non-Agenda ● Docker ● Orchestration ● Microservices ● JVM GC Tuning ● Node.js ● JSON over HTTP ● Docker ● Docker

  4. More Non-Agenda ● Cache lines, coherency protocols ● NUMA ● Algorithms are the only thing that matters, everything else is implementation detail ● Docker

  5. Background - ScyllaDB ● Clustered NoSQL database compatible with Apache Cassandra ● ~10X performance on same hardware ● Low latency, esp. higher percentiles ● Self tuning ● C++14, fully asynchronous; Seastar!

  6. 3 Cassandra YCSB Benchmark: 3 node Scylla cluster vs 3, 9, 15, 30 30 Cassandra Cassandra machines 3 Scylla 30 Cassandra 3 Scylla 3 Cassandra

  7. Log-Structured Merge Tree SStable 1 SStable 2 Time SStable 3 SStable 4 SStable 1+2+3 SStable 5 Foreground Job Background Job

  8. High-level Goals ● Efficiency: ○ Make the most out of every cycle ● Utilization: ○ Squeeze every cycle from the machine ● Control ○ Spend the cycles on what we want, when we want

  9. Characterizing the problem ● Large numbers of small operations ○ Make coordination cheap ● Lots of communications ○ Within the machine ○ With disk ○ With other machines

  10. Asynchrony, Everywhere

  11. General Architecture ● Thread-per-core design ○ Never block ● Asynchronous networking ● Asynchronous file I/O ● Asynchronous multicore

  12. Scylla has its own task scheduler Traditional stack Scylla’s stack Promise Promise Promise Task Thread is a Promise is a Promise Thread Task Thread Promise Thread Task Thread function pointer pointer to Task Thread Task Thread Promise eventually Thread Promise Thread Promise Stack Task Stack Promise Stack Stack is a byte Task computed value Promise Stack Task Stack Task Stack array from 64k Task Stack Promise Stack Promise to megabytes Task is a Promise Task Promise Task Promise pointer to a Task Task Task lambda function Context switch cost is high. Large stacks pollutes No sharing, millions of Promise Promise Promise Scheduler Scheduler Scheduler Scheduler Scheduler Task Promise Task Promise Task parallel events Task Task the caches CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU

  13. The Concurrency Dilemma

  14. Fundamental performance equation Concurrency = Throughput * Latency

  15. Fundamental performance equation Concurrency Throughput = Latency

  16. Fundamental performance equation Concurrency Latency = Throughput

  17. Lower bounds for concurrency ● Disks want minimum iodepth for full throughput (heads/chips) ● Remote nodes need concurrency to hide network latency and their own min. concurrency ● Compute wants work for each core

  18. Results of Mathematical Analysis ● Want high concurrency (for throughput) ● Want low concurrency (for latency) ● Resources require concurrency for full utilization

  19. Sources of concurrency ● Users ○ Reduce concurrency / add nodes ● Internal processes ○ Generate as much concurrency as possible ○ Schedule

  20. Resource Scheduling 30 User read 12 User write Scheduler 8 Storage 50 Compaction (internal) 50 Streaming (internal)

  21. Why not the Linux I/O scheduler? ● Can only communicate priority by originating thread ● Will reorder/merge like crazy ● Disable

  22. Figuring out optimal disk concurrency Max useful disk concurrency

  23. Cache design Cache files or objects?

  24. Using the kernel page cache ● 4k granularity ● Exists ● Thread-safe ● Hundreds of ● Synchronous APIs hacker-years ● General-purpose ● Handling lots of edge ● Lack of control (1) cases ● Lack of control (2)

  25. Unified cache Cassandra Scylla App thread Key cache Map page Page fault On-heap / Suspend thread Resume thread Off-heap Unified cache Row cache Kernel Initiate I/O I/O completes Your data (300b) Context switch Interrupt Context switch SSD Linux page cache Parasitic rows Page faults Tuning SSTable page (4k) SSTables SSTables

  26. Workload Conditioning

  27. Workload Conditioning • Internal feedback loops to balance competing loads Commitlog Memory WAN Monitor Memtable Seastar Adjust priority SSD Compaction Adjust priority Scheduler Query Compaction Backlog CPU Monitor Repair

  28. Replacing the system memory allocator

  29. System memory allocator problems ● Thread safe ● Allocation back pressure

  30. Seastar memory allocator ● Non-Thread safe! ○ Each core gets a private memory pool ● Allocation back pressure ○ Allocator calls a callback when low on memory ○ Scylla evicts cache in response

  31. One allocator is not enough

  32. Remaining problems with malloc/free ● Memory gets fragmented over time ○ If workload changes sizes of allocated objects ● Allocating a large contiguous block requires evicting most of cache

  33. OOM :( Memory

  34. Log-structured memory allocation ● The cache ○ Large majority of memory allocated ○ Small subset of allocation sites ● Teach allocator how to move allocated objects around ○ Updating references

  35. Log-structured memory allocation Fancy Animation

  36. Future Improvements

  37. Userspace TCP/IP stack ● Thread-per-core design ● Use DPDK to drive hardware ● Present as experimental mode ○ Needs more testing and productization

  38. Query Compilation to Native Code ● Use LLVM to JIT-compile CQL queries ● Embed database schema and internal object layouts into the query

  39. Conclusions ● Full control of the software stack can generate big payoffs ● Careful system design can maximize throughput ● Without sacrificing latency ● Without requiring endless end-user tuning ● While having a lot of fun

  40. How to interact ● Download: http://www.scylladb.com ● Twitter: @ScyllaDB ● Source: http://github.com/scylladb/scylla ● Mailing lists: scylladb-user @ groups.google.com ● Company site & blog: http://www.scylladb.com

  41. THE SCYLLA IS THE LIMIT Thank you.

Recommend


More recommend