ScyllaDB: Achieving No-Compromise Performance Avi Kivity, CTO - PowerPoint PPT Presentation

ScyllaDB: Achieving No-Compromise Performance Avi Kivity, CTO @AviKivity (Hiring!)

Agenda Background Goals Methods Conclusion

Non-Agenda ● Docker ● Orchestration ● Microservices ● JVM GC Tuning ● Node.js ● JSON over HTTP ● Docker ● Docker

More Non-Agenda ● Cache lines, coherency protocols ● NUMA ● Algorithms are the only thing that matters, everything else is implementation detail ● Docker

Background - ScyllaDB ● Clustered NoSQL database compatible with Apache Cassandra ● ~10X performance on same hardware ● Low latency, esp. higher percentiles ● Self tuning ● C++14, fully asynchronous; Seastar!

3 Cassandra YCSB Benchmark: 3 node Scylla cluster vs 3, 9, 15, 30 30 Cassandra Cassandra machines 3 Scylla 30 Cassandra 3 Scylla 3 Cassandra

Log-Structured Merge Tree SStable 1 SStable 2 Time SStable 3 SStable 4 SStable 1+2+3 SStable 5 Foreground Job Background Job

High-level Goals ● Efficiency: ○ Make the most out of every cycle ● Utilization: ○ Squeeze every cycle from the machine ● Control ○ Spend the cycles on what we want, when we want

Characterizing the problem ● Large numbers of small operations ○ Make coordination cheap ● Lots of communications ○ Within the machine ○ With disk ○ With other machines

Asynchrony, Everywhere

General Architecture ● Thread-per-core design ○ Never block ● Asynchronous networking ● Asynchronous file I/O ● Asynchronous multicore

Scylla has its own task scheduler Traditional stack Scylla’s stack Promise Promise Promise Task Thread is a Promise is a Promise Thread Task Thread Promise Thread Task Thread function pointer pointer to Task Thread Task Thread Promise eventually Thread Promise Thread Promise Stack Task Stack Promise Stack Stack is a byte Task computed value Promise Stack Task Stack Task Stack array from 64k Task Stack Promise Stack Promise to megabytes Task is a Promise Task Promise Task Promise pointer to a Task Task Task lambda function Context switch cost is high. Large stacks pollutes No sharing, millions of Promise Promise Promise Scheduler Scheduler Scheduler Scheduler Scheduler Task Promise Task Promise Task parallel events Task Task the caches CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU

The Concurrency Dilemma

Fundamental performance equation Concurrency = Throughput * Latency

Fundamental performance equation Concurrency Throughput = Latency

Fundamental performance equation Concurrency Latency = Throughput

Lower bounds for concurrency ● Disks want minimum iodepth for full throughput (heads/chips) ● Remote nodes need concurrency to hide network latency and their own min. concurrency ● Compute wants work for each core

Results of Mathematical Analysis ● Want high concurrency (for throughput) ● Want low concurrency (for latency) ● Resources require concurrency for full utilization

Sources of concurrency ● Users ○ Reduce concurrency / add nodes ● Internal processes ○ Generate as much concurrency as possible ○ Schedule

Resource Scheduling 30 User read 12 User write Scheduler 8 Storage 50 Compaction (internal) 50 Streaming (internal)

Why not the Linux I/O scheduler? ● Can only communicate priority by originating thread ● Will reorder/merge like crazy ● Disable

Figuring out optimal disk concurrency Max useful disk concurrency

Cache design Cache files or objects?

Using the kernel page cache ● 4k granularity ● Exists ● Thread-safe ● Hundreds of ● Synchronous APIs hacker-years ● General-purpose ● Handling lots of edge ● Lack of control (1) cases ● Lack of control (2)

Unified cache Cassandra Scylla App thread Key cache Map page Page fault On-heap / Suspend thread Resume thread Off-heap Unified cache Row cache Kernel Initiate I/O I/O completes Your data (300b) Context switch Interrupt Context switch SSD Linux page cache Parasitic rows Page faults Tuning SSTable page (4k) SSTables SSTables

Workload Conditioning

Workload Conditioning • Internal feedback loops to balance competing loads Commitlog Memory WAN Monitor Memtable Seastar Adjust priority SSD Compaction Adjust priority Scheduler Query Compaction Backlog CPU Monitor Repair

Replacing the system memory allocator

System memory allocator problems ● Thread safe ● Allocation back pressure

Seastar memory allocator ● Non-Thread safe! ○ Each core gets a private memory pool ● Allocation back pressure ○ Allocator calls a callback when low on memory ○ Scylla evicts cache in response

One allocator is not enough

Remaining problems with malloc/free ● Memory gets fragmented over time ○ If workload changes sizes of allocated objects ● Allocating a large contiguous block requires evicting most of cache

OOM :( Memory

Log-structured memory allocation ● The cache ○ Large majority of memory allocated ○ Small subset of allocation sites ● Teach allocator how to move allocated objects around ○ Updating references

Log-structured memory allocation Fancy Animation

Future Improvements

Userspace TCP/IP stack ● Thread-per-core design ● Use DPDK to drive hardware ● Present as experimental mode ○ Needs more testing and productization

Query Compilation to Native Code ● Use LLVM to JIT-compile CQL queries ● Embed database schema and internal object layouts into the query

Conclusions ● Full control of the software stack can generate big payoffs ● Careful system design can maximize throughput ● Without sacrificing latency ● Without requiring endless end-user tuning ● While having a lot of fun

How to interact ● Download: http://www.scylladb.com ● Twitter: @ScyllaDB ● Source: http://github.com/scylladb/scylla ● Mailing lists: scylladb-user @ groups.google.com ● Company site & blog: http://www.scylladb.com

THE SCYLLA IS THE LIMIT Thank you.

ScyllaDB: Achieving No-Compromise Performance Avi Kivity, CTO - PowerPoint PPT Presentation

ScyllaDB: Achieving No-Compromise Performance Avi Kivity, CTO @AviKivity (Hiring!) Agenda Background Goals Methods Conclusion Non-Agenda Docker Orchestration Microservices JVM GC Tuning Node.js JSON over HTTP

Missouri Compromise, 1820 Firebell in the Night The Missouri Compromise, 1820 Divides the

Tripwire Inferring Internet Site Compromise Joe DeBlasio Stefan Savage UC San Diego Geoffrey

Financial Impacts of Achieving Aggressive Financial Impacts of Achieving Aggressive Financial

The National Adoption Service Suzanne Griffiths, Director of Achieving More Together Achieving

Canada By Anne Tomalin June, 2016 WHO Held Meeting For Compromise Proposal To Name

Anatomy of a Fraud Business E-mail Compromise & Computer Intrusion What is Business E-Mail

Cougar Helicopters Presentation Offshore Helicopter Safety Inquiry 1 No Compromise No

SSH Compromise Detection using NetFlow/IPFIX Rick Hofstede, Luuk Hendriks 51 percent of

Compromise Solutions Kai-Simon Goetzmann, TU Berlin (Joint work with Christina B using, Jannik

IRS & FTB Offers in Compromise December 12, 2012 Thank you for attending this seminar. Your

The Simpson-Bowles Social Security Framework: A Bipartisan Compromise Worthy of Support Charles

Achieving High Achieving High- -Impact HIV Prevention: Impact HIV Prevention: CDC Funding

Achieving Operational Efficiency of Research Institutes using ICT 16 th April 2018 Achieving

achieving better outcomes achieving better outcomes and some thoughts on wellbeing and

Achieving the Dream For Prospective Colleges Fall 2010 Achieving the Dream is a bold national

U d Understanding & di & Achieving Achieving T T Chris Lawrence Virginia

Coastal, Columbia, & Snake Conservation Plan for Lampreys in Oregon (CPL) Oregon Fish and

ALASKA SEAFOOD MARKETING INSTITUTE FY18 TECHNICAL BUDGET Michael Kohan Seafood Techn ical

Emerald Ash Borer in New Jersey Emerald Ash Borer (EAB) History of the spread First

1 Null Hypothesis Alternative Hypotheses AH I: the different spawning aggregations there

Protecting Your Horse Review The Drugs Fenbendazole SafeGuard, Panacur Treatment

Growing Exports With Regional Collaboration Southern California International Ag Trade Summit Dr.

The CIPM MRA and DEVELOPING COUNTRIES Ndwakhulu Mukhufhi CEO NMISA Outlay The Situation

Eco-Friendly Backyards TTFs mission is to improve the health and vitality of the

Sambuz

Useful Links

Newsletter

Mail Us

ScyllaDB: Achieving No-Compromise Performance Avi Kivity, CTO - PowerPoint PPT Presentation

ScyllaDB: Achieving No-Compromise Performance Avi Kivity, CTO @AviKivity (Hiring!) Agenda Background Goals Methods Conclusion Non-Agenda Docker Orchestration Microservices JVM GC Tuning Node.js JSON over HTTP

Missouri Compromise, 1820 Firebell in the Night The Missouri Compromise, 1820 Divides the

Tripwire Inferring Internet Site Compromise Joe DeBlasio Stefan Savage UC San Diego Geoffrey

Financial Impacts of Achieving Aggressive Financial Impacts of Achieving Aggressive Financial

The National Adoption Service Suzanne Griffiths, Director of Achieving More Together Achieving

Canada By Anne Tomalin June, 2016 WHO Held Meeting For Compromise Proposal To Name

Anatomy of a Fraud Business E-mail Compromise &amp; Computer Intrusion What is Business E-Mail

Cougar Helicopters Presentation Offshore Helicopter Safety Inquiry 1 No Compromise No

SSH Compromise Detection using NetFlow/IPFIX Rick Hofstede, Luuk Hendriks 51 percent of

Compromise Solutions Kai-Simon Goetzmann, TU Berlin (Joint work with Christina B using, Jannik

IRS &amp; FTB Offers in Compromise December 12, 2012 Thank you for attending this seminar. Your

The Simpson-Bowles Social Security Framework: A Bipartisan Compromise Worthy of Support Charles

Achieving High Achieving High- -Impact HIV Prevention: Impact HIV Prevention: CDC Funding

Achieving Operational Efficiency of Research Institutes using ICT 16 th April 2018 Achieving

achieving better outcomes achieving better outcomes and some thoughts on wellbeing and

Achieving the Dream For Prospective Colleges Fall 2010 Achieving the Dream is a bold national

U d Understanding &amp; di &amp; Achieving Achieving T T Chris Lawrence Virginia

Coastal, Columbia, &amp; Snake Conservation Plan for Lampreys in Oregon (CPL) Oregon Fish and

ALASKA SEAFOOD MARKETING INSTITUTE FY18 TECHNICAL BUDGET Michael Kohan Seafood Techn ical

Emerald Ash Borer in New Jersey Emerald Ash Borer (EAB) History of the spread First

1 Null Hypothesis Alternative Hypotheses AH I: the different spawning aggregations there

Protecting Your Horse Review The Drugs Fenbendazole SafeGuard, Panacur Treatment

Growing Exports With Regional Collaboration Southern California International Ag Trade Summit Dr.

The CIPM MRA and DEVELOPING COUNTRIES Ndwakhulu Mukhufhi CEO NMISA Outlay The Situation

Eco-Friendly Backyards TTFs mission is to improve the health and vitality of the

Sambuz

Useful Links

Newsletter

Mail Us

Anatomy of a Fraud Business E-mail Compromise & Computer Intrusion What is Business E-Mail

IRS & FTB Offers in Compromise December 12, 2012 Thank you for attending this seminar. Your

U d Understanding & di & Achieving Achieving T T Chris Lawrence Virginia

Coastal, Columbia, & Snake Conservation Plan for Lampreys in Oregon (CPL) Oregon Fish and