The State of Databases in 2019 Dinesh A. Joshi @dineshjoshi dinesh.joshi@gatech.edu apache cassandra
About Me • Senior Software Engineer • Apache Cassandra Committer • > 10 YoE in Distributed Systems • MS CS (Distributed Systems), Georgia Tech, Atlanta, USA
Data Trends 📋
Data Growth Source: https://www.seagate.com/www-content/our-story/trends/files/Seagate-WP-DataAge2025-March-2017.pdf
Data Criticality Source: https://www.seagate.com/www-content/our-story/trends/files/Seagate-WP-DataAge2025-March-2017.pdf
Data Growth Fuel ⛽ • Embedded Devices Time Series! • IoT • Sensors • Wearables
apache cassandra Source: https://www.datameer.com/blog/big-data-ecosystem/
Database Landscape 2019
Choices? 🧑 350+ !!!
Operators & Developers
Operators & Developers Developers Operators Both
Not always aligned!
Cascading Costs $ UI / Presentation Services (REST, GRPC) Access Layer DB $$$
Polyglot Persistence Polyglot persistence is the concept of using di ff erent data storage technologies to handle di ff erent data storage needs within a given software application – Wikipedia Source: https://en.wikipedia.org/wiki/Polyglot_persistence
Polyglot Persistence Source: https://www.infoq.com/presentations/microservices-polyglot-persistence
Polyglot Persistence Source: https://www.infoq.com/presentations/microservices-polyglot-persistence
Database Landscape 2019
Landscape 2019 • Relational • Time Series • NoSQL • Document Stores • NewSQL • Search Engines • Graph • In Memory
Relational Databases
Relational Databases
Relational Databases • Data is Relational • Joins • Transactions • SQL is well known • Dataset fits
NoSQL Databases
NoSQL Databases • Key-Value Stores • RDF Stores • Wide Column Stores • Native XML DBMS • Document Stores • Content Stores • Graph DBMS • Search Engines
NoSQL Databases LevelDB
Industry Trends
SQL Source: Google Trends
Relational Source: https://db-engines.com/en/ranking_categories
Graph vs Relational Source: https://db-engines.com/en/ranking_categories
Time Series, Wide Column DBs Source: https://db-engines.com/en/ranking_categories
Popularity Trends Source: https://db-engines.com/en/ranking_categories
Popularity Trends (All) Source: https://db-engines.com/en/ranking_categories
Apache Cassandra apache cassandra
Manage massive amounts of data, fast, without losing sleep! apache cassandra Source: http://cassandra.apache.org/
1500+
What is MASSIVE Scale? • 75000+ nodes DURABLE • 10+ PBs of data apache cassandra • Over 1 Trillion requests / day Source: http://cassandra.apache.org/
What is FAST? LINEAR SCALABILITY apache cassandra Source: http://vldb.org/pvldb/vol5/p1724_tilmannrabl_vldb2012.pdf
RELIABILITY? • No SPoFs • Decentralized apache cassandra • Shared-nothing architecture
Cassandra Origins BigTable apache cassandra Dynamo
CAP Theorem Availability apache cassandra Consistency Partition Tolerance
Apache Cassandra 4.0 What's new? apache cassandra
Cassandra 4.0 Changes 300+!
Reliability & Stability 🐏 • Checksummed Transport • Checksummed Storage
Scalability • Zero Copy Data Streaming • New internode messaging • Transient Replication
Throughput vs Cluster Size ~1000 nodes Throughput (RPS) Cluster Size (# of nodes)
Time to recover (4.0 vs 3.x) 120 trunk 3.x 100 Time to recover (minutes) 80 60 40 20 0 i3.2xl i3.4xl i3.8xl AWS Instance Type Source: https://issues.apache.org/jira/browse/CASSANDRA-14765
Time to recover (4.0 vs 3.x) Source: https://issues.apache.org/jira/browse/CASSANDRA-14765
Netty OpenSSL vs JDK SSL Source: https://speakerdeck.com/normanmaurer/netty-one-framework-to-rule-them-all?slide=29
Cassandra Networking (4.0 vs Pre 4.0) • Lower Latencies ( 40% lower avg 60% lower p99) • Memory E ffi ciency ( ~10x reduction) • Scalable internode encryption ( ~4x throughput) • Better throughput & response times ( ~2x vs 3.0)
Contribute • https://cassandra.apache.org • dev@cassandra.apache.org, user@cassandra.apache.org • #cassandra-dev (irc.freenode.net)
Questions?
Recommend
More recommend