the state of databases in 2019
play

The State of Databases in 2019 Dinesh A. Joshi @dineshjoshi - PowerPoint PPT Presentation

The State of Databases in 2019 Dinesh A. Joshi @dineshjoshi dinesh.joshi@gatech.edu apache cassandra About Me Senior Software Engineer Apache Cassandra Committer > 10 YoE in Distributed Systems MS CS (Distributed Systems),


  1. The State of Databases in 2019 Dinesh A. Joshi @dineshjoshi dinesh.joshi@gatech.edu apache cassandra

  2. About Me • Senior Software Engineer • Apache Cassandra Committer • > 10 YoE in Distributed Systems • MS CS (Distributed Systems), Georgia Tech, Atlanta, USA

  3. Data Trends 📋

  4. Data Growth Source: https://www.seagate.com/www-content/our-story/trends/files/Seagate-WP-DataAge2025-March-2017.pdf

  5. Data Criticality Source: https://www.seagate.com/www-content/our-story/trends/files/Seagate-WP-DataAge2025-March-2017.pdf

  6. Data Growth Fuel ⛽ • Embedded Devices Time Series! • IoT • Sensors • Wearables

  7. apache cassandra Source: https://www.datameer.com/blog/big-data-ecosystem/

  8. Database Landscape 2019

  9. Choices? 🧑 350+ !!!

  10. Operators & Developers

  11. Operators & Developers Developers Operators Both

  12. Not always aligned!

  13. Cascading Costs $ UI / Presentation Services (REST, GRPC) Access Layer DB $$$

  14. Polyglot Persistence Polyglot persistence is the concept of using di ff erent data storage technologies to handle di ff erent data storage needs within a given software application – Wikipedia Source: https://en.wikipedia.org/wiki/Polyglot_persistence

  15. Polyglot Persistence Source: https://www.infoq.com/presentations/microservices-polyglot-persistence

  16. Polyglot Persistence Source: https://www.infoq.com/presentations/microservices-polyglot-persistence

  17. Database Landscape 2019

  18. Landscape 2019 • Relational • Time Series • NoSQL • Document Stores • NewSQL • Search Engines • Graph • In Memory

  19. Relational Databases

  20. Relational Databases

  21. Relational Databases • Data is Relational • Joins • Transactions • SQL is well known • Dataset fits

  22. NoSQL Databases

  23. NoSQL Databases • Key-Value Stores • RDF Stores • Wide Column Stores • Native XML DBMS • Document Stores • Content Stores • Graph DBMS • Search Engines

  24. NoSQL Databases LevelDB

  25. Industry Trends

  26. SQL Source: Google Trends

  27. Relational Source: https://db-engines.com/en/ranking_categories

  28. Graph vs Relational Source: https://db-engines.com/en/ranking_categories

  29. Time Series, Wide Column DBs Source: https://db-engines.com/en/ranking_categories

  30. Popularity Trends Source: https://db-engines.com/en/ranking_categories

  31. Popularity Trends (All) Source: https://db-engines.com/en/ranking_categories

  32. Apache Cassandra apache cassandra

  33. Manage massive amounts of data, fast, without losing sleep! apache cassandra Source: http://cassandra.apache.org/

  34. 1500+

  35. What is MASSIVE Scale? • 75000+ nodes DURABLE • 10+ PBs of data apache cassandra • Over 1 Trillion requests / day Source: http://cassandra.apache.org/

  36. What is FAST? LINEAR SCALABILITY apache cassandra Source: http://vldb.org/pvldb/vol5/p1724_tilmannrabl_vldb2012.pdf

  37. RELIABILITY? • No SPoFs • Decentralized apache cassandra • Shared-nothing architecture

  38. Cassandra Origins BigTable apache cassandra Dynamo

  39. CAP Theorem Availability apache cassandra Consistency Partition Tolerance

  40. Apache Cassandra 4.0 What's new? apache cassandra

  41. Cassandra 4.0 Changes 300+!

  42. Reliability & Stability 🐏 • Checksummed Transport • Checksummed Storage

  43. Scalability • Zero Copy Data Streaming • New internode messaging • Transient Replication

  44. Throughput vs Cluster Size ~1000 nodes Throughput (RPS) Cluster Size (# of nodes)

  45. Time to recover (4.0 vs 3.x) 120 trunk 3.x 100 Time to recover (minutes) 80 60 40 20 0 i3.2xl i3.4xl i3.8xl AWS Instance Type Source: https://issues.apache.org/jira/browse/CASSANDRA-14765

  46. Time to recover (4.0 vs 3.x) Source: https://issues.apache.org/jira/browse/CASSANDRA-14765

  47. Netty OpenSSL vs JDK SSL Source: https://speakerdeck.com/normanmaurer/netty-one-framework-to-rule-them-all?slide=29

  48. Cassandra Networking (4.0 vs Pre 4.0) • Lower Latencies ( 40% lower avg 60% lower p99) • Memory E ffi ciency ( ~10x reduction) • Scalable internode encryption ( ~4x throughput) • Better throughput & response times ( ~2x vs 3.0)

  49. Contribute • https://cassandra.apache.org • dev@cassandra.apache.org, user@cassandra.apache.org • #cassandra-dev (irc.freenode.net)

  50. Questions?

Recommend


More recommend