architecting a low latency schemaless sql engine
play

Architecting a Low-Latency Schemaless SQL Engine Igor Canadi, - PowerPoint PPT Presentation

Architecting a Low-Latency Schemaless SQL Engine Igor Canadi, Rockset About Rockset Igor Search and analytics engine Rockset Enables data-driven Facebook applications RocksDB GraphQL 2 Overview Hardware and people


  1. Architecting a Low-Latency Schemaless SQL Engine Igor Canadi, Rockset

  2. About Rockset Igor ● Search and analytics engine ● Rockset ● Enables data-driven ● Facebook applications ● RocksDB ● GraphQL 2

  3. Overview ● Hardware and people efficiency ● Designing systems for people efficiency ○ Schemaless SQL ○ Converged indexing ○ Serverless architecture

  4. Hardware Efficiency

  5. Hardware Efficiency ● Faster databases ~= less hardware ● How much hardware do I need? ● Important, but not the only thing that matters

  6. People Efficiency

  7. People Efficiency ● How many people do I need? ● How much time do I need?

  8. People Efficiency - Configuration “My query is slow” “Do you have an index?” “What’s your partition key?” “What’s your buffer size?” “You should hire a DBA”

  9. People Efficiency - Organizational Friction ● Pre-cloud era: Application developers blocked on provisioning ● Data scientists blocked on data engineers

  10. People Efficiency - Pipelines

  11. Hardware vs. People Efficiency ● Hardware is frequently cheaper than people ● Increase hardware efficiency - spend less money ● Increase people efficiency - spark creativity

  12. Designing Systems for People Efficiency

  13. Rockset ● Search and analytics engine ● “Shortest path from data to applications” ● Connect to data sources or streams ● Execute fast queries

  14. Schemaless SQL

  15. Choosing the Query Language ● SQL is the obvious choice ● Maximize usefulness ● Existing knowledge ● Ecosystem of tools

  16. Querying existing data sources OLTP Data Lake Social SQL Sensors Logs Email/Docs Files Web/Mobile

  17. Querying existing data sources OLTP Data Lake Social SQL Sensors Logs Email/Docs ...but first, let me define a schema Files Web/Mobile

  18. SQL Schema ● Drag on people efficiency ● Messy data ● Complex ETL jobs

  19. Schemaless SQL ● “Smart schema” ● Frictionless data onboarding ● Data scientists no longer blocked on data engineers ● Performance overhead? https://rockset.com/blog/using-smart-schema-to-accelerate-insights-from-nested-json/

  20. Schemaless SQL - Storage Data Schema name: String age: Int Strict schema John 35 Schemaless “name”: S “John” “age”: I 35 Schemaless name: 0 age: 1 0: S “John” 1: I 35 (with field interning)

  21. Schemaless SQL - Query Execution Rows 1 10 7 4 5 Strict schema Columns a b c d e I 1 I 10 I 7 I 4 I 5 Schemaless Columns S a S b I 3 I 5 S e I 1 10 7 4 5 Schemaless Columns (with type hoisting) M S a S b I 3 I 5 S e

  22. Schemaless SQL ● Superior user experience ● Field interning reduces storage overhead ● Type hoisting reduces query execution overhead

  23. Converged indexing

  24. Converged Indexing ● “Query is slow because of the missing index”

  25. Converged Indexing ● “Query is slow because of the missing index” Index all the fields!

  26. Background on Indexing ● Columnar storage ● Search indexing

  27. Columnar Storage Store each column separately ● Great compression ● Only fetch columns the query needs ● 29

  28. Columnar Storage Store each column separately ● “name” Great compression ● Only fetch columns the query needs 0 Igor ● 1 Dhruba <doc 0> “interests” { “name”: “Igor”, 0.0 databases “interests”: [“databases”, “snowboarding”], 0.1 snowboarding “last_active”: 2019/3/15 } 1.0 cars 1.1 databases <doc 1> { “last_active” “name”: “Dhruba”, 0 2019/3/15 “interests”: [“cars”, “databases”], “last_active”: 2019/3/22 1 2019/3/22 } 30

  29. Columnar Storage Advantages Disadvantages Cost effective High write latency ● ● Narrow queries, wide tables High minimum read latency ● ● Scan queries Not suitable for online ● ● Analytical queries applications ● 31

  30. Search Indexing For each value, store documents containing that value (posting list) ● Quickly retrieve a list of document IDs that match a predicate ● 32

  31. Search Indexing For each value, store documents containing that value (posting list) ● Quickly retrieve a list of document IDs that match a predicate ● “name” <doc 0> Dhruba 1 { “name”: “Igor”, Igor 0 “interests”: [“databases”, “snowboarding”], “interests” “last_active”: 2019/3/15 } databases 0.0; 1.1 cars 1.0 <doc 1> { snowboarding 0.1 “name”: “Dhruba”, “interests”: [“cars”, “databases”], “last_active” “last_active”: 2019/3/22 2019/3/15 0 } 2019/3/22 1 33

  32. Search Indexing Advantages Disadvantages High selectivity queries Slower analytical queries ● ● Low latency queries ● Suitable for online applications ● 34

  33. Converged Indexing Columnar and search indexes in the same system ● Built on top of key-value store abstraction ● Each document maps to many key-value pairs ● 35

  34. Converged Indexing Columnar and search indexes in the same system ● Built on top of key-value store abstraction ● Each document maps to many key-value pairs ● Key Value Row Store R.0.name Igor <doc 0> R.1.name Dhruba { “name”: “Igor” C.name.0 Igor Column Store } C.name.1 Dhruba <doc 1> { Search index S.name.Dhruba.1 “name”: “Dhruba” } S.name.Igor.0 36

  35. Converged Indexing - Queries Fast analytical queries + fast search queries ● Optimizer picks between columnar store or search index ● 37

  36. Converged Indexing - Queries Fast analytical queries + fast search queries ● Optimizer picks between columnar store or search index ● SELECT * SELECT keyword, count(*) FROM search_logs FROM search_logs WHERE keyword = ‘datacouncil’ GROUP BY keyword AND locale = ‘en’ ORDER BY count(*) DESC Search index Columnar store 38

  37. Converged Indexing - Writes One document write results in many ● key-value store writes Use write-optimized key-value store - ● RocksDB 39

  38. Converged Indexing - Writes new keys One document write results in many ● key-value store writes Memory Manager Use write-optimized key-value store - ● Memory Buffer RocksDB Storage SST 1 SST 2 background compaction SST 3 SST 4 40

  39. Converged Indexing ● Fast queries out of the box ● Real-time index writes Less efficient More efficient ● Database configuration ● Storage ● Queries ● Writes 41

  40. Serverless Architecture

  41. Serverless Architecture ● Rockset is a cloud service ● No need to manage hardware ● Seamless autoscale 43

  42. Storage in the Cloud Rockset SQL API Distributed Data is sharded across leaves ● Log Aggregator Aggregator Leaf Leaf Leaf RocksDB RocksDB RocksDB 44

  43. Storage in the Cloud Rockset SQL API Distributed Data is sharded across leaves ● Log Aggregator Aggregator RocksDB-Cloud keeps consistent ● copy in cloud object storage Leaf Leaf Leaf RocksDB-Cloud RocksDB-Cloud RocksDB-Cloud RocksDB RocksDB RocksDB SST files SST files SST files Object Storage (AWS S3, GCS, Minio, ...) 45

  44. Adding new read replica Rockset SQL API Distributed Copy data to a new leaf ● Log Aggregator Aggregator Leaf Leaf Leaf Leaf RocksDB-Cloud RocksDB-Cloud RocksDB-Cloud RocksDB-Cloud RocksDB RocksDB RocksDB RocksDB SST files SST files SST files Object Storage (AWS S3, GCS, Minio, ...) 46

  45. Adding new read replica Rockset SQL API Distributed Copy data to a new leaf ● Log Aggregator Aggregator Tail new updates from log ● Leaf Leaf Leaf Leaf RocksDB-Cloud RocksDB-Cloud RocksDB-Cloud RocksDB-Cloud RocksDB RocksDB RocksDB RocksDB SST files SST files SST files Object Storage (AWS S3, GCS, Minio, ...) 47

  46. Adding new read replica Rockset SQL API Distributed Copy data to a new leaf ● Log Aggregator Aggregator Tail new updates from log ● Able to serve more queries ● Leaf Leaf Leaf Leaf RocksDB-Cloud RocksDB-Cloud RocksDB-Cloud RocksDB-Cloud RocksDB RocksDB RocksDB RocksDB SST files SST files SST files Object Storage (AWS S3, GCS, Minio, ...) 48

  47. Conclusion

  48. Conclusion No need to configure... ● Schemaless SQL ...schema ● Converged indexing ...indexes ● Serverless architecture ...servers 50

  49. Conclusion ● Rockset - “shortest path from data to applications” ● Making workflows easy catalyzes creativity 51

  50. Thank you

Recommend


More recommend