scaling the relational database for the cloud age
play

Scaling the Relational Database for the Cloud Age Sumedh Pathak, - PowerPoint PPT Presentation

Scaling the Relational Database for the Cloud Age Sumedh Pathak, Co-Founder & VP Engineering, Citus Data DataEngConf San Francisco 2018 @moss_toss | @citusdata About Me Co-Founder & VP Engineering at Citus Data Amazon Shopping


  1. Scaling the Relational Database for the Cloud Age Sumedh Pathak, Co-Founder & VP Engineering, Citus Data DataEngConf San Francisco 2018 @moss_toss | @citusdata

  2. About Me • Co-Founder & VP Engineering at Citus Data • Amazon Shopping Cart 
 (former) • Amazon Supply Chain & Order Fulfillment (former) • Stanford Computer Science Sumedh Pathak | Citus Data | DataEngConf 2018

  3. Why RDBMS? Sumedh Pathak | Citus Data | DataEngConf 2018

  4. Because your architecture could be simpler SELECT ... SQL Application Database Map Streaming Storage NoSQL Application Reduce Sumedh Pathak | Citus Data | DataEngConf 2018

  5. An RDBMS is a general-purpose 
 data platform Fast writes High throughput Data consistency Real-time & bulk High concurrency Query optimizers Sumedh Pathak | Citus Data | DataEngConf 2018

  6. Startups Are Choosing Postgres % database job posts mentioning each database, across 20K+ job posts PostgreSQL MySQL MongoDB SQL Server + Oracle Source: Hacker News, https://news.ycombinator.com 


  7. Data Trends in the “Cloud Age” - Explosion of Data - Higher Volume - Higher Velocity ** SCALE **

  8. but RDBMS’s 
 don’t scale! Sumedh Pathak | Citus Data | DataEngConf 2018

  9. but RDBMS’s don’t scale RDBMS’s are hard to scale Sumedh Pathak | Citus Data | DataEngConf 2018

  10. What exactly needs to Scale? - Tables (Data) 1 - Partitioning, Co-location, Reference Tables - SQL (Reads) 2 - How do we express and optimize distributed SQL - Transactions (Writes) 3 - Cross Shard updates/deletes, Global Atomic Transactions Sumedh Pathak | Citus Data | DataEngConf 2018

  11. Scaling Tables Sumedh Pathak | Citus Data | DataEngConf 2018

  12. Data Partitioning - Pick a column - Date - Id (customer_id, cart_id) - Pick a method - Hash - Range

  13. Partition data across nodes Coordinator Node Shards Worker R 1 R 2 R 3 R 4 R 5 R 6 R 7 Nodes Sumedh Pathak | Citus Data | DataEngConf 2018

  14. Worker → RDBMS, Shard → Table

  15. Reference Tables Coordinator Node Copies of same table Worker N 1 N 1 N 1 N 1 Nodes Sumedh Pathak | Citus Data | DataEngConf 2018

  16. Co-Location Coordinator Node Explicit Co-Location API. 
 E.g. Partition by Tenant Worker R 1 S 1 R 2 S 2 R 3 S 3 R 4 S 4 Nodes Sumedh Pathak | Citus Data | DataEngConf 2018

  17. What about Foreign Keys?

  18. The key to scaling tables... - Use relational databases as a building block - Understand semantics of application—to be smart about partitioning - Multi-tenant applications

  19. Scaling SQL

  20. SQL ↔ Relational Algebra FROM table R Project x (R) � R’ SELECT x Filter f(x) (R) � R’ WHERE f(x) R × S � R’ … JOIN … Sumedh Pathak | Citus Data | DataEngConf 2018

  21. Distributed Relational Algebra Collect(R 1 ,R 2 ,...) � R FROM sharded_table C R 1 R 2 Sumedh Pathak | Citus Data | DataEngConf 2018

  22. Commutativity A + B = B + A Sumedh Pathak | Citus Data | DataEngConf 2018

  23. Commutative property Project x (Collect(R 1 ,R 2 ,...)) = Collect(Project x (R 1 ), Project x (R 2 )...) P x C C P x P x R 1 R 2 R 1 R 2 Sumedh Pathak | Citus Data | DataEngConf 2018

  24. Distributivity A*(B + C) = A*B + A*C Sumedh Pathak | Citus Data | DataEngConf 2018

  25. Distributive property Collect(R 1 ,R 2 ,...) x Collect(S 1 ,S 2 ,...) = Collect(R 1 × S 1 ,R 2 × S 2, ...) × C C C × × R 1 S 1 R 2 S 2 R 1 S 1 R 2 S 2 X = Join Operator Sumedh Pathak | Citus Data | DataEngConf 2018

  26. Associativity A + B + C = (A + B) + C = A + (B + C) Sumedh Pathak | Citus Data | DataEngConf 2018

  27. Associative property SUM(x)(Collect(R 1 ,R 2 ,...)) = SUM(Collect(SUM(R 1 ), SUM(R 2 )...)) Sum x Sum x C C Sum x Sum x R 1 R 2 R 1 R 2 Sumedh Pathak | Citus Data | DataEngConf 2018

  28. SELECT sum(price) FROM orders, nation WHERE orders.nation = nation.name AND orders.date >= '2012-01-01' AND nation.region = 'Asia'; Sumedh Pathak | Citus Data | DataEngConf 2018

  29. Sumedh Pathak | Citus Data | DataEngConf 2018

  30. Volcano style processing Data flows from 
 bottom 
 to top Sumedh Pathak | Citus Data | DataEngConf 2018

  31. Sumedh Pathak | Citus Data | DataEngConf 2018

  32. Parallelize Aggregate Push Joins & Filters below collect. Run in parallel across all nodes Filters & Projections done before Join Sumedh Pathak | Citus Data | DataEngConf 2018

  33. Sumedh Pathak | Citus Data | DataEngConf 2018

  34. SELECT sum(intermediate_col) 
 FROM <concatenated results>; SELECT sum(price) SELECT sum(price) FROM orders_2 JOIN nation_2 
 FROM orders_2 JOIN nation_1 
 ON (orders_2.name = ON (orders_2.name = nation_2.name) nation_1.name) WHERE WHERE orders_2.date >= '2017-01-01' orders_2.date >= '2017-01-01' AND AND nation_2.region = 'Asia'; nation_2.region = 'Asia'; Sumedh Pathak | Citus Data | DataEngConf 2018

  35. Executing Distributed SQL SELECT sum(price) FROM <results>; orders SELECT sum(price) SELECT sum(price) FROM orders_1 o JOIN nation_1 nation FROM orders_2 o JOIN nation_1 ON (o.name = n.name) ON (o.name = n.name) WHERE o.date >= '2017-01-01' WHERE o.date >= '2017-01-01' AND n.region = 'Asia'; AND n.region = 'Asia'; orders_1 orders_2 nation_1 nation_1 SQL database SQL database

  36. The key to scaling SQL... - New relational algebra operators for distributed processing - Relational Algebra Properties to optimize tree: Commutativity, Associativity, & Distributivity - Map / Reduce operators Sumedh Pathak | Citus Data | DataEngConf 2018

  37. Scaling Transactions

  38. Money Transfer, as an example BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT; Sumedh Pathak | Citus Data | DataEngConf 2018

  39. BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT; Coordinator A 1 A 3 A 2 A 4 Sumedh Pathak | Citus Data | DataEngConf 2018

  40. BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT; Coordinator BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; A 1 A 3 A 2 A 4 Sumedh Pathak | Citus Data | DataEngConf 2018

  41. BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT; Coordinator BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; COMMIT; A 1 A 3 A 2 A 4 Sumedh Pathak | Citus Data | DataEngConf 2018

  42. BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT; Coordinator BEGIN; BEGIN; UPDATE accounts SET balance = UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; balance - 100 WHERE id = ‘ALICE’; A 1 A 3 A 2 A 4 Sumedh Pathak | Citus Data | DataEngConf 2018

  43. BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT; BEGIN; Coordinator UPDATE accounts SET balance = BEGIN; balance + 100 WHERE id = ‘BOB’; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; COMMIT A 1 A 3 A 2 A 4 Sumedh Pathak | Citus Data | DataEngConf 2018

  44. BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT; Coordinator PREPARE TRANSACTION ‘citus_...98’; PREPARE TRANSACTION ‘citus_...98’; A 1 A 3 A 2 A 4 Sumedh Pathak | Citus Data | DataEngConf 2018

  45. What happens during PREPARE? Locks are 
 State of transaction stored maintained on a durable store & Sumedh Pathak | Citus Data | DataEngConf 2018

  46. BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT; Coordinator ROLLBACK TRANSACTION ‘citus_... PREPARE TRANSACTION ‘citus_...98’; 98’; A 1 A 3 A 2 A 4 Sumedh Pathak | Citus Data | DataEngConf 2018

  47. BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT; Coordinator PREPARE TRANSACTION ‘citus_...98’; PREPARE TRANSACTION ‘citus_...98’; A 1 A 3 A 2 A 4 Sumedh Pathak | Citus Data | DataEngConf 2018

  48. BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT; Coordinator COMMIT PREPARED ‘citus_... COMMIT PREPARED ‘citus_... 98’; 98’; A 1 A 3 A 2 A 4 Sumedh Pathak | Citus Data | DataEngConf 2018

  49. BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT; Coordinator COMMIT PREPARED ‘citus_... COMMIT PREPARED ‘citus_... 98’; 98’; A 1 A 3 A 2 A 4 Sumedh Pathak | Citus Data | DataEngConf 2018

Recommend


More recommend