Scaling the Relational Database for the Cloud Age Sumedh Pathak, Co-Founder & VP Engineering, Citus Data DataEngConf San Francisco 2018 @moss_toss | @citusdata
About Me • Co-Founder & VP Engineering at Citus Data • Amazon Shopping Cart (former) • Amazon Supply Chain & Order Fulfillment (former) • Stanford Computer Science Sumedh Pathak | Citus Data | DataEngConf 2018
Why RDBMS? Sumedh Pathak | Citus Data | DataEngConf 2018
Because your architecture could be simpler SELECT ... SQL Application Database Map Streaming Storage NoSQL Application Reduce Sumedh Pathak | Citus Data | DataEngConf 2018
An RDBMS is a general-purpose data platform Fast writes High throughput Data consistency Real-time & bulk High concurrency Query optimizers Sumedh Pathak | Citus Data | DataEngConf 2018
Startups Are Choosing Postgres % database job posts mentioning each database, across 20K+ job posts PostgreSQL MySQL MongoDB SQL Server + Oracle Source: Hacker News, https://news.ycombinator.com
Data Trends in the “Cloud Age” - Explosion of Data - Higher Volume - Higher Velocity ** SCALE **
but RDBMS’s don’t scale! Sumedh Pathak | Citus Data | DataEngConf 2018
but RDBMS’s don’t scale RDBMS’s are hard to scale Sumedh Pathak | Citus Data | DataEngConf 2018
What exactly needs to Scale? - Tables (Data) 1 - Partitioning, Co-location, Reference Tables - SQL (Reads) 2 - How do we express and optimize distributed SQL - Transactions (Writes) 3 - Cross Shard updates/deletes, Global Atomic Transactions Sumedh Pathak | Citus Data | DataEngConf 2018
Scaling Tables Sumedh Pathak | Citus Data | DataEngConf 2018
Data Partitioning - Pick a column - Date - Id (customer_id, cart_id) - Pick a method - Hash - Range
Partition data across nodes Coordinator Node Shards Worker R 1 R 2 R 3 R 4 R 5 R 6 R 7 Nodes Sumedh Pathak | Citus Data | DataEngConf 2018
Worker → RDBMS, Shard → Table
Reference Tables Coordinator Node Copies of same table Worker N 1 N 1 N 1 N 1 Nodes Sumedh Pathak | Citus Data | DataEngConf 2018
Co-Location Coordinator Node Explicit Co-Location API. E.g. Partition by Tenant Worker R 1 S 1 R 2 S 2 R 3 S 3 R 4 S 4 Nodes Sumedh Pathak | Citus Data | DataEngConf 2018
What about Foreign Keys?
The key to scaling tables... - Use relational databases as a building block - Understand semantics of application—to be smart about partitioning - Multi-tenant applications
Scaling SQL
SQL ↔ Relational Algebra FROM table R Project x (R) � R’ SELECT x Filter f(x) (R) � R’ WHERE f(x) R × S � R’ … JOIN … Sumedh Pathak | Citus Data | DataEngConf 2018
Distributed Relational Algebra Collect(R 1 ,R 2 ,...) � R FROM sharded_table C R 1 R 2 Sumedh Pathak | Citus Data | DataEngConf 2018
Commutativity A + B = B + A Sumedh Pathak | Citus Data | DataEngConf 2018
Commutative property Project x (Collect(R 1 ,R 2 ,...)) = Collect(Project x (R 1 ), Project x (R 2 )...) P x C C P x P x R 1 R 2 R 1 R 2 Sumedh Pathak | Citus Data | DataEngConf 2018
Distributivity A*(B + C) = A*B + A*C Sumedh Pathak | Citus Data | DataEngConf 2018
Distributive property Collect(R 1 ,R 2 ,...) x Collect(S 1 ,S 2 ,...) = Collect(R 1 × S 1 ,R 2 × S 2, ...) × C C C × × R 1 S 1 R 2 S 2 R 1 S 1 R 2 S 2 X = Join Operator Sumedh Pathak | Citus Data | DataEngConf 2018
Associativity A + B + C = (A + B) + C = A + (B + C) Sumedh Pathak | Citus Data | DataEngConf 2018
Associative property SUM(x)(Collect(R 1 ,R 2 ,...)) = SUM(Collect(SUM(R 1 ), SUM(R 2 )...)) Sum x Sum x C C Sum x Sum x R 1 R 2 R 1 R 2 Sumedh Pathak | Citus Data | DataEngConf 2018
SELECT sum(price) FROM orders, nation WHERE orders.nation = nation.name AND orders.date >= '2012-01-01' AND nation.region = 'Asia'; Sumedh Pathak | Citus Data | DataEngConf 2018
Sumedh Pathak | Citus Data | DataEngConf 2018
Volcano style processing Data flows from bottom to top Sumedh Pathak | Citus Data | DataEngConf 2018
Sumedh Pathak | Citus Data | DataEngConf 2018
Parallelize Aggregate Push Joins & Filters below collect. Run in parallel across all nodes Filters & Projections done before Join Sumedh Pathak | Citus Data | DataEngConf 2018
Sumedh Pathak | Citus Data | DataEngConf 2018
SELECT sum(intermediate_col) FROM <concatenated results>; SELECT sum(price) SELECT sum(price) FROM orders_2 JOIN nation_2 FROM orders_2 JOIN nation_1 ON (orders_2.name = ON (orders_2.name = nation_2.name) nation_1.name) WHERE WHERE orders_2.date >= '2017-01-01' orders_2.date >= '2017-01-01' AND AND nation_2.region = 'Asia'; nation_2.region = 'Asia'; Sumedh Pathak | Citus Data | DataEngConf 2018
Executing Distributed SQL SELECT sum(price) FROM <results>; orders SELECT sum(price) SELECT sum(price) FROM orders_1 o JOIN nation_1 nation FROM orders_2 o JOIN nation_1 ON (o.name = n.name) ON (o.name = n.name) WHERE o.date >= '2017-01-01' WHERE o.date >= '2017-01-01' AND n.region = 'Asia'; AND n.region = 'Asia'; orders_1 orders_2 nation_1 nation_1 SQL database SQL database
The key to scaling SQL... - New relational algebra operators for distributed processing - Relational Algebra Properties to optimize tree: Commutativity, Associativity, & Distributivity - Map / Reduce operators Sumedh Pathak | Citus Data | DataEngConf 2018
Scaling Transactions
Money Transfer, as an example BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT; Sumedh Pathak | Citus Data | DataEngConf 2018
BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT; Coordinator A 1 A 3 A 2 A 4 Sumedh Pathak | Citus Data | DataEngConf 2018
BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT; Coordinator BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; A 1 A 3 A 2 A 4 Sumedh Pathak | Citus Data | DataEngConf 2018
BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT; Coordinator BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; COMMIT; A 1 A 3 A 2 A 4 Sumedh Pathak | Citus Data | DataEngConf 2018
BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT; Coordinator BEGIN; BEGIN; UPDATE accounts SET balance = UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; balance - 100 WHERE id = ‘ALICE’; A 1 A 3 A 2 A 4 Sumedh Pathak | Citus Data | DataEngConf 2018
BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT; BEGIN; Coordinator UPDATE accounts SET balance = BEGIN; balance + 100 WHERE id = ‘BOB’; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; COMMIT A 1 A 3 A 2 A 4 Sumedh Pathak | Citus Data | DataEngConf 2018
BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT; Coordinator PREPARE TRANSACTION ‘citus_...98’; PREPARE TRANSACTION ‘citus_...98’; A 1 A 3 A 2 A 4 Sumedh Pathak | Citus Data | DataEngConf 2018
What happens during PREPARE? Locks are State of transaction stored maintained on a durable store & Sumedh Pathak | Citus Data | DataEngConf 2018
BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT; Coordinator ROLLBACK TRANSACTION ‘citus_... PREPARE TRANSACTION ‘citus_...98’; 98’; A 1 A 3 A 2 A 4 Sumedh Pathak | Citus Data | DataEngConf 2018
BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT; Coordinator PREPARE TRANSACTION ‘citus_...98’; PREPARE TRANSACTION ‘citus_...98’; A 1 A 3 A 2 A 4 Sumedh Pathak | Citus Data | DataEngConf 2018
BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT; Coordinator COMMIT PREPARED ‘citus_... COMMIT PREPARED ‘citus_... 98’; 98’; A 1 A 3 A 2 A 4 Sumedh Pathak | Citus Data | DataEngConf 2018
BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT; Coordinator COMMIT PREPARED ‘citus_... COMMIT PREPARED ‘citus_... 98’; 98’; A 1 A 3 A 2 A 4 Sumedh Pathak | Citus Data | DataEngConf 2018
Recommend
More recommend