Split your database Nicolai Plum Booking.com Database Engineering
Before 2
After 3
Why? ● Size ● Query time, query latency ● … conflicting workloads ● Business or architecture reasons ● Regulatory compliance ● Easier PCI DSS, GDPR compliance, PII... 4
Choosing tables to split out ● Minimise joins ● Maximise gains ● Split out biggest, fastest-growing, busiest, most- regulated, … ● Size and activity: ps-top ● https://github.com/sjmudd/ps-top ● Design requirements, current system structure ● Based on code analysis, application knowledge 5
ps-top ps-top 0.7.5 - 18:44:30 db-101 / 5.7.19-log, up 31d 9h 39m 2 Table (table_io_waits_summary_by_table) 44 rows Ops %| Fetch Insert Update Delete|Table Name 13.49 k 50.6%| 99.8% 0.1% 0.1%|bdb.Inventory 6.97 k 26.1%| 2.6% 94.8% 2.5% 0.1%|bdb.Catalogue 1.36 k 5.1%| 50.9% 24.5% 24.5%|bdb.Transaction 963 3.5%| 66.7% 33.3% |bdb.Delivery 597 2.2%| 51.1% 25.3% 23.6% |bdb.Customer 396 1.4%| 4.5% 94.4% 1.0%|bdb.Reviews 6
Application and code ● Application queries must go to different master for some tables after split ● Code change ● Proxy magic (untested!) ● (almost certainly) Need to change your code ● Try to separate read and write database handles in code ● Try to use application feature switch, A/B experiment framework, or other runtime new-old switch mechanism ● Implement in your test environment first 7
Joins & code changes ● Writes: ● Scan (statement) binlog ● Preferably get rid of update clauses with joins altogether ● Code analysis ● Reads ● Code analysis ● Query sampling ● Performance_Schema, proxies ● Perfection is difficult, you may have fixes after split 8
Foreign keys, triggers, stored procedures ● Foreign Keys between the schemas will not work ● Remove them and refactor code ● Move relational integrity checks and enforcement to application layers ● Reflect upon your life and give up using FKs ● Check stored procedures and triggers for use of both schemas ● Implement triggers and stored procedures on product schema as necessary ● No Percona Online Schema change during split 9
Replication topology 10
Translation Master ● Schema translation ● replicate-rewrite-db=’Sales->Product' ● Replication filter ● replicate-wild-do-table = Product.Catalogue ● replicate-wild-do-table = Product.Inventory ● replicate-wild-do-table = Product.Reviews 11
Schema name translation 12
Scheme name translation 13
Preparing Translation master ● Make a slave, with binlog enabled ● Shut down MySQL ● Configuration: Add ● skip-slave-start ● replicate-rewrite-db=’sales->product’ ● replicate-wild-do-table… 14
Preparing Translation master ● Create new schema ( product ) ● Rename all tables that are being split to the new schema ● Drop old schema ( sales ) from translation master ● Start replication slave ● Remove skip-slave-start ● If replication stops with missing table errors, find the bad query, fix up (skip) or make a new translation master again 15
Preparing new replication chain ● Clone the translation master to new ( product ) master ● Remove the database translate and replication filter statements ● Clone more slave databases as needed 16
Client connections ● Sales clients ● Product clients ● Write connection to ● Write connection to sales master sales master ● Read connection to ● Read connetion to sales slaves product slaves 17
Clients 18
So, how do clients see it? ● Sales clients ● Product clients ● Write sales schema in ● Write sales schema in sales master sales master ● Read sales schema in ● Read product schema sales slaves in product slaves 19
Better client access ● Create a view on the sales master of the tables in a product schema ● Simple view is updatable, so UPDATE/ INSERT/DELETE succeed ● product client code can be changed to use new schema name before split ● Change product clients to use product on sales master before split 20
Schemas with views 21
Creating views ● Sys schema tool use sys; call create_synonym_db(’sales', ’product'); ● Cannot handle views in sales ● View of view is not possible ● product views must be built on the same real sales tables as sales views ● Drop views of tables not moving ● Or create database product then create view… 22
Views on slaves also ● Updates are executed properly in SBR ● Slave can be promoted to master ● Using Orchestrator or similar 23
So, how do clients see it now? ● Sales clients ● Product clients ● Write sales schema in ● Write product schema sales master in sales master ● Read sales schema in ● Read product schema sales slaves in product slaves 24
Decreased redundancy ● Failover of the translation master is hard ● Keep the intermediate stage short-lived ● Probably best to prepare to repoint slaves in a hurry if translation master dies ● Keep sufficient capacity in parent chain for total load 25
Grants ● MySQL users of the sales tables will need privileges on product tables on the sales master ● Separate grants for sales and product makes the split easier and safer ● Grants (mysql schema) do not replicate through translation master ● product master will need matching grants 26
Client database server configuration ● Need a way to change server name quickly ● DNS CNAME or A record ● Preferably without application restart 27
Client access while split is ongoing ? 28
Choice ● Disconnect ALL ● Allow some product product clients from clients to connect to sales master product master while BEFORE connecting others still use sales to product master master ● More downtime ● Less downtime ● More data integrity ● Less data integrity 29
Choice ● Disconnect first ● Disconnect later ● No chance of conflicting ● Some chance of transactions conflicting transactions ● Auto-increment counter ● Fixed or external counter columns and id sources ● Complex less- ● More-understood understood application application ● Low risk appetite ● Higher risk appetite ● Lower cost of downtime ● High cost of downtime 30
Disconnect first ● Disable product client access to sales master – use separate user grants if possible, lock the user ● Kill all connections from product clients ● Wait for translation master replication pointer to catch up past the point where product clients were blocked ● Stop replication on the translation master ● Change product client configuration to use product master ● (going back is now hard) 31
Disconnect first 32
Disconnect first 33
Disconnect first 34
Disconnect later ● Change product client configuration to use product master ● Wait for product clients to make new connections to product master ● Disable product client access to sales master – use separate user grants if possible, lock the user ● Kill all connections from product clients ● Wait for translation master replication pointer to catch up ● Stop replication on the translation master ● (going back is now hard) 35
Disconnect later 36
Disconnect later 37
Disconnect later 38
After split 39
Cleanup ● Remove translation master ● Drop views and grants for product on the sales master ● Rename, then drop tables from the sales master 40
Final state 41
? nicolai.plum@booking.com 42
Recommend
More recommend