MySQL at Scale at Square Daniel Nichter Percona Live 2018 1 April 23, 2018
● An honest financial network for everyone Square ● Global: USA, Canada, UK, Japan, Australia ● Payment transaction data stored in MySQL ● We are hiring (square.com/careers) 2 About Square
The Numbers ● > 4,000 unique MySQL instances ○ Percona Server 5.6, RBR, GTID ● 99.95% SLA ○ 43s day, 5m week, 21m month, 4h 23m year ● ~ 1 PB of data ● ~ 800 physical servers ● ~ 300 microservices (apps) 3 MySQL at Square by the Numbers
Theme Pebbles, Not Boulders 4 Theme: Pebbles, Not Boulders
Theme Pebbles, Not Boulders 5 Theme: Pebbles, Not Boulders
Topics 1. Setup and Replication 2. High Availability 3. Backups 4. Access and Security 5. Monitoring and Alerting 6. Advice 6 Topics of Discussion
1. Setup and Replication 2. High Availability 3. Backups 4. Access and Security 5. Monitoring and Alerting 6. Advice
A-side B-side 01 02 US West US East 03 04 8 Production Database Cluster
A-side B-side ● It works ● Simple, fewest moving parts ● Balance cost, redundancy, complexity 01 02 ● Can add more read replicas US West ● Inactive side for: ○ Backups ○ Maintenance ○ Upgrade/patch US East 03 04 ○ Ad hoc queries by humans ○ No SLA ● Shard and scale out (app) 9 Benefits of this Topology
Theme Pebbles, Not Boulders A-side B-side 01 02 US West 03 04 US East 10 Theme: Pebbles, Not Boulders
1. Setup and Replication 2. High Availability 3. Backups 4. Access and Security 5. Monitoring and Alerting 6. Advice
A-side B-side RW CNAME RW SIP 01 02 03 04 12 RW CNAME and SIP
A-side B-side RW SIP RW CNAME 01 02 03 04 13 RW CNAME and SIP flipped
A-side B-side RW CNAME RW SIP 01 02 Clone from 03 03 04 Clone from 02 14 Node 02 Failure
A-side B-side RW CNAME RW SIP 01 02 03 04 15 B-side Rebuilt
A-side B-side 01 02 Clone from 02 Clone from 04 RW CNAME RW SIP 03 04 GTID 16 Primary DC Failure
A-side B-side 01 02 RW CNAME RW SIP 03 04 17 Cluster Running In DR DC
A-side B-side RW CNAME RW SIP 01 02 03 04 18 DR Successover Back To Primary DC
Scaling Out 01 02 01 02 01 02 03 04 03 04 03 04 01 02 01 02 01 02 03 04 03 04 03 04 01 02 01 02 01 02 03 04 03 04 03 04 19 9 Production MySQL Clusters
Theme Pebbles, Not Boulders 01 02 01 02 03 04 03 04 01 02 01 02 03 04 03 04 20 Theme: Pebbles, Not Boulders
1. Setup and Replication 2. High Availability 3. Backups 4. Access and Security 5. Monitoring and Alerting 6. Advice
Backups Percona XtraBackup 22 Percona XtraBackup for Backups
Backups ● Percona XtraBackup ● Full daily on all inactive nodes ● Encrypted ● Monitored (we know if backups do not run) ● Most stored locally on separate RAID array; some uploaded to cloud ● Ruby script to wrap XtraBackup, monitoring, etc. ● Restore verified via rebuilding failed nodes 23 Backups at Square
ulimit -n 262144 && umask 0227 && \ innobackupex \ --defaults-extra-file=#{backup_my.cnf} \ --slave-info \ --safe-slave-backup \ --databases=<db list file> \ --stream=xbstream \ --parallel=4 \ #{tmpdir} \ | pigz --fast -p 1 -c \ | gpg2 --batch --no-options -o #{dst} \ | tee >(md5sum > #{file}.md5) \ | sudo -u mysql split -a 2 --bytes=4096m - #{file}- 24 innobackupex command
ulimit -n 262144 && umask 0227 && \ innobackupex \ --defaults-extra-file=#{backup_my.cnf} \ --slave-info \ --safe-slave-backup \ --databases=<db list file> \ --stream=xbstream \ --parallel=4 \ #{tmpdir} \ | pigz --fast -p 1 -c \ | gpg2 --batch --no-options -o #{dst} \ | tee >(md5sum > #{file}.md5) \ | sudo -u mysql split -a 2 --bytes=4096m - #{file}- 25 innobackupex command
ulimit -n 262144 && umask 0227 && \ innobackupex \ --defaults-extra-file=#{backup_my.cnf} \ --slave-info \ --safe-slave-backup \ --databases=<db list file> \ --stream=xbstream \ --parallel=4 \ #{tmpdir} \ | pigz --fast -p 1 -c \ | gpg2 --batch --no-options -o #{dst} \ | tee >(md5sum > #{file}.md5) \ | sudo -u mysql split -a 2 --bytes=4096m - #{file}- 26 innobackupex command
ulimit -n 262144 && umask 0227 && \ innobackupex \ --defaults-extra-file=#{backup_my.cnf} \ --slave-info \ --safe-slave-backup \ --databases=<db list file> \ --stream=xbstream \ --parallel=4 \ #{tmpdir} \ | pigz --fast -p 1 -c \ | gpg2 --batch --no-options -o #{dst} \ | tee >(md5sum > #{file}.md5) \ | sudo -u mysql split -a 2 --bytes=4096m - #{file}- 27 innobackupex command
ulimit -n 262144 && umask 0227 && \ innobackupex \ --defaults-extra-file=#{backup_my.cnf} \ --slave-info \ --safe-slave-backup \ --databases=<db list file> \ --stream=xbstream \ --parallel=4 \ #{tmpdir} \ | pigz --fast -p 1 -c \ | gpg2 --batch --no-options -o #{dst} \ | tee >(md5sum > #{file}.md5) \ | sudo -u mysql split -a 2 --bytes=4096m - #{file}- 28 innobackupex command
ulimit -n 262144 && umask 0227 && \ innobackupex \ --defaults-extra-file=#{backup_my.cnf} \ --slave-info \ --safe-slave-backup \ --databases=<db list file> \ --stream=xbstream \ --parallel=4 \ #{tmpdir} \ | pigz --fast -p 1 -c \ | gpg2 --batch --no-options -o #{dst} \ | tee >(md5sum > #{file}.md5) \ | sudo -u mysql split -a 2 --bytes=4096m - #{file}- 29 innobackupex command
ulimit -n 262144 && umask 0227 && \ innobackupex \ --defaults-extra-file=#{backup_my.cnf} \ --slave-info \ --safe-slave-backup \ --databases=<db list file> \ --stream=xbstream \ --parallel=4 \ #{tmpdir} \ | pigz --fast -p 1 -c \ | gpg2 --batch --no-options -o #{dst} \ | tee >(md5sum > #{file}.md5) \ | sudo -u mysql split -a 2 --bytes=4096m - #{file}- 30 innobackupex command
1. Setup and Replication 2. High Availability 3. Backups 4. Access and Security 5. Monitoring and Alerting 6. Advice
Access and Security ● App-specific MySQL clusters 32 Access and Security
Access and Security ● App-specific MySQL clusters App: foo 01 02 01 02 01 02 03 04 03 04 03 04 foo-001 foo-002 bar-001 33 Access and Security
Access and Security ● App-specific MySQL clusters ● TLS for all connections and auth (no passwords) 34 Access and Security
Access and Security ● App-specific MySQL clusters ● TLS for all connections and auth (no passwords) GRANT ALL PRIVILEGES ON *.* TO 'daniel'@'%' REQUIRE ISSUER '/CN=Square/O=Square Inc./ST=California/L=San Francisco' SUBJECT '/CN=daniel/serialNumber=1' 35 Access and Security
Access and Security ● App-specific MySQL clusters ● TLS for all connections and auth (no passwords) ● Rotating accounts: human, app, system 36 Access and Security
Access and Security ● App-specific MySQL clusters ● TLS for all connections and auth (no passwords) ● Rotating accounts: human, app, system ● Isolated security zones: directional, no cross-talk, no hardware reuse 37 Access and Security
Access and Security ● App-specific MySQL clusters ● TLS for all connections and auth (no passwords) ● Rotating accounts: human, app, system ● Isolated security zones: directional, no cross-talk, no hardware reuse Blue Zone Red Zone INBOUND TCP 3306 OUTBOUND 01 02 01 02 03 04 03 04 38 Access and Security
Recommend
More recommend