large databases lots of servers on premises in the cloud
play

Large databases lots of servers on premises in the cloud GET THEM - PowerPoint PPT Presentation

Large databases lots of servers on premises in the cloud GET THEM ALL! Flavio Gurgel pgDay Paris 2019 DBA leboncoin Mars 12, 2019 A common stack The Internet Border, DMZ http Application Replication Master Standby A common stack


  1. Large databases lots of servers on premises in the cloud GET THEM ALL! Flavio Gurgel pgDay Paris 2019 DBA leboncoin Mars 12, 2019

  2. A common stack The Internet Border, DMZ http Application Replication Master Standby

  3. A common stack leboncoin circa 2009 The Internet Border, DMZ http Application Replication Master Standby

  4. A growing stack leboncoin circa 2013 The Internet Border, DMZ http Application Standby Replication Standby Standby Master (spare)

  5. A growing company leboncoin still at 2013 The Internet Internal users Border, DMZ http Application BI Stuff (ETL, OLAP, web frontend, etc) Standby (short) Replication Standby Standby Master (long) (spare) OLAP DB

  6. 28,1 million unique visitors* more than 27 million classifield ads online 800 000 new ads every day 73 categories *Source Médiamétrie Net Ratings, avril 2018

  7. Stack? Incomplete chart - 2 DCs + AWS leboncoin at 2019

  8. 2 >20 300M A strong Tech team of « open-source » Datacenters Gbits/s 2000 images more than culture: 50k & 200 PostgreSQL, Go, outflows & a Virtual React, Python, 1 database of machines req/s on people Hadoop, 3 To leboncoin Kubernetes Cloud provider …

  9. To handle all that: automation

  10. Availability - is replication enough? ● Hardware ● Warranty ● Power ● RAID 10 ● Battery ● ECC RAM ● Network ● Fans ● Alerting

  11. Replication minimum requirements And let’s think about load-balancing too ● Standby ● Streaming ● Replication slots ● Geographic distribution ● Path ● Load balancing ● Spare

  12. Getting critical DC A DC B Read-write endpoint for applications Standby Master (spare) Standby Standby HAProxy Read-only endpoint for applications

  13. pg_dump ● Nightly ● Archiving DBs not dumped ● Custom mode ● Directory mode (from 300 GB) ● Encrypted ● Sent to the cloud ● Retention -> GDPR

  14. pg_dump (and restore) strategy On premise S3 Databases bucket All DCs Encryption Dump happens here server NFS mount AWS AWS Storage Gateway Storage Gateway DC B DC A Local cache disk Local cache disk

  15. Testing pg_dumps ● Mandatory ● Corruption ● Procedures ● Bugs ● Time to restore

  16. Physical backups ● Barman ● Basebackups (based on WAL/day) ○ Daily -> from 1 TB ○ Twice a week -> between 100 GB and 1 TB ○ Twice a month -> up to 100 GB ● PITR ● Tests

  17. Barman tips ● Postgres method ○ pg_receivewal ○ Pg_basebackup ○ Replication slot ● Geographic distribution ● Archive ● Disk space ● Retention

  18. Barman strategy On premise Databases All DCs Barman 1 Barman 2 Barman 1 Barman 1 DC B DC B DC A DC A

  19. Monitoring and Alerting pgwatch2

  20. Data flows

  21. Minor version upgrades ● Release notes ● DBA + SRE + Developper ● Standby -> Master ● Automation ● 1h total ● Site always up ● Services cut for seconds

  22. Major version upgrades ● Same version everywhere ● Current is 10 ● Decide ● QA + Staging ● Production ● pg_upgrade ● 3h total ● Site always up

  23. Major version (new generation) upgrades ● Near zero downtime ● Logical replication to 11 ● Stop origin on 10 ● Update sequences ● Point to new origin ● Start production ● New physical replica

  24. How it was to migrate from 9.3 to 10? ● row_to_json ● Parallel query ● Plan ● Auto-analyze ● Function ● Replication lag ● DDL locks ● Replication slots

  25. Applying DDL

  26. Incidents we faced ● Replication lag ● Changing execution plans ● Sqitch ● Unattended upgrade

  27. Cloud? (speaking only of databases) ● Instance types ● On premises cost ● Variables ● Physical backups/replicas ● Lock-in ● New scenarios ● Small, elastic, internal services ● Decommissioned DBs

  28. Other DB engine? NoSQL? ● NoSQL ○ InfluxDB ○ Elasticsearch ○ Redis ● Other engines ○ PostgreSQL - more than 70 servers ○ MySQL - some servers ○ MSSQL - one server ● Ditch PostgreSQL for (generic NoSQL here) ○ Never

  29. An app (iOS + Android) 50,6 is on device mobile *Source Médiamétrie Net Ratings, avril 2018 **Source Baromètre de satisfaction BVA novembre 2017

  30. Let’s keep in touch… leboncoin github.com/leboncoin @leboncoinEng Engineering blog

Recommend


More recommend