100% Containers Powered Carpooling
Maxime Fouilleul Database Reliability Engineer
BlaBlaCar - Facts & Figures Infrastructure Ecosystem - 100% containers powered carpooling Today’s agenda Stateful Services into containers - MariaDB as an example Next challenges - Kubernetes, the Cloud
BlaBlaCar Facts & Figures
Facts and Figures 30 million mobile 60 million app downloads members iPhone and Android Founded 15 million in 2006 travellers /quarter 1 million tonnes Currently in less CO 2 22 countries France, Spain, UK, Italy, Poland, Hungary, Croatia, Serbia, Romania, In the past year Germany, Belgium, India, Mexico, The Netherlands, Luxembourg, Portugal, Ukraine, Czech Republic, Slovakia, Russia, Brazil and Turkey.
Our prod data ecosystem MariaDB ElasticSearch Redis Cassandra PostgreSQL Kafka Transactional Search Volatile Distributed Spatial Stream
Infrastructure Ecosystem 100% containers powered carpooling
Why containers?
Homogeneous Hardware From this srv_001 srv_006 srv_009 srv_013 svc_001 svc_006 svc_009 svc_013 srv_007 svc_007 srv_002 svc_002 srv_003 srv_010 srv_014 svc_003 svc_010 srv_008 svc_014 svc_008 srv_004 srv_011 svc_004 svc_011 srv_005 srv_012 svc_005 svc_012
Homogeneous Hardware To that srv_001 srv_003 srv_005 srv_007 svc_001 svc_013 svc_008 svc_002 svc_003 svc_004 svc_005 svc_011 srv_002 srv_004 srv_006 srv_008 svc_006 svc_009 svc_014 svc_007 svc_010 svc_012
Homogeneous Hardware - “Pets vs Cattle” Easier to replace broken hardware Cost Effective Easier to manage
Homogeneous Deployment trip-meeting-point application redis trip-meeting-point cat ./prod-dc1/services/trip-meeting-point/service-manifest.yml cat ./prod-dc1/services/redis-meeting-point/service-manifest.yml --- --- containers: containers: - aci.blbl.cr/aci-trip-meeting-point:20180928.145115-v-979da34 - aci.blbl.cr/aci-redis:4.0.2-1 - aci.blbl.cr/aci-go-synapse:15-40 - aci.blbl.cr/aci-redis-dictator:20 - aci.blbl.cr/aci-go-nerve:21-27 - aci.blbl.cr/aci-go-nerve:21-27 - aci.blbl.cr/aci-logshipper:27 - aci.blbl.cr/aci-prometheus-redis-exporter:0.12.2-1 nodes: nodes: - hostname: trip-meeting-point1 - hostname: redis-meeting-point1 gelf: fleet: level: INFO - MachineMetadata=rack=110 fleet: - Conflicts=*redis-meeting-point* - MachineMetadata=rack=110 - Conflicts=*trip-meeting-point* - hostname: redis-meeting-point2 - hostname: trip-meeting-point2 fleet: fleet: - MachineMetadata=rack=210 - MachineMetadata=rack=210 - Conflicts=*redis-meeting-point* - Conflicts=*trip-meeting-point* - hostname: trip-meeting-point3 - hostname: redis-meeting-point3 fleet: fleet: - MachineMetadata=rack=310 - MachineMetadata=rack=310 - Conflicts=*trip-meeting-point* - Conflicts=*redis-meeting-point* ggn prod-dc1 trip-meeting-point update -y ggn prod-dc1 redis-meeting-point update -y
Volatile by design trip-meeting-point dependencies trip-meeting-point cat ./prod-dc1/services/trip-meeting-point/service-manifest.yml --- containers: aci-trip-meeting-point aci-go-synapse aci-go-nerve aci-logshipper - aci.blbl.cr/aci-trip-meeting-point:20180928.145115-v-979da34 - aci.blbl.cr/aci-go-synapse:15-41 - aci.blbl.cr/aci-go-nerve:21-27 - aci.blbl.cr/aci-logshipper:27 [...] cat ./aci-trip-meeting-point/aci-manifest.yml --- name: aci.blbl.cr/aci-trip-meeting-point:{{.version}} aci-java aci-hindsight aci: dependencies: - aci.blbl.cr/aci-java:1.8.181-2 [...] cat ./aci-java/aci-manifest.yml --- name: aci.blbl.cr/aci-java:1.8.181-2 aci: dependencies: aci-debian aci-common - aci.blbl.cr/aci-debian:9.5-9 - aci.blbl.cr/aci-common:7
Volatile - When should I redeploy? A change in my own app/container: “immutable” A change on a sidecar container or its dependencies Noisy neighbours: “mutualization” When you are ready for instability your are HA
How?
Infrastructure Ecosystem front1 php nerve mysql-main1 nginx Service Discovery zookeeper mysqld create monitoring Service Codebase synapse monitoring nerve dgr build nerve synapse store Container Registry rkt PODs etcd run fleet cluster “Distributed init system” ggn 1 type of hardware bare-metal servers Hardware host CoreOS 3 disk profiles
Infrastructure Ecosystem front1 php nerve mysql-main1 nginx Service Discovery zookeeper mysqld create monitoring Service Codebase synapse monitoring nerve dgr build nerve synapse store Container Registry rkt PODs etcd run fleet kubernetes “Distributed init system” ggn helm 1 type of bare-metal servers Hardware host hardware CoreOS 3 disk profiles
Service Discovery Zookeeper backend pod client pod go-synapse go-nerve /database/node1 /database go-synapse watches zookeeper service keys and reloads haproxy if changes are go-nerve does health checks detected and reports to zookeeper in service keys HAProxy node1 Applications hit their local haproxy to access backends
Stateful Services into containers MariaDB as an example
“Stateful” and “volatile by design”?
The recipe/prereqs/pillars to succeed: Abolish Slavery Be Quiet! Build Smart “For a given service, “A node should be able “Services can be every node have the to restart without operate by any SRE” same role” impacting the app”
MariaDB as an example
Abolish Slavery “For a given service, every node have the same role”
Asynchronous vs. Synchronous MariaDB Cluster means Master No Single Point of Failure Slave Slave Slave No Replication Lag Auto States Transfers wsrep wsrep wsrep wsrep wsrep As fast as the slowest MariaDB Cluster
The Target MariaDB Cluster Writes Reads Containers Writes go on one wsrep wsrep wsrep wsrep wsrep node MariaDB Cluster Reads are balanced on the others
How to hit the target? Service Discovery
Nerve - Track and report service status # cat env/prod-dc1/services/mysql-main/attributes/nerve.yml # zookeepercli -c lsr /services/mysql/main --- mysql-main1_192.168.1.2_ba0f1f8b override: mysql-main2_192.168.1.3_734d63da nerve: mysql-main3_192.168.1.4_dde45787 services: # zookeepercli -c get - name: "mysql-main" /services/mysql/main/mysql-main1_192.168.1.2_ba0f1f8b3 port: 3306 { reporters: "available":true, - {type: zookeeper, path: /services/mysql/main} "host":"192.168.1.2", checks: "port":3306, - type: sql "name":"mysql-main1", driver: mysql "weight":255, datasource: "local_mon:local_mon@tcp(127.0.0.1:3306)/" "labels":{ "host":"r10-srv4" } }
Synapse - Service discovery router # cat env/prod-dc1/services/tripsearch/attributes/synapse.yml # cat env/prod-dc1/services/tripsearch/attributes/tripsearch.yml --- —- override: override: synapse: tripsearch: services: database: - name: mysql-main_read read: path: /services/mysql/main host: localhaproxy port: 3307 database: tripsearch - name: mysql-main_write user: tripsearch_rd path: /services/mysql/main port: 3307 port: 3308 write: serverOptions: backup host: localhaproxy serverSort: date database: tripsearch user: tripsearch_wr port: 3308
Be Quiet! “A node should be able to restart without impacting the app”
Nerve - “Readiness Probe” # cat env/prod-dc1/services/mysql-main/attributes/nerve.yml Starting Pod mysql-main1 --- override: nerve: Nerve check is KO services: - name: "mysql-main" port: 3306 Starting MySQL reporters: - {type: zookeeper, path: /services/mysql/main} checks: Nerve check is KO - type: sql driver: mysql request: "SELECT 1" MySQL is syncing (IST/SST) datasource: "local_mon:local_mon@tcp(127.0.0.1:3306)/" Nerve check is KO MySQL is ready Nerve check is OK mysql -h 127.0.0.1 -ulocal_mon -plocal_mon -p3306 -e ‘SELECT 1;’
Nerve - “Grace Period” Stop Pod # cat env/prod-dc1/services/mysql-main/attributes/nerve.yml --- override: nerve: services: - name: "mysql-main" Call /disable on Nerve’s API port: 3306 reporters: - {type: zookeeper, path: /services/mysql/main} Set weight to 0 = no more new checks: - type: sql sessions will go into the services. driver: mysql datasource: "local_mon:local_mon@tcp(127.0.0.1:3306)/" disableCommand: "/report_remaining_processes.sh" disableMaxDurationInMilli: 180000 Wait The remaining sessions are finishing their job SELECT COUNT(1) FROM processlist WHERE user LIKE 'app_%'; Pod Stopped The service can be shutdown without risk.
Build Smart “Services can be operate by any SRE”
Example: Use Service Discovery to find peers
Use Service Discovery to find peers Eg: the wsrep_cluster_address attribute in Galera Cluster Description: The addresses of cluster nodes to connect to when starting up. Good practice is to specify all possible cluster nodes, in the form gcomm://<node1 or ip:port>,<node2 or ip2:port>,<node3 or ip3:port>. Specifying an empty ip (gcomm://) will cause the node to start a new cluster. Ask the Service Discovery to find mysql-main peers ? mysql-main No peer found! node1, node2 node1 node1 node2 node3 node1 node2 node3 wsrep_cluster_address = gcomm://node1 wsrep_cluster_address = gcomm:// wsrep_cluster_address = gcomm://node1,node2
Recommend
More recommend