Shopify’s Architecture to Handle 80K RPS Celebrity Sales Simon Eskildsen – @Sirupsen Production Engineering Lead, Shopify
Shopify is handling some of the largest sales in the world from Kylie Jenner, Kanye, Superbowl, and others
“We learned to absorb these shocks and become stronger as a result. [..] The school of hard knocks has taught us well.” — Tobi Lütke, CEO in internal essay on why we optimize for f lash sales
$5.8B 500K Processed Q2, 2017 Merchants powered 40+ 80K Daily deploys Peak RPS 2000+ Rails Employees Ruby on Rails since 2006
Tra fg ic Application Application Data Data Region A Region B
Tra fg ic Application Application Data Data Region A Region B
Tra fg ic • Global Routing • Openresty • Bots • Cache hits • Checkout Throttling
walrusser.myshopify.com 23.227.38.64 ISP ISP ISP ISP Region B ISP ISP ISP ISP BGP ANNOUNCE 23.227.38.0/24 ISP ISP Region A BGP ANNOUNCE 23.227.38.0/24
OpenResty allows Nginx with OpenResty Lua scripting of your load Rule Banner balancers, it’s been Ka f ka Logging one of the most impactful additions Edgecache to our stack in recent memory Checkout Throttle https://github.com/openresty/openresty
worker_processes 1; error_log logs/error.log; events { worker_connections 1024; } http { server { listen 8080; location / { default_type text/html; content_by_lua ' ngx.say("<p>hello, world</p>") '; } } }
Nginx with OpenResty Bot squasher Rule Banner Ka f ka Logger analyzes the Ka f ka stream of incoming requests to ban BAN POST /checkout 23.227.38.178 bots with a rule banner module Bot Squasher Ka f ka
GET /collections/walruses Nginx with OpenResty Edgecache can serve full page Edgecache cache hits out of the load-balancers in microseconds HIT MISS FILL Memcached Web Process
GET /checkout /wait_area /checkout Checkout Throttle Nginx with OpenResty throttles the number of Checkout Throttle customers in the processing heavy checkout path Throttle Queue
Tra fg ic Application Application Data Data Region A Region B
Pod is an isolated unit of one or more shops
Pod 14 Pod 2 Pod 7 p 8 8 s h o o p 2 2 s h Data in Region A 4 2 s h o p 6 s h o p 7 p 1 8 p 1 7 s h o s h o 2 3 9 2 9 s h o p s h o p s h o p h o 5 2 o p 7 2 h o p 4 s s h s p 0 s h o p 3 p 1 s h o s h o
Pod 7 Redis Memcache MySQL Cron Pod 2 Redis Memcache MySQL Cron Pod 14 Memcache Redis MySQL Cron Each Pod in Region A
Pod 7 Memcache Redis MySQL Cron Pod 2 Memcache Redis MySQL Cron Pod 14 Memcache Redis MySQL Cron Shared Workers
Pod 7 Memcache Redis MySQL Cron Pod 2 Memcache Redis MySQL Cron Pod 14 Memcache Redis MySQL Cron Shared Load Balancing
Genghis is our load-testing tool to test scale
Pod Balancer balances shops between pods with minimal downtime to keep load and size even
Pod 14 Pod 2 Pod 7 p 8 8 s h o o p 2 2 s h 4 2 s h o p 6 s h o p 7 p 1 8 p 1 7 s h o s h o 2 3 9 2 9 s h o p s h o p s h o p o p 5 2 o p 7 2 h o p 4 s h s h s p 0 s h o p 3 p 1 s h o s h o Pod Balancer
Pod 14 Pod 2 Pod 7 s h o p 7 2 4 s h o p 6 o p 8 8 s h p 2 2 p 1 8 p 1 7 s h o s h o s h o 2 3 9 2 9 s h o p s h o p s h o p o p 5 2 o p 7 2 h o p 4 s h s h s p 0 s h o p 3 p 1 s h o s h o Pod Balancer
Pod 14 Pod 2 Pod 7 p 9 1 7 9 8 s h o s h o p s h o p 4 s h o p 6 o p 8 8 s h p 2 2 p 1 8 2 s h o s h o s h o p 7 2 3 9 2 s h o p s h o p o p 5 2 o p 7 2 h o p 4 s h s h s p 0 s h o p 3 p 1 s h o s h o Pod Balancer
Pod 14 Pod 2 Pod 7 p 1 0 0 s h o 9 s h o p 9 p 1 7 s h o p 9 9 8 s h o s h o p 4 s h o p 6 o p 8 8 s h p 2 2 p 1 8 2 s h o s h o s h o p 7 2 3 9 2 s h o p s h o p o p 5 2 o p 7 2 h o p 4 s h s h s p 0 s h o p 3 p 1 s h o s h o Pod Balancer
Pod 74 Pod 14 Pod 2 Pod 7 0 0 s h o p 1 o p 9 9 s h 7 s h o p 1 s h o p 9 h o p 9 8 s h o p 6 4 8 8 s s h o p 2 8 s h o p 2 s h o p 1 o p 7 2 s h h o p 2 3 h o p 9 2 s s 5 2 7 2 p 4 s h o p s h o p s h o o p 3 s h o p 0 s h s h o p 1 Pod Balancer
Pod 74 Pod 14 Pod 2 Pod 7 1 0 0 s h o p o p 9 9 s h 8 s h o p 9 s h o p 5 2 o p 1 7 s h s h o p 9 o p 0 s h 8 8 p 6 4 s h o p s h o 2 s h o p 7 h o p 7 2 s 2 s h o p 2 h o p 1 8 s 3 s h o p h o p 2 3 s o p 9 2 s h 4 s h o p s h o p 1 Pod Balancer
Source Target Redis MySQL Redis MySQL Pod 9 Pod 23 COPY SHOP SELECT * FROM products WHERE shop_id = 38493 SELECT * from orders WHERE shop_id = 38493
NEW CHECKOUT INSERT INTO CHECKOUTS … Source Target Redis MySQL Redis MySQL Pod 9 Pod 23 COPY SHOP SELECT * FROM products WHERE shop_id = 38493 SELECT * from orders WHERE shop_id = 38493
Bin Log REPLICATE SHOP_ID 238 CHECKOUT id: 383293 Source Target Redis MySQL Redis MySQL Pod 9 Pod 23 COPY SHOP_ID 238 SELECT * FROM products WHERE shop_id = 238 SELECT * from orders WHERE shop_id = 238
LOCK SHOP_ID 238 Source Target Redis MySQL Redis MySQL Pod 9 Pod 23 UPDATE SHOP_ID 238 pod_id=23 Routing
Tra fg ic Application Application Data Data Region A Region B
Sorting Hat routes requests for a shop to the region the pod is active in
GET /products Host: sneakershop.com Tra fg ic ROUTE sneakershop.com Routing Sorting Hat shop238 pod2:B Pod 7 Pod 7 Active Inactive Pod 2 Pod 2 Inactive Active Pod 14 Pod 14 Pod 14 Active Inactive Region A Region B
Tra fg ic Application Application Data Data Region A Region B
Pod Mover moves pods between regions with minimal downtime
Tra fg ic Sorting Hat Pod 7 Pod 7 Active Inactive Pod 2 Pod 2 Inactive Active Pod 2 Pod 14 Pod 14 Pod 14 Active Inactive Region A Region B
Tra fg ic Sorting Hat Pod 7 Pod 7 Active Inactive Pod 2 Pod 2 Active Inactive Pod 2 Pod 14 Pod 14 Pod 14 Active Inactive Region A Region B
Disable cron in both regions Update Routing for pod to target region pod2:b -> pod2:a Sorting Hat routes requests to target region Fail over MySQL to target region Enable cron in both regions Transfer jobs to target region
What about errors while the database fails over?
POST /checkout HTTP 200 (during failover) (seconds later) Nginx with OpenResty Pauser will pause requests in the Pauser middle of failovers to avoid serving errors Throttle Queue
Disable cron in both regions Update Routing for pod to target region pod2:b -> pod2:a Sorting Hat routes requests to target region and pause requests Fail over MySQL to target region Resume requests Enable cron in both regions Transfer jobs to target region
Cloud Migration with the Pods Architecture
Cloud Region C p 8 8 s h o o p 2 2 s h 4 2 s h o p 6 s h o p 7 Region A p 1 8 p 1 7 s h o s h o 2 3 9 2 9 s h o p s h o p s h o p h o 5 2 o p 7 2 h o p 4 s s h s p 0 s h o p 3 p 1 s h o s h o
Thanks! @Sirupsen
Recommend
More recommend