shopify s architecture to handle 80k rps celebrity sales
play

Shopifys Architecture to Handle 80K RPS Celebrity Sales Simon - PowerPoint PPT Presentation

Shopifys Architecture to Handle 80K RPS Celebrity Sales Simon Eskildsen @Sirupsen Production Engineering Lead, Shopify Shopify is handling some of the largest sales in the world from Kylie Jenner, Kanye, Superbowl, and others We


  1. Shopify’s Architecture to Handle 80K RPS Celebrity Sales Simon Eskildsen – @Sirupsen Production Engineering Lead, Shopify

  2. Shopify is handling some of the largest sales in the world from Kylie Jenner, Kanye, Superbowl, and others

  3. “We learned to absorb these shocks and become stronger as a result. [..] The school of hard knocks has taught us well.” — Tobi Lütke, CEO in internal essay on why we optimize for f lash sales

  4. $5.8B 500K Processed Q2, 2017 Merchants powered 40+ 80K Daily deploys Peak RPS 2000+ Rails Employees Ruby on Rails since 2006

  5. Tra fg ic Application Application Data Data Region A Region B

  6. Tra fg ic Application Application Data Data Region A Region B

  7. Tra fg ic • Global Routing • Openresty • Bots • Cache hits • Checkout Throttling

  8. walrusser.myshopify.com 23.227.38.64 ISP ISP ISP ISP Region B ISP ISP ISP ISP BGP ANNOUNCE 23.227.38.0/24 ISP ISP Region A BGP ANNOUNCE 23.227.38.0/24

  9. OpenResty allows Nginx with OpenResty Lua scripting of your load Rule Banner balancers, it’s been Ka f ka Logging one of the most impactful additions Edgecache to our stack in recent memory Checkout Throttle https://github.com/openresty/openresty

  10. worker_processes 1; error_log logs/error.log; events { worker_connections 1024; } http { server { listen 8080; location / { default_type text/html; content_by_lua ' ngx.say("<p>hello, world</p>") '; } } }

  11. Nginx with OpenResty Bot squasher Rule Banner Ka f ka Logger analyzes the Ka f ka stream of incoming requests to ban BAN POST /checkout 23.227.38.178 bots with a rule banner module Bot Squasher Ka f ka

  12. GET /collections/walruses Nginx with OpenResty Edgecache can serve full page Edgecache cache hits out of the load-balancers in microseconds HIT MISS FILL Memcached Web Process

  13. GET /checkout /wait_area /checkout Checkout Throttle Nginx with OpenResty throttles the number of Checkout Throttle customers in the processing heavy checkout path Throttle Queue

  14. Tra fg ic Application Application Data Data Region A Region B

  15. Pod is an isolated unit of one or more shops

  16. Pod 14 Pod 2 Pod 7 p 8 8 s h o o p 2 2 s h Data in Region A 4 2 s h o p 6 s h o p 7 p 1 8 p 1 7 s h o s h o 2 3 9 2 9 s h o p s h o p s h o p h o 5 2 o p 7 2 h o p 4 s s h s p 0 s h o p 3 p 1 s h o s h o

  17. Pod 7 Redis Memcache MySQL Cron Pod 2 Redis Memcache MySQL Cron Pod 14 Memcache Redis MySQL Cron Each Pod in Region A

  18. Pod 7 Memcache Redis MySQL Cron Pod 2 Memcache Redis MySQL Cron Pod 14 Memcache Redis MySQL Cron Shared Workers

  19. Pod 7 Memcache Redis MySQL Cron Pod 2 Memcache Redis MySQL Cron Pod 14 Memcache Redis MySQL Cron Shared Load Balancing

  20. Genghis is our load-testing tool to test scale

  21. Pod Balancer balances shops between pods with minimal downtime to keep load and size even

  22. Pod 14 Pod 2 Pod 7 p 8 8 s h o o p 2 2 s h 4 2 s h o p 6 s h o p 7 p 1 8 p 1 7 s h o s h o 2 3 9 2 9 s h o p s h o p s h o p o p 5 2 o p 7 2 h o p 4 s h s h s p 0 s h o p 3 p 1 s h o s h o Pod Balancer

  23. Pod 14 Pod 2 Pod 7 s h o p 7 2 4 s h o p 6 o p 8 8 s h p 2 2 p 1 8 p 1 7 s h o s h o s h o 2 3 9 2 9 s h o p s h o p s h o p o p 5 2 o p 7 2 h o p 4 s h s h s p 0 s h o p 3 p 1 s h o s h o Pod Balancer

  24. Pod 14 Pod 2 Pod 7 p 9 1 7 9 8 s h o s h o p s h o p 4 s h o p 6 o p 8 8 s h p 2 2 p 1 8 2 s h o s h o s h o p 7 2 3 9 2 s h o p s h o p o p 5 2 o p 7 2 h o p 4 s h s h s p 0 s h o p 3 p 1 s h o s h o Pod Balancer

  25. Pod 14 Pod 2 Pod 7 p 1 0 0 s h o 9 s h o p 9 p 1 7 s h o p 9 9 8 s h o s h o p 4 s h o p 6 o p 8 8 s h p 2 2 p 1 8 2 s h o s h o s h o p 7 2 3 9 2 s h o p s h o p o p 5 2 o p 7 2 h o p 4 s h s h s p 0 s h o p 3 p 1 s h o s h o Pod Balancer

  26. Pod 74 Pod 14 Pod 2 Pod 7 0 0 s h o p 1 o p 9 9 s h 7 s h o p 1 s h o p 9 h o p 9 8 s h o p 6 4 8 8 s s h o p 2 8 s h o p 2 s h o p 1 o p 7 2 s h h o p 2 3 h o p 9 2 s s 5 2 7 2 p 4 s h o p s h o p s h o o p 3 s h o p 0 s h s h o p 1 Pod Balancer

  27. Pod 74 Pod 14 Pod 2 Pod 7 1 0 0 s h o p o p 9 9 s h 8 s h o p 9 s h o p 5 2 o p 1 7 s h s h o p 9 o p 0 s h 8 8 p 6 4 s h o p s h o 2 s h o p 7 h o p 7 2 s 2 s h o p 2 h o p 1 8 s 3 s h o p h o p 2 3 s o p 9 2 s h 4 s h o p s h o p 1 Pod Balancer

  28. Source Target Redis MySQL Redis MySQL Pod 9 Pod 23 COPY SHOP SELECT * FROM products WHERE shop_id = 38493 SELECT * from orders WHERE shop_id = 38493

  29. NEW CHECKOUT INSERT INTO CHECKOUTS … Source Target Redis MySQL Redis MySQL Pod 9 Pod 23 COPY SHOP SELECT * FROM products WHERE shop_id = 38493 SELECT * from orders WHERE shop_id = 38493

  30. Bin Log REPLICATE SHOP_ID 238 CHECKOUT id: 383293 Source Target Redis MySQL Redis MySQL Pod 9 Pod 23 COPY SHOP_ID 238 SELECT * FROM products WHERE shop_id = 238 SELECT * from orders WHERE shop_id = 238

  31. LOCK SHOP_ID 238 Source Target Redis MySQL Redis MySQL Pod 9 Pod 23 UPDATE SHOP_ID 238 pod_id=23 Routing

  32. Tra fg ic Application Application Data Data Region A Region B

  33. Sorting Hat routes requests for a shop to the region the pod is active in

  34. GET /products Host: sneakershop.com Tra fg ic ROUTE sneakershop.com Routing Sorting Hat shop238 pod2:B Pod 7 Pod 7 Active Inactive Pod 2 Pod 2 Inactive Active Pod 14 Pod 14 Pod 14 Active Inactive Region A Region B

  35. Tra fg ic Application Application Data Data Region A Region B

  36. Pod Mover moves pods between regions with minimal downtime

  37. Tra fg ic Sorting Hat Pod 7 Pod 7 Active Inactive Pod 2 Pod 2 Inactive Active Pod 2 Pod 14 Pod 14 Pod 14 Active Inactive Region A Region B

  38. Tra fg ic Sorting Hat Pod 7 Pod 7 Active Inactive Pod 2 Pod 2 Active Inactive Pod 2 Pod 14 Pod 14 Pod 14 Active Inactive Region A Region B

  39. Disable cron in both regions Update Routing for pod to target region pod2:b -> pod2:a Sorting Hat routes requests to target region Fail over MySQL to target region Enable cron in both regions Transfer jobs to target region

  40. What about errors while the database fails over?

  41. POST /checkout HTTP 200 (during failover) (seconds later) Nginx with OpenResty Pauser will pause requests in the Pauser middle of failovers to avoid serving errors Throttle Queue

  42. Disable cron in both regions Update Routing for pod to target region pod2:b -> pod2:a Sorting Hat routes requests to target region and pause requests Fail over MySQL to target region Resume requests Enable cron in both regions Transfer jobs to target region

  43. Cloud Migration with the Pods Architecture

  44. Cloud Region C p 8 8 s h o o p 2 2 s h 4 2 s h o p 6 s h o p 7 Region A p 1 8 p 1 7 s h o s h o 2 3 9 2 9 s h o p s h o p s h o p h o 5 2 o p 7 2 h o p 4 s s h s p 0 s h o p 3 p 1 s h o s h o

  45. Thanks! @Sirupsen

Recommend


More recommend