andrew godwin
play

Andrew Godwin Django core developer Senior Software Engineer at - PowerPoint PPT Presentation

Hi, I'm Andrew Godwin Django core developer Senior Software Engineer at Used to complain about migrations a lot Distributed Systems c = 299,792,458 m/s Early CPUs c = 60m propagation distance 5 MHz ~2cm Clock Modern CPUs c = 10cm


  1. Hi, I'm Andrew Godwin Django core developer Senior Software Engineer at Used to complain about migrations a lot

  2. Distributed Systems

  3. c = 299,792,458 m/s

  4. Early CPUs c = 60m propagation distance 5 MHz ~2cm Clock

  5. Modern CPUs c = 10cm propagation distance 3 GHz

  6. Distributed systems are made of independent components

  7. They are slower and harder to write than synchronous systems

  8. But they can be scaled up much, much further

  9. Trade-offs

  10. There is never a perfect solution.

  11. Fast Cheap Good

  12. Load Balancer WSGI WSGI WSGI Worker Worker Worker

  13. Load Balancer WSGI WSGI WSGI Worker Worker Worker Cache

  14. Load Balancer WSGI WSGI WSGI Worker Worker Worker Cache Cache Cache

  15. Load Balancer WSGI WSGI WSGI Worker Worker Worker Database

  16. CAP Theorem

  17. Partition Tolerant Available Consistent

  18. PostgreSQL: CP Consistent everywhere Handles network latency/drops Can't write if main server is down

  19. Cassandra: AP Can read/write to any node Handles network latency/drops Data can be inconsistent

  20. It's hard to design a product that might be inconsistent

  21. But if you take the tradeoff, scaling is easy

  22. Otherwise, you must fi nd other solutions

  23. Read Replicas (often called master/slave) Load Balancer WSGI WSGI WSGI Worker Worker Worker Replica Replica Main

  24. Replicas scale reads forever... But writes must go to one place

  25. If a request writes to a table it must be pinned there, so later reads do not get old data

  26. When your write load is too high, you must then shard

  27. Vertical Sharding Users Tickets Events Payments

  28. Horizontal Sharding Users Users Users Users 0 - 2 3 - 5 6 - 8 9 - A

  29. Both Users Users Users Users 0 - 2 3 - 5 6 - 8 9 - A Events Events Events Events 0 - 2 3 - 5 6 - 8 9 - A Tickets Tickets Tickets Tickets 0 - 2 3 - 5 6 - 8 9 - A

  30. Both plus caching Users Users Users Users User 0 - 2 3 - 5 6 - 8 9 - A Cache Events Events Events Events Event 0 - 2 3 - 5 6 - 8 9 - A Cache Tickets Tickets Tickets Tickets Ticket 0 - 2 3 - 5 6 - 8 9 - A Cache

  31. Teams have to scale too; nobody should have to understand eveything in a big system.

  32. Services allow complexity to be reduced - for a tradeoff of speed

  33. User Service Users Users Users Users User 0 - 2 3 - 5 6 - 8 9 - A Cache Event Service Events Events Events Events Event 0 - 2 3 - 5 6 - 8 9 - A Cache Ticket Service Tickets Tickets Tickets Tickets Ticket 0 - 2 3 - 5 6 - 8 9 - A Cache

  34. User Service WSGI Server Event Service Ticket Service

  35. Each service is its own, smaller project, managed and scaled separately.

  36. But how do you communicate between them?

  37. Direct Communication Service 1 Service 3 Service 2

  38. Service 5 Service 1 Service 4 Service 3 Service 2

  39. Service 7 Service 8 Service 6 Service 1 Service 5 Service 2 Service 4 Service 3

  40. Message Bus Service 1 Service 2 Service 3 Service 1 Service 2 Service 3

  41. A single point of failure is not always bad - if the alternative is multiple, fragile ones

  42. Channels and ASGI provide a standard message bus built with certain tradeoffs

  43. Django Channels Library Django Channels Project ASGI (Channel Layer) Backing Store e.g. Redis, RabbitMQ

  44. Pure Python ASGI (Channel Layer) Backing Store e.g. Redis, RabbitMQ

  45. Failure Mode At most once Messages either do not arrive, or arrive once. At least once Messages arrive once, or arrive multiple times

  46. Guarantees vs. Latency Low latency Messages arrive very quickly but go missing more Low loss rate Messages are almost never lost but arrive slower

  47. Queuing Type First In First Out Consistent performance for all users First In Last Out Hides backlogs but makes them worse

  48. Queue Sizing Finite Queues Sending can fail In fi nite queues Makes problems even worse

  49. You must understand what you are making (This is surprisingly uncommon)

  50. Design as much as possible around shared-nothing

  51. Per-machine caches On-demand thumbnailing Signed cookie sessions

  52. Has to be shared? Try to split it

  53. Has to be shared? Try sharding it.

  54. Django's job is to be slowly replaced by your code

  55. Just make sure you match the API contract of what you're replacing!

  56. Don't try to scale too early; you'll pick the wrong tradeoffs.

  57. Thanks. Andrew Godwin @andrewgodwin channels.readthedocs.io

Recommend


More recommend