how we scaled push messaging for millions of netflix
play

How we scaled push messaging for millions of Netflix devices - PowerPoint PPT Presentation

How we scaled push messaging for millions of Netflix devices Susheel Aroskar Cloud Gateway Why do we need push? How I spend my time in Netflix application... What is push? What is push? How you can build it What is push?


  1. How we scaled push messaging for millions of Netflix devices Susheel Aroskar Cloud Gateway

  2. Why do we need push?

  3. How I spend my time in Netflix application...

  4. ● What is push?

  5. ● What is push? ● How you can build it

  6. ● What is push? ● How you can build it ● How you can operate it

  7. ● What is push? ● How you can build it ● How you can operate it ● What can you do with it

  8. Susheel Aroskar Senior Software Engineer Cloud Gateway saroskar@netflix.com github.com/raksoras @susheelaroskar

  9. P ERSIST U NTIL S OMETHING H APPENS

  10. P ERSIST U NTIL S OMETHING H APPENS

  11. Zuul Push Architecture

  12. Zuul Push Servers

  13. Zuul Push Servers WebSockets / SSE

  14. Push Registry Zuul Push Servers Register User WebSockets / SSE

  15. Push Registry Zuul Push Servers Register User WebSockets / SSE

  16. Push Library Push Registry Zuul Push Servers Register User WebSockets / SSE

  17. Push Library Push Registry Zuul Push Servers Push Message Register Queue User WebSockets / SSE

  18. Message Push Library Processor Push Registry Zuul Push Servers Push Message Register Queue User WebSockets / SSE

  19. Message Push Library Processor Push Registry Zuul Push Servers Push Message Register Queue User WebSockets / SSE

  20. Message Push Library Lookup server Processor Push Registry Zuul Push Servers Push Message Register Queue User WebSockets / SSE

  21. Message Push Library Lookup server Processor Push Registry Zuul Push Servers Push Message Register Deliver Queue User message WebSockets / SSE

  22. Zuul Push server Handling millions of persistent connections

  23. C10K challenge

  24. Thread per Connection Socket Socket Read Write Write Read Thread-1 Thread-2

  25. Async I/O Thread per Connection read Socket Socket callback Socket write callback Read Single Write read Thread Write callback Socket write Read callback Thread-1 Thread-2

  26. Netty Channel Channel Inbound Inbound S Handler Handler O C Channel Pipeline K Channel Channel E Outbound Outbound T Handler Handler Tail Head

  27. protected void addPushHandlers(ChannelPipeline pl) { pl.addLast(new HttpServerCodec()); pl.addLast(new HttpObjectAggregator()); pl.addLast( getPushAuthHandler ()); pl.addLast(new WebSocketServerCompressionHandler()); pl.addLast(new WebSocketServerProtocolHandler()); pl.addLast( getPushRegistrationHandler ()); }

  28. Plug in your custom authentication policy Authenticate by Cookies, JWT or any other custom scheme

  29. Push Registry Tracking clients’ connection Metadata in real-time

  30. public class MyRegistration extends PushRegistrationHandler { @Override protected void registerClient( ChannelHandlerContext ctx, PushUserAuth auth, PushConnection conn, PushConnectionRegistry registry) { super.registerClient(ctx, authEvent, conn, registry); ctx.executor().submit(() -> storeInRedis ( auth )); } }

  31. Push registry features checklist

  32. Push registry features checklist ● Low read latency

  33. Push registry features checklist ● Low read latency ● Record expiry

  34. Push registry features checklist ● Low read latency ● Record expiry ● Sharding

  35. Push registry features checklist ● Low read latency ● Record expiry ● Sharding ● Replication

  36. What we use Redis + Auto-sharding + Read/Write quorum + Cross-region replication Dynomite https://github.com/Netflix/dynomite

  37. Message Processing Queue, Route Deliver

  38. We use Kafka message queues to decouple message senders from receivers

  39. Fire and Forget

  40. Cross-region Replication

  41. Different queues for different priorities

  42. We run multiple message processor instances in parallel to scale our message processing throughput.

  43. Operating Zuul Push Different than REST of them

  44. Persistent connections make Zuul Push server stateful Long lived stable connections

  45. Persistent connections make Zuul Push server stateful Long lived stable connections ○ Great for client efficiency

  46. Persistent connections make Zuul Push server stateful Long lived stable connections ○ Great for client efficiency ○ Terrible for quick deploy/rollback

  47. If you love your clients set them free... Tear down connections periodically

  48. Randomize each connection’s lifetime

  49. # reconnects Effect of randomizing connection lifetime on reconnect peaks Time

  50. Ask client to close its connection.

  51. How to optimize push server Most connections are idle!

  52. BIG Server, tons of connections ulimit -n 262144 net.ipv4.tcp_rmem="4096 87380 16777216" net.ipv4.tcp_wmem="4096 87380 16777216"

  53. Goldilocks strategy

  54. Optimize for cost, NOT instance count ✓ ❌ $$ $$

  55. How to auto-scale?

  56. How to auto-scale? RPS? CPU??

  57. How to auto-scale? RPS? CPU?? Open Connections

  58. Amazon Elastic Load Balancers cannot proxy WebSockets.

  59. Solution - Run ELB as a TCP load balancer 7 Application Layer 7 HTTP 6 Presentation (WebSocket Upgrade HTTP Request) 5 Session 4 Transport TCP Layer 4 TCP 3 Network IP 2 Data link Ethernet 1 Physical OSI 7 network layers HTTP over TCP/IP (conceptual)

  60. Managing push cluster - a quick recap ● Recycle connections after tens of minutes

  61. Managing push cluster - a quick recap ● Recycle connections after tens of minutes ● Randomize each connection’s lifetime

  62. Managing push cluster - a quick recap ● Recycle connections after tens of minutes ● Randomize connection’s lifetime ● More number of smaller servers >> few BIG servers

  63. Managing push cluster - a quick recap ● Recycle connections after tens of minutes ● Randomize connection’s lifetime ● More number of smaller servers >> few BIG servers ● Auto-scale on number of open connections per box

  64. Managing push cluster - a quick recap ● Recycle connections after tens of minutes ● Randomize connection’s lifetime ● More number of smaller servers >> few BIG servers ● Auto-scale on number of open connections per box ● WebSocket aware vs TCP load balancer

  65. If you build it, They will push

  66. On-demand diagnostics

  67. Remote recovery

  68. User messaging

  69. WHAT WILL YOU USE IT FOR?

  70. Call to action

  71. PULL!

  72. PULL! https://github.com/Netflix/zuul

  73. In conclusion, push can make you

  74. In conclusion, push can make you rich (in functionality),

  75. In conclusion, push can make you rich (in functionality), thin (by getting rid of polling)

  76. In conclusion, push can make you rich (in functionality), thin (by getting rid of polling) and happy!

  77. Thank you.

  78. Questions? Susheel Aroskar Senior Software Engineer Cloud Gateway saroskar@netflix.com github.com/raksoras @susheelaroskar

  79. More Rich, efficient exciting systems Apps Battle tested Zuul Push Easy to Easy to operate customize

Recommend


More recommend