scaling uber with node js
play

Scaling Uber with Node.js Amos Barreto @amos_barreto Uber is - PowerPoint PPT Presentation

Scaling Uber with Node.js Amos Barreto @amos_barreto Uber is everyones Private driver. REQUEST RIDE RATE Tap to select location Sit back and relax, tell your Help us maintain a quality service driver your destination by


  1. Scaling Uber with Node.js Amos Barreto @amos_barreto

  2. Uber is everyone’s Private driver. REQUEST � RIDE � RATE � Tap to select location Sit back and relax, tell your 
 Help us maintain a quality service � driver your destination by rating your experience

  3. YOUR DRIVERS 4

  4. Your Drivers UBER QUALIFIED RIDER RATED LICENSED & INSURED Uber only partners with drivers Tell us what you think. Your From insurance to background who have a keen eye for feedback helps us work with checks, every driver meets or customer service and a drivers to constantly improve beats local regulations. passion for the trade. the Uber experience. 19

  5. LOGISTICS 4

  6. #OMGUBERICECREAM 22

  7. UberChopper #OMGUBERCHOPPER 22

  8. #UBERVALENTINES 22

  9. #ICANHASUBERKITTENS 22

  10. Trip State Machine (Simplified) Request Dispatch Accept Arrive End Begin 6

  11. Trip State Machine (Extended) Expire / Request Dispatch (1) Reject Dispatch (2) Accept Arrive End Begin 6

  12. OUR STORY 4

  13. Version 1 • PHP dispatch 
 • Outsourced to remote PHP contractors in Midwest 
 • Half the code in spanish 
 Cron • Flat file � • Lifetime: 6-9 months 6

  14. 33

  15. “I read an article on HackerNews about a new framework called Node.js” 
 � � � � Jason Roberts �

  16. Tradeoffs • Learning curve • Database drivers � � • Scalability • Documentation � � � • Performance • Monitoring � � � • Library ecosystem • Production operations �

  17. Version 2 • Lifetime: 9 months � • Developed in house Node.js � • Node.js application • Prototyped on 0.2 • Launched in production with 0.4 � • MongoDB datastore

  18. “I really don’t see dispatch changing much in the next three years” 33

  19. Expect the unexpected 15

  20. Version 3 • Mongo did not scale with volume of GPS logs (global CN CN CN write lock) CN � • Swapped mongo for redis and flat files SF NYC SEA CHI

  21. Decoupling storage of different types of data

  22. Version 3 (continued) • Node.js mongo client failed to recognize replica set CN CN CN topology changes CN SF NYC SEA CHI

  23. Be wary of immature client libraries

  24. Commits to client modules over time

  25. Version 3 (continued) SF NYC SEA CHI BOS PAR

  26. Focus on driving business value 15

  27. 15

  28. 15

  29. Capacity planning, forecasting, and load testing are your friends 15

  30. Measure everything 15

  31. Version 4 • Nickname: The Grid � CN CN • Multi-process dispatch CN � • Peer assignment � • Redis is now considered the SF NYC SEA CHI SF NYC CHI source of truth SF CHI � • Use lua interpreter for atomic operations � • Fan out to all city peers to find nearby cars

  32. clientStatus = redis.call('hget', clientHash, ‘status’) driverStatus = redis.call('hget', driverHash, ‘status’) if clientStatus == 'WaitingForPickup' and driverStatus == 'Open' then local clientPeerId = redis.call('hget', clientAssignmentHash, clientToken) redis.call('hset', driverHash, 'status', 'DispatchPending') redis.call('hset', clientAssignmentHash, clientToken, driverPeerId) if clientPeerId then redis.call('zincrby', countKey, -1, clientPeerId) end redis.call('zincrby', countKey, 1, driverPeerId) return redis.status_reply('SUCCESS') else return redis.error_reply('ERROR - clientStatus: '..tostring(clientStatus)..', driverStatus: '..tostring(driverStatus)) end 15

  33. clientStatus = redis.call('hget', clientHash, ‘status’) driverStatus = redis.call('hget', driverHash, ‘status’) if clientStatus == 'WaitingForPickup' and driverStatus == 'Open' then local clientPeerId = redis.call('hget', clientAssignmentHash, clientToken) redis.call('hset', driverHash, 'status', 'DispatchPending') redis.call('hset', clientAssignmentHash, clientToken, driverPeerId) if clientPeerId then redis.call('zincrby', countKey, -1, clientPeerId) end redis.call('zincrby', countKey, 1, driverPeerId) return redis.status_reply('SUCCESS') else return redis.error_reply('ERROR - clientStatus: '..tostring(clientStatus)..', driverStatus: '..tostring(driverStatus)) end 15

  34. clientStatus = redis.call('hget', clientHash, ‘status’) driverStatus = redis.call('hget', driverHash, ‘status’) if clientStatus == 'WaitingForPickup' and driverStatus == 'Open' then local clientPeerId = redis.call('hget', clientAssignmentHash, clientToken) redis.call('hset', driverHash, 'status', 'DispatchPending') redis.call('hset', clientAssignmentHash, clientToken, driverPeerId) if clientPeerId then redis.call('zincrby', countKey, -1, clientPeerId) end redis.call('zincrby', countKey, 1, driverPeerId) return redis.status_reply('SUCCESS') else return redis.error_reply('ERROR - clientStatus: '..tostring(clientStatus)..', driverStatus: '..tostring(driverStatus)) end 15

  35. clientStatus = redis.call('hget', clientHash, ‘status’) driverStatus = redis.call('hget', driverHash, ‘status’) if clientStatus == 'WaitingForPickup' and driverStatus == 'Open' then local clientPeerId = redis.call('hget', clientAssignmentHash, clientToken) redis.call('hset', driverHash, 'status', 'DispatchPending') redis.call('hset', clientAssignmentHash, clientToken, driverPeerId) if clientPeerId then redis.call('zincrby', countKey, -1, clientPeerId) end redis.call('zincrby', countKey, 1, driverPeerId) return redis.status_reply('SUCCESS') else return redis.error_reply('ERROR - clientStatus: '..tostring(clientStatus)..', driverStatus: '..tostring(driverStatus)) end 15

  36. Version 4 (continued) SF1 SF2 NY1 NY2 SEA1 SEA2 CHI1 PAR1 BOS1 BOS2 SF3 NY3 NY4 SEA3 SEA4 CHI2 CHI3

  37. Version 5 SF SF SF SF SF SF SF SF SF

  38. Version 5 max # of loc queries # of nodes

  39. Version 5 CN CN CN ncar SF NYC SEA CHI ncar SF NYC CHI ncar SF CHI ncar

  40. Break out services as needed 15

  41. Understand v8 to optimize Node.js applications 15

  42. SF1 SF2 NY1 NY2 SEA1 SEA2 CHI1 PAR1 BOS1 BOS2 SF3 NY3 NY4 SEA3 SEA4 CHI2 CHI3

  43. Don’t take vacation ;)

  44. Don’t live in Chicago! 15

  45. Stateless applications… No single points of failure… Replicated data stores… Dynamic application topology… 15

  46. Version 6 SF1 NY2 NY3 NY1 SEA3 PAR1 CHI1 BOS1 BOS2 SF3 SF2 NY4 SEA1 SEA4 CHI2 CHI3 SEA2 Grid Grid Grid Manager Manager Manager

  47. Version 7 haproxy

  48. Do the obvious 15

  49. Pros • every application is horizontally scalable 
 • flexible, partially dynamic topology 
 • failure recovery manual in the worst case � • supports primary business case very well 
 • conservative estimates 1-2 years of runway

  50. Never be satisfied

  51. Cons • what happens when a city out scales the capacity of a single redis instance? � • who wants to wake up in the middle of the night for servers crashes? � • what about future business use cases?

  52. #WORLDCLASS 4

  53. World Class • city agnostic dispatch application � • “stateless” applications � • scale to 100x current load � • flexible data model

  54. Every now and then it’s okay to bend the rules 15

  55. Realtime Analytics

  56. Realtime Analytics

  57. So why did we stick with Node.js? • JavaScript is easy to learn � Simple interface with thorough documentation • � • Lends itself to fast prototyping � • Asynchronous, nimble � Avoid concurrency challenges • � • Increasingly mature module ecosystem

  58. How to win with Node.js? • measure everything - particularly response times and event loop lag 
 • learn to take heap dumps to debug memory issues 
 • strace, perf, flame graphs are necessary tools for improving performance � � • small, reusable components to reduce duplication

  59. The Human Factor 34

  60. Thank you. Questions?

Recommend


More recommend