mesos at yelp building a production ready paas
play

Mesos at Yelp: Building a production ready PaaS Rob Johnson - PowerPoint PPT Presentation

Mesos at Yelp: Building a production ready PaaS Rob Johnson robj@yelp.com/@rob_johnson_ Who Am I: - Rob Johnson - Operations Team at Yelp - Spend most of my time working on PaaSTA Yelps Mission: Connecting people with great local


  1. Mesos at Yelp: Building a production ready PaaS Rob Johnson robj@yelp.com/@rob_johnson_

  2. Who Am I: - Rob Johnson - Operations Team at Yelp - Spend most of my time working on PaaSTA

  3. Yelp’s Mission: Connecting people with great local businesses.

  4. Yelp Stats: As of Q2 2015 83M 83M 68% 32

  5. PaaSTA

  6. Yelp’s homegrown Platform- as-a-Service

  7. What’s the problem we’re trying to solve here?

  8. - Yelp’s monolith is ~3 million LoC (that’s just the Python). * - Increasing number of developers. *as of 28/09/2015

  9. - Code deployments become increasingly difficult to coordinate. - Surface area for impact of a bug greatly increases.

  10. What’s the solution?

  11. SOA

  12. Solves everything, right?

  13. SOA: Round 1

  14. - Statically defined list of hosts to deploy a service on. - Operations handle deciding which hosts to deploy to.

  15. - Manually configure Nagios for each service. - Manual deployment system. Lots of rsync wrappers to push code around.

  16. This doesn’t scale well.

  17. PaaSTA

  18. - Built on the shoulders of established tools. - ‘Glue Code’ that coordinates these tools.

  19. Components

  20. Mesos

  21. Marathon

  22. Chronos (almost)

  23. My work here is done, right?

  24. Not Quite.

  25. Services != Production

  26. What makes a service production ready?

  27. - easy deployment for developers

  28. - easy deployment for developers - discovery

  29. - easy deployment for developers - discovery - monitoring

  30. - easy deployment for developers - discovery - monitoring - highly available

  31. - easy deployment for developers - discovery - monitoring - highly available - operational support

  32. - easy deployment for developers - discovery - monitoring - highly available - operational support

  33. Services at Yelp tend to be: - http api - Python - uWSGI

  34. We want to be stack agnostic; developers shouldn’t be constrained by dependencies on a server.

  35. - PaaSTA only runs Docker containers. - Developers own the creation of the image.

  36. PaaSTA currently has Java, Golang and Python apps in production.

  37. PaaSTA provides tooling to automate the build and deployment of images via Jenkins.

  38. PaaSTA uses Git as its control plane.

  39. git push make itest push to registry performance check deploy to dev (repeat for each dev env) manual intervention prod

  40. Once a given image is marked for deployment in production, PaaSTA ‘bounces’ the app, gracefully upgrading the version.

  41. - Reduces operational overhead of deploying service. - Removes bottleneck of going through operations to deploy.

  42. - easy deployment for developers - discovery - monitoring - highly available - operational support

  43. Smartstack

  44. - Originally written by Airbnb - Yelp now has maintainers working on it.

  45. s3 s1 s2 s4 s3 s1 s2 s4 S N S N H H ZK H S N s3 s1 s2 s4

  46. s3 s1 s2 s4 s3 s1 s2 s4 S N S N H H ZK H S N s3 s1 s2 s4

  47. s3 s1 s2 s4 s3 s1 s2 s4 S N S N H H ZK H S N s3 s1 s2 s4

  48. s3 s1 s2 s4 s3 s1 s2 s4 S N S N H H ZK H S N s3 s1 s2 s4

  49. There’s no place like 127.0.0.1 169.254.255.254

  50. Why Smartstack?

  51. - ZK/synapse/nerve dying doesn’t wipe us out. - HAProxy has its own health checking system we can fall back to.

  52. - HAProxy is a proven load balancer and http proxy. - We can use Smartstack with non-PaaSTA services.

  53. Zero-downtime HAProxy reloads: http://bit.ly/1RsctGi

  54. - easy deployment for developers - discovery - monitoring - highly available - operational support

  55. - API allows us to send event data. - Flexibility to assign alerts to service authors, rather than forcing it on operations team.

  56. $ cat monitoring.yaml -- team: search_infra notification_email: search@yelp.com page: true runbook: 'y/rb-myservice' alert_after: 5m realert_every: 10m tip: 'The federator service is in the critical path for search, you should be fixing this'

  57. ./check_marathon_services_replication

  58. ./check_hung_setup_marathon_jobs

  59. - easy deployment for developers - discovery - monitoring - highly available - operational support

  60. Yelp organises machines into latency zones.

  61. Superregion Region Habitat

  62. $ cat smartstack.yaml --- main: advertise: [superregion] discover: superregion proxy_port: 20603

  63. By choosing a more specific latency zone, service owners optimize for RTT over availability.

  64. - By being aware of these latency zones, PaaSTA can make smarter decisions on how to constrain applications.

  65. Without this coupling, Marathon wouldn’t balance apps evenly amongst the latency zones.

  66. - easy deployment for developers - discovery - monitoring - highly available - operational support

  67. PaaSTA comes with a cli for managing PaaSTA services.

  68. - easy deployment for developers - discovery - monitoring - highly available - operational support

  69. Questions?

  70. YelpEngineers @YelpEngineering engineeringblog.yelp.com github.com/yelp

Recommend


More recommend