arrested development
play

Arrested Development The awkward adolescence of a - PowerPoint PPT Presentation

Arrested Development The awkward adolescence of a microservices-based application Europython 2015 Scott Triglia The Company 77M reviews 142M monthly unique users Scott Triglia @scott_triglia 4 years with Yelp Your Speaker Search, ML,


  1. Arrested Development The awkward adolescence of a microservices-based application Europython 2015 Scott Triglia

  2. The Company

  3. 77M reviews 142M monthly unique users

  4. Scott Triglia @scott_triglia 4 years with Yelp Your Speaker Search, ML, Services

  5. Yelp Transaction The Product Platform

  6. Yelp Transaction The Product Platform (or just “Platform”)

  7. Microservices That Hot Trend

  8. “…an approach to developing a single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms…” http://martinfowler.com/articles/microservices.html

  9. (clarkmaxwell via Flickr; CC BY-NC-ND 2.0)

  10. Monolithic python code resisted decoupling

  11. Monolithic python code catered to the lowest common denominator

  12. Monolithic python code was anti-agile

  13. Services Time

  14. Pinterest Gingerbread House

  15. Pinterest Gingerbread House

  16. API complexity increases

  17. coupling rises

  18. interactions get murky

  19. process does not scale

  20. So what’s an engineer to do?

  21. • Decoupling • Defining • Understanding Production • Staying Agile

  22. • Decoupling • Defining • Understanding Production • Staying Agile

  23. Old boring problem Monolithic spaghetti code

  24. Solution: microservices!

  25. New exciting problem how to share concepts across services

  26. New exciting problem distributed tech debt

  27. service_type

  28. service_type What product does your business provide and how do they provide it?

  29. service_type pickup delivery

  30. booking_at_customer service_type pickup booking_at_business delivery

  31. booking_at_customer service_type hotel_reservation goods_at_business pickup booking_at_business goods_at_customer delivery

  32. Confusing Pervasive Convenient, but not designed

  33. Draw boundaries, introduce domain-specific concepts tied to functionality

  34. Lessons

  35. Interfaces are the sum of APIs, shared libraries, and the data that flows through them

  36. Sacrificing DRYness can be the best choice for overall design

  37. Service interfaces are a great opportunity to intentionally decouple systems

  38. • Decoupling • Defining • Understanding Production • Staying Agile

  39. Have you ever needed to understand a system and been told go read the source?

  40. What about a system which only validates half its interface?

  41. Coming from a python monolith, strong interfaces were quite rare

  42. def checkout(order, price, **kwargs): “““Process an order.””” validate_order(order) charge_credit_card(order.user, price) notify_user(order, **kwargs)

  43. Client side - Yelp/bravado from bravado.client import SwaggerClient client = SwaggerClient.from_url( “www.myservice.com/swagger.json” ) pet = client.pet.getPetById(petId=42).result()

  44. Server side - striglia/pyramid_swagger # In your Pyramid webapp.py config.include(‘pyramid_swagger')

  45. Lessons

  46. Interfaces should be intentional

  47. Interfaces should be explicit

  48. Find the mechanical things which don’t scale and automate them mercilessly

  49. • Decoupling • Defining • Understanding Production • Staying Agile

  50. Real customer bug report: “We’re seeing 504s talking to the /user_info API”

  51. Ancient times: Use logic and whatever logs happen to exist

  52. (drbethsnow via Flickr; CC BY-NC-ND 2.0)

  53. Better: Log all incoming API requests to any service

  54. (spam via Flickr; CC by 2.0)

  55. Best: Every service has a detailed access/ error log and tooling to examine them

  56. So what about that customer with the mystery 504?

  57. 2.5 s 0.15 s

  58. Realistically: Don’t require the customer to report issues in the first place

  59. es_host: elasticsearch-hostname es_port: 14900 index: logstash-errors-%G.%V type: frequency num_events: 20 timeframe: minutes: 2 alert: - "modules.sensu_alert.SensuAlerter" sensu: team: platform tip: "This alert indicates a large number of errors across the Platform product. See <link to Kibana> for details." page: true status: 2 # CRITICAL

  60. es_host: elasticsearch-hostname es_port: 14900 index: logstash-errors-%G.%V type: frequency num_events: 20 timeframe: minutes: 2 alert: - "modules.sensu_alert.SensuAlerter" sensu: team: platform tip: "This alert indicates a large number of errors across the Platform product. See <link to Kibana> for details." page: true status: 2 # CRITICAL

  61. es_host: elasticsearch-hostname es_port: 14900 index: logstash-errors-%G.%V type: frequency num_events: 20 timeframe: minutes: 2 alert: - "modules.sensu_alert.SensuAlerter" sensu: team: platform tip: "This alert indicates a large number of errors across the Platform product. See <link to Kibana> for details." page: true status: 2 # CRITICAL

  62. Lessons

  63. Logging is a superpower. Use it wisely constantly.

  64. But raw data is not enough! Visualize and monitor actively.

  65. These approaches make a world of difference: • Incident response from days to minutes • Investigations from ∞ to minutes

  66. • Decoupling • Defining • Understanding Production • Staying Agile

  67. Uncomfortable conversation: “Customers had their orders interrupted. How are you preventing it going forward?”

  68. Understandable response: “Deploy more carefully”

  69. Understandable response: “Expand oncall”

  70. How do we ensure the team stays agile as our services grow in complexity?

  71. Pain point: The testing environment is {broken, flaky, not like prod}

  72. Pain point: Tests passed but production broke

  73. Production monitoring is the natural extension of excellent pre-deploy testing.

  74. Pain point: No clue how much time we spend fixing production issues

  75. Pain point: Tough to argue what changes will make things more robust

  76. And as with everything else, this must eventually be automated

  77. Lessons

Recommend


More recommend