mature microservices and how to operate them
play

Mature microservices and how to operate them Sarah Wells Technical - PowerPoint PPT Presentation

Mature microservices and how to operate them Sarah Wells Technical Director for Operations & Reliability, The Financial Times @sarahjwells https://www.ft.com/stream/ c47f4dfc-6879-4e95-accf-ca8cbe6a1f69 @sarahjwells


  1. Mature microservices and how to operate them Sarah Wells Technical Director for Operations & Reliability, The Financial Times @sarahjwells

  2. https://www.ft.com/stream/ c47f4dfc-6879-4e95-accf-ca8cbe6a1f69 @sarahjwells

  3. https://www.ft.com/companies @sarahjwells

  4. Problem: we’d set up a redirect to a page which didn’t exist @sarahjwells

  5. We weren’t sure how to fix the data via the url management tool @sarahjwells

  6. We got it fixed @sarahjwells

  7. Polyglot architectures are great - until you need to work out how *this* database is backed up @sarahjwells

  8. Microservices are more complicated to operate and maintain @sarahjwells

  9. Why bother? @sarahjwells

  10. “Experiment” for most organizations really means “try” Linda Rising Experiments: the Good, the Bad and the Beautiful @sarahjwells

  11. Overlap tests by componentising the barrier

  12. Releasing changes frequently doesn’t just ‘happen’ @sarahjwells

  13. Done right, microservices enable this @sarahjwells

  14. The team that builds the system *has* to operate it too @sarahjwells

  15. What happens when teams move on to new projects? @sarahjwells

  16. Your next legacy system will be microservices not a monolith @sarahjwells

  17. Optimising for speed Operating microservices When people move on @sarahjwells

  18. Optimising for speed @sarahjwells

  19. Measure High performers Delivery lead time

  20. Measure High performers Delivery lead time Less than one hour “How long would it take you to release a single line of code to production?”

  21. Measure High performers Delivery lead time Less than one hour Deployment frequency

  22. Measure High performers Delivery lead time Less than one hour Deployment frequency On demand

  23. Measure High performers Delivery lead time Less than one hour Deployment frequency On demand Time to restore service

  24. Measure High performers Delivery lead time Less than one hour Deployment frequency On demand Time to restore service Less than one hour

  25. Measure High performers Delivery lead time Less than one hour Deployment frequency On demand Time to restore service Less than one hour Change fail rate

  26. Measure High performers Delivery lead time Less than one hour Deployment frequency On demand Time to restore service Less than one hour Change fail rate 0 - 15%

  27. High performing organisations release changes frequently @sarahjwells

  28. Continuous delivery is the foundation @sarahjwells

  29. “If it hurts, do it more frequently, and bring the pain forward.”

  30. Our old build and deployment process was very manual… @sarahjwells

  31. You can’t experiment when you do 12 releases a year @sarahjwells

  32. 1. An automated build and release pipeline @sarahjwells

  33. 2. Automated testing, integrated into the pipeline @sarahjwells

  34. 3. Continuous integration @sarahjwells

  35. If you aren’t releasing multiple times a day, consider what is stopping you @sarahjwells

  36. You’ll probably have to change the way you architect things @sarahjwells

  37. Zero downtime deployments: - sequential deployments - schemaless databases @sarahjwells

  38. In hours releases mean the people who can help are there @sarahjwells

  39. You need to be able to test and deploy your changes independently @sarahjwells

  40. You need systems - and teams - to be loosely coupled @sarahjwells

  41. Done right, microservices are loosely coupled @sarahjwells

  42. Processes also have to change @sarahjwells

  43. Often there is ‘process theatre’ around things and this can safely be removed @sarahjwells

  44. Change approval boards don’t reduce the chance of failure @sarahjwells

  45. Filling out a form for each change takes too long @sarahjwells

  46. How fast are we moving? @sarahjwells

  47. Releasing 250 times as often @sarahjwells

  48. Changes are small, easy to understand, independent and reversible @sarahjwells

  49. <1% failure rate ~16% failure rate

  50. Optimising for speed Operating microservices @sarahjwells

  51. There are patterns and approaches that help @sarahjwells

  52. Devops is essential for success @sarahjwells

  53. You can’t hand things off to another team when they change multiple times a day @sarahjwells

  54. High performing teams get to make their own decisions about tools and technology @sarahjwells

  55. Delegating tool choice to teams makes it hard for central teams to support everything @sarahjwells

  56. Make it someone else’s problem @sarahjwells

  57. https://medium.com/wardleymaps

  58. Buy rather than build, unless it’s critical to your business @sarahjwells

  59. Work out what level of risk you’re comfortable with @sarahjwells

  60. “We’re not a hospital or a power station” @sarahjwells

  61. We value releasing often so we can experiment frequently @sarahjwells

  62. Accept that you will generally be in a state of ‘grey failure’ @sarahjwells

  63. Retry on failure: - backoff before retrying - give up if it’s taking too long @sarahjwells

  64. Mitigate now, fix tomorrow @sarahjwells

  65. How do you know something’s wrong? @sarahjwells

  66. Concentrate on the business capabilities @sarahjwells

  67. Synthetic monitoring @sarahjwells

  68. No data fixtures required @sarahjwells

  69. Also helps us know things are broken even if no user is currently doing anything @sarahjwells

  70. Make sure you know whether *real* things are working in production @sarahjwells

  71. Our editorial team is inventive @sarahjwells

  72. What does it mean for a publish to be ‘successful’? @sarahjwells

  73. Build observability into your system @sarahjwells

  74. Observability: can you infer what’s going on in the system by looking at its external outputs? @sarahjwells

  75. Log aggregation @sarahjwells

  76. Metrics @sarahjwells

  77. Keep it simple: - request rate - latency - error rate @sarahjwells

Recommend


More recommend