the sre i aspire to be
play

The SRE I aspire to be Yaniv Aknin // @aknin #VelocityConf San Jose - PowerPoint PPT Presentation

The SRE I aspire to be Yaniv Aknin // @aknin #VelocityConf San Jose 2019 The SRE I aspire to be // @aknin Who is this guy Google SRE since 2013 Most recently GCP's Quantitative Reliability Lead Jack of all trades Equal parts SRE, dev,


  1. The SRE I aspire to be Yaniv Aknin // @aknin #VelocityConf San Jose 2019

  2. The SRE I aspire to be // @aknin Who is this guy ● Google SRE since 2013 Most recently GCP's Quantitative Reliability Lead ● Jack of all trades Equal parts SRE, dev, and /pro(duct|ject) manager/ ● Opinions my own But I owe a lot here to others

  3. The SRE I aspire to be // @aknin Who is this guy ● Google SRE since 2013 Most recently GCP's Quantitative Reliability Lead ● Jack of all trades * Equal parts SRE, dev, and /pro(duct|ject) manager/ ● Opinions my own But I owe a lot here to others * NB: what does "SRE" really mean?

  4. The SRE I aspire to be // @aknin Wikipedia says Engineering is " using scientific principles to design and build https://en.wikipedia.org/wiki/Engineering $THINGS "

  5. The SRE I aspire to be // @aknin Wikipedia says Engineering is " using scientific principles to design and build https://en.wikipedia.org/wiki/Engineering $THINGS " Imagine THINGS="Reliability" ... how do we apply science to that?

  6. The SRE I aspire to be // @aknin Innovation Reliability (engineering, proactive, change) (support, reactive, preserve)

  7. The SRE I aspire to be // @aknin (support, reactive, preserve) Reliability (engineering, proactive, change) ? Innovation

  8. The SRE I aspire to be // @aknin ( engineering, proactive, change ) Reliability (engineering, proactive, change) Innovation The Error Budget

  9. The SRE I aspire to be // @aknin Measurably optimise reliability vs cost

  10. The SRE I aspire to be // @aknin “ When you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot measure it, your knowledge is of a meagre and ” unsatisfactory kind . William Thomson (Lord Kelvin) President of the Royal Society Lecture on "Electrical Units of Measurement" Published in "Popular Lectures", Vol. 1, 1883 (abridged to fit slide)

  11. The SRE I aspire to be // @aknin MTTR 99.9% 99.99% MTBF MTBF/MTTR "9s" (e.g. "99.95% uptime") Challenge: fungible definition of "failure" Challenge: aggregating individual events into business credible 9s

  12. The SRE I aspire to be // @aknin Why is this hard? ● Scope ● Difficulty ● Cost++ ● Misconceptions

  13. The SRE I aspire to be // @aknin Why is this hard? And why is it good? ● Scope ● Leverage ● Difficulty ● Precision ● Cost++ ● Cost-- ● Misconceptions

  14. The SRE I aspire to be // @aknin On ops, user harm, and tradeoffs Ops Your product is here. User happiness

  15. The SRE I aspire to be // @aknin On ops, user harm, and tradeoffs Ops Your product is here. User happiness

  16. The SRE I aspire to be // @aknin On ops, user harm, and tradeoffs Ops Your product is here. User happiness

  17. The SRE I aspire to be // @aknin On ops, user harm, and tradeoffs Ops Your product is here. User happiness

  18. The SRE I aspire to be // @aknin You need "better quality" 9s! 99.999% "I spent time making my metrics hit certain thresholds" Misaligned Aligned "Whatever I happened "I spent time ensuring 9s correlate to measure" with customer pain" 99% "Whatever I happened to ship"

  19. The SRE I aspire to be // @aknin First move right, then move up 99.999% "I spent time making my metrics hit certain thresholds" Wasted Happy Effort Customers Misaligned Aligned "Whatever I happened "I spent time ensuring 9s correlate to measure" with customer pain" Unknown Known Problem Problem 99% "Whatever I happened to ship"

  20. The SRE I aspire to be // @aknin SRE team: a recipe Obvious Monitoring Alerting Capacity planning CI/CD & Rollouts Load Balancing

  21. The SRE I aspire to be // @aknin SRE team: a recipe Obvious Less Obvious Monitoring System Architecture Alerting Distributed Algorithms Capacity planning Networking CI/CD & Rollouts Operating Systems Load Balancing

  22. The SRE I aspire to be // @aknin SRE team: a recipe Obvious Less Obvious Least Obvious Monitoring Product Management System Architecture Alerting Data Science Distributed Algorithms Capacity planning Business Acumen Networking CI/CD & Rollouts (nose for) UX Operating Systems Research Load Balancing

  23. The SRE I aspire to be // @aknin Litmus test of SRE ● Have a measurement of reliability ● When unreliable, resource allocation changes ● When reliable, you don't do ops

  24. The SRE I aspire to be // @aknin * Litmus test of SRE ● Have a measurement of reliability ● When unreliable, resource allocation changes ● When reliable, you don't do ops * Please remember this is my litmus test... tell me yours?

  25. The SRE I aspire to be // @aknin Thank you! Yaniv Aknin // @aknin Art credits "Lord Kelvin", Messrs. Dickinson, London, goo.gl/RHF61Z, [cropped] Yin Yang, https://openclipart.org/detail/276316/ying-yang

Recommend


More recommend