how to invest in technical infrastructure
play

How to invest in technical infrastructure Will Larson 2019 - PowerPoint PPT Presentation

How to invest in technical infrastructure Will Larson 2019 @lethain Prioritizing infrastructure investment... ...in a high autonomy environment... ...within a rapidly scaling business. How can infrastructure teams... ...be surprisingly


  1. How to invest in technical infrastructure Will Larson 2019 @lethain

  2. Prioritizing infrastructure investment...

  3. ...in a high autonomy environment...

  4. ...within a rapidly scaling business.

  5. How can infrastructure teams...

  6. ...be surprisingly impactful...

  7. ...without burning out?

  8. What is technical infrastructure?

  9. Technical infrastructure : Someone’s biggest problem they dislike.

  10. Technical infrastructure : Tools used by 3+ teams for business critical workloads.

  11. Examples of technical infrastructure Developer tools Data infrastructure Core libraries and frameworks Model training and evaluation

  12. Introduction 1. Fundamentals 2. Escaping the firefight 3. Learning to innovate 4. Navigating breadth 5. Unifying approach Closing

  13. Forced Discretionary ● Scale MongoDB ● Sorbet ● Lower AWS costs ● Monolith -> µservices ● GDPR ● Deep learning

  14. Short-term Long-term ● Critical remediation ● QoS strategy ● Scale for holidays ● “Bend the cost curve” ● Support launch ● Rewrite monolith

  15. Where is your team now?

  16. Where do you want to be?

  17. Introduction 1. Fundamentals 2. Escaping the firefight 3. Learning to innovate 4. Navigating breadth 5. Unifying approach Closing

  18. Even Stripe...

  19. MongoDB

  20. Shared replsets Easy to provision :-) Don’t cost much :-) Shared everything :-\ Joint ownership :-/ Limited isolation :-( Big blast radius :-(

  21. More time on incidents

  22. Incident impact increasing

  23. When things aren’t getting better, they are getting worse

  24. How to fix?

  25. Ok, so what’s the firefighting playbook?

  26. Finish something

  27. Reduce concurrent work

  28. Automate

  29. Eliminate categories of problems

  30. Are you seeing signs of progress?

  31. No? You’ve gotta hire

  32. Once there’s progress, stay the course!

  33. btw, don’t fall in love with firefighting

  34. Introduction 1. Fundamentals 2. Escaping the firefight 3. Learning to innovate 4. Navigating breadth 5. Unifying approach Closing

  35. Rare opportunity in infrastructure

  36. Rare also means inexperienced

  37. tl;dr Talk to your users more

  38. tl;dr Talk to your users more

  39. tl;dr Listen to your users more

  40. Ways innovation goes wrong...

  41. Problem Making the most intuitive fix

  42. Problem AKA fixating on your local maxima

  43. Discover

  44. Discover Benchmark with peer companies Coffee chats with users SLOs Surveys

  45. “Ruby is a terrible language.”

  46. Problem Infinite possibilities, what to pick?

  47. Prioritization

  48. Prioritization Order by return on investment Don’t try without users in the room Long-term vision

  49. “The critical business outcome is me learning Elixir.”

  50. Problem Right opportunity with wrong solution

  51. Validation

  52. Validation Cheaply disprove approach Try hardest cases early Embed with owners

  53. “Monster is too unreliable and slow!”

  54. “Let’s just rewrite monster.”

  55. “Let’s just rewrite monster. Again.”

  56. “Let’s just rewrite harden monster.”

  57. “Can we provide a unified interface for task, cronjob and service orchestration?”

  58. Kubernetes

  59. Kubernetes Chronos Railyard Services

  60. tl;dr Listen to your users more

  61. Be valuable or go back to firefighting

  62. Introduction 1. Fundamentals 2. Escaping the firefight 3. Learning to innovate 4. Navigating breadth 5. Unifying approach Closing

  63. Fool me once, shame on you

  64. Fool me twice, shame on me

  65. Fool me every year on exact same date?

  66. “Convert unplanned scalability work into planned scalability work.”

  67. Schedule manual load tests

  68. Schedule automated load tests

  69. Run continuous load tests

  70. Solved out of a job

  71. Great technology fix, but what’s the organizational fix?

  72. Infrastructure properties

  73. Stripe’s infrastructure properties Security Reliability Usability Efficiency Latency

  74. Lightly ordered but not stack ranked

  75. More a portfolio: invest in each

  76. Baselines!

  77. Invest to maintain your baselines

  78. Maintain across timeframes

Recommend


More recommend