architectures that scale deep
play

Architectures that Scale Deep: Regaining Control in Deep Systems - PowerPoint PPT Presentation

Architectures that Scale Deep: Regaining Control in Deep Systems Ben Sigelman (@el_bhs, bhs@lightstep.com) Co-founder & CEO: LightStep Co-creator: OpenTracing, OpenTelemetry, Google Dapper, Google Monarch QCon SF, November 2019 Part I


  1. Architectures that Scale Deep: Regaining Control in Deep Systems Ben Sigelman (@el_bhs, bhs@lightstep.com) Co-founder & CEO: LightStep Co-creator: OpenTracing, OpenTelemetry, Google Dapper, Google Monarch QCon SF, November 2019

  2. Part I Scaling, and Deep Systems

  3. What is scale, anyway?

  4. Scaling wide

  5. Scaling wide

  6. Scaling wide

  7. Scaling wide

  8. Scaling wide

  9. Scaling deep

  10. Scaling deep

  11. Scaling deep

  12. Scaling deep

  13. Scaling deep

  14. How does this look for software?

  15. Software: Scaling wide

  16. Software: Scaling deep

  17. How do real-world systems look?

  18. Microservices at scale aren’t just wide systems , they’re deep systems

  19. Deep Systems Deep Systems Architectures with ≥ 4 layers of Architectures with ≥ 4 layers of independently operated services independently operated services (including external/cloud dependencies) (including external/cloud dependencies)

  20. What do deep systems sound like?

  21. What do deep systems sound like? “Don’t deploy on Fridays”

  22. What do deep systems sound like? “Where’s Chris?! I’m dealing with a P0 and they’re the only one who knows how to debug this.”

  23. What do deep systems sound like? “It can’t be our fault, our dashboard says we’re healthy”

  24. What do deep systems sound like? “Kafka is on fire”

  25. What do deep systems sound like? “I need 100% availability from your team. One hundred percent .”

  26. What do deep systems sound like? “I didn’t know I depended on that region”

  27. What do deep systems sound like? “That was on a dashboard but I can’t find it”

  28. What do deep systems sound like? Lots of challenges: - People-management - Security - Multi-tenancy - “Big-customer” success - Performance - Observability

  29. Part II Control Theory: TL;DR Edition

  30. Why do we care so much about observability , anyway?

  31. Inputs Outputs A System … and its state vector,

  32. Observability Inputs Outputs A System … and its state vector, How well can you infer internal state using only the outputs ?

  33. Controllability Inputs Outputs A System … and its state vector, How well can you control internal state using only the inputs ?

  34. Controllability is the dual of Observability

  35. Controllability is the dual of Observability

  36. Part III What Deep Systems Mean for Observability

  37. Pure Monoliths developers per service Deep Systems Architectural evolution # of services

  38. Stress (n): responsibility without control Stress what you can control what you are responsible for

  39. Observability: Shrink This Gap

  40. Mental models A System

  41. Managing Deep Systems Services must have SLOs (“Service Level Objectives”: latency, errors, etc) For effective service management, only three things matter: 0. Releasing service functionality 1. Gradually improving SLOs 2. Rapidly restoring SLOs In a deep system, we must control the entire “triangle” to maintain our SLOs

  42. There’s that word again… Controllability == Observability Controllability == Observability

  43. Observability: “The Conventional Wisdom” Observing microservices is hard Google and Facebook solved this (right???) They used Metrics, Logging, and Distributed Tracing… … So we should, too.

  44. 3 Pillars, 3 Experiences Metrics Logs Traces

  45. Three Pillars? Three Pillars? Two giant pipes… Metrics Without Traces: Cognitive Load ≈ O( depth 2 ) Logs

  46. Three Pillars? Three Pillars? Two giant pipes… Metrics Logs

  47. Two giant pipes… Metrics Without Traces: Cognitive Load ≈ O( depth 2 ) Logs

  48. Traces

  49. Traces provide Context

  50. Traces provide Context And context rules out invalid hypotheses

  51. Two giant pipes and a filter Metrics Context (from traces) Logs

  52. Context reduces cognitive load Relevant Metrics Context (from traces) Relevant Logs With Traces: Cognitive Load ≈ O( depth )

  53. Observability: Shrink This Gap

  54. Let’s Review

  55. Microservices don’t just scale wide, they scale deep Recognize deep systems

  56. Stress (n): responsibility without control Stress what you can control what you are responsible for

  57. “Controllability” (of SLOs) depends on observability

  58. “The Three Pillars of Observability” is a lousy metaphor … and traces are not sprinkles

  59. Tracing can reduce cognitive load from O( depth 2 ) to O( depth )

  60. Tracing is the backbone of simple observability in deep systems

  61. Thank You Play with LightStep, Feedback always for free, anytime: welcome: (no email address required!) twitter → @el_bhs lightstep.com/play the emails → bhs@lightstep.com

Recommend


More recommend