architecting for failure in a containerized world
play

Architecting for Failure in a Containerized World Tom Faulhaber - PowerPoint PPT Presentation

Architecting for Failure in a Containerized World Tom Faulhaber Infolace How can container tech help us build robust systems? Key takeaway: an architectural toolkit for building robust systems with containers The Rules Decomposition


  1. Architecting for Failure in a Containerized World Tom Faulhaber Infolace

  2. How can container tech help us build robust systems?

  3. Key takeaway: an architectural toolkit for building robust systems with containers

  4. The Rules Decomposition Orchestration and Synchronization Managing Stateful Apps

  5. Simplicity

  6. Simple means: “Do one thing!”

  7. The opposite of simple is complex

  8. Complexity exists within components

  9. Complexity exists between components

  10. Example: a counter x … 5 5 0 0 1 1 2 2 3 3 4 4 Counter Counter Service Service … 0 1 2 3 4 5 Counter Service 0 1 2 3 4 5 0 1 2 3 4 5

  11. Example: a counter … 5 0 1 2 3 4 Counter Service Balancer Load … 0 1 2 3 4 5 Counter Service 0 0 1 2 1 2 3 3 4 5 4 5

  12. State + composition = complexity

  13. Part 1: Decomposition

  14. Rule: Decompose vertically

  15. App Server Service Service Service #1 #2 #3

  16. App Server

  17. Rule: Separation of concerns

  18. Example: Logging App Logging Server Core Code Logging Driver Config

  19. Example: Logging App Core Code Logging Server StdOut Logger Logging Driver Config

  20. Aspect-oriented programming

  21. Rule: Constrain state

  22. Session Store Relational DB

  23. Rule: Battle-tested tools

  24. Redis MySQL

  25. Rule: High code churn → Easy restart

  26. Rule: No start-up order!

  27. a b c d time

  28. a b c x d time

  29. x a x b x c x d time

  30. x a x b x c x d time

  31. a b c d time

  32. a b c d time

  33. a b c d time

  34. Rule: Consider higher-order failure

  35. The Rules Decomposition Orchestration and Synchronization Decompose vertically Separation of concerns Constrain state Battle-tested tools High code churn, easy restart Managing Stateful Apps No start-up order! Consider higher-order failure

  36. Part 2: Orchestration and Synchronization

  37. Rule: Use Framework Restarts

  38. • Mesos: Marathon always restarts • Kubernetes: RestartPolicy=Always • Docker: Swarm always restarts

  39. Rule: Create your own framework

  40. Mesos Master Mesos Mesos Mesos Agent Agent Agent Framework Driver Framework Framework Framework Executor Executor Executor

  41. Rule: Use Synchronized State

  42. Synchronized State Tools: Patterns: - zookeeper - leader election - etcd - shared counters - consul - peer awareness - work partitioning

  43. Rule: Minimize Synchronized State

  44. Even battle-tested state management is a headache. (Source: http://blog.cloudera.com/blog/2014/03/zookeeper-resilience-at-pinterest/)

  45. The Rules Decomposition Orchestration and Synchronization Decompose vertically Use framework restarts Separation of concerns Create your own framework Constrain state Battle-tested tools Use synchronized state High code churn, easy Minimize synchronized state restart Managing Stateful Apps No start-up order! Consider higher-order failure

  46. Part 3: Managing Stateful Apps

  47. Rule (repeat!): Always use battle-tested tools! (State is the weak point)

  48. Rule: Choose the DB architecture

  49. Option 1: External DB Execution cluster Database cluster

  50. Option 1: External DB Pros Cons • Somebody else’s problem! • Not really somebody else’s problem! • Can use a DB designed for • Higher latency/no reference clustering directly locality • Can use DB as a service • Can’t leverage orchestration, etc.

  51. Option 2: Run on Raw HW App App App Marathon Marathon Marathon Mesos Mesos Mesos HDFS HDFS HDFS

  52. Option 2: Run on Raw HW Pros Cons • Use existing recipes • Orchestration doesn’t help with failure • Have local data • Increased management • Manage a single cluster complexity

  53. Option 3: In-memory DB App App App MemSQL MemSQL MemSQL Marathon Marathon Marathon Mesos Mesos Mesos

  54. Option 3: In-memory DB Pros Cons • No need for volume tracking • Bets all machines won’t go down • Fast • Bets on orchestration • Have local data framework • Manage a single cluster

  55. Option 4: Use Orchestration Mesos Mesos Mesos App App App Marathon Marathon Marathon Cassandra Cassandra Cassandra

  56. Option 4: Use Orchestration Pros Cons • Orchestration manages • Currently the least mature volumes • Not well supported by vendors • One model for all programs • Have local data • Single cluster

  57. Option 5: Roll Your Own Mesos Mesos Mesos Mesos Master Framework App App App Marathon Marathon Marathon ImageMgr ImageMgr ImageMgr

  58. Option 5: Roll Your Own Pros Cons • Very precise control • You’re on your own! • You decide whether to use • Wedded to a single containers orchestration platform • Have local data • Not battle tested • Can be system aware

  59. Rule: Have replication

  60. The Rules Decomposition Orchestration and Synchronization Decompose vertically Use framework restarts Separation of concerns Create your own framework Constrain state Battle-tested tools Use synchronized state High code churn, easy Minimize synchronized state restart Managing Stateful Apps No start-up order! Consider higher-order Battle-tested tools failure Choose the DB architecture Have replication

  61. Fin

  62. References • Rich Hickey: 
 “Are We There Yet?” (https://www.infoq.com/presentations/Are-We- There-Yet-Rich-Hickey) 
 “Simple Made Easy” (https://www.infoq.com/presentations/Simple- Made-Easy-QCon-London-2012) • David Greenberg, Building Applications on Mesos, O’Reilly, 2016 • Joe Johnston, et al. , Docker in Production: Lessons from the Trenches, Bleeding Edge Press, 2015

  63. The Rules Decomposition Orchestration and Synchronization Decompose vertically Use framework restarts Separation of concerns Create your own framework Constrain state Battle-tested tools Use synchronized state High code churn, easy Minimize synchronized state restart Managing Stateful Apps No start-up order! Consider higher-order Battle-tested tools failure Choose the DB architecture Have replication

Recommend


More recommend