building high available systems
play

Building High-available systems Chander Damodaran - Collabera - PowerPoint PPT Presentation

Building High-available systems Chander Damodaran - Collabera Agenda Introduction Key Concepts Approach Availability Index Key HA Design Principles 3 Sample Business Scenarios Root causes IRCTC RPO Mid-size GitHub


  1. Building High-available systems Chander Damodaran - Collabera

  2. Agenda • Introduction • Key Concepts • Approach • Availability Index • Key HA Design Principles 3

  3. Sample Business Scenarios Root causes IRCTC RPO Mid-size GitHub Company Single point of failure System used beyond design limits Software error Human error Wrong design assumptions 4

  4. Key Concepts • Availability : Availability is the measure of how often or how long a service or a system component is available for use. Uptime __________________ Availability = Uptime + Downtime • Reliability : Reliability is the measure of fault avoidance. • Serviceability : Serviceability is a measurement that expresses how easily a system is serviced or repaired. system is serviced or repaired. • Disaster Recovery : Disaster recovery is the ability to continue with services in the case of major outages, often with reduced capabilities or performance. 5

  5. Approach Review the solution, and check its Design solution behaviour against failure Map scenarios scenarios to requirements Evaluate scenarios, and determine their List probability Vulnerabilities VULNERABILITY LIKELIHOOD IMPACT (1-5) LEVEL OF SOLUTION (1-5) CONCERN Failed disk 5 1 5 Implement Mirrored disks Application Crash 5 4 20 Distributed application, failover, clustering 6

  6. Availability Index Disaster Recovery A V Replication A Failovers I Services and Applications L Client Management A B Local Environment I Networking L Disk and Volume Management I I Reliable Backups T Y Good System Administration Practices *Blueprints for High Availability INVESTMENT 7

  7. Components, failures & protection mechanism Component category Typical failure Fault protection User environment Data deletion or Disaster-recovery corruption processes Administration Data deletion or Disaster-recovery environment corruption processes Application Crashes, data corruption Distributed application, failover, clustering Middleware Middleware Crashes, memory leaks Crashes, memory leaks Clustering Clustering (Network) infrastructure Connection loss Independent high- availability architecture Operating system Crash, device driver Clustering errors Hardware Device defect Redundant components, hot-spare disks maintenance contracts Physical environment Power outage, fire, floods UPS, backup data center 8

  8. Key High Availability Design Principles • Assume Nothing • Remove Single Points of Failure (SPOFs) • Plan Ahead & Design for Growth • One Problem, One Solution • Choose Mature, Reliable Hardware • Choose Mature Software • Learn from History • Separate Your Environments • Separate Your Environments • Test Everything • Employ Service Level Agreements • Document Everything • Enforce Change Control • Watch Your Speed • Consolidate Your Servers • Enforce Security • Don’t Be Cheap 9

  9. QUESTIONS? ChanderD@Collabera.com 10

Recommend


More recommend