emerging trends for high availability
play

Emerging trends for High Availability Asim Zuberi Senior - PowerPoint PPT Presentation

Emerging trends for High Availability Asim Zuberi Senior Consultant, Collective Technologies Ayaz Mudarris Senior Consultant, Collective Technologies Module 1: Concepts What is Downtime? If a user cannot get his job done on time, the


  1. Emerging trends for High Availability Asim Zuberi Senior Consultant, Collective Technologies Ayaz Mudarris Senior Consultant, Collective Technologies

  2. Module 1: Concepts…

  3. What is Downtime? – If a user cannot get his job done on time, the system is down – the downtime is incurred.

  4. Causes of Downtime!

  5. What is Availability? MTBF A = ——————— MTBF + MTTR where: A – is the degree of availability expressed as a percentage MTBF – is the mean time between failures (Uptime) MTTR – is the maximum time to repair (Downtime)

  6. Availability Equation (A closer look) Case-I : As MTTR approaches zero, A increases toward 100%. MTBF A = ——————— MTBF + MTTR

  7. Availability Equation (A closer look) Case-I : As MTTR approaches zero, A increases toward 100%. Case-II: As MTBF gets larger, MTTR has less impact on A. MTBF A = ——————— MTBF + MTTR

  8. Increasing Availability • Key is obviously to minimize downtime • As downtime approaches zero, availability approaches 100% 10 0 8 0 6 0 4 0 2 0 0 10 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 10 0 A v a ila blity

  9. The Rule of 9’s % Uptime Annual Downtime 100 0 hours 99.99 (4 – 9’s) 52.8 Minutes 99.98 1 hour 45 Minutes 99.90 (3 – 9’s) 8 hours 45 Minutes 99.80 17 Hours 30 Minutes 99.70 26 Hours 17 Minutes 99.50 43 Hours 43 Minutes 99.00 (2 – 9’s) 87 Hours 36 Minutes

  10. Why do you need Availability? – Issues which have caused problems or concerns with computer availability… • Terrorist attacks • Satellite Outages • Attacks by computer viruses • Emergence of internet as viable force

  11. Levels of Availability • Level 1: Regular Availability (Do Nothing Special) • Level 2: Increased Availability (Protect the Data) • Level 3: High Availability (Protect the System) • Level 4: Disaster Recovery (Protect the Organization)

  12. Twenty Key System Design Principles 20) Spend Money…but not blindly 19) Assume Nothing 18) Remove/Identify SPOFs 17) Maintain Tight Security 16) Consolidate Your Servers 15) Automate Common Tasks 14) Document Everything 13) Establish Service Level Agreements 12) Plan Ahead 11) Test Everything 10) Maintain Separate Environments 9) Invest in Failure Isolation 8) Examine the History of the System 7) Build for Growth 6) Choose Mature Software 5) Select Reliable and Serviceable Hardware 4) Reuse Configurations 3) Exploit External Resources 2) One Problem, One Solution 1) KISS: Keep It Simple Simple

  13. End-to-end Availability Measurem ent Application Network I nfrastructure System E-E-A Software Operating System Hardware

  14. Modeling Availability • Complex as system comprises many components • Most common techniques – Monte Carlo principle – Markov techniques • Basically state diagrams

  15. State Diagram

  16. W hat Does I t Mean to Us? How do you minimize downtime? Replication Backups Clustering Snapshot Mirroring 24 hrs 12 hrs 1 hr 10 min 1 min

  17. Trinity of TTs

  18. Module 2: Storage Area Networks…

  19. W hy Storage Area Netw orks? • Management of distant configurations. • Soft recabling. • Storage consolidation. • Heterogeneous connectivity. • Data sharing. • Massive configurations. • LAN-less and/ or server-less backup.

  20. W hy Fibre Channel? • Reliable Communication – Removes the performance barriers of legacy LANs. – Support for other, typically "non-network" protocols, such as SCSI. • Low-latency message passing • High bandwidth transfer – Connection and connectionless data delivery. – sustain data transfer rates at 90 Mbps – variable length (0-2 KB) frames. – Highly effective for protocol frames of less than 100 bytes, as well as bulk data transfer • Scalable networks.

  21. SAN Com ponents • Host Bus Adapter (HBA) • Channel • Switch/ Hub/ Bridge

  22. HBA • Fibre Channel Cards – Every device on the SAN has a World Wide Number (WWN) including HBA’s – 64 bit assigned by IEEE – Similar to the way MAC addresses are assigned to Network Interface Cards (NICs). • Vendors – JNI for Solaris – Emulex for NT

  23. Channel • Medium – Copper 30m – Fibre optics • Multimode 500m • Single mode 10km • Buffer to buffer copy • Transmission isolated from control • FC-0 through FC-4

  24. Topologies • Point-to-point – Two Nodes

  25. Topologies • Arbitrated loop – 126 nodes – Practically even less Hub

  26. Topologies Array • Fabric – 16 million nodes Switch Enterprise Switch Switch Hub Bridge Array JBOD

  27. Sw itches/ Hubs/ Bridges • Workgroup switches – 8 or 16 port – Redundant Power supplies – Hot Swappable GBIC’s • Enterprise Switches – 64 port – Everything redundant, everything hot swappable • Hubs – Connects FC-AL to FC-SW • FC/ SCSI Bridges – Reuse old JBOD or SCSI tape drives

  28. NAS Vs SAN • NAS devices are storage appliances big, single purpose servers that you plug into your network. • These appliances perform one task, and they perform it well: They serve files very fast. • The difference between how a NAS appliance and a SAN function is subtle. • NAS is a defined product that sits between your application server and your file system. • SAN is a defined architecture that sits between your file system and your underlying physical storage. • NAS is network-centric. • A SAN is data-centric.

  29. The Final Conflict • NAS appliances offer – performance and reliability at a low cost. – excellent devices for collaboration and data storage, especially in heterogeneous computing environments. – Yet, NAS appliances can send only files, not data blocks, which limits their ability. • SAN promises to free your network of bottlenecks. – traffic relief comes at a high price.

  30. Third level of High Availability • 85% of storage on Unix servers is unprotected! • RAID,Replication and Snapshots can protect you when disaster strikes. • New emerging concepts – Business Continuance Volumes (BCV) – Shared Storage Option (SSO)/ Smart Media – SAN over WAN – iSCSI

  31. Business Continuance Volum es Backup/ Restore High speed Tapeless Offsite Test Environm ent Softw are Lifecycles Y2 k/ Euro Currency Decision Support Reporting DataW arehouse

  32. BCV • sync-split-mount sequence – Directly form disk to internal cache and then BVC – Works at volume group level • Block-by-block copy • Only changed tracks copied at next sync. • Instantaneous fallback.

  33. Sharing Tape Libraries • Tape Drives are shared – Heterogeneous connectivity. – Reduces cost – Increases availability Enterprise Switch SUN NT Switch Bridge Tru6 4

  34. Module 3: High Availability trends for SAN…

  35. Sw itch/ Sw itch com ponents MIRRORING CLUSTER FC Switch SPOF;

  36. Sw itch/ Sw itch com ponents FC Switch MIRRORING CLUSTER FC Switch SPOF;

  37. MIRRORING CLUSTER Enterprise FC Switch SPOF;

  38. Drives Drives 50% Enterprise FC Switch Tape Drives SPOF;

  39. Module 4: Design Issues for Clustering…

  40. Design Issues • Objectives • Understand design issues of high-availability • Understand trade-offs of design issues

  41. Design Suggestions • Keep it simple – Complexity hurts long term maintenance and manageability • Know all single points of failure (SPOFs) • Avoid failover if possible

  42. Know and Document ALL Single Points of Failure • Look for all SPOFs in both hardware and software • Look for SPOFs both on hosts and on cluster as a whole • Could the failure of any single component prevent a client from accessing a vital service?

  43. A Typical Layout NETWORK : Ethernet heartbeat Hub Hub • hbas links • routers • switches • hubs • power source NICS OS Disks OS Disks SCSI2 SCSI2 SCSI1 SCSI1 DISKS : HOSTS : • hbas • critical file systems, • drives e.g. / and /usr Service Network • power source • power source NICs NICs FC0 FC0 FC1 FC1 SAN : TAPE : • hbas • hbas • routers • drives Bridge • switches • robots • hubs • power source • power source

  44. SPOF:Hosts Ethernet heartbeat Hub link NICS SCSI1 SCSI1 HOSTS : • critical file systems, e.g. / and /var Service Network • power source NIC NIC

  45. SPOF:Disks ethernet heartbeat Hub link NICS OS Disks OS Disks SCSI2 SCSI2 SCSI1 SCSI1 DISKS : • controllers • drives Service Network • power source NICs NICs

  46. SPOF:Heartbeat NETWORK : Ethernet heartbeat Hub Hub • nic links • switches • hubs • power source NICS SCSI2 SCSI2 SCSI1 SCSI1 Service Network NICs NICs

  47. SPOF:Storage/SAN Ethernet heartbeat Hub Hub links NICS SCSI2 SCSI2 SCSI1 SCSI1 Service Network NICs NICs FC0 FC0 FC1 FC1 Storage/SAN • hbas • cabling • switches • hubs • power source

  48. SPOF:Tape ethernet heartbeat Hub Hub links NICS SCSI2 SCSI2 SCSI1 SCSI1 Service Network NICs NICs FC0 FC0 FC1 FC1 TAPE : • hbas • drives Bridge • robots • power source

  49. SPOF:Network Switching SystemB Clients Switch SystemA

Recommend


More recommend