Oceano The Océano Project The Océano Project Intelligent eUtilities Infrastructure Intelligent eUtilities Infrastructure A Self-Managed Server Farm A Self-Managed Server Farm Germán Goldszmidt Germán Goldszmidt (gsg@ us.ibm.com) (gsg@ us.ibm.com) O céano Team The O céano Team and The and IBM Research IBM Research September 2001 September 2001 IBM Confidential 1 09/24/0111:45 AM
Oceano - Presentation Outline Océano - Presentation Outline Océano Motivation S ample S cenario Architecture Components Status IBM Confidential 2 09/24/0111:45 AM
Oceano Multi-Customer Farms: Today Multi-Customer Farms: Today Independent Islands Problems N on-shared dedicated WXY XYZ hardware for each customer O ver provisioning SportsWeb Macys (peak loads 10:1) Lack rapid response to demand Today TCO high Administration IBM Confidential 3 09/24/0111:45 AM
Oceano Farms: Future Océano Farms: Future Océano Characteristics Future WXY XYZ Provisioning Platform Macys S hared Infrastructure Isolation for each realm SportsWeb peaks covered (autonomic) rapid allocation of resources Virtualize the hardware Unified management Automation reduce administration cost IBM Confidential 4 09/24/0111:45 AM
Oceano Océano Objectives Objectives Océano Efficient infrastructure for eUtilities Multi-customer hosting on a virtualized collection of resources Drive down people management costs via automation Handle spiky workloads ; provide capacity on demand Automated, fast add/remove [clean, secure] servers, bandwidth, storage Create Infrastructure SLA (ISLA) contracts support dynamic resource allocation model IS LA monitoring and enforcement S calable and highly available Technology applies to several environments: N etGen S Ps, Enterprises, ... IBM Confidential 5 09/24/0111:45 AM
Oceano Flow of Requests into Server Farm Internet Requests Router Oceano Resource Control Admin WES Network Dispatcher SND Macys Macys Free WS Free Macys WS Free WS WS WS Dolphins Macys Free Free Macys Macys Macys WS WS Free Free Free WS WS Free Free Free Free Free Free Free Free Free Free Whales IBM Confidential 6 09/24/0111:46 AM
Oceano ISLA Monitoring ISLA Monitoring Internet Neptune Requests Escalate Oceano Event Router Resource Control SLA Admin Monitor SND Yemanja WS M WS WS WS WS WS WS WS Macys Macys Macys Macys Macys Macys WS WS WS WS Macys Performance WS WS Metrics Free M Free Free Free Free Free Free Free Free IBM Confidential 7 09/24/0111:47 AM
Oc é é ano ISLA-based resource ano ISLA-based resource Oc Oceano reallocation reallocation Analysis Resource Allocation (Neptune) Free Free Free Free Free Free Free Free Free Decision Macy's +2 Server Manager Selection (e-clams) Prime Di, Dj for Macy's Macys Free Free Priming Free Macys Preparation (Khnum) Free Free Free Free IBM Confidential 8 09/24/0111:47 AM
Oceano After the addition of 2 servers After the addition of 2 servers Internet Requests Router Oceano Resource Network Dispatcher Control Admin SND + HarborMaster Free WS Free WS Free WS WS Dolphins WS Macys Macys Macys Macys Free Macys Free Macys Macys WS WS Free Free Free Macys Macys WS WS Free Free Free Free Free Free Whales Free Free Free Free IBM Confidential 9 09/24/0111:48 AM
Oceano Océano Components Components Océano ResourceAllocation (Fortuna) Resource Coordination ISLA Contracts (Salmon) (Neptune) Policy Event Correlation ISLA Monitor (Yemanja) (Yemanja) GUI Server Resource (Pismo) Management (e-Clams) Control Priming/FS Traffic (Khnum) Throttling Configuration (HarborMaster) Infrastructure Server App. monitor Monitors HA/HB/Topology/VLANs (Nautilus) (GulfStream) (Kelp) Existing DB/2 WES Network AFS Dispatcher Components IBM Confidential 10 09/24/0111:48 AM Hardware - Netfinity/Linux RS6K/AIX - LAN Switch
Oceano Policy Layer Components Policy Layer Components S almon Contract definition, pricing, billing Yemanja IS LA monitor Problem Determination (event correlation) N eptune Resource coordination Fortuna Intelligent Proactive Allocation IBM Confidential 11 09/24/0111:48 AM
Oceano Salmon - Service Agreement Levels for Salmon - Service Agreement Levels for Monitoring Océano coNtracts Monitoring Océano coNtracts GUI Interface IS LA contract definition 1 11 IS LA Manager 8 Contract Builder Contract Response Automation Evaluator 2.1 Pricing Engine 10 Violation Detection 2.2 Violator, Grace Period, 7 Reports Action/Penalty DB ISLA Manager 5 Pricing engine: 9 3 Flat-rate, Usage-based and penalties 6 Yemanja for violation 4 S tandard Equations: Charges: Contract Flat-rate, Usage-Based, Monitors S ub Contract Addition, Penalty Off-line activities per Violation and prediction queries On-line activities Futures and Options IBM Confidential 12 09/24/0111:48 AM
Oceano Yemanja - Event Correlation Yemanja - Event Correlation problem determination hierarchical event correlation hardware faults, application faults, ISLA performance violations policy monitoring and violation detection integrate detection with performance monitoring and problem determination automated violation handling alert resource manager (N eptune) open problem records IBM Confidential 13 09/24/0111:48 AM
Oceano Yemanja - Problems to be addressed Yemanja - Problems to be addressed Difficult to capture complex problem scenarios Need system that integrates high level SLA violation with low level network monitoring Need method to propagate problems to all effected system recognize effected components resist hard coding of dependency information hard to anticipate all effected components component models become large and unmanageable, adding new components can effect preexisting component models cancel dependent problems when initial problem is fixed Simultaneous faults Uncertainty in causal implications Lost and spurious alarms Need for integrated testing Scenario waits Dynamic system configuration changes IBM Confidential 14 09/24/0111:48 AM
Oceano Yemanja Features SLA violation detection integrated into correlation rules Rules can contain a mix of methods and events This allows for the collection of additional data, or the analysis of state information before the complete set of required events have arrived Associate and rank rules that represent alternate solutions to the same set of events Events propagate through the abstract component dependency chain using publish-subscribe semantics Built in problem database canceling root problem, cancels dependent problems automatically Flexible way to collapse multiple events of the same type to a single set based event specification Can require that some % of resources in a resource-set generate the selected event IBM Confidential 15 09/24/0111:48 AM
Oceano Neptune Neptune Reactive resource allocation plan based allocates servers, bandwidth Reacts to performance problems component failures ISLA violations Activated by Yemanja IBM Confidential 16 09/24/0111:48 AM
Oceano Fortuna - Resource Allocation Strategy Fortuna - Resource Allocation Strategy Goals: Improve Performance + Maximize Revenue Planned + Reactive Planned: use prediction of periodic traffic patterns Construct a resource allocation plan (e.g. for the next 24 hours). Reactive: (de)Allocation based on current load Correct initial plan give feedback to improve the prediction/analysis. O perate in a fully reactive mode for a new customer or if the system observes unexpected behavior IBM Confidential 17 09/24/0111:48 AM
Oceano Preliminary Example of a Layered ISLA Customer ISLA Current load 16 océano How Units of server Many capacity 10 Server monitor 8 To 7 Required capacity range 6 Allocate? 5 4 3 2 1 Levels of guarantee: always strong weak Best effort IBM Confidential 18 09/24/0111:48 AM
Oceano ISLAs and Revenues ISLAs and Revenues Layered IS LA Current state depends on the required server capacity, and state parameters (layer i): maxi servers – the layer’s boundary Charge for capacity, time unit Ci Penalty for a violation in this layer Pi size depends on the level of guarantee Options Exercised implicitly according to measured load. Price of an option depends on the level of guarantee, the capacity (maxi) and can also depend on the expected usage. IBM Confidential 19 09/24/0111:48 AM
Oceano Scenario Scenario Active Scenario: Scenario 1: {[Server_Set(4, 4, 2)], 00:00 Dec/01/2000, 23:59 Dec/31/2001, 1} Definition Level Allocation on the 3 levels of guarantee Resources Requested X Resources Allocated 10 Ceiling Servers 8 Allocated T4 6 Guaranteed- No. of Scalability servers 4 T3 Servers Floor 2 Requested T2 0 T1 T2 T3 T4 T1 Time 0 5 10 No. of servers Monitoring Level Over Provisioning Violation Usage-Based Charge Calculation Charging Level IBM Confidential 20 09/24/0111:48 AM
Oceano Resource Control Layer Resource Control Layer eClams server allocation/reclamation Khnum application and data priming HarborMaster bandwidth management (request throttling) Pismo-Beach GUI IBM Confidential 21 09/24/0111:48 AM
Recommend
More recommend