Develop Quantitative Reliability Roadmap to Meet Market’ s Expectations Xuemei Zhang Alcatel-Lucent April 27, 2007
Introduction Gaps between a product’s target and current-release availability can arise… � … in early releases of new products � … when product is deployed in new scenario, such as supporting VoIP or IPTV by a traditionally IP data-only product � … when significant software features or hardware/architecture changes are made Reliability roadmapping is the best practice for managing closure of an availability gap Product management owns product roadmaps; reliability roadmaps are an key input to overall product roadmaps This presentation details what a reliability roadmap is, how to construct one, and how to use that roadmap to manage closure of an availability gap 2 | Reliability Roadmap | April 2007
Outline The Business Problem and Solution � New Product Reliability Risk � New Deployment Scenario Reliability Risk � New Feature Reliability Risk � Reliability Roadmap as a Solution Reliability Roadmap Elements Availability Improving Features Connecting-the-Dots Roadmapping End-to-End Solution Availability Recommendations for Product Managers 3 | Reliability Roadmap | April 2007
Business Problem – New Product Reliability Risk Market expects 99.999% availability for most of Lucent’s products � Best practice for assessing market’s availability expectation given in a companion presentation Significant risk in achieving 99.999% availability in initial (or early) releases of most products because: 1. Some availability features may have been deferred from initial product release(s) in favor of higher-priority features 2. High availability system configurations (e.g., N+K, duplex controllers) may not be supported in initial release(s) (note: high availability configurations may be required in RFx’s, but not actually be purchased and hence not reflected in business cases) 3. Software may not be sufficiently mature to have low enough failure rate 4. Software may not be sufficiently mature to have sufficiently effective and efficient automatic failure detection, isolation, alarming and recovery mechanisms 4 | Reliability Roadmap | April 2007
Business Problem – New Deployment S cenario Reliability Risk As existing products are deployed in new scenarios, they may encounter different availability expectations, thus exposing a gap; for example � Network element availability expectations for VoIP and IPTV may be higher than for data-only deployments � Basestation availability expectations for wireless local-loop may be higher than for typical mobility deployments 5 | Reliability Roadmap | April 2007
Business Problem – New Feature Reliability Risk As existing products evolve, large, availability-impacting features may be added, such as: � Adding VoIP or other major capability � Expanding architecture/configuration (e.g., adding duplex controllers) � Changing blades or major hardware elements Significant changes to existing products increase reliability risks of: 1. “degrow” software reliability (increase failure rate) or 2. reduce system’s ability to effectively detect and isolate failures (lower coverage factor) or 3. add latency to recovery/restart times � thus adding software downtime � Note: hardware downtime for a particular element typically changes little from release-to-release, so release-by-release roadmapping of hardware elements is less common 6 | Reliability Roadmap | April 2007
Business S olution: Reliability Roadmap The risk in purchasing a release of a system that doesn’t currently meet a customer’s availability expectations can be reduced by providing a credible, concrete plan for closing the availability gap in an upcoming release… .a.k.a., a “reliability roadmap” Key elements of a reliability roadmap 1. Gives ‘ultimate’ quantitative system availability goal(s) and definition 2. Availability estimate of current release and system configuration 3. A target release and system configuration to meet a specific availability level 4. Per-release availability budgets to plausibly close the gap between current release performance and specific availability goal in target release 5. By-release enumeration of features and/or factors that will support this availability growth 7 | Reliability Roadmap | April 2007
Outline The Business Problem and Solution Reliability Roadmap Elements 1. Ultimate Availability Goal 2. Estimate Availability of Current Release 3. Specific Release Identified to Meet Goal 4. Per-Release Availability Improvement Targets 5. Per-Release Availability Improvement Features � Graphical Example Availability Improving Features Connecting-the-Dots Roadmapping End-to-End Solution Availability Recommendations for Product Managers 8 | Reliability Roadmap | April 2007
Roadmap Element 1: Set “Ultimate” Availability Goal Availability goals are typically set for annualized minutes of unplanned, supplier-attributable “ total” system unavailability (meaning greater than 90% capacity lost) � Includes both hardware and software downtime, but may exclude planned/ scheduled downtime for upgrades, updates, growth, etc � Market expectation for most telecom products is 5.25 down-minutes per year (99.999% availability) Partial-capacity-loss events are quite common, and thus sophisticated customers may have availability expectations for pro-rated partial-capacity- loss availability � TL-9000 defines partial-capacity-loss to be greater than 10% capacity loss, but less than 90% capacity loss Planned unavailability includes system downtime for upgrades, updates, reconfiguration, growth, degrowth, and so on. S ophisticated customers may have clear planned downtime expectations � S ome sophisticated customers (e.g., Nextel) explicitly define their 99.999% availability requirement to include planned events, as well as unplanned events Quantitatively define exactly what ‘ ultimate’ obj ective is Example: Availability goal for Product A is 99.999% unplanned, supplier- attributable, (partial) pro-rated availability16 9 | Reliability Roadmap | April 2007
Roadmap Element 2: Estimate Availability of Current Release Estimating the availability of the current release of a product provides the baseline availability and helps identify the gap with the market’s availability expectation The availability of a baseline release can be estimated from: � Field data if the release is out in the field and reliable data exists � Lab data via system reliability modeling 10 | Reliability Roadmap | April 2007
Roadmap Element 3: Set Specific Release to Meet Target As with any business objective, explicitly setting a clear scheduled completion goal is essential Since products are typically planned and managed on a release basis (rather than a calendar basis), recommend setting a target release 11 | Reliability Roadmap | April 2007
Roadmap Element 4: Set By-Release Improvement Targets Based on the availability of the baseline release and the release planned to meet the market expectation, by-release reliability improvement targets can be set to plan the reliability growth. Product A Reliability Roadmap Linear Growth Actual Release X Downtime Annual Downtime (min/yr) RX R(X+1) R(X+2) R(X+3) R(X+4) Release 12 | Reliability Roadmap | April 2007
Roadmap Element 5: Set By-Release Feature Investments Investing in reliability improving features is often required to achieve high availability in a timely manner. Example: Product A reliability roadmap Release by Release Reliability Feature Sets RX R(X+1) R(X+2) R(X+3) R(X+4) 13 | Reliability Roadmap | April 2007
Roadmap Example Product A Reliability Roadmap Element 2: Estimate current availability Annual Downtime (min/yr) RX R(X+1) R(X+2) R(X+3) R(X+4) Element 1: Set Ultimate availability Release goal Element 3: Pick a Element 4: Set rough release to achieve Element 5: Set per-release per-release targets availability goal feature investments to achieve availability goal 14 | Reliability Roadmap | April 2007
Outline The Business Problem and Solution Reliability Roadmap Elements Availability Improving Features Connecting-the-Dots Roadmapping End-to-End Solution Availability Recommendations for Product Managers 15 | Reliability Roadmap | April 2007
Availability Improving Features Product availability improves in 3 general ways � Maturation of software and support (both service provider and Lucent) reduces software failure rates, shortens outage durations for manually-recovered events, and improves reliability of manual maintenance activities � This growth is fairly slow, often not keeping pace with reliability degrowth from addition of new features � Investment in reliability/availability improving features. Broadly, these features address one or more of the following: 1. Reduce failure rates 2. Reduce impact of failures 3. Improve efficiency of failure detection, isolation, alarming and recovery 4. Shorten recovery latency 5. Improve Design-for-Serviceability (DfS) 6. Reduce planned downtime 7. Policy and other items � Technology change – products can undergo significant changes in architecture, configuration, hardware or software which can significantly affect availability. � Often managed via product’s feature roadmap 16 | Reliability Roadmap | April 2007
Recommend
More recommend