BIO PRESENTATION SUPPLEMENTAL T15 September 22, 2005 1:30 PM D EFECT P REDICTION WITH R ELIABILITY G ROWTH M ODELING Michael Allegra GXS BETTER SOFTWARE CONFERENCE & EXPO 2005 September 19-22, 2005 Hyatt Regency San Francisco Airport San Francisco, California, USA
Michael Allegra Michael Allegra has devoted most of his 14 year career in software testing to leading and managing teams at IBM’s Global Services division. He was responsible for delivering high quality software for Fortune 500 customers and managing a billing test group responsible for $250 million in annual revenue. Michael has led process improvement activities for his organization involving CMM and industry best practices. He recently left IBM and joined GXS where he manages a test group for the leading provider of messaging applications and services.
Defect Prediction Defect Prediction with Reliability with Reliability Growth Modeling Growth Modeling Michael Allegra Michael Allegra IBM/GXS IBM/GXS
Agenda Agenda • How will this technology help me ? • Overview of Reliability Growth Models • Describe a model applicable to most software development • CASRE Tool setup (its free!) • Entering the data and format with provided templates • Interpreting the results • Summary
Why should I care? Why should I care? • Software reliability models can tell us; – Number of latent defects – ‘Good Enough’ testing – Effectiveness of current test cycle – Service level/Availability target validation – Help Desk staffing needs
Reliability Growth Modeling Reliability Growth Modeling Typical formulas Typical formulas
Reliability Growth Modeling Reliability Growth Modeling Typical formulas Typical formulas Fahgetaboutit! We aren’t covering that!
Basic Concepts Basic Concepts • Skip the Rocket Science stuff! • Reliability Growth Modeling is a subset of the Software Reliability Engineering methodology. • SRE covers a wide range of technical topics that cannot be covered in this brief tutorial. • RGM can be useful without all the SRE steps by following some basic principles
Model Types Model Types • Defect prediction models Static or Dynamic – Static • Defects per KLOC, Function Point, Class, etc • Based on historical projects • Example, test team historically finds 5 defects/FP and 1 defect escapes per every 7 found in test – 21FP in release = 105 defects found in test phase and 15 escapes or latent defects. • Useful for ballpark resource planning. • Not always accurate due to assumptions • Proven best for model level predictions
Dynamic models Dynamic models • Uses defect data from actual project as it moves through development lifecycle • Best suited for product or service level reliability, not modules • Two basic models – Full-life cycle: Rayleigh model – Qualify phase – reliability growth models
Dynamic Models;Rayleigh Dynamic Models;Rayleigh • A type of Weibull distribution • Models entire development life cycle
Dynamic Models; Reliability Growth Dynamic Models; Reliability Growth • Reliability growth is present when failure intensity decreases as test time increases • Used during Independent Test phase – Function Test, System Test, etc – When testing most closely resembles end- user activity • Two types of Reliability Growth Models; – Time Between Failures (when is next failure) – Fault Count (how many failures exist)
Recommended Model Recommended Model • Yamada S-shaped model • Fault detection rate starts flat and builds reflecting tester learning curve and test environment stability Yamada S-shaped model 80 70 60 50 defects 40 30 20 10 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 test interval
Model assumptions Model assumptions • Software is tested in a similar manner as customers • Unknown number of defects at start of formal testing • Fixes are clean • Discovered defects are quickly fixed and not present in future test intervals • Finite number of faults
Recommended Project Types Recommended Project Types • At least 10 test intervals • Expecting at least 50 defects • One month or greater test time • For iterative development: – Must allocate new functions delivered with new defects – See John Musa’s book for details
Team must agree on defect definition! Team must agree on defect definition! • The IBM definition of a defect is ‘any non-conformance to the agreed requirements, acceptance criteria, specifications, plans, standards or other input requirements.’ • Defects that are ‘opened’ and subsequently ‘cancelled’ by a tester should not be included. • Any behavior of the product that would result in a customer reporting a defect after the product is released should be considered a defect. • Consideration should also be given to the following types of defects: – Documentation – Build – Packaging – Installation – Performance – Usability
CASRE CASRE • Computer Aided Software Reliability Estimation • Developed by Jet Propulsion Laboratory • Yes, its Free! • Primarily developed for scholars, researchers, etc. • Demo will cover basic scenario step by step
CASRE CASRE • Requires MS Windows • Download CASRE 3.0 from: – www.openchannelfoundation.org/projects/CASRE_3.0 – You will need to register and fax back a license agreement. • Installation instructions are included on web site. • Verify the software starts without error. – Note: make sure the ‘casrev3.ini’ file is in your windows directory (c:\winnt)
Process flow, high level Process flow, high level 2) Import to 1) Update data file CASRE Prediction per test interval 5) Interpret 3) Select/Run Results Model EXCEL 4) Import results then Format CASRE
Create the data file Create the data file • Start this after you have about 5 test intervals. • Before you begin, gather your defect list that contains the following information – Date defect opened (sort list by date) – Number of defects per severity per date – Interval size per increment (hours/day, days/week, etc) • Format data file as shown in template • Save as type ‘internal#.dat’ (Oct18.dat) • Isolate yourself from interruption while creating!
Create defect data file Create defect data file • Test interval time • Test interval # of Defects Test Time Severity
CASRE CASRE • Start CASRE, open .dat file • Verify data imported correctly • Setup and Execute Model – Select Data Range – Select future Prediction Intervals – Choose model, ‘Yamada s-Shaped’ • Select model results • Check ‘Goodness to Fit’
CASRE CASRE DEMO IN CASRE
Import and Format Results Import and Format Results
Model Results; Predicted vs Actual Defects Model Results; Predicted vs Actual Defects Total Defects Predicted vs. Actual Predicted defects uncovered w ith 5 additional test days 60 56 50 53 40 Defects 30 20 10 0 9/26 10/3 10/10 10/17 10/24 10/31 11/7 11/14 11/21 11/28 12/5 Test Day Predicted Cumulative failures Actual Cumulative Failures If there are significant differences past the first 1/3 of the test cycle then the testing effectiveness must be reevaluated for the following: •If the actuals continue to be well below the prediction, the test coverage is probably insufficient or there are not enough people testing given the current rate of execution. •If the actuals are above the prediction then you may need to re-check your parameters in your data file and in CASRE.
Model Results; Predicted failure rate Model Results; Predicted failure rate Predicted Failure Rate by Test lnterval Failures per test hour 0.50 0.40 0.30 0.20 0.10 0.00 9/26 10/3 10/10 10/17 10/24 10/31 11/7 11/14 11/21 11/28 12/5 Estimation at each Test Iteration Push for additiona l test time if Predicted Failure rate: •still increasing •has not started to decline significantly •remains at a level that is unacceptable to the team and business organization Consider reducing test time if Predicted Failure rate: •Is at a level that is acceptable to the business, i.e. ‘we will not find enough defects to justify the cost of the remaining test intervals’ •has leveled-off at a low rate and the test coverage has hit all major functions, therefore rate is not likely to increase.
Model Results; Total Defects Predicted Model Results; Total Defects Predicted Estimated Total Defects in Product as testing progressed (95% Confidence Interval & Most Likely Estimate) 120 100 High 79 80 ects Defect Most Likely 61 60 40 Low 53 20 0 3 0 7 0 7 4 1 / 1 1 2 2 2 / 2 1 / / / / / 1 1 1 1 0 0 1 1 1 1 1 1 Estimation at each Test Iteration After a flattening of the estimated total defects, if the rate begins to rise again it can mean the scope of the testing is not sufficient and test coverage should be increased.
Model Results; Defects Predicted vs. Actual Model Results; Defects Predicted vs. Actual Actual Latent Defects 80 HIGH 75 Defects 70 65 Most Likely 60 55 50 LOW 45 12/16/03 12/23/03 12/30/03 1/6/04 1/13/04 1/20/04 1/27/04 Post GA defects •Exited test with 53 uncovered defects •Model predicted 61 defects most likely •62 defects total after 2 months release. •No further defects reported to date. Model proved to be very accurate!
Recommend
More recommend