Introduction • Bugs CL1 23: – Definition – Examples Getting it right and • Algorithms getting it wrnog – Foundation of computer programs – All applications are programs Monday 13/11/2006 • Software design – Minimising the impact of bugs – Minimising human error! Bugs: Ariane 5 flight 501 Ariane 501 • Cost – $500 million of satellites on board • The bug – “Type conversion error” (Jargon!) – A 64-bit number was converted to a 16-bit number A – The value of the horizontal position was lost – Ariane self-destructs correctly cautionary • The error – Code not meant for that flight? tale A happy F-16 Well, almost … In simulation, software inverted aircraft as it crossed the equator 1
Less dramatic but.. Computer Bug • On August 28, 1993, 2 a.m. clocks in • Unwanted property of program code or some PCs in Israel suddenly lost an hour. hardware • On October 24, 1993, at 2 a.m. some PCs • Especially when it causes a malfunction in the UK did not lose an hour. • Bugs are common Unfortunately everyone else was turning – “In Windows 98 Microsoft supposedly fixed back their clocks that morning. 3000 bugs.” PC Computing, Sept. 1998 – Bugs can be unwanted security holes Remember..? First Bug? • Ariane: Programme was doing the right • Moth found in the Mark II computer by Admiral Grace Hopper in 1947 thing in the wrong rocket – error in requirement • Summertime: Programme was correctly doing the wrong thing – error in specification • F-16, Mariner: Programme(r) made a mistake – error in implementation Software design process Early bug: IEFBR14 • Requirements: statement of the problem • IEFBR14: one line of code for an IBM mainframe computer used in 70’s – Validation (fails: Ariane 501) • Instruction of code: • Specification: statement of what to do – “Do nothing” (i.e. wait for a short time) – Verification (fails: date error) • Contained a bug! • Implementation: doing it – Forgot to prepare the memory for the next – Design, Testing (fails:f16 (nearly)) instruction – Subsequent instructions went wrong • Note: the F-16 bug was the only one • Fixed code increased code size to 4 bytes! caught 2
Bugs: Patriot missile A few other causes • Error calculating the time since the • Evolutionary bugs (requirement drift) computer booted – Ariane, Patriot missile – Binary representation of 0.1 seconds limited • Human Interactions to 24 bits – USS Yorktown (data entry error), HMS • Once activated, navigation system drifts Sheffield (operational errors) • In the Gulf War 1991 – Caused a patriot missile to fail to intercept a • Communication Scud missile – Mars Orbiter: mixed imperial and Metric units – 28 killed, 100 injured • Most major failures have multiple causes Computer programs Bugs in programs • Computers are excellent at following • Memory leak instructions – Forget to release memory after it has been used • Other easy/common mistakes – Identify how to solve the problem – Variable not set to the right initial value – Use a computer! – Divide by zero: answer is infinity! • Major difficulties are – Get a number wrong by 1 – Expressing problems that can be solved – Loops that never end using efficient algorithms • Spelling mistakes – Giving the computer the correct instructions – Usually prevented by the code not compiling – Making the program user-friendly – Not always! (Mariner 1) Fault tolerant systems Mariner 1: • Creating fault free systems • Failed “because the line – Difficult and time-consuming should have • Fault tolerant systems operate successfully – DO 10 I=1.100 despite faults read • Hardware: back-up systems – DO 10 I=1,100” • Software: – Keep multiple copies of (back-up) the data – Identify and monitor critical variables • There’s rather more to it than – Checkpointing: reset system to a stored set of values that.. 3
Example: Aircraft failure rates Software design: Waterfall model • Fatal accident rate Analyse the problem 1 death in 1,000,000 flying hours → Design solution architecture • System causes 10% of accidents → Design solution details • 100 critical systems in an aircraft → Write program code • Rate of failure → Test code 1,000,000 hours × 100 systems / 10% → Maintain code = 1 fatal fault in 1,000,000,000 system flying • Problems: hours – Original analysis is difficult Good enough? – Problems identified at the end can be expensive to fix Defensive programming Iterative design model • At each stage • Anticipate possible circumstances – Design → Prototype → Evaluate → Redesign – All stages developed concurrently, with feedback • Trust nothing between all stages – Check what you are being told e.g. • Advantages • angles between 0 and 359º • day-of month is between 1 and 31 – User-defined from the start – Check what you are telling others – Performance can be measured much earlier – Sanity checks on actions taken • Problems • Fail in predictable manner if fault occurs – Time consuming – Requires good management • Layered protection including hardware ‘back- stops’ Beta testing In the news … • Refers to the 2nd phase of software testing • Cost of Child Support Agency’s new – Sample of the intended audience tests the product computer system: – It works for the programmer, does it work for the – £456 million user? – Provides a “preview” of software: it’s free! Buggy! (Scottish parliament building: £431 million) • Emerging software: look for “Beta” versions • Unable to cope with the work load – At Google http://labs.google.com/ – Backlog of 30,000 cases per month – At MSN now: http://beta.search.msn.com/ – Dedicated Web site www.betanews.com/ • How could this happen? • [Beta is the second letter in the Greek alphabet. “Alpha” testing refers to the • Source: http://news.bbc.co.uk/1/hi/uk_politics/4020399.stm first phase: checking it works for the programmer] 4
IT systems development Reporting problems • Difficult initial problem analysis • Relevant details: – IT systems supplement existing practice – Username – Easy to be over-ambitious – Goals can change – Date, Time – Practical difficulty of establishing user’s goals – Location • Changing technology – Computing environment (Operating system…) – Technology is quickly obsolete – Limited experience with new technology • The fault: • Complexity: – Observations – Large programs use ~100,000 lines of code – And separately, any inferences – High staff turnover Key Points • Computers solve problems using algorithms • Bugs result from human-computer interactions • Techniques exist to try and control the effects of bugs 5
Recommend
More recommend