comments on formal methods application
play

Comments on Formal Methods Application: concrete example, many of - PDF document

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 29, NO. 6, JUNE 2003 567 Comments on Formal Methods Application: concrete example, many of these principles come into sharp focus. Never have we read Christensen with more interest than in


  1. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 29, NO. 6, JUNE 2003 567 Comments on “Formal Methods Application: concrete example, many of these principles come into sharp focus. Never have we read Christensen with more interest than in the An Empirical Tale of Software Development” context of the Sobel and Clarkson experiment! We hope that this note will help motivate computer scientists to study with renewed Daniel M. Berry and Walter F. Tichy interest the body of knowledge about experimental design. Abstract —We comment on the experimental design and the result of the paper 2 O VERVIEW OF P UBLISHED P APER mentioned in the title. Our purpose is to show interested readers examples of what In the following discussion, the published paper [5] by Sobel and can go wrong with experiments in software research and how to avoid the attending problems. Clarkson is referred to as “the TSE paper.” The personal pronoun “we” refers to the authors of the present note, i.e., Berry and Tichy, while the phrase “the investigators” refers to Sobel and Clarkson. 1 � In the TSE paper, the investigators describe an experiment in 1 I NTRODUCTION which two groups of mostly two-person teams of university E MPIRICAL studies and controlled experiments in particular have students were asked to develop running programs to meet the become an important tool for understanding the nature and requirements of a given problem, an elevator simulation problem. efficacy of software methods and tools. A positive trend in recent One group of teams developed formal specifications, the other did years has been that the number of papers with empirical data not. The investigators observe that the formal methods group’s published in IEEE Transactions on Software Engineering ( TSE ) and solutions are “far more correct than the nonformal solutions.” elsewhere has been increasing. This trend is motivated in part by Additional details appear in a second paper, hereafter called the the realization that, unlike in the early days of software research, “Inroads paper,” authored by only Sobel [4]. mere demonstration of a new tool or method is not enough. There The formal methods group consisted of undergraduate stu- is a bewildering variety of software engineering methods and the dents who had voluntarily participated in a formal methods relative merits of competing approaches are poorly understood. curriculum. This curriculum consisted of a course on formal Furthermore, the methods and their interactions with the real program derivation and a course on the axiomatic semantics of world of software development are too complicated to be under- data structures, both taught using a first-order-logic specification stood by theory alone. Actual observation of programmers in language, plus a course on object-oriented design including UML. realistic settings is beginning to go hand in hand with the The other group, the control group, consisted of undergraduate development of new methods and techniques, thus putting students whose training differed in that they did not take part in software research on a firm footing. the program derivation course, took a data structures course In this vein, it is heartening to see the experiment by Sobel and covering the same topics as the formal group except for the Clarkson [5]. The experiment collected evidence that “formal axiomatic semantics, and took the same course on OO-design. The methods students had increased complex problem solving skills” elevator programming task was an assignment in the OO-design and that “the use of formal analysis during software development course. There were additional courses to be taken later. produces ‘better’ programs.” Formal methods have a long history Both curricula taught the same material, in the same sequence, of theoretical research, but rigorous, empirical evaluation is scarce. by the same instructors, using the same examples, the same Pfleeger and Hatton published a case study [3] on formal methods programming assignments, and the same exams, except for formal with inconclusive results; their paper points to additional case methods. Thus, the investigators have tried to maintain the studies in this area. Sobel and Clarkson report on the first quasi equivalence of the two groups except for the experimental experiment on formal methods. treatment, the continual exposure to formal methods by the Unfortunately, the paper contains several subtle problems. The members of the formal methods group. reader unfamiliar with the basic principles of experimental The programming task used to assess the two groups was the psychology may easily miss them and interpret the results development of an elevator simulation. Each group divided into incorrectly. Not only do we wish to point out these problems, teams, each with two members on average. Each team was to but we also aim to illustrate what to look for when drawing develop a running solution as a homework assignment. Six teams conclusions from controlled experiments. We thus hope to help of the formal methods group and 11 teams of the control group handed in solutions that compiled properly. 2 Each team was both experimenters and readers of empirical software research to become more astute in regards to meaningful experimentation in encouraged to submit UML diagrams of its design. Each formal software research. method teams was asked to submit a formal specification of its Much has been written about experimental methodology; a solution. classic text is the book by Christensen [2]. The book covers a wide The investigators found that 100 percent of the programs produced by the formal method group teams passed all of a set of range of experimental principles, including control, experimental design, data collection, validity, ethics, and hypothesis testing. six test cases, while only 45.5 percent of the programs produced by the control group teams passed all of the same set of test cases. However, since the book is written for psychologists, it may appear This is the main result of the experiment and seen as strong dry and inaccessible to software researchers and practitioners. evidence of the power of formal methods. However, by using the experiment by Sobel and Clarkson as a Standardized ACT tests found no statistical difference between the abilities of students in the two groups at the beginning of the . D.M. Berry is with the School of Computer Science, University of curricula. The investigators conclude, therefore, that the two Waterloo, 200 University Ave., West Waterloo, Ontario N2L 3G1, populations, the students in the two groups, are alike in all aspects Canada. E-mail: dberry@haifa.math.uwaterloo.ca. except for the training in formal methods. . W.F. Tichy is with the Department of Informatics, University of Karlsruhe, 76128 Karlsruhe, Germany. E-mail: tichy@ira.uka.de. 1. This is a simplification because Clarkson was actually a student Manuscript received 12 July 2002; accepted 20 Feb. 2003. participating in the experiment. Recommended for acceptance by D. Rosenblum. 2. There is a discrepancy regarding the number of formal methods For information on obtaining reprints of this article, please send e-mail to: teams—it is four in the Inroads paper but six in the TSE paper. In personal tse@computer.org, and reference IEEECS Log Number 116936. communication, Clarkson asserted that the correct number is six.

Recommend


More recommend