Outline Experimental Evaluation in Computer Science: A Motivation - PDF document

Outline Experimental Evaluation in Computer Science: A • Motivation Quantitative Study • Related Work • Methodology • Observations Paul Lukowicz, Ernst A. Heinz, Lutz • Accuracy Prechelt and Walter F. Tichy • Conclusions • Future work! Journal of Systems and Software January 1995 Related Work Introduction • 1979 surveys say experiments lacking • Large part of CS research new designs – 1994 say experimental CS under funded – systems, algorithms, models • 1980, Denning defines experimental CS • Objective study needs experiments – “ Measuring an apparatus in order to test a hypothesis ” • Hypothesis – “If we do not live up to traditional science standards, no one will take us seriously” – Experimental study often neglected in CS • Articles on role of experiments in various CS • If accepted, CS inferior to natural sciences, disciplines • 1990 experimental CS seen as growing, but engineering and applied math • Paper ‘scientifically’ tests hypothesis 1994 – “Falls short of science on all levels” • No systematic attempt to assess research Select CS Papers Methodology • Sample broad set of CS publications (200 papers) • Select Papers – ACM Transactions on Computer Systems (TOCS), volumes 9-11 • Classify – ACM Transactions on Programming Languages • Results and Systems (TOPLAS), volumes 14-15 • Analysis – IEEE Transactions on Software Engineering (TSE), volume 19 • Dissemination (this paper) – Proceedings of 1993 Conference on Programming Language Design and Implementation • Random Sample (50 papers) – 74 titles by ACM via INSPEC (24 discarded) + 30 refereed 1

Select Comparison Papers Classify • Neural Computing (72 papers) – Neural Computation, volume 5 – Interdsciplinary: bio, CS, math, medicine … – Neural networks, neural modeling … – Young field (1990) and CS overlap • Optical Engineering (75 papers) – Optical Engineering, volume 33, no 1 and 3 – Applied optics, opto-mech, image proc. • Same person read most – Contributors from: ee, astronomy, optics… – Applied, like CS, but longer history • Two read all, save NC Subclasses of Design and Major Categories Modeling • Formal Theory • Amount of physical space for experiments – Formally tractable: theorem’s and proofs – Setups, Results, Analysis • Design and Modeling • 0-10%, 11-20%, 21-50%, 51%+ • To shallow? Assumptions: – Systems, techniques, models – Cannot be formally proven ! require experiments – Amount of space proportional to importance by • Empirical Work authors and reviewers – Amount of space correlated to importance to – Analyze performance of known objects • Hypothesis Testing research • Also, concerned with those that had no – Describe hypotheses and test experimental evaluation at all • Other – Ex: surveys Assessing Experimental Outline Evaluation • Look for execution of apparatus, techniques or methods, models validated • Motivation • Tables, graphs, section headings… • Related Work • No assessment of quality • Methodology • But count only ‘true’ experimental work • Observations – Repeatable • Accuracy – Objective (ex: benchmark) • No demonstrations, no examples • Conclusions • Future work! • Some simulations – Supplies data for other experiments – Trace driven 2

Observation of Major Categories Observation of Major Categories • Majority is design and modeling • The CS samples have lower percentage of empirical work than OE and NC • Hypothesis testing is rare (4 articles out of 403!) • Combine hypothesis testing with empirical Observation of Design Sub- Observation of Design Sub- Classes Classes • Higher percentage with no evaluation for CS • Many more NC+OE with 20%+ than in CS vs. NC+OE (43% vs. 14%) • Software engineering (TSE and TOPLAS) worse than random Groupwork: How Experimental is Observation of Design Sub- WPI CS? Classes • Take 2 papers: KDDRG, PEDS, SERG, DSRG, AIDG, GTRG • Read abstract, flip through • Categorize: – Formal Theory – Design and Modelling + Count pages for experiments – Empirical – Hypothesis Testing – Other • Shows percentage that have 20%+ or more • Swap with another group to experimental evaluation 3

Outline Accuracy of Study • Deals with humans, so subjective • Psychology techniques to get objective • Motivation • Related Work measure • Methodology – Large number of users ! Beyond resources (and a lot of work!) • Observations – Provide papers, so other can provide data • Accuracy • Conclusions • Systematic errors • Future work – Classification errors – Paper selection bias Systematic Error: Classification Systematic Error: Classification • Classification ambiguity – Large between Theory and Design-0% (26%) – Design-0% and Other (10%) – Design-0% with simulations (20%) • Counting inaccuracy – 15% from counting experiment space differently • Classification differences between 468 article classification pairs Overall Accuracy (Maximize Distortion) Systematic Error: Paper Selection No Experimental • Journals may not be representative of CS Evaluation – PLDI proceedings is a ‘case study’ of conferences • Random sample may not be “random” – Influenced by INSPEC database holdings – Further influenced by library holdings • Statistical error if selection within journals do 20%+ Space for not represent journals Experiments 4

Conclusion Guidelines • 40% of CS design articles lack experiments • Higher standards for design papers – Non-CS around 10% • Recognize empirical as first class science • 70% of CS have less than 20% space • Need more publicly available benchmarks – NC and OE around 40% • Need rules for how to conduct repeatable • CS conferences no worse than journals! experiments • Youth of CS is not to blame • Tenure committees and funding orgs need to • Experiment difficulty not to blame recognize work involved in experimental CS • Look in the mirror – Harder in physics – Psychology methods can help • Field as a whole neglects importance 5

Outline Experimental Evaluation in Computer Science: A Motivation - PDF document

Outline Experimental Evaluation in Computer Science: A Motivation Quantitative Study Related Work Methodology Observations Paul Lukowicz, Ernst A. Heinz, Lutz Accuracy Prechelt and Walter F. Tichy Conclusions Future

Experimental Analysis Marco Chiarandini Department of Mathematics & Computer Science

Experimental Analysis Marco Chiarandini Department of Mathematics & Computer Science

A Very Short History Feldman & Sutherland (1979) Rejuvenating Experimental Computer Science

SmartSlog knowledge patterns: initial experimental performance evaluation Pavel Vanag, Dmitry

Evaluating Systems Chapter 22 Computer Security: Art and Science , 2 nd Edition Version 1.0

Outline Background Research Questions Experimental Workloads Experiments/Evaluation

Computer Systems Performance Evaluation Carey Williamson Department of Computer Science

Automatic Evaluation of Tasks for Instantaneous Diagnostics in Computer Science Lessons Seminar -

PANEL EXPERIMENTS IN COMPUTER SCIENCE ARE TRADITIONAL EXPERIMENTAL PRINCIPLES ENOUGH? F.

Quantitative Evaluation Research Questions Quantitative Data Controlled Studies Experimental

Quantitative Evaluation Research Questions Quantitative Data Controlled Studies Experimental

Finding Hamiltonian Cycle in Graphs of Bounded Treewidth: Experimental Evaluation 1 Marcin

Principles of Computer What is a computer? Science I What is computer science? Course

Subspace Clustering Ensembles Carlotta Domeniconi Department of Computer Science George Mason

Experimental Evaluation of an Augmented Reality Visualization for Directing a Car Drivers

Methods for Experimental Analysis Marco Chiarandini Department of Mathematics & Computer

Methods for Experimental Analysis Marco Chiarandini Department of Mathematics & Computer

Mutex Locking versus Hardware Transactional Memory: An Experimental Evaluation Thesis Defense

Experimental Design & Evaluation 10. Controlled Experiment SunyoungKim,PhD Last

Experimental Design & Evaluation 1. Introduction to ED&E SunyoungKim,PhD

Modern Dataflow in Experimental Nuclear Science (and Tcl). Ron Fox, Giordano Cerizza Sean

Welcome to CS 126! Outline Administrivia What is computer science? - What its not

A Look at Computer Architecture Methodologies Mario Badr and Natalie Enright Jerger Why

The Structure of E-Government - Developing a Methodology for Quantitative Evaluation - Vaclav

Outline Experimental Evaluation in Computer Science: A Motivation - PDF document

Outline Experimental Evaluation in Computer Science: A Motivation Quantitative Study Related Work Methodology Observations Paul Lukowicz, Ernst A. Heinz, Lutz Accuracy Prechelt and Walter F. Tichy Conclusions Future

Experimental Analysis Marco Chiarandini Department of Mathematics &amp; Computer Science

Experimental Analysis Marco Chiarandini Department of Mathematics &amp; Computer Science

A Very Short History Feldman &amp; Sutherland (1979) Rejuvenating Experimental Computer Science

SmartSlog knowledge patterns: initial experimental performance evaluation Pavel Vanag, Dmitry

Evaluating Systems Chapter 22 Computer Security: Art and Science , 2 nd Edition Version 1.0

Outline Background Research Questions Experimental Workloads Experiments/Evaluation

Computer Systems Performance Evaluation Carey Williamson Department of Computer Science

Automatic Evaluation of Tasks for Instantaneous Diagnostics in Computer Science Lessons Seminar -

PANEL EXPERIMENTS IN COMPUTER SCIENCE ARE TRADITIONAL EXPERIMENTAL PRINCIPLES ENOUGH? F.

Quantitative Evaluation Research Questions Quantitative Data Controlled Studies Experimental

Quantitative Evaluation Research Questions Quantitative Data Controlled Studies Experimental

Finding Hamiltonian Cycle in Graphs of Bounded Treewidth: Experimental Evaluation 1 Marcin

Principles of Computer What is a computer? Science I What is computer science? Course

Subspace Clustering Ensembles Carlotta Domeniconi Department of Computer Science George Mason

Experimental Evaluation of an Augmented Reality Visualization for Directing a Car Drivers

Methods for Experimental Analysis Marco Chiarandini Department of Mathematics &amp; Computer

Methods for Experimental Analysis Marco Chiarandini Department of Mathematics &amp; Computer

Mutex Locking versus Hardware Transactional Memory: An Experimental Evaluation Thesis Defense

Experimental Design &amp; Evaluation 10. Controlled Experiment SunyoungKim,PhD Last

Experimental Design &amp; Evaluation 1. Introduction to ED&amp;E SunyoungKim,PhD

Modern Dataflow in Experimental Nuclear Science (and Tcl). Ron Fox, Giordano Cerizza Sean

Welcome to CS 126! Outline Administrivia What is computer science? - What its not

A Look at Computer Architecture Methodologies Mario Badr and Natalie Enright Jerger Why

The Structure of E-Government - Developing a Methodology for Quantitative Evaluation - Vaclav

Experimental Analysis Marco Chiarandini Department of Mathematics & Computer Science

Experimental Analysis Marco Chiarandini Department of Mathematics & Computer Science

A Very Short History Feldman & Sutherland (1979) Rejuvenating Experimental Computer Science

Methods for Experimental Analysis Marco Chiarandini Department of Mathematics & Computer

Methods for Experimental Analysis Marco Chiarandini Department of Mathematics & Computer

Experimental Design & Evaluation 10. Controlled Experiment SunyoungKim,PhD Last

Experimental Design & Evaluation 1. Introduction to ED&E SunyoungKim,PhD