Software Engineering Janet Siegmund 1 Why Experiments? - PowerPoint PPT Presentation

Controlled Experiments in Software Engineering Janet Siegmund 1

Why Experiments? • Programmers comprehend code most of their time • In general: Human factors 15% Read comments Search by tool 50% 14% Read documentation Notes 9% Organizational Understanding 8% 4% 2

What are Experiments? • Systematic research study • One or more factors intentionally varied • Everything else held constant • Result of systematic variation is observed • Here: human participants 3

Stages of Experiments Objective Design Conduct Analysis Interpretation Definition Hypotheses; Experimental Accepted/ Independent Design; Data Rejected & Dependent Confounding Hypotheses Variables Variables 4

Outline • Discuss each stage with a running example • Discuss problems and solutions • Goal: – Get a feeling for design of experiments 5

//Comments in Source Code • Do they make code more comprehensible? • Do they make code more maintainable? • Do they reduce maintenance costs? • Do they increase development time? 6

Objective Definition 7

Independent Variable • Factor, predictor (variable) • Intentionally varied • Influences dependent variable • Comments 8

Operationalization • Finding an operational definition • Define methods and operations to measure variable • Levels, alternatives • Presence/absence of comments • Good/bad/useless comments 9

Dependent variable • Response variable • Outcome of experiment • What is measured • Program comprehension 10

Operationalization • Specify a measure • Program comprehension: – Subjective rating – Solutions to tasks (correctness? response time?) – Think aloud 11

Hypotheses • Expectations about outcome • Based on theory or practice -> expectations must have reason • If there are reasons for and against an outcome, state a research question 12

Hypotheses - Example • Bad comments are bad for program comprehension • Good comments are good for program comprehension 13

Good/Bad Hypotheses • What are good/bad comments? • What does good/bad for program comprehension mean? -> slower, more errors? by how much? • Hypothesis must be falsifiable – Karl Popper. The Logic of Scientific Discovery. Routledge, 1959 . 14

Better Hypotheses • Comments describing each statement of source code have no effect on the response time of understanding source code • Comments containing wrong information about statements slow down comprehension • Comments describing the purpose of statements speed up comprehension 15

Why Hypotheses? • Why not just measure and see what the result is? – Influences experimental design – Fishing for results 16

Experimental Design 17

Validity • Do we measure what we want to measure? • Internal: – Degree to which the value of the dependent variable can be assigned to the manipulation of the independent variable • External: – Degree to which the results gained in one experiment can be generalized to other participants and settings 18

Confounding Parameters • Influence depending variable besides variations of independent variable 19

Confounding Parameters Problem-solving Programming Culture ability experience Ability Data consistency Comprehension Education Occupation Model Evaluation Color blindness apprehension Attitude Intelligence Knowledge Hawthorne Ordering Familiarity with Motivation study object Content of study Fatigue object Familiarity with Instrumentation Reading time tools Treatment Working memory Gender Learning effects Preference capacity 20

Controlling for Confounding Variables 1. Randomization 2. Matching 3. Keep confounding parameter constant 4. Use confounding parameter as independent variable 5. Analyze influence of confounding parameter on result 21

Randomization • Use random number generator • Roll a dice • Toss a coin • … 22

Matching • Balancing/Odd-even-even-odd/ABBA Group A Group B Participant Value 65 56 P5 65 34 42 P9 56 24 23 P3 42 16 21 P4 34 12 6 P10 24 P6 23 P7 21 P8 16 P2 12 P1 5 23

Keep Parameter Constant • Programming experience – Recruit students as participants (undergraduate, graduate) – Recruit programming experts • Intelligence – Only participants with certain grades 24

Use parameter as Independent Variable • Reminder: 2 level of independent variable (comment/no comment) • Example: 2 levels of programming experience – Comment/low experience – Comment/high experience – No comment/low experience – No comment/high experience 25

Analyze Influence of Parameter on Result • When we cannot assign participants to groups, for example when comparing two companies • When something happened during the experiment, e.g., power failure in one session, but not in an other session 26

Validity • Internal and external validity need different things: – Internal: controlling everything – External: broad setting so that we can generalize • First maximize internal validity • Step by step increase external validity 27

Experimental Designs • One-factorial designs Group Levels One Session 1 Session 2 Group A Comment Comment No Comment B No comment ordering effects comparable groups Group Session 1 Session 2 learning effects mortality A Comment No Comment B No comment Comment 28

Experimental Designs • Two-factorial designs Group Session 1 Session 2 Session 3 Session 4 Group D Group C Group B Comment/ Group A Low Experience Comment/ Group B Group A Group D Group C High Experience Group B Group A Group D No comment/Low Group C Experience No comment/High Group D Group C Group B Group A Experience 29

Conduct 30

What can go wrong? • Everything! • Conduct pilot tests • Test material • Tools • Data storage • Tell participants exactly what they have to do • Observe that participants do what they are instructed to do • Make backups of the data 31

Ethics • Be nice to your participants, they voluntarily invest their time for you • Assure anonymity • Assure that benefit for science is worth the effort for participants • When in doubt, talk to your local ethics committee 32

Analysis 33

Experimental Data Group Time [s] A (no comment) 42 public static void main(String[] args) { A 60 String word = "Hello"; A 30 String result = new String(); A 77 for ( int j = word.length() - 1; j >= 0; j--) A 58 result = result + word.charAt(j); System.out.println(result); A 49 } A 38 B (comment) 48 public static void main(String[] args) { String word = "Hello"; B 48 String result = new String(); B 26 //reverse character order B 30 for ( int j = word.length() - 1; j >= 0; j--) result = result + word.charAt(j); B 50 System.out.println(result); B 34 } 34

Descriptive Statistics • What do we do with these data? • Look at the data • Mean/average (=arithmetic mean) • Median • Standard deviation • Boxplots 35

Median Group Time [s] Time [s] Group Time [s] Time [s] B 48 26 A 42 30 B 48 30 A 60 38 B 26 34 A 30 42 B 30 48 A 77 49 B 50 48 A 58 58 B 34 50 A 49 60 A 38 77 Median: 49 Median (Variante 1): (34 + 48)/2 = 41 Median (Variante 2): 34 36

Standard Deviation n    2 ( x x ) i  i 1 s n Group A: s = 15.9 http://commons.wikimedia.org/wiki/File:Standard_deviation_diagram.svg Group A: x = 50.6 Interval [s - x; s + x]: 34.7 – 71.5 37

Boxplot • Box: 50% of all values • Line: median • Whiskers: upper and lower 25% of data • Dot: – Outlier (=values that deviate too much from mean/median) – What is too much? – 1.5/2 standard deviations 38

Statistical Tests • When is a difference real, not coincidental? – A: 50.57 – B: 39.33 • Assumption: both values are the same (= null hypothesis; H0) • Conditional probability: probability of observed result under assumption that values should be the same • If probability is low, then assumption must be wrong – Typical: 1%, 5% – Possible: 10% 39

Common Tests • T test: – Metric data (e.g., response time) – Normally distributed data • Mann-Whitney-U test – Ordinal data (e.g., rankings, grades) – Metric data, but not normally distributed • χ2 -Test – Nominal scale type (e.g., gender, party members) 40

T Test • Interesting values: • P value: smaller/larger than 0.05? • (T value/degrees of freedom-df: when you report the test) • p value > 0.05? -> no significant difference • p value <= 0.05? -> significant difference 41

Interpretation of t Test • We reject the hypothesis, that comments speed up comprehension • In case p value is <= 0.05 • We did not confirm hypothesis • We just did not find any evidence against it • Hence: we do not say that we confirmed a hypothesis, but that we can accept it • (Or even more correct: we can reject the null hypothesis) 42

Effect size • Is a difference of 11 seconds a large effect? • Depending on data • Metric data (e.g., response time): Cohen‘s d  • 0.2 – 0.5: weak effect x x   a b d 0 . 82 • 0.5 – 0.8: medium effect s pooled • > 0.8: large effect 43

Software Engineering Janet Siegmund 1 Why Experiments? - PowerPoint PPT Presentation

Controlled Experiments in Software Engineering Janet Siegmund 1 Why Experiments? Programmers comprehend code most of their time In general: Human factors 15% Read comments Search by tool 50% 14% Read documentation Notes 9%

Introduction to Software Testing Software Testing - Module 1 Part 1 The Software Engineering

Introduction to Software Engineering Week 1 Software Engineering Software Engineering

Software Engineering Topics Computer science v. software engineering Definition of

Software Engineering Software Engineering 200511357 200511357 1 Software

Software Engineering Software Applications A.Y. 2020/2021 What is software engineering? What is

CSE 2221 Software I: Software Components and CSE 2231 Software II: Software Development and

Engineering Culture Secret Sauce of Great Software Great Software process model Great

Requirements Engineering Software Engineering Software Engineering Andreas Zeller Saarland

Requirements Engineering Software Engineering Software Engineering Andreas Zeller Saarland

Texas State 3398 Software Engineering Course Introduction to Software Engineering examines

Software Engineering CS305, Autumn 2020 Nikhil Hegde, IIT Dharwad 1 Software Engineering

Software Requirements Engineering Material for Software Engineering for Outsourced &

HCI in the software process HCI in the software process Software engineering and the design

Progress toward an Engineering Discipline of Software Mary Shaw Institute for Software Research

Progress toward an Engineering Discipline of Software Mary Shaw Institute for Software Research

Software Design Software Engineering Software Engineering Andreas Zeller Saarland University

Perception Ma Maneesh Agrawala CS 448B: Visualization Winter 2020 1 Announcements 3 1

ESSA Every Student Succeeds Act Finding ways to help every child succeed is not just your

Cultural Competence in Evaluation Evaluation Caf April 2, 2014 Jody Brylinsky, Ph.D.

TRECVID 2018 Ad-hoc Video Search Task : Overview Georges Qunot Laboratoire d'Informatique de

CS-5630 / CS-6630 Visualization for Data Science How to Critique a Vis and Exam Review Alexander

INTERACTION TRAINING for Deaf / Hard of Hearing And other Disabilities The Virginia

J eremiah had scolded his people, saying, Hear this, O foolish and senseless people, who have

Accelerated Natural Language Processing Lecture 2 Morphology Sharon Goldwater (based on slides

Software Engineering Janet Siegmund 1 Why Experiments? - PowerPoint PPT Presentation

Controlled Experiments in Software Engineering Janet Siegmund 1 Why Experiments? Programmers comprehend code most of their time In general: Human factors 15% Read comments Search by tool 50% 14% Read documentation Notes 9%

Introduction to Software Testing Software Testing - Module 1 Part 1 The Software Engineering

Introduction to Software Engineering Week 1 Software Engineering Software Engineering

Software Engineering Topics Computer science v. software engineering Definition of

Software Engineering Software Engineering 200511357 200511357 1 Software

Software Engineering Software Applications A.Y. 2020/2021 What is software engineering? What is

CSE 2221 Software I: Software Components and CSE 2231 Software II: Software Development and

Engineering Culture Secret Sauce of Great Software Great Software process model Great

Requirements Engineering Software Engineering Software Engineering Andreas Zeller Saarland

Requirements Engineering Software Engineering Software Engineering Andreas Zeller Saarland

Texas State 3398 Software Engineering Course Introduction to Software Engineering examines

Software Engineering CS305, Autumn 2020 Nikhil Hegde, IIT Dharwad 1 Software Engineering

Software Requirements Engineering Material for Software Engineering for Outsourced &amp;

HCI in the software process HCI in the software process Software engineering and the design

Progress toward an Engineering Discipline of Software Mary Shaw Institute for Software Research

Progress toward an Engineering Discipline of Software Mary Shaw Institute for Software Research

Software Design Software Engineering Software Engineering Andreas Zeller Saarland University

Perception Ma Maneesh Agrawala CS 448B: Visualization Winter 2020 1 Announcements 3 1

ESSA Every Student Succeeds Act Finding ways to help every child succeed is not just your

Cultural Competence in Evaluation Evaluation Caf April 2, 2014 Jody Brylinsky, Ph.D.

TRECVID 2018 Ad-hoc Video Search Task : Overview Georges Qunot Laboratoire d'Informatique de

CS-5630 / CS-6630 Visualization for Data Science How to Critique a Vis and Exam Review Alexander

INTERACTION TRAINING for Deaf / Hard of Hearing And other Disabilities The Virginia

J eremiah had scolded his people, saying, Hear this, O foolish and senseless people, who have

Accelerated Natural Language Processing Lecture 2 Morphology Sharon Goldwater (based on slides

Software Requirements Engineering Material for Software Engineering for Outsourced &