We need more coverage, stat! Classroom experience with the Software - PowerPoint PPT Presentation

We need more coverage, stat! Classroom experience with the Software ICU Philip Johnson, Shaoxuan Zhang University of Hawaii Presentation by Sandro Heinzelmann Software Engineering Seminar 2010 April 27, 2010

Teaching software measurement - Motivation • Not easy • Tradeoff between too much work and too little insight • Personal Software Process (PSP) / Team Software Process (TSP) versus simple literature review • � Find a balance by using automation tools Software ICU Hackystat Hudson 2

Hackystat • Opensource project initiated by Philip Johnson • Collection of services • Enables subtle, unobtrusive data collection in various development tools (Eclipse, Ant,...) • Notion of sensors integrated in applications – Keep track of work, send data to Hackystat SensorBase • Layer of analysis modules • Webinterface to display data 3 code.google.com/p/hackystat/

Hackystat in the past • Continuously improved over time • Used in case studies in 2003 and 2006 with varying success • Hard to install, confusion about various measurements and interpretation • New approach with a medical metaphor Software Intensive Care Unit (ICU) images.clipartof.com 4

Software health metaphor ukb.uni-bonn.de • Terminology of „health“ • Not new - „runtime health“ of life-critical hardware-software systems (NASA) • Here focus is on health during development • Notion of vital signs and their normal ranges – Normal or improving � healthy – Interpreted as a whole 5

Software health metaphor • High-level characteristics of a healthy project – High efficiency, high effectiveness, high quality • „Healthy programmer behavior“ – Work consistently, contribute equally, consistent committing, no last minute rushes, ... 6 Illustration by Aaron Bacall

Vital signs • Coverage • Commits • Complexity • Unit tests • Coupling • Size • Churn • Dev time • Builds � Research hypotheses 7

Vital sign interpretation • Normal ranges and coloring defined by current value as well as trends • Thresholds and methods can be configured Coverage high or increasing Dev time ≥ 50% of the members commit, commits on ≥ 50% of the days in the project interval Size No interpretation (color white) 8

ICU display • Current value as well as trend lines code.google.com/p/hackystat/ 9

Drill-downs • Detailed, per-member view of vital signs 10

Research questions • What are the strengths and weaknesses of the medical ICU metaphor for teaching software measurement in a classroom setting? • How appropriate were the choices of “vital signs”? • How effective were the algorithms for coloring the vital signs? • How does this approach compare to previous uses of Hackystat to teach software metrics in a classroom setting? 11

Study setting • 18 students in a senior-level undergraduate software engineering course • Course about open source development in Java • ICU introduced in the final 4 weeks • Hackystat log data • Online survey during the last week, 17 questions – Installation overhead – Overhead of sensor use – Problems encountered during use – Frequency of use – Privacy – Useful vital signs – Usefulness in an industrial setting 12

Results - misc • Privacy: mixed, but generally positive feelings (from no problem to „hacky-stalk“) • Overhead: easier than in earlier versions, though varying from tool to tool. Sensor sending sometimes slow. • Frequency of use 13

Results – vital signs Vital sign usefulness 20 18 16 14 12 10 8 6 4 2 0 t y g n e e d t e i s m l t n r z m g i e i u i u x i a S l i m T h p t B r e e C l v o u p v o e C m o D C C o C • Coloring generally seen as accurate, with some general drawbacks • ICU and drill-downs in particular useful to react to poor health and manage team 14

Results – industrial possibilities • Generally considered a good idea • But – does not include non-IDE work (like reading a technical book) – Algorithms can never fully judge the health of a program in all contexts 15

Discussion and conclusions • Significantly better results than previous Hackystat studies • ICU metaphor is useful to interpret and understand measurements – No more „pretty squiggly lines“ – Coloring encourages thoughts about validity • ICU provides a layer of abstraction – Normal ranges must be chosen carefully! – Too lenient interpretation leads to oversight – Too strict interpretation leads to „boy who cried wolf“ syndrome • Vital sign ranges need to be tweaked further • Dangerous weakness: measurement dysfunction 16

Measurement dysfunction Using measures competitively as a means to do good at a performance evaluation • Individual measurements did not contribute to the grade • Data was only visibile to the assistant, professor only had anonymized data and got to see survey only after semester • And yet: At least one group had major problems „I need more dev time because I need an A“ „oh if he ups his stats more than mine, tomorrow I‘m gonna hack all day“ • � compromised work as a team 17

Threats to validity In the paper • Small sample size ( ) • Small duration, Small project size • Subjects with very similar background (senior computer science students) • Wrong demography for „industry“ questions Personal • Relatively short survey • Students unfamiliar with software measurement 18

Future directions • Refine vital signs and ranges – More research – „crowd-sourcing“ • Use in more environments – Industry – Different project types/languages/IDEs • Game-based approach – „Devcathlon“ Personal • Comparative studies versus other measurement techniques (PSP/TSP) 19

Questions? stormgrounds.com 20

Appendix - Hudson • Continuous integration tool developed by Kohsuke Kawaguchi • Builds and tests projects after every commit • Used in the following application for measurements of coverage, coupling, and complexity 21

Appendix: PSP • „Disciplined, data-driven procedure“ • Level-based approach: PSP0 to PSP2.1 • Use „historical“ data/experience from previous level to detect repeated defects • Requires programmers to log their activities (a lot of manual data collection required, even with tool support) • Many measures collected and derived: estimation accuracy (size/time), prediction intervals (size/time),time in phase distribution, defect injection distribution, defect removal distribution, productivity, reuse percentage, cost performance index, planned value, earned value, etc etc 22

Appendix: Complexity • Authors hint at 2 methods: Halstead complexity measures & McGabe‘s cyclomatic complexity • Judging from the configuration site, ICU uses JavaNCSS, which uses the cyclomatic complexity: – Uses flow graph of program – Counts number of independent paths through program (Base Path Testing) – M = E − N + 2 P where M = cyclomatic complexity E = number of edges of the graph N = number of nodes of the graph P = number of connected components 23

We need more coverage, stat! Classroom experience with the Software - PowerPoint PPT Presentation

We need more coverage, stat! Classroom experience with the Software ICU Philip Johnson, Shaoxuan Zhang University of Hawaii Presentation by Sandro Heinzelmann Software Engineering Seminar 2010 April 27, 2010 Teaching software measurement -

EXISTING OVERALL FLOOR PLAN 1 NORTH SCALE: 1" = 20'-0" SHEET NUMBER CLASSROOM

STAT 830 Blank Slides for Notes Richard Lockhart SFU STAT 830 Fall 2020 Richard Lockhart

Data Flow Coverage 1 Stuart Anderson Stuart Anderson Data Flow Coverage 1 2011 c 1 Why

Coverage-Oriented Verification Coverage-Oriented Verification of Banias of Banias Alon Gluska

Logic-based test coverage Basic approach Clauses and predicates Basic coverage criteria: CC, PC,

Learn more Do more Be more Learn more Do more Be more UNITY Learn more Do

Defect Detection Thomas Zimmermann The First Bug September 9, 1947 More Bugs More Bugs More

Why Transformers Work. More info blablabla More info blablabla More info blablabla More

Coverage A Primer on (Potential) Coverage Issues 1 Overview of Current Situation Governmental

Occupy Central Coverage 2014 Coverage via Facebook Coverage via Twitter Liveblogging the Events

5 Official 5 Official 5 Official 5 Official Run Zone Coverage Run Zone Coverage Run Zone

CODE COVERAGE ISNT COVERAGE Wayne Roseberry Microsoft Author of Writing Test Plans Made

410(b) Coverage Testing Chad Blech Robin Snyder 410(b) Coverage Tests What is the 410(b)

V2 28 May 2015 What Is Wrong With Stat 101? 1 2 V2 2015 USCOTS Whats Wrong with Stat 101?

STAT 830 Non-parametric Inference Basics Handwritten Notes Richard Lockhart Simon Fraser

CREATING THE CLASSROOM OF YOUR DREAMS PRESENTER: CHRISTINE BEVERLY, NBCT-AYA-ELA, GOLDEN APPLE

Mapping Locations for NHSN Surveillance: Preparing for 2013 Maggie Dudeck, MPH, CPH NHSN Training

Sepsis Webinar Series 2018 Presenter: Angela Craig, APN, MS, CCNS Tennessee Center for Patient

and Addresses Colin Strutt, Interisle Consulting Group Greg Aaron, Illumintel Presented at

The REDUCE MRSA Trial R andomized E valuation of D ecolonization vs. U niversal C learance to E

Modeling COVID-19 in Colorado Katie Colborn, PhD, MSPH Assistant Professor Department of

1 Do we feed the Beast or Grow the Village ? Statement of intent Proactively shape our

ICU Updates: Delirium in Hospitalized Patients Recognizing and preventing delirium to improve

Mental health budgets under pressure Cuts to mental health budgets more visible during

We need more coverage, stat! Classroom experience with the Software - PowerPoint PPT Presentation

We need more coverage, stat! Classroom experience with the Software ICU Philip Johnson, Shaoxuan Zhang University of Hawaii Presentation by Sandro Heinzelmann Software Engineering Seminar 2010 April 27, 2010 Teaching software measurement -

EXISTING OVERALL FLOOR PLAN 1 NORTH SCALE: 1&quot; = 20'-0&quot; SHEET NUMBER CLASSROOM

STAT 830 Blank Slides for Notes Richard Lockhart SFU STAT 830 Fall 2020 Richard Lockhart

Data Flow Coverage 1 Stuart Anderson Stuart Anderson Data Flow Coverage 1 2011 c 1 Why

Coverage-Oriented Verification Coverage-Oriented Verification of Banias of Banias Alon Gluska

Logic-based test coverage Basic approach Clauses and predicates Basic coverage criteria: CC, PC,

Learn more Do more Be more Learn more Do more Be more UNITY Learn more Do

Defect Detection Thomas Zimmermann The First Bug September 9, 1947 More Bugs More Bugs More

Why Transformers Work. *More info blablabla *More info blablabla *More info blablabla *More

Coverage A Primer on (Potential) Coverage Issues 1 Overview of Current Situation Governmental

Occupy Central Coverage 2014 Coverage via Facebook Coverage via Twitter Liveblogging the Events

5 Official 5 Official 5 Official 5 Official Run Zone Coverage Run Zone Coverage Run Zone

CODE COVERAGE ISNT COVERAGE Wayne Roseberry Microsoft Author of Writing Test Plans Made

410(b) Coverage Testing Chad Blech Robin Snyder 410(b) Coverage Tests What is the 410(b)

V2 28 May 2015 What Is Wrong With Stat 101? 1 2 V2 2015 USCOTS Whats Wrong with Stat 101?

STAT 830 Non-parametric Inference Basics Handwritten Notes Richard Lockhart Simon Fraser

CREATING THE CLASSROOM OF YOUR DREAMS PRESENTER: CHRISTINE BEVERLY, NBCT-AYA-ELA, GOLDEN APPLE

Mapping Locations for NHSN Surveillance: Preparing for 2013 Maggie Dudeck, MPH, CPH NHSN Training

Sepsis Webinar Series 2018 Presenter: Angela Craig, APN, MS, CCNS Tennessee Center for Patient

and Addresses Colin Strutt, Interisle Consulting Group Greg Aaron, Illumintel Presented at

The REDUCE MRSA Trial R andomized E valuation of D ecolonization vs. U niversal C learance to E

Modeling COVID-19 in Colorado Katie Colborn, PhD, MSPH Assistant Professor Department of

1 Do we feed the Beast or Grow the Village ? Statement of intent Proactively shape our

ICU Updates: Delirium in Hospitalized Patients Recognizing and preventing delirium to improve

Mental health budgets under pressure Cuts to mental health budgets more visible during

EXISTING OVERALL FLOOR PLAN 1 NORTH SCALE: 1" = 20'-0" SHEET NUMBER CLASSROOM

Why Transformers Work. More info blablabla More info blablabla More info blablabla More