Context: Defect Detection Task Alessio Ferrari ISTI-CNR, Pisa, - PowerPoint PPT Presentation

Context: Defect Detection Task Alessio Ferrari ISTI-CNR, Pisa, Italy alessio.ferrari@isti.cnr.it A. Ferrari (ISTI-CNR) Context: Defect Detection Task 1 / 15

Context Task T : defect detection in natural language requirements – a classification problem (many, actually) Type of Classification Problem Binary Multi-class anaphoric ambiguity defective coordination ambiguity R Output Granularity Requirement R vagueness not defective not defective defective anaphoric ambiguity chunk coordination ambiguity chunk chunks R R Chunk vagueness chunk not defective not defective chunk chunks A. Ferrari (ISTI-CNR) Context: Defect Detection Task 2 / 15

Context Task T : defect detection in natural language requirements – a classification problem (many, actually) Type of Classification Problem Binary Multi-class anaphoric ambiguity defective coordination ambiguity R Output Granularity Requirement R vagueness not defective not defective defective anaphoric ambiguity chunk coordination ambiguity chunk chunks R R Chunk vagueness chunk not defective not defective chunk chunks A. Ferrari (ISTI-CNR) Context: Defect Detection Task 3 / 15

Recall vs Precision Of course recall counts more than precision ( β > 1 for T ) But how much? This cost is something that should take into account time to discard false positives, impact on the development process of false negatives, etc. Let’s imagine I managed to compute β = 1 . 7 for T with the overview method, which focuses on time aspects A. Ferrari (ISTI-CNR) Context: Defect Detection Task 4 / 15

My tool t for T I develop my tool t for T I find that my t has P = 0 . 6, R = 0 . 9, F 1 . 7 = 0 . 8 What can I say? Is t GOOD or BAD? A. Ferrari (ISTI-CNR) Context: Defect Detection Task 5 / 15

My tool t for T I develop my tool t for T I find that my t has P = 0 . 6, R = 0 . 9, F 1 . 7 = 0 . 8 What can I say? Is t GOOD or BAD? Let’s say I have a Gold Standard of 100 requirements, and 60 are defective If we do the math for t we have TP = 54 , FP = 36 , FN = 6 , TN = 4 A. Ferrari (ISTI-CNR) Context: Defect Detection Task 6 / 15

What about a tool that returns all requirements as defective? Another imaginary tool called “All Defects” 100 requirements, and 60 are defective Imagine a tool t ′ that returns all requirements as defective I have P = 0 . 6, R = 1, F 1 . 7 = 0 . 85 → My tool t ( F 1 . 7 = 0 . 8) is BAD ! Evaluation depends on the GOLD STANDARD Evaluation is useless if I do not consider other BASELINES A. Ferrari (ISTI-CNR) Context: Defect Detection Task 7 / 15

Baseline: “All Defects” Equivalent to doing the task manually I have to check all the requirements P = defective R = defective defective = 1 all Baseline: “No Defect” Equivalent to not doing the task at all I assume that requirements are correct P = 0 R = 0 ...to compare T with this baseline F -measure is not sufficient, although not doing the task is an option! (ask me later, I have hidden slides) Other baselines are possible, e.g., HAHR, random predictor, existing tools A. Ferrari (ISTI-CNR) Context: Defect Detection Task 8 / 15

What do they do in NLP? Shared Task: a competition in which datasets are provided by the organisation Shared tasks in CoNLL (Computational Natural Language Learning, core A) from 1999 Address fundamental NLP tasks that go from Chunking (NP , VP) to Discourse Parsing (relations) Example: Shallow Discourse Parsing (CoNLL 2015) Three sets of data Training: the one you should use to train your system Development: to tune the system – closer to the blind test set Blind test: deploy the system on the remote machine, and we will run the system on this blind test set for the final ranking A. Ferrari (ISTI-CNR) Context: Defect Detection Task 9 / 15

Evaluation Measures? The winning tool is the one with highest F-measure on the blind test set For some tasks, e.g., grammatical error correction (CoNLL 2014), they used F 0 . 5 , weighting precision twice as much as recall ( β = 0 . 5) A. Ferrari (ISTI-CNR) Context: Defect Detection Task 10 / 15

My Humble Opinion The choice of β does not count that much, if you have a shared Gold Standard against which different tools can be evaluated As long as we do not have a shared Gold Standard for defect detection, it is useful to build up knowledge with industrial case studies, try to increase P and R as much as possible Choose β = 1 . 5, if you really need it A. Ferrari (ISTI-CNR) Context: Defect Detection Task 11 / 15

My Humble Opinion Provide lessons learned instead of numbers only, since contextual factors are several: People learn new defects when using a tool The tool often performs only a part of the defect detection task The tool may not be qualified → manual inspection is needed Defects require different vetting effort Different defects may have different cost A. Ferrari (ISTI-CNR) Context: Defect Detection Task 12 / 15

Hidden Slide: Cost-based Evaluation... A. Ferrari (ISTI-CNR) Context: Defect Detection Task 13 / 15

What if I do not have the data to compute β ? I assume that the COST of a fn is N times the cost of a fp . How much shall N be to make T preferable to the baselines? Tool defective not defective defective V N × V Gold Standard not defective V 0 C = ( fp + tp ) × V + fn × ( N × V ) = fp + tp + fn × N fp T = 10 , tp T = 30 , fn T = 5 , tn T = 35, i.e., 80 reqs, 35 defective C T = 10 + 30 + 5 × N = 40 + 5 N C T < C ALL−DEFECT , C NO−DEFECT C ALL−DEFECT = 45 + 35 + 0 × N > C T → N < 8 C NO−DEFECT = 0 + 0 + 35 × N > C T → N > 1 . 33 A. Ferrari (ISTI-CNR) Context: Defect Detection Task 14 / 15

1 . 33 < N < 8 means that: IF the cost of a fp is slightly higher than the cost of fn AND IF the cost of a fn is less than 8 times the cost of a fp → it is better to use T rather than: doing the task manually (All Defects Baseline) doing nothing (No Defect Baseline) A. Ferrari (ISTI-CNR) Context: Defect Detection Task 15 / 15

Context: Defect Detection Task Alessio Ferrari ISTI-CNR, Pisa, - PowerPoint PPT Presentation

Context: Defect Detection Task Alessio Ferrari ISTI-CNR, Pisa, Italy alessio.ferrari@isti.cnr.it A. Ferrari (ISTI-CNR) Context: Defect Detection Task 1 / 15 Context Task T : defect detection in natural language requirements a

Automatic Defect Detection Andrzej Wasylkowski Overview Automatic Defect Detection

DEFECT DETECTION IN A DEFECT DETECTION IN A DISTRIBUTED SOFTWARE DISTRIBUTED SOFTWARE

Defect Removal Metrics SE 350 Software Process & Product Quality 1 Objectives Understand

Defect Removal Metrics September 30, 2004 Swami Natarajan RIT Software Engineering Defect

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

(DEFECT SEGMENTATION) Peter Pyun Ph.D. Andrew Liu Ph.D. Relevant Links: Defect Segmentation

Analyzing fluid flows via the ergodicity defect ergodicity defect Sherry E. Scott FFT 2013

A Defect- -Tolerant Tolerant A Defect Computer Architecture: Computer Architecture:

Circuit Analysis and Defect Characteristics Estimation Method Using Bimodal Defect-Centric Random

Defect Prevention and Removal SE 350 Software Process & Product Quality 1 Objectives

Defect Classification and Defect Types Revisited Stefan Wagner Technische Universitt Mnchen,

Additive Manufacturing Defect Detection using Neural Networks James Ferguson May 16, 2016

Bond Task Force Draft Bond Task Force Recommendations Tuesday, February 27 , 2018 Bond Task

Task 1d: River basin management Task leader: LNEC; Involved partners EU: ISPRA, DTU, EWA Task

p wered Yva productivity AI Task Manager @nerdybff Task Management Task Management Todoist

Low Level Low Level Low Level Low Level Detection of Detection of Detection of Detection of

Class Imbalance Learning in Software Defect Prediction Dr. Shuo Wang s.wang@cs.bham.ac.uk

Logarithmic Minimal Models, W -Extended Fusion and Verlinde Formulas 24 September 2008 GGI

Integer linear programming approach to learning Bayesian network structure: towards the essential

Sum-Product Networks CS486 / 686 University of Waterloo Lecture 23: July 19, 2017 Outline

Formal Verification at Intel John Harrison Intel Corporation LICS 2003 Ottawa 22nd June 2003

CIS 4930 Digital System Testing Fault Simulation Dr Hao Zheng Comp. Sci & Eng. U of South

On the Complexity of Defective Coloring Rmy Belmonte 1 Michael Lampis 2 Valia Mitsou 3 1

ECS130 Eigenvectors Chapter 6 February 1, 2019 Eigenvalue problem For a given A C m n

Context: Defect Detection Task Alessio Ferrari ISTI-CNR, Pisa, - PowerPoint PPT Presentation

Context: Defect Detection Task Alessio Ferrari ISTI-CNR, Pisa, Italy alessio.ferrari@isti.cnr.it A. Ferrari (ISTI-CNR) Context: Defect Detection Task 1 / 15 Context Task T : defect detection in natural language requirements a

Automatic Defect Detection Andrzej Wasylkowski Overview Automatic Defect Detection

DEFECT DETECTION IN A DEFECT DETECTION IN A DISTRIBUTED SOFTWARE DISTRIBUTED SOFTWARE

Defect Removal Metrics SE 350 Software Process &amp; Product Quality 1 Objectives Understand

Defect Removal Metrics September 30, 2004 Swami Natarajan RIT Software Engineering Defect

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

(DEFECT SEGMENTATION) Peter Pyun Ph.D. Andrew Liu Ph.D. Relevant Links: Defect Segmentation

Analyzing fluid flows via the ergodicity defect ergodicity defect Sherry E. Scott FFT 2013

A Defect- -Tolerant Tolerant A Defect Computer Architecture: Computer Architecture:

Circuit Analysis and Defect Characteristics Estimation Method Using Bimodal Defect-Centric Random

Defect Prevention and Removal SE 350 Software Process &amp; Product Quality 1 Objectives

Defect Classification and Defect Types Revisited Stefan Wagner Technische Universitt Mnchen,

Additive Manufacturing Defect Detection using Neural Networks James Ferguson May 16, 2016

Bond Task Force Draft Bond Task Force Recommendations Tuesday, February 27 , 2018 Bond Task

Task 1d: River basin management Task leader: LNEC; Involved partners EU: ISPRA, DTU, EWA Task

p wered Yva productivity AI Task Manager @nerdybff Task Management Task Management Todoist

Low Level Low Level Low Level Low Level Detection of Detection of Detection of Detection of

Class Imbalance Learning in Software Defect Prediction Dr. Shuo Wang s.wang@cs.bham.ac.uk

Logarithmic Minimal Models, W -Extended Fusion and Verlinde Formulas 24 September 2008 GGI

Integer linear programming approach to learning Bayesian network structure: towards the essential

Sum-Product Networks CS486 / 686 University of Waterloo Lecture 23: July 19, 2017 Outline

Formal Verification at Intel John Harrison Intel Corporation LICS 2003 Ottawa 22nd June 2003

CIS 4930 Digital System Testing Fault Simulation Dr Hao Zheng Comp. Sci &amp; Eng. U of South

On the Complexity of Defective Coloring Rmy Belmonte 1 Michael Lampis 2 Valia Mitsou 3 1

ECS130 Eigenvectors Chapter 6 February 1, 2019 Eigenvalue problem For a given A C m n

Defect Removal Metrics SE 350 Software Process & Product Quality 1 Objectives Understand

Defect Prevention and Removal SE 350 Software Process & Product Quality 1 Objectives

CIS 4930 Digital System Testing Fault Simulation Dr Hao Zheng Comp. Sci & Eng. U of South