METHOD HODS FOR T TESTING G UNI NIFORMITY S STATI TISTI TICS - PowerPoint PPT Presentation

METHOD HODS FOR T TESTING G UNI NIFORMITY S STATI TISTI TICS CS KRISHNA PATEL AND ROBERT M. HIERONS BRUNEL UNIVERSITY – BRUNEL SOFTWARE ENGINEERING LAB (BSEL) DAVID CLARK AND HECTOR D. MENENDEZ UNIVERSITY COLLEGE LONDON – CREST CENTRE

Definiti tions • Uniform Distribution: A sample is said to adhere to a uniform distribution if every element in the sample has an equal chance of being randomly selected. 1 2 3 4 5 6 • Uniformity Statistic: A Uniformity Statistic is a means of measuring the extent to which a sample conforms to a uniform distribution. • The Uniformity Statistics considered in our research produce lower values for samples that adhere more strongly to a uniform distribution.

Problem m Def efinition • Uniformity Statistics have the oracle problem, because it is very difficult to predict the outcome. • We investigated three different approaches for alleviating the oracle problem in uniformity statistics.

Intu tuition • The standard deviation of a sample is a measure of the spread of values in that sample. • Higher measures of standard deviations indicate that the values in the sample are more spread out, and thus the sample should adhere more strongly to a uniform distribution. • Thus, the standard deviation is intrinsically linked to uniformity. • All of our oracles are based on this observation.

Intu tuition Behind a a Metamorphic R Relati tion Uniformity Statistic Sample with Statistic Value (A) Higher SD B < A A < B Fail Compare Pass Uniformity Statistic Sample with Statistic Value (B) Lower SD

Intu tuition B Behind Regression Model Oracles ( (1) • For each uniformity statistic, we performed a Regression Analysis to learn the precise nature of the relationship between the standard deviation and test statistic value. • For a given test statistic, the Regression Analysis enabled us to derive a mathematical formula that accepts a standard deviation value as input and outputs a predicted test statistic value.

Intu tuition B Behind Regression Model Oracles ( (2) • Plot Statistic (Black) and Model (Grey), against standard deviation, based on 10000 samples. • Applied one Mann-Whitney U Test per subject program to compare the statistic and model, and applied Benjamini-Hochberg correction to these tests. 14/18 of the statistics did not report a significant result. • Most models are indistinguishable.

Intu tuition B Behind Regression Model Oracles ( (3) Uniformity Statistic Statistic Value Similar enough Too dissimilar Fail Compare Pass Sample Model Model Value

Intuition n behind Metamorphic Regression Model O Oracles Uniformity Absolute Uniformity Statistic Difference Statistic Similar enough Too dissimilar Fail Sample with Sample with Compare Pass Lower SD Higher SD Absolute Model Model Difference

Experi Ex rimental Design – Subject P Programs + , D n - , V n , W n 2 , U n 2 , C n + , • Subject Programs: 18 Uniformity Statistics – D n - , C n , K n , T 1 , T 2 , T 1 ’ , T 2 ’ , G(n), Q, S n (m) , A * (n), E m,n C n • Code Reuse: + and D n - • V n reuses D n 2 reuses W n 2 • U n + and C n - • C n reuses C n + and C n - • K n reuses C n • Q reuses G(n)

Ex Experi rimental Design – Mutants • Mutmut mutation testing tool. • Removed equivalent mutants. • Removed crashed mutants. • 196 mutants in total.

Ex Experi rimental Design – Tes est Su Suit ites es • Mutation Testing Test Suites: • We generated one test suite per oracle, by random testing. • These test suites consist of 100 test cases • Test cases in these test suites could either deterministically report false positives, or deterministically not report false positives. • Metamorphic Regression Model Oracle had one such test case – this was replaced to prevent false positives from confounding the results. • False Positive Rate Test Suites: • We generated one test suite per oracle, by random testing. • Each test suite consisted of 1000 test cases.

Results ts a and D Discussion – Muta tation S Score • MR – 77/196, RMO – 159/196, and MRMO – 119/196 • Fisher’s Exact Tests + Benjamini- Hochberg Correction = Significant Difference • MRMO is probably more effective than MR because of tightness • RMO is probably more effective than MRMO because: • RMO was less aggressively tuned • MRMO is blind to faults that cause the same level of difference between the source and follow-up test case, whilst RMO is not

Results ts a and D Discussion – Failure e Detec ectio ion Ra Rate • RMO obtained an FDR of 100% for 137/159 killed mutants • MR obtained an FDR of 100% in 52/77 killed mutants • MRMO obtained an FDR of 100% for 40/119 killed mutants • Mann-Whitney U Tests + Benjamini- Hochberg Correction = Significant • Interesting: MR is more effective than MRMO in terms of FDR

Results ts a and D Discussion – False se Pos ositive Ra Rate • False positives arise from: • Statistics can make errors and this could result in false positives • The models used in the RMO and MRMO oracles could make inaccurate predictions • MR reports 0 false positives in all subject programs • The largest false positive rates that were observed for RMO and MRMO across all subject programs is: • MRMO: 0.40% • RMO: 0.40%

Future Work rk • A Genetic Algorithm based test case selection methodology that attempts to maximise the difference between the statistic and the models for the RMO oracle. • The RMO and MRMO oracles both require tuning before they can be used. A method that circumvents this requirement would improve the usability of these techniques.

Thank you for r listening. Ar Are there any ques estion ons?

METHOD HODS FOR T TESTING G UNI NIFORMITY S STATI TISTI TICS - PowerPoint PPT Presentation

METHOD HODS FOR T TESTING G UNI NIFORMITY S STATI TISTI TICS CS KRISHNA PATEL AND ROBERT M. HIERONS BRUNEL UNIVERSITY BRUNEL SOFTWARE ENGINEERING LAB (BSEL) DAVID CLARK AND HECTOR D. MENENDEZ UNIVERSITY COLLEGE LONDON CREST

Philippine S e Stati tatisti tics Auth uthor ority ty Provin incia ial St Statis istic

BI BIG PI PICTURE CTURE IN N STATIS TISTI TICA CAL FRAME AME A DATA VI VISUALIZA

CENSUS AT SCHOOLS 2019/20 A C T I V I T Y PA C K Ta Table of of Contents About Census at t

Levels of Testing Chapter 12 Beyond unit testing Developer Testing stages Unit testing

Testing Terminology System testing Types of errors Function testing Structure

Uni.lu HPC School 2020 PS6: HPC Containers: Singularity Uni.lu High Performance Computing (HPC)

Multiresol olution M on Method hods f for Large-scale L Learni ning ng @ @ NIPS 2 S 2015

Property-Based Testing Matt Bachmann @mattbachmann Testing is Important Testing is Important

Software Testing Overview What is software testing? General testing criteria Testing

Software testing Software Testing Introduction Testing levels Automated testing Principles and

1. Test page This page is for testing. This page is for testing. This page is for testing.

Carlo o Cafiero ro, , Pietro tro Gennari nari and Steve ve Katz tz FAO O Stati tistics

Alternative Refinement February 6, 2012 1 POTOMAC YARD METRORAI L STATI ON I MPLEMENTATI ON WORK

an and d St State ate St Stati atisti stical cal Org rganis anisation ations s (C

Sp Speedin ing g Up Maxim imal al Ca Causality Reduction wi with th Stati tic Analysis

The Scientific Method The Scientific Method The Scientific Method involves 6 steps: Problem

The Distribution of the Sample Mean Suppose that X 1 , X 2 , . . . , X n are a simple random sample

Statistics, Probability, Distributions, & Error Propagation James R. Graham 9/2/09 1

Foundations of Computer Science Lecture 21 Deviations from the Mean How Good is the Expectation

Map Reduce and Design Patterns Lecture 1 Fang Yu Software Security Lab. Department of

E v al u ating a model graphicall y SU P E R VISE D L E AR N IN G IN R : R E G R E SSION Nina Z

Simulating the impact of wind power on the Nord Pool Reference year: 2011 (26% penetration, i.e.,

Introductory Statistics Day 25 Paired Means Test 4.4 Paired Tests Find the data set

Presented in collaboration with Nebraska ICAP, Nebraska DHHS HAI Team, Nebraska Medicine, and The

METHOD HODS FOR T TESTING G UNI NIFORMITY S STATI TISTI TICS - PowerPoint PPT Presentation

METHOD HODS FOR T TESTING G UNI NIFORMITY S STATI TISTI TICS CS KRISHNA PATEL AND ROBERT M. HIERONS BRUNEL UNIVERSITY BRUNEL SOFTWARE ENGINEERING LAB (BSEL) DAVID CLARK AND HECTOR D. MENENDEZ UNIVERSITY COLLEGE LONDON CREST

Philippine S e Stati tatisti tics Auth uthor ority ty Provin incia ial St Statis istic

BI BIG PI PICTURE CTURE IN N STATIS TISTI TICA CAL FRAME AME A DATA VI VISUALIZA

CENSUS AT SCHOOLS 2019/20 A C T I V I T Y PA C K Ta Table of of Contents About Census at t

Levels of Testing Chapter 12 Beyond unit testing Developer Testing stages Unit testing

Testing Terminology System testing Types of errors Function testing Structure

Uni.lu HPC School 2020 PS6: HPC Containers: Singularity Uni.lu High Performance Computing (HPC)

Multiresol olution M on Method hods f for Large-scale L Learni ning ng @ @ NIPS 2 S 2015

Property-Based Testing Matt Bachmann @mattbachmann Testing is Important Testing is Important

Software Testing Overview What is software testing? General testing criteria Testing

Software testing Software Testing Introduction Testing levels Automated testing Principles and

1. Test page This page is for testing. This page is for testing. This page is for testing.

Carlo o Cafiero ro, , Pietro tro Gennari nari and Steve ve Katz tz FAO O Stati tistics

Alternative Refinement February 6, 2012 1 POTOMAC YARD METRORAI L STATI ON I MPLEMENTATI ON WORK

an and d St State ate St Stati atisti stical cal Org rganis anisation ations s (C

Sp Speedin ing g Up Maxim imal al Ca Causality Reduction wi with th Stati tic Analysis

The Scientific Method The Scientific Method The Scientific Method involves 6 steps: Problem

The Distribution of the Sample Mean Suppose that X 1 , X 2 , . . . , X n are a simple random sample

Statistics, Probability, Distributions, &amp; Error Propagation James R. Graham 9/2/09 1

Foundations of Computer Science Lecture 21 Deviations from the Mean How Good is the Expectation

Map Reduce and Design Patterns Lecture 1 Fang Yu Software Security Lab. Department of

E v al u ating a model graphicall y SU P E R VISE D L E AR N IN G IN R : R E G R E SSION Nina Z

Simulating the impact of wind power on the Nord Pool Reference year: 2011 (26% penetration, i.e.,

Introductory Statistics Day 25 Paired Means Test 4.4 Paired Tests Find the data set

Presented in collaboration with Nebraska ICAP, Nebraska DHHS HAI Team, Nebraska Medicine, and The

Statistics, Probability, Distributions, & Error Propagation James R. Graham 9/2/09 1