Feedback-controlled Random Test Generation Kohsuke Yatoh 1* , - PowerPoint PPT Presentation

Feedback-controlled Random Test Generation Kohsuke Yatoh 1* , Kazunori Sakamoto 2 , Fuyuki Ishikawa 2 , Shinichi Honiden 12 1:University of Tokyo, 2:National Institute of Informatics 1 * He is currently affiliated with Google Inc., Japan. All work is done in Univ. Tokyo and nothing to do with Google.

My First Motivation Software testing • Very important • Tedious, labor-intensive and error-prone I want someone ELSE to write tests for me! → Automatic Test Generation 2

Two Sides of Automated Test Generation This paper 1. Input generation (data) Generating interesting test data System under test 1 2 2. Output verification (assertions) Oracles – specifications, domain specific knowledge 3

Background Feedback-directed random test generation (FDRT) [Pacheco.07] Random test generation for OOP languages Random FDRT Classes under method test sequences Usage • • Test by contracts [Pacheco.07] Test by property [Yatoh.14] • • Regression test gen. [Robinson.11] Combination with other automated • Specification mining [Pradel.12] test generation [Garg.13, Zhang.14] 4

Example Input: Class list Output: Method sequences class AddressBook { AddressBook a1 = AddressBook(int capacity) { new AddressBook(10); assert capacity >= 0; Person p1 = … new Person(“foo”); } void add(Person person) {…} a1.add(p1); } //AddressBook a2 = // new AddressBook(-1); class Person { //Person p2 = Person(String name) { // new Person(null); assert name != null; Person p3 = … new Person(“bar”); } } a1.add(p3); a1.add(p1); 5

FDRT Pros & Cons Applicable to wider range of SUT Good than other methods like symbolic execution Coverage of generated tests are low and unstable Bad → less possibility to detect faults Our Contributions 1. Analyzed characteristics of FDRT and found one cause of low and unstable coverage 2. Proposed a new method to mitigate the low coverage (Feedback-controlled Random Test Generation) → 2x - 3x coverage for utility libraries 6

FDRT Algorithm Value Pool Classes Under Test “foo”, “bar”, 1, -1, true, false ,… class Person { Person(String name) {…} bool equals(Person p) {…} } Pool of Candidate Arguments (Initialized with random primitives) 7

FDRT Algorithm Value Pool Classes Under Test “foo”, “bar”, 1, -1, true, false ,… class Person { Person(String name) {…} bool equals(Person p) 2. Choose 3. Save {…} Argument Return Value } “foo” p1 Person p1 = 1. Choose Method new Person (“foo”); Person() 8

FDRT Algorithm Value Pool “foo”, “bar”, 1, -1, Classes Under Test true, false, class Person { p1, … Person(String name) {…} bool equals(Person p) 2. Choose 3. Save {…} Argument Return Value } “bar” p2 Person p2 = 1. Choose Method new Person (“bar”); Person() 9

FDRT Algorithm Value Pool “foo”, “bar”, 1, -1, Classes Under Test true, false, class Person { p1, p2, … Person(String name) {…} bool equals(Person p) 2. Choose 3. Save {…} Argument Return Value } p1, p2 b1 bool b1 = 1. Choose Method p1.equals(p2); equals() 10

FDRT Algorithm Value Pool Feedback “foo”, “bar”, 1, -1, Classes Under Test true, false, class Person { p1, p2, … Person(String name) {…} bool equals(Person p) 2. Choose 3. Save {…} Argument Return Value } p1, p2 b1 bool b1 = 1. Choose Method p1.equals(p2); equals() 11

Problems When Applying to Real Libraries 1. Low test coverage Commons Collections 4.0 Branch Coverage [%] 2. Unstable dependency on seed Elapsed Time [seconds] 12

Cause of Low and Unstable Coverage Positive feedback loop of FDRT ⇒ Bias grows in pool ⇒ Less diversity of generated tests Bias in pool is amplified by feedback (e.g. List) [b] [] [b,a] [b] [] [] [a,b] [a] [a,b] [a,c,d] [a] [a,d] [a] [a,c,a] [a,c] [a,c] 13

Proposed Method Feedback-controlled Random Test Generation • Keep diversity by multiple pools • Hold multiple pools at the same time • Use multiple pools concurrently • Promote diversity by manipulating pools 1. Select pool 2. Add pool 3. Delete pool 4. Global reset 14

Keep Diversity by Multiple Pools • Hold multiple pools at the same time Each pool may be biased, but keep diversity as whole • Use multiple pools concurrently (in turn) Enable pool manipulation described later Single pool Set of pools Original method Proposed method 15

Promote Diversity by Manipulating Pools 1. Select pool Prioritize pools by ‘score’ function (High priority for pools that are likely to archive higher coverage) 2. Add pool Add new pools dynamically 3. Delete pool Delete similar pools using ‘uniqueness’ function 4. Global reset Reset all pools + Restart JVM 16 See the paper for the definition of score and uniqueness function

Evaluation Compared 3 methods • baseline FDRT, one run • reset FDRT, reset every 100 sec. • control Proposed method SUT • 8 popular Java libraries from MVNRepository Configuration • Generate tests using 3600 sec. and record coverage of generated tests • Conduct experiments with 30 different random seeds Xeon X5650 (2.67GHz), 100GB RAM, CentOS 7.0 17 Isolated by Docker Ubuntu 14.04 w/ OpenJDK 1.7

Results – after 3600 seconds Pattern (2) Pattern (1) Branch Coverage [%] Pattern (3) 8 Libraries x 3 methods (baseline, reset, control) 18

Random testing is (1) Large Utility Libraries semantically suitable for this kind of libraries 4 utility libraries with 50K ～ 200K LOC Large improvement on average and variance of coverage Commons Collections Commons Lang 19

(2) Small Libraries 2 libraries with 10K LOC Small improvement, as the original FDRT do very well Improvement on increase speed Gson Commons Codec 20

(3) Configuration-intensive Libraries 2 libraries (Database / Web server) No improvement, very low coverage Needs careful configuration to work properly H2 Jetty Server Core 21

Summary Problem Low and unstable coverage of FDRT Cause: Bias of pool due to positive feedback loop Method Feedback-controlled Random Test Generation • Keep diversity by multiple pools • Promote diversity by pool manipulation Result Three result patterns depending on SUT • Large utility libraries: Large improvement • Small libraries: Small improvement, Less time for fixed coverage • Configuration-intensive libraries: No changes 22

Appendix 24

Bias and Limited Diversity e.g. Black or non-black stone class Stone { # of generated stones bool black; Stone(bool black) {…} b ool isBlack() {…} Stone clone() {…} } Feedback Feedback Bias Larger Bias 25

1. Select Pool • Select pool that is most likely to increase coverage • Scoring function 6.0 11.1 2.3 9.3 4.6 Improves average coverage 26

2. Add Pool • Add a new pool every 1 second 27

3. Delete Pool • Delete pools with similar contents, when #pools exceeds a threshold • Uniqueness function 0.8 0.4 0.9 0.3 0.6 Improves (decreases) Variance of coverage 28

4. Global Reset • Reset every pool and restart JVM • In order to remedy effect of nondeterministic behaviors and JVM instability 29

Results 3 result patterns, depending on SUT property Name LOC Category Commons Collections 58,186 Collections Commons Lang 66,628 Core Utilities (1) Guava 129,249 Core Utilities Commons Math 202,839 Math Libraries Commons Codec 13,948 Base64 Libraries (2) Gson 12,216 JSON Libraries H2 Database Engine 158,926 Embedded SQL Databases (3) Jetty Server Core 32,316 Web Servers 30

Related Work • Adaptive random testing [Ciupa.08] • Similar concept as our approach （ Avoid testing with similar values ） • Heavy computation cost due to calculating distances between every generated values [Arcuri.11] • Combination with Dynamic Symbolic Execution (DSE) • Use FDRT to create seed sequences for DSE [Bounimova.13, Zhang.14] • Alternatively execute FDRT and DSE [Garg.13] Replacing FDRT with our approach would improve the effectiveness and efficiency of these techniques 31

Feedback-controlled Random Test Generation Kohsuke Yatoh 1* , - PowerPoint PPT Presentation

Feedback-controlled Random Test Generation Kohsuke Yatoh 1* , Kazunori Sakamoto 2 , Fuyuki Ishikawa 2 , Shinichi Honiden 12 1:University of Tokyo, 2:National Institute of Informatics 1 * He is currently affiliated with Google Inc., Japan. All

Screening Controlled Substance Screening Controlled Substance Screening Controlled Substance

MEDICAL SOLUTIONS Controlled Power Company MEDICAL SOLUTIONS Controlled Power Company MEDICAL

Count Controlled CSCI-UA.0002-008 Loops Count Controlled Loops A count controlled loop is a

Baby Got Feedback: How to Give and Take Feedback Like A Boss Sarah Hagan @thesarahhagan Sarah

Feedback By Daniels group (Mathew, Kamal and Daniel) Lisa needs feedback What is good

Feedback John Settle October 16, 2014 Distinctions: A Mediators use of feedback

Feedback Control Theory a Computer System s Perspective Introduction Introduction

Feedback EEG Brain Feedback EEG Brain Feedback Tends to be Supposed

Almost over... Feedback Website schedule for talk feedback Booklet (infodesk) feedback@

Nonlinear Control Lecture # 10 State Feedback Stabilization and Robust State Feedback

Haptics Haptics Haptic : Haptic : Haptic and Tactile Feedback Haptic and Tactile Feedback

Relevance Feedback Relevance Feedback Relevance Feedback Prof. Paolo Ciaccia Prof. Paolo

4303 Ed/Serious Feedback and Reinforcement Feedback and Reinforcement Both of them are

Improving Automated Feedback Building a Rule Feedback Generator Eric Bouwers September 27, 2007

controlled with metformin over 28 days: a randomized, double-blind, placebo-controlled study.

Controlled Exercise Strategies Controlled exercise is the basis of preparation for athletic

Multiclass Neural Network Minimization via Tropical Newton Polytope Approximation Georgios

Darrell Bethea May 13, 2011 1 3 Review of String methods Keyboard and Screen

Reactive programming @minebocek mine-cetinkaya-rundel Mine etinkaya-Rundel

Robust model training and generalisation with Studentising flows Simon Alexanderson Gustav Eje

GAC Underserved Regions Working Group Meeting 2 November 2019 13:30 - 14:30 Pua Hunter

Key Extraction Using Thermal Laser Stimulation: A Case Study on Xilinx Ultrascale FPGAs Heiko

harmonius Concept Noise-reducing technology for loud, unbearable environments Helps relieve

NATx4 Port Allocation and Logging [Cheng] draft-cheng-behave-nat44-pre-allocated-ports-01 [Tsou]