Can Data Transformation Help in the Detection of Fault-Prone - PowerPoint PPT Presentation

Can Data Transformation Help in the Detection of Fault-Prone Modules? Y. Jiang, B. Cukic, T. Menzies Lane Department of CSEE West Virginia University DEFECTS 2008 High Assurance Systems Lab

Background • Prediction of fault-prone modules is one of the most active research areas in empirical software engineering. – Also the one with a significant impact to practice of verification and validation. • Recent results indicate that current methods reached a “ceiling effect”. – Differences between (most) classification algorithms not statistically significant. – Different metrics suites do not seem to offer a significant advantage. Feature selection indicates relatively small number of metrics perform as well as larger sets. High Assurance Systems Lab

Motivation • Overcoming the “ceiling” requires experimentation with new approaches appropriate for our domain. – Recent history matters the most [Weyuker et. al] – Inclusion of the developer’s social networks [Zimmerman et. al.]. – Incorporating expert opinions [Khoshgoftaar et. al.]. – Utilization of early life-cycle metrics [Jiang et. al.] – Incorporating misclassification costs [Jiang et. al.] – ( your best ideas here ) • Transformation of metrics data suggested as a possible venue for the improvement [Menzies, TSE’07] High Assurance Systems Lab

Goal of study • Evaluate whether transformation (preprocessing) helps improving the prediction of fault-prone software modules? • Four data transformation methods are used and their effects on prediction compared: a) The original data, no transformation ( none ) b) Ln transformation ( log ) c) Discretization using Fayyad-Irani’s Minimum Description Length algorithm ( nom ) d) Discretization of log transformed data ( log&nom ) High Assurance Systems Lab

The Impact of Transformations High Assurance Systems Lab

Experimental Setup • 9 data sets from Metrics Data Program (MDP). • 4 transformation methods. • 9 classification algorithms for each transformation. • Ten-way cross-validation (10x10 CV). • Evaluation technique: Area Under the ROC curve (AUC). • Total AUCs: 9 datasets x 4 transformation x 9 classifiers x 10CV = 3240 models • Boxplot diagrams depict the results of each fault prediction modeling technique. • Nonparametric statistical hypothesis test tests the difference between the classifiers over multiple data sets. High Assurance Systems Lab

Metrics Data Program (MDP) data sets High Assurance Systems Lab

10 different classifiers used High Assurance Systems Lab

Statistical hypothesis test • We use the nonparametric procedure for the comparison. – 95% confidence level used in all experiments. • Performance comparison between more than two experiments: – Friedman test determines whether there are statistically significant differences amongst in classification performance across ALL experiments. – If yes, after-the-fact Nemenyi test ranks different classifiers. • For the comparison of two specific experiments, we use Wilcoxon’s signed rank test. High Assurance Systems Lab

Classification results using the original data High Assurance Systems Lab

Classification results using the log transformed data High Assurance Systems Lab

Classification results using the discretized data High Assurance Systems Lab

Classification results using the discretized log transformed data High Assurance Systems Lab

Comparing results over different data domains • Random forest ranked as one of the best classifiers in the original and log transformed domains. • Boosting ranked as one of the best classifiers in the experiments with the discretized data. • The performance comparison reveals statistically significant difference. – We compared random forest ( none and log ) vs . boosting ( nom and log&nom ) using the Wilcoxon signed ranked test, using 95% confidence interval • Random forest in original and log transformed domains beats Boosting in discretized domains. High Assurance Systems Lab

Comparing the classifiers across the four transformation domains Better for none and log Better for discretized data all the same High Assurance Systems Lab

Conclusions � • Transformation did not improve overall classification performance, measured by AUC. • Random forest is reliably one of the best classification algorithms in the original and log domains. • Boosting offers the best models in the discretized data domains. • NaiveBayes is greatly improved in the discretized domain. • Log transformation rarely affects the performance of software quality models. High Assurance Systems Lab

Ensuing Research • Data transformation unlikely to make the impact on breaking the “performance ceiling”. • The heuristics for the selection of the “most promising” classification algorithms. • So, how to “break the ceiling”? – We may have ran out of “low hanging research fruit”. – Possible directions: • Fusion of measures from different development phases. • Human factor. • Correlating with operational profiles. • Business context. • ??? High Assurance Systems Lab

Can Data Transformation Help in the Detection of Fault-Prone - PowerPoint PPT Presentation

Can Data Transformation Help in the Detection of Fault-Prone Modules? Y. Jiang, B. Cukic, T. Menzies Lane Department of CSEE West Virginia University DEFECTS 2008 High Assurance Systems Lab Background Prediction of fault-prone modules is

Lecture 10: Fault Tolerance Fault Tolerant Concurrent Computing The main principles of fault

BSc Project What kinds of fault we may confront in a control loop? Fault Detection &

Distributed Systems 5. Fault Tolerant Systems Fault-Tolerance - 1 Lszl Bszrmnyi

JUST ONE FAULT Persistent Fault Analysis on Block Ciphers Shivam Bhasin Temasek Labs @ NTU ASK

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

Fault Detection and Mitigation in WLAN RSS Nearest Neighbor Fingerprint-based Positioning

Automated fault detection for Automated fault detection for Autosub6000: Autosub6000: What we've

Active fault level management Introducing the Fault Current Limiting service 1 Fluctuating

Overview Introduction and basic concept ECE 753: FAULT-TOLERANT Fault model and fault

Fault Tolerance and Robustness in Concurrent Systems Faults, errors, failures, and fault

Fault Modeling 1 Why Fault Models? Actual number of physical defects in a circuit are too

Voting-based fault detection and diagnosis in systems with multiple operating conditions Carlos F

Locating arrays and disjoint partitions Daniel Horsley (Monash University, Australia) Joint work

IEEE P1149.8.1 Selective Toggle Fault Detection and Isolation Steve Butkovich Cisco Systems,

Fault Detection & Diagnosis in Control Valve Shahriar iar Shahra ram Super ervi visor:

Strategy for City Transformation (Part 4) Sunday November 25, Strategy for City Transformation

Increasing Automation in the Backporting of Linux Drivers Using Coccinelle Luis R. Rodriguez,

a DSL for Configuring a Fieldbus Mathijs Schuts & Jozef Hooman Philips & ESI TNO

Adversarial Examples Hanxiao Liu April 2, 2018 1 / 22 Adversarial Examples Inputs to ML

CS 188: Artificial Intelligence Lecture 7: Utility Theory Pieter Abbeel UC Berkeley Many

I Sensed It Was You: Authenticating Mobile Users with Sensor-enhanced Keystroke Dynamics Kamil

HEPATITIS B AND THE ADA SUSANA LORENZO-GIGUERE, FORMER TRIAL ATTORNEY AT THE DEPARTMENT OF

COSTA: A COSt and Termination Analyzer for Java (bytecode) Programs Elvira Albert and Germ an

SoC Pain Points and Gaps An Overview - MUKUND PAI Intel Corp. All Discussions in this Slide

Can Data Transformation Help in the Detection of Fault-Prone - PowerPoint PPT Presentation

Can Data Transformation Help in the Detection of Fault-Prone Modules? Y. Jiang, B. Cukic, T. Menzies Lane Department of CSEE West Virginia University DEFECTS 2008 High Assurance Systems Lab Background Prediction of fault-prone modules is

Lecture 10: Fault Tolerance Fault Tolerant Concurrent Computing The main principles of fault

BSc Project What kinds of fault we may confront in a control loop? Fault Detection &amp;

Distributed Systems 5. Fault Tolerant Systems Fault-Tolerance - 1 Lszl Bszrmnyi

JUST ONE FAULT Persistent Fault Analysis on Block Ciphers Shivam Bhasin Temasek Labs @ NTU ASK

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

Fault Detection and Mitigation in WLAN RSS Nearest Neighbor Fingerprint-based Positioning

Automated fault detection for Automated fault detection for Autosub6000: Autosub6000: What we've

Active fault level management Introducing the Fault Current Limiting service 1 Fluctuating

Overview Introduction and basic concept ECE 753: FAULT-TOLERANT Fault model and fault

Fault Tolerance and Robustness in Concurrent Systems Faults, errors, failures, and fault

Fault Modeling 1 Why Fault Models? Actual number of physical defects in a circuit are too

Voting-based fault detection and diagnosis in systems with multiple operating conditions Carlos F

Locating arrays and disjoint partitions Daniel Horsley (Monash University, Australia) Joint work

IEEE P1149.8.1 Selective Toggle Fault Detection and Isolation Steve Butkovich Cisco Systems,

Fault Detection &amp; Diagnosis in Control Valve Shahriar iar Shahra ram Super ervi visor:

Strategy for City Transformation (Part 4) Sunday November 25, Strategy for City Transformation

Increasing Automation in the Backporting of Linux Drivers Using Coccinelle Luis R. Rodriguez,

a DSL for Configuring a Fieldbus Mathijs Schuts &amp; Jozef Hooman Philips &amp; ESI TNO

Adversarial Examples Hanxiao Liu April 2, 2018 1 / 22 Adversarial Examples Inputs to ML

CS 188: Artificial Intelligence Lecture 7: Utility Theory Pieter Abbeel UC Berkeley Many

I Sensed It Was You: Authenticating Mobile Users with Sensor-enhanced Keystroke Dynamics Kamil

HEPATITIS B AND THE ADA SUSANA LORENZO-GIGUERE, FORMER TRIAL ATTORNEY AT THE DEPARTMENT OF

COSTA: A COSt and Termination Analyzer for Java (bytecode) Programs Elvira Albert and Germ an

SoC Pain Points and Gaps An Overview - MUKUND PAI Intel Corp. All Discussions in this Slide

BSc Project What kinds of fault we may confront in a control loop? Fault Detection &

Fault Detection & Diagnosis in Control Valve Shahriar iar Shahra ram Super ervi visor:

a DSL for Configuring a Fieldbus Mathijs Schuts & Jozef Hooman Philips & ESI TNO