Automatic Identification of Bug-fix Commits: The Case of GitHub - PowerPoint PPT Presentation

Automatic Identification of Bug-fix Commits: The Case of GitHub Projects Yujuan Jiang, Rodrigo Morales, Bram Adams, Foutse Khom 1

• Case study projects • Approach • Research questions • Result (so far) 2

Case Study Projects key words: GitHub, C language 3

Approach • Data Collection • Feature Extraction (Text & Source code) • Model Training • Evaluation 4

Approach: Data collection 5

Approach: Feature Extraction Textual Analysis: keywords Code Analysis 6

Approach: Feature Extraction 1) Textual Analysis: 7

Approach: Feature Extraction 1) Textual Analysis: keywords 7

Approach: Feature Extraction 1) Textual Analysis: keywords + feature words 7

Approach: Feature Extraction 1) Textual Analysis: keywords + feature words All words 7

Approach: Feature Extraction 1) Textual Analysis: keywords + feature words Stem + All words remove stop words 7

Approach: Feature Extraction 1) Textual Analysis: keywords + feature words Stem + All words Filter remove stop words 7

Approach: Feature Extraction 2) Source Code Analysis: 8

Approach: Feature Extraction 2) Source Code Analysis: Patch Parser 8

Approach: Feature Extraction 2) Source Code Analysis: Patch Parser + re Script 8

Approach: Feature Extraction 2) Source Code Analysis: Patch Parser + re Script Commits 8

Approach: Feature Extraction 2) Source Code Analysis: Patch Parser + re Script Commits Parser 8

Approach: Feature Extraction 2) Source Code Analysis: Patch Parser + re Script Commits Parser Commit Profile 8

Approach: Feature Extraction 2) Source Code Analysis: Patch Parser + re Script # of while loops # of ifs # of boolean ...... Commits Parser Commit Profile Features 8

Approach: Feature Extraction 9

Approach: Model Training Black data (Manually label 300 bug fixing commits for each project) Grey data (Unlabelled) 10

Approach: Model Training Black data (Manually label 300 bug fixing commits for each project) Grey data LPU (Unlabelled) 10

Approach: Model Training Black data (Manually label 300 bug fixing commits for each project) White data (Bottom k) Grey data LPU (Unlabelled) Black data 10

Approach: Model Training Black data (Manually label 300 bug fixing commits for each project) White data (Bottom k) Grey data + LPU (Unlabelled) Black data SVM Random Forest 10

Approach: Evaluation 11

Research Questions • Does our classifier work better than the baseline: keyword-based approach? • How does the parameter k impact the classifier? • What kind of metrics play more important roles in identifying bug-fixing commits? • Is the hybrid approach (namely the combination of the LPU and SVM) more effective than a single classifier approach? • Which combination of the options of the tool LPU makes the classifier work best? 12

Result (so far): recall • Libgit2: 76.95% • openFrameworks: 96.67% 13

Result (so far): key features X5 ● X6 ● X7 ● X22 ● X20 ● X21 ● X23 ● X31 ● X12 ● X50 ● X27 ● X16 ● X10 ● X16676 ● X51 ● X49 ● X48 ● X47 ● X46 ● X45 ● X44 ● X43 ● X42 ● X40 ● X39 ● X36 ● X35 ● X34 ● X32 ● X30 ● X29 ● X28 ● X25 ● X24 ● X19 ● X18 ● X17 ● X15 ● X14 ● X13 ● X11 ● X9 ● X4 ● X3 ● X2 ● X26 ● X37 ● X33 ● X41 ● X38 ● 0.000 0.005 0.010 0.015 0.020 0.025 0.030 Libgit2 14

LPU SVM 15

X5 ● X6 ● X7 ● X22 ● X20 ● X21 ● X23 ● X31 ● X12 ● X50 ● X27 ● X16 ● X10 ● X16676 ● X51 ● X49 ● X48 ● X47 ● X46 ● X45 ● X44 ● X43 ● X42 ● X40 ● X39 ● X36 ● X35 ● X34 ● X32 ● X30 ● X29 ● X28 ● X25 ● X24 ● X19 ● X18 ● X17 ● X15 ● X14 ● X13 ● X11 ● X9 ● X4 ● X3 ● X2 ● X26 ● X37 ● X33 ● X41 ● X38 ● LPU SVM 0.000 0.005 0.010 0.015 0.020 0.025 0.030 15

Automatic Identification of Bug-fix Commits: The Case of GitHub - PowerPoint PPT Presentation

Automatic Identification of Bug-fix Commits: The Case of GitHub Projects Yujuan Jiang, Rodrigo Morales, Bram Adams, Foutse Khom 1 Case study projects Approach Research questions Result (so far) 2 Case Study Projects key words:

Automatic Identification and Disambiguation of Concepts and Named Entities in the Multilingual

Automatic Identification and Normalisation of Physical Measurements in Scientific Literature

Untangling Composite Commits Untangling Composite Commits Using Program Slicing Using Program

Review Mining Soo-Min Lim and Eduard Hovy. (2006). Automatic Identification of Pro and Con

SHAPE YOUR COMMITS SHAPE YOUR COMMITS WITH GERRIT WITH GERRIT original version online This is

Automatic writer identification: the paleographers new best friend? Jinna Smit University of

In a Nutshell Eurovoc thesaurus descriptors, here displayed in English 6621020304 52160104

CheckThat! 2020 3 rd edition Enabling the Automatic Identification and Verification of Claims in

Automatic Identification of Locative Expressions from Social Media Text: A Comparative Analysis

The Automatic Identification of Unstable Approaches from Flight Data Robert J. de Boer, Teun

Evaluations of Deep Convolutional Neural Networks for Automatic Identification of Malaria Infected

Automatic Identification of Common and Special Object-Oriented Unit Tests Tao Xie Advisor:

Automatic Programming Error Class Identification with Code Plagiarism-Based Clustering Dr

= arg min t (could be Maximum Likelihood) AUTOMATIC CONTROL AUTOMATIC CONTROL

Fluorescence labeling of polymers for automatic identification in mixed plastic waste streams.

IconIntent : Automatic Identification of Sensitive UI Widgets based on Icon Classification for

Automatic Query Type Identification Automatic Query Type Identification Based on Click Through

CheckThat! 2020 3 rd edition Automatic Identification and Verification of Claims Some tweets in

The PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions (1.1) C.

Commits and work cycle Presenter: Steve Baskauf steve.baskauf@vanderbilt.edu CodeGraf landing

CAI Center of Automatic Identification Lukas Vojtech Daniel Lopour RFID in Europe RFID

Data Reduction Techniques applied on Automatic Identification System Data Claudia Ifrim*, Iulian

Lecture 3 More on Git Commits Sign in on the attendance sheet! Review: The Git Commit Workflow

Security Biometric identification recognition and/or verification of a persons identity.