Predicting the Number of Reported Bugs in a Software Repository Hadi Jahanshahi Mucahit Cevik Ayşe Başar May 2020 33 rd Canadian Conference on Artificial Intelligence Data Science Laboratory
Outline Data Science Laboratory • Introduction • Contribution and Research Questions • Methodology • Result • Conclusion Predicting the Number of Reported Bugs in a Software Repository May 20 2/20 Jahanshahi et al.
Reported bugs’ pattern Data Science Laboratory • Why is predicting the number of bugs reported to a system important? • Bug prediction : binary classification • Predicting the number of bugs : Regression task • Predicting the number of reported bugs : Time series prediction Predicting the Number of Reported Bugs in a Software Repository May 20 3/20 Jahanshahi et al.
General Idea Data Science Laboratory • In this paper, the number of reported bugs to the Mozilla bug repository during the last decade is extracted. • The release times of Mozilla updates is used as an exogenous variable. • Different time series prediction methods have been utilized to investigate the performance of each model under different circumstances. Predicting the Number of Reported Bugs in a Software Repository May 20 4/20 Jahanshahi et al.
Previous studies and our contributions [I] Data Science Laboratory • Previous studies use generic time series models without having a rational baseline to compare their models. [1, 2, 3, 4] • Another study [5] used time series analysis to determine seasonality and trends of Affective Metrics for Software Development. • They consider the evolution of human aspects in SE while our study focuses on the reported number of bugs in software as a metric which helps developers maintain the software quality Predicting the Number of Reported Bugs in a Software Repository May 20 5/20 Jahanshahi et al.
Previous studies and our contributions [II] Data Science Laboratory • Wang and Zhang [5] design Defect State Transition models and apply the Markovian method to predict the number of defects at each state in the future. • There are also studies that consider software defect number prediction in method-level and file-level [6, 7, 8]. Predicting the Number of Reported Bugs in a Software Repository May 20 6/20 Jahanshahi et al.
Research Questions Data Science Laboratory RQ1 : How accurately the number of bugs in a project can be predicted using time series analysis? RQ2 : How feasible is long-term bug number prediction? Predicting the Number of Reported Bugs in a Software Repository May 20 7/20 Jahanshahi et al.
Data preparation Data Science Laboratory • For time series prediction, we first check whether the given data is stationary. As the p-value of the test is 0.012, there is no need to have supplementary preprocessing. • Checking Auto-correlation function (ACF) and partial autocorrelation function (PACF) Predicting the Number of Reported Bugs in a Software Repository May 20 8/20 Jahanshahi et al.
Methodology Data Science Laboratory • Rolling method is used for training the time series dataset. Predicting the Number of Reported Bugs in a Software Repository May 20 9/20 Jahanshahi et al.
Data Data Science Laboratory • We have extracted the number of reported bugs from the Mozilla bug repository*. ____________________________ * Mozilla Bug Tracking System. https://bugzilla.mozilla.org/. Predicting the Number of Reported Bugs in a Software Repository May 20 10/20 Jahanshahi et al.
Forecasting Models (I) Data Science Laboratory • Naive Baseline : It assumes the number of bugs at time t is equal to that at time t−1. • EXP : It considers two factors in its prediction: the forecast value at the previous timestamp and its actual value. Therefore, it is defined as • WMA : Weighted Moving Average simply forecasts based on a weighted average of the previous steps. Predicting the Number of Reported Bugs in a Software Repository May 20 11/20 Jahanshahi et al.
Forecasting Models (II) Data Science Laboratory • ARIMA : The general ARIMA model (p, q, d) is formulated as Where . • RF : We applied RF Regressor as a new method that has not been used in this domain. Predicting the Number of Reported Bugs in a Software Repository May 20 12/20 Jahanshahi et al.
Forecasting Models (III) Data Science Laboratory • LSTM : We use the LSTM cell architecture defined by [9] as follows: • All models’ parameters are shown in Table 1. Predicting the Number of Reported Bugs in a Software Repository May 20 13/20 Jahanshahi et al.
Results (I) Data Science Laboratory Predicting the Number of Reported Bugs in a Software Repository May 20 14/20 Jahanshahi et al.
Results (II) Data Science Laboratory Predicting the Number of Reported Bugs in a Software Repository May 20 15/20 Jahanshahi et al.
Answer to the Research Questions Data Science Laboratory • RQ1 : How accurately the number of bugs in a project can be predicted using time series analysis? • Surprisingly, the performance of a one-step prediction for all models is not significantly different. Furthermore, the baseline seems as good as the others, a new finding which was not considered in previous studies. • RQ2 : How feasible is long-term bug number prediction? • For the Mozilla project, LSTM shows a significant improvement compared to traditional time series models. Predicting the Number of Reported Bugs in a Software Repository May 20 16/20 Jahanshahi et al.
Conclusions Data Science Laboratory • What we expect to see from our time series analyses: • to forecast the number of future defects • to identify the trends and abnormality in the system. • Our observations: • The number of bugs introduced to the system is stationary. • Considering eight different methods with five different performance metrics, Random Forest with exogenous variables exceeds other methods. • Deep learning, especially LSTM in our case, significantly enhances the long- term prediction. Predicting the Number of Reported Bugs in a Software Repository May 20 17/20 Jahanshahi et al.
Main references (I) Data Science Laboratory [1] Kenmei, B., Antoniol, G., di Penta, M.: Trend analysis and issue prediction in large-scale open source systems. In: 2008 12th European Conference on Software Maintenance and Reengineering, pp. 73 – 82, April 2008 [2] Krishna, R., Agrawal, A., Rahman, A., Sobran, A., Menzies, T.: What is the connection between issues, bugs, and enhancements? Lessons learned from 800+ software projects. In: Proceedings of the 40th International Conference on Software Engineering: Software Engineering in Practice, ICSE-SEIP 2018, pp. 306 – 315. Association for Computing Machinery, New York (2018) [3] Wu, W., Zhang, W., Yang, Y., Wang, Q.: Time series analysis for bug number prediction. In: The 2nd International Conference on Software Engineering and Data Mining, pp. 589 – 596, June 2010 Predicting the Number of Reported Bugs in a Software Repository May 20 18/20 Jahanshahi et al.
Main references (II) Data Science Laboratory [4] Yazdi, H.S., Angelis, L., Kehrer, T., Kelter, U.: A framework for capturing, statistically modeling and analyzing the evolution of software models. J. Syst. Softw. 118, 176 – 207 (2016) [4] Destefanis, G., Ortu, M., Counsell, S., Swift, S., Tonelli, R., Marchesi, M.: On the randomness and seasonality of affective metrics for software development. In: Proceedings of the Symposium on Applied Computing, SAC 2017, pp. 1266 – 1271. Association for Computing Machinery, New York (2017) [5] Wang, J., Zhang, H.: Predicting defect numbers based on defect state transition models. In: Proceedings of the 2012 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement, pp. 191 – 200, September 2012 Predicting the Number of Reported Bugs in a Software Repository May 20 19/20 Jahanshahi et al.
Main references (II) Data Science Laboratory [6] Chen, X., Zhang, D., Zhao, Y., Cui, Z., Ni, C.: Software defect number prediction: unsupervised vs supervised methods. Inf. Softw. Technol. 106, 161 – 181 (2019) [7] Gao, K., Khoshgoftaar, T.M.: A comprehensive empirical study of count models for software fault prediction. IEEE Trans. Reliab. 56(2), 223 – 236 (2007) [8] Graves, T.L., Karr, A.F., Marron, J.S., Siy, H.: Predicting fault incidence using software change history. IEEE Trans. Softw. Eng. 26(7), 653 – 661 (2000) [9] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735 – 1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 Predicting the Number of Reported Bugs in a Software Repository May 20 20/20 Jahanshahi et al.
Recommend
More recommend