reported bugs in a software
play

Reported Bugs in a Software Repository Hadi Jahanshahi Mucahit - PowerPoint PPT Presentation

Predicting the Number of Reported Bugs in a Software Repository Hadi Jahanshahi Mucahit Cevik Aye Baar May 2020 33 rd Canadian Conference on Artificial Intelligence Data Science Laboratory Outline Data Science Laboratory


  1. Predicting the Number of Reported Bugs in a Software Repository Hadi Jahanshahi Mucahit Cevik Ayşe Başar May 2020 33 rd Canadian Conference on Artificial Intelligence Data Science Laboratory

  2. Outline Data Science Laboratory • Introduction • Contribution and Research Questions • Methodology • Result • Conclusion Predicting the Number of Reported Bugs in a Software Repository May 20 2/20 Jahanshahi et al.

  3. Reported bugs’ pattern Data Science Laboratory • Why is predicting the number of bugs reported to a system important? • Bug prediction : binary classification • Predicting the number of bugs : Regression task • Predicting the number of reported bugs : Time series prediction Predicting the Number of Reported Bugs in a Software Repository May 20 3/20 Jahanshahi et al.

  4. General Idea Data Science Laboratory • In this paper, the number of reported bugs to the Mozilla bug repository during the last decade is extracted. • The release times of Mozilla updates is used as an exogenous variable. • Different time series prediction methods have been utilized to investigate the performance of each model under different circumstances. Predicting the Number of Reported Bugs in a Software Repository May 20 4/20 Jahanshahi et al.

  5. Previous studies and our contributions [I] Data Science Laboratory • Previous studies use generic time series models without having a rational baseline to compare their models. [1, 2, 3, 4] • Another study [5] used time series analysis to determine seasonality and trends of Affective Metrics for Software Development. • They consider the evolution of human aspects in SE while our study focuses on the reported number of bugs in software as a metric which helps developers maintain the software quality Predicting the Number of Reported Bugs in a Software Repository May 20 5/20 Jahanshahi et al.

  6. Previous studies and our contributions [II] Data Science Laboratory • Wang and Zhang [5] design Defect State Transition models and apply the Markovian method to predict the number of defects at each state in the future. • There are also studies that consider software defect number prediction in method-level and file-level [6, 7, 8]. Predicting the Number of Reported Bugs in a Software Repository May 20 6/20 Jahanshahi et al.

  7. Research Questions Data Science Laboratory RQ1 : How accurately the number of bugs in a project can be predicted using time series analysis? RQ2 : How feasible is long-term bug number prediction? Predicting the Number of Reported Bugs in a Software Repository May 20 7/20 Jahanshahi et al.

  8. Data preparation Data Science Laboratory • For time series prediction, we first check whether the given data is stationary. As the p-value of the test is 0.012, there is no need to have supplementary preprocessing. • Checking Auto-correlation function (ACF) and partial autocorrelation function (PACF) Predicting the Number of Reported Bugs in a Software Repository May 20 8/20 Jahanshahi et al.

  9. Methodology Data Science Laboratory • Rolling method is used for training the time series dataset. Predicting the Number of Reported Bugs in a Software Repository May 20 9/20 Jahanshahi et al.

  10. Data Data Science Laboratory • We have extracted the number of reported bugs from the Mozilla bug repository*. ____________________________ * Mozilla Bug Tracking System. https://bugzilla.mozilla.org/. Predicting the Number of Reported Bugs in a Software Repository May 20 10/20 Jahanshahi et al.

  11. Forecasting Models (I) Data Science Laboratory • Naive Baseline : It assumes the number of bugs at time t is equal to that at time t−1. • EXP : It considers two factors in its prediction: the forecast value at the previous timestamp and its actual value. Therefore, it is defined as • WMA : Weighted Moving Average simply forecasts based on a weighted average of the previous steps. Predicting the Number of Reported Bugs in a Software Repository May 20 11/20 Jahanshahi et al.

  12. Forecasting Models (II) Data Science Laboratory • ARIMA : The general ARIMA model (p, q, d) is formulated as Where . • RF : We applied RF Regressor as a new method that has not been used in this domain. Predicting the Number of Reported Bugs in a Software Repository May 20 12/20 Jahanshahi et al.

  13. Forecasting Models (III) Data Science Laboratory • LSTM : We use the LSTM cell architecture defined by [9] as follows: • All models’ parameters are shown in Table 1. Predicting the Number of Reported Bugs in a Software Repository May 20 13/20 Jahanshahi et al.

  14. Results (I) Data Science Laboratory Predicting the Number of Reported Bugs in a Software Repository May 20 14/20 Jahanshahi et al.

  15. Results (II) Data Science Laboratory Predicting the Number of Reported Bugs in a Software Repository May 20 15/20 Jahanshahi et al.

  16. Answer to the Research Questions Data Science Laboratory • RQ1 : How accurately the number of bugs in a project can be predicted using time series analysis? • Surprisingly, the performance of a one-step prediction for all models is not significantly different. Furthermore, the baseline seems as good as the others, a new finding which was not considered in previous studies. • RQ2 : How feasible is long-term bug number prediction? • For the Mozilla project, LSTM shows a significant improvement compared to traditional time series models. Predicting the Number of Reported Bugs in a Software Repository May 20 16/20 Jahanshahi et al.

  17. Conclusions Data Science Laboratory • What we expect to see from our time series analyses: • to forecast the number of future defects • to identify the trends and abnormality in the system. • Our observations: • The number of bugs introduced to the system is stationary. • Considering eight different methods with five different performance metrics, Random Forest with exogenous variables exceeds other methods. • Deep learning, especially LSTM in our case, significantly enhances the long- term prediction. Predicting the Number of Reported Bugs in a Software Repository May 20 17/20 Jahanshahi et al.

  18. Main references (I) Data Science Laboratory [1] Kenmei, B., Antoniol, G., di Penta, M.: Trend analysis and issue prediction in large-scale open source systems. In: 2008 12th European Conference on Software Maintenance and Reengineering, pp. 73 – 82, April 2008 [2] Krishna, R., Agrawal, A., Rahman, A., Sobran, A., Menzies, T.: What is the connection between issues, bugs, and enhancements? Lessons learned from 800+ software projects. In: Proceedings of the 40th International Conference on Software Engineering: Software Engineering in Practice, ICSE-SEIP 2018, pp. 306 – 315. Association for Computing Machinery, New York (2018) [3] Wu, W., Zhang, W., Yang, Y., Wang, Q.: Time series analysis for bug number prediction. In: The 2nd International Conference on Software Engineering and Data Mining, pp. 589 – 596, June 2010 Predicting the Number of Reported Bugs in a Software Repository May 20 18/20 Jahanshahi et al.

  19. Main references (II) Data Science Laboratory [4] Yazdi, H.S., Angelis, L., Kehrer, T., Kelter, U.: A framework for capturing, statistically modeling and analyzing the evolution of software models. J. Syst. Softw. 118, 176 – 207 (2016) [4] Destefanis, G., Ortu, M., Counsell, S., Swift, S., Tonelli, R., Marchesi, M.: On the randomness and seasonality of affective metrics for software development. In: Proceedings of the Symposium on Applied Computing, SAC 2017, pp. 1266 – 1271. Association for Computing Machinery, New York (2017) [5] Wang, J., Zhang, H.: Predicting defect numbers based on defect state transition models. In: Proceedings of the 2012 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement, pp. 191 – 200, September 2012 Predicting the Number of Reported Bugs in a Software Repository May 20 19/20 Jahanshahi et al.

  20. Main references (II) Data Science Laboratory [6] Chen, X., Zhang, D., Zhao, Y., Cui, Z., Ni, C.: Software defect number prediction: unsupervised vs supervised methods. Inf. Softw. Technol. 106, 161 – 181 (2019) [7] Gao, K., Khoshgoftaar, T.M.: A comprehensive empirical study of count models for software fault prediction. IEEE Trans. Reliab. 56(2), 223 – 236 (2007) [8] Graves, T.L., Karr, A.F., Marron, J.S., Siy, H.: Predicting fault incidence using software change history. IEEE Trans. Softw. Eng. 26(7), 653 – 661 (2000) [9] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735 – 1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 Predicting the Number of Reported Bugs in a Software Repository May 20 20/20 Jahanshahi et al.

Recommend


More recommend