empirical analysis of the relationship between cc and
play

Empirical analysis of the relationship between CC and SLOC in a - PowerPoint PPT Presentation

Empirical analysis of the relationship between CC and SLOC in a large corpus of Java methods Davy Landman Alexander Serebrenik Jurgen Vinju Metrics Lines of Code (SLOC) Cyclomatic Complexity (CC) Popular in practice and research


  1. Empirical analysis of the relationship between CC and SLOC in a large corpus of Java methods Davy Landman Alexander Serebrenik Jurgen Vinju

  2. Metrics • Lines of Code (SLOC) • Cyclomatic Complexity (CC) • Popular in practice and research

  3. Metrics • Lines of Code (SLOC) = 7 • Cyclomatic Complexity (CC) = 2 public ¡ double ¡sqrt( int ¡n){ ¡ 1 ¡ ¡ ¡ ¡ ¡// ¡Newton-­‑Raphson ¡method ¡ ¡ ¡ ¡ ¡ ¡ double ¡r ¡= ¡n ¡/ ¡2.0; ¡ 2 ¡ ¡ ¡ ¡ ¡ while ¡(abs(r ¡– ¡(n ¡/ ¡r)) ¡> ¡0.00001) ¡{ ¡ 3 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡r ¡= ¡0.5 ¡* ¡(r ¡+ ¡(n ¡/ ¡r)); ¡ 4 ¡ ¡ ¡ ¡ ¡} ¡ 5 ¡ ¡ ¡ ¡ ¡ return ¡r; ¡ 6 ¡ } ¡ 7 ¡

  4. M. Shepperd. "A critique of cyclomatic complexity as a software metric." Software Engineering Journal 3.2 (1988)

  5. Citations Total 218 Last 5 years 90

  6. CC redundant? • Shepperd’s was based on 8 papers (1979-1987) • 7 papers followed (1991-2013) • Fortran, PL/1, Pascal, COBOL, C, C++, and Java • SLOC & CC correlate linearly R 2 = 0.65 - 0.95

  7. Our research • Identify di ff erences in 15 papers • Get data • Reproduce!

  8. we do not conclude that CC is redundant with SLOC • Our result: R 2 = 0.43 • Di ff erence related work: • Aggregation • Power transform • Larger methods correlate even less • Di ff ering variance

  9. Corpus • 13K Open Source Java Projects (14GB of Java) • 17M methods in 362M SLOC 1e+07 1e+07 1e+05 1e+05 Frequency Frequency 1e+03 1e+03 1e+01 1e+01 1 10 100 100 1000 10000 1 10 100 100 1000 SLOC of a Method CC of a Method E. Linstead, S. K. Bajracharya, T. C. Ngo, P. Rigor, C. V. Lopes, and P. Baldi, “Sourcerer: mining and searching internet-scale software repositories,” Data Mining and Knowledge Discovery, 18.2 (2009).

  10. First result • Correlation ( R 2 ) : 0.43 • Lower than other papers: 0.65 - 0.95 • Why?

  11. Other explanations • Correlation ( R 2 ) : 0.43 • Lower than other papers: 0.65 - 0.95 Yes No Power transform 4 12 File level (sum) 9 6

  12. Power transform 8e+06 1e+07 6e+06 1e+05 Frequency Frequency 4e+06 1e+03 2e+06 1e+01 0e+00 0 1 50 10 100 100 100 150 1000 200 10000 250 SLOC of a Method SLOC of a Method

  13. Method level R 2 = 0.43 R 2 = 0.70

  14. File level • Example: 1 fj le, 30 “small” methods. • File SLOC = 30 * avg(SLOC m ) = 30 * 2.5 • File CC = 30 * avg(CC m ) = 30 * 2 • Volume factor causes high correlation [1] [1] K. El Emam, S. Benlarbi, N. Goel, S.N. Rai. "The confounding e ff ect of class size on the validity of object-oriented metrics." IEEE Transactions on Software Engineering 27.7 (2001)

  15. File level R 2 = 0.87 R 2 = 0.65 Aggrega&on ¡causing ¡it? ¡

  16. we do not conclude that CC is redundant with SLOC • Our result: R 2 = 0.43 • Di ff erence related work: • Aggregation • Power transform • Larger methods correlate even less • Di ff ering variance

  17. 1e+07 50% 25% 10% 1% 0.1% 1e+05 Frequency 1e+03 1e+01 1 10 100 100 1000 10000 SLOC of a Method Israel Herraiz and Ahmed E. Hassan, “Beyond lines of code: Do we need more complexity metrics?” Making Software What Really Works, and Why We Believe It. (2010)

  18. Statistics R 2 “power” R 2 Tail min. SLOC # Methods 100% 1 17.8M 0.43 0.70 50% 3 8.9M 0.45 0.62 25% 9 4.5M 0.42 0.44 10% 20 1.8M 0.38 0.27 1% 77 179K 0.29 0.05 0.1% 230 18K 0.21 0.00

  19. Large Methods

  20. we do not conclude that CC is redundant with SLOC • Our result: R 2 = 0.43 • Di ff erence related work: • Aggregation • Power transform • Larger methods correlate even less • Di ff ering variance

  21. Variance • R 2 = 0.43 means 57% variance not explained • Variance = actual CC – predicted CC

  22. Method level

  23. log 10 (Method level) Method level

  24. log 10 (Method level) Method level File level

  25. log 10 (Method level) log 10 (File level) Method level File level

  26. Di ff ering variance complicate interpretation of linear models

  27. we do not conclude that CC is redundant with SLOC • Our result: R 2 = 0.43 • Di ff erence related work: • Aggregation • Power transform • Larger methods correlate even less • Di ff ering variance

  28. Method Level File Level Summary Large Methods Di ff ering variance ( data, scripts & preprint: http://is.gd/icsme_cc )

Recommend


More recommend