do code clones matter
play

Do Code Clones Matter? Software Engineering Seminar Spring Semester - PowerPoint PPT Presentation

Do Code Clones Matter? Software Engineering Seminar Spring Semester 2010 Dan Tecu Paper presented today: Do Code Clones Matter? Juergens, E.; Deissenboeck, F.; Hummel, B.; Wagner, S. ICSE'09, May 16-24, 2009, Vancouver, Canada 09 th of March


  1. Do Code Clones Matter? Software Engineering Seminar Spring Semester 2010 Dan Tecu

  2. Paper presented today: Do Code Clones Matter? Juergens, E.; Deissenboeck, F.; Hummel, B.; Wagner, S. ICSE'09, May 16-24, 2009, Vancouver, Canada 09 th of March Do Code Clones Matter? 2

  3. What is code cloning?  Code cloning is a primitive form of code reuse, usually by copying (copy-paste)  Code cloning may happen intentionally or unintentionally 09 th of March Do Code Clones Matter? 3

  4. Clones and possible consequences  Clones introduce redundancy and might introduce dependencies  Inconsistent changes to cloned code might create faults and might lead to incorrect behavior  Inconsistent bug fixing: when a bug found in cloned code is fixed in only one clone 09 th of March Do Code Clones Matter? 4

  5. Previous work  Most previous work agrees that cloning poses a problem for software maintenance [Lague et al. 1997]  However, there is little information available concerning the impact of code cloning on software quality  Some researchers even started to doubt that code cloning is harmful [Krinke, 2007] 09 th of March Do Code Clones Matter? 5

  6. Research problem  This paper purpose is to shed some light in this field  It presents the results of a large case study that was undertaken to find out: 1. whether the clones are changed inconsistently 2. whether the inconsistencies are introduced intentionally 3. whether the unintentional inconsistencies can represent faults 09 th of March Do Code Clones Matter? 6

  7. Contribution  A novel suffix-tree based algorithm for detection of inconsistent clones  An empirical study showing whether the cloned code is harmful or not 09 th of March Do Code Clones Matter? 7

  8. Detection of inconsistent clones  Works on token level  Identifier names are irrelevant (due to the normalizer)  Clone groups whose clones overlap with each other are filtered out 09 th of March Do Code Clones Matter? 8

  9. Detection in detail: suffix trees  A suffix tree for the word: “ wood ”  A path from root to a leaf wood describes (uniquely) a suffix; the o d tree describes all suffixes  The circles represent nodes, the rectangles represent leaves d od  Each edge to a node is labeled with a substring of the initial word  Each edge to a leaf is labeled with the empty string 09 th of March Do Code Clones Matter? 9

  10. Detection in detail: edit distance  The edit distance between 2 strings is the minimum number of operations required to change one string into another  The allowed operations are: 1. Deletion of a character 2. Insertion of a character 3. Changing a character into another  The edit distance between “ wood ” and “ floor ” is 3 09 th of March Do Code Clones Matter? 10

  11. Detection in detail: the algorithm  Input parameters: the sequence (composed of n tokens), maximum edit distance and minimal clone length  The suffix tree over the input sequence is constructed  For each suffix an approximate search based on the edit distance is performed in the tree 09 th of March Do Code Clones Matter? 11

  12. Performance of algorithm  Worst case complexity is hard to analyze  Results on Intel Core 2 Duo 2.4 GHz, 3.5 GB RAM, running Java in a single thread are shown below: 09 th of March Do Code Clones Matter? 12

  13. Study description: study objects  5 projects described below, were analyzed: System Organization Language Age (years) Size (kLOC) A Munich Re C# 6 317 B Munich Re C# 4 454 C Munich Re C# 2 495 D LV 1871 Cobol 17 197 Sysiphus TUM Java 8 281 09 th of March Do Code Clones Matter? 13

  14. Study description: research questions  RQ1 : Are clones changed inconsistently?  RQ2 : Are inconsistent clones created unintentionally?  RQ3 : Can inconsistent clones be indicators for faults in real systems? 09 th of March Do Code Clones Matter? 14

  15. Study measurements: parameters tuning  Minimal clone length: 10 statements (for Cobol: 20)  Maximum edit distance: 5 (for Cobol: 10)  Maximal inconsistency ratio (the ratio of edit distance and clone length): 0.2  Additional constraint: the first 2 statements of two clones need to be equal 09 th of March Do Code Clones Matter? 15

  16. Study measurements: absolute numbers 09 th of March Do Code Clones Matter? 16

  17. Study measurements: relative numbers  RQ1 mean value: 0.52  RQ2 mean value: 0.28  RQ3 mean value: 0.15 09 th of March Do Code Clones Matter? 17

  18. Fault density  To answer RQ3, we have to compare the fault density in inconsistencies against the average fault density  Fault density in inconsistencies was evaluated in faults/ kLOC  But, the average fault density in the analyzed systems was not known  Typical range for fault density is: 0.1 – 50 faults/kLOC [Endres and Rombach, 2003] 09 th of March Do Code Clones Matter? 18

  19. RQ3 answered  Average fault density in inconsistencies: 48.1 faults/kLOC 09 th of March Do Code Clones Matter? 19

  20. Threats to validity  The development repositories of the systems were not analyzed (to trace the evolution of inconsistencies)  Comparison with the actual fault density would have been better  The analyzed projects were not sampled randomly  Majority of the systems is written in C#  Only 5 systems were analyzed 09 th of March Do Code Clones Matter? 20

  21. Conclusion  The answer to RQ1 is positive: clones are changed inconsistently  The answer to RQ2 is positive: inconsistent clones are created unintentionally  The answer to RQ3 is also positive: the average fault in inconsistent clones is very close to the upper bound of the reported average fault density  Inconsistent clones can be indicators for faults in real systems 09 th of March Do Code Clones Matter? 21

  22. References  A. Endres and D. Rombach. A Handbook of Software and Systems Engineering . Pearson 2003  J. Krinke. A study of consistent and inconsistent changes to code clones. In Proc. WCRE´07 . IEEE, 2007  B. Lague, D. Proulx, J. Mayrand, E. M. Merlo and J. Hudepohl. Assessing the benefits of incorporating function clone detection in a development process. In Proc. ICSM ´97 . IEEE, 1997 09 th of March Do Code Clones Matter? 22

  23. Detection in detail: detect procedure 09 th of March Do Code Clones Matter? 23

  24. Detection in detail: search procedure 09 th of March Do Code Clones Matter? 24

Recommend


More recommend