Do Code Clones Matter? Software Engineering Seminar Spring Semester 2010 Dan Tecu
Paper presented today: Do Code Clones Matter? Juergens, E.; Deissenboeck, F.; Hummel, B.; Wagner, S. ICSE'09, May 16-24, 2009, Vancouver, Canada 09 th of March Do Code Clones Matter? 2
What is code cloning? Code cloning is a primitive form of code reuse, usually by copying (copy-paste) Code cloning may happen intentionally or unintentionally 09 th of March Do Code Clones Matter? 3
Clones and possible consequences Clones introduce redundancy and might introduce dependencies Inconsistent changes to cloned code might create faults and might lead to incorrect behavior Inconsistent bug fixing: when a bug found in cloned code is fixed in only one clone 09 th of March Do Code Clones Matter? 4
Previous work Most previous work agrees that cloning poses a problem for software maintenance [Lague et al. 1997] However, there is little information available concerning the impact of code cloning on software quality Some researchers even started to doubt that code cloning is harmful [Krinke, 2007] 09 th of March Do Code Clones Matter? 5
Research problem This paper purpose is to shed some light in this field It presents the results of a large case study that was undertaken to find out: 1. whether the clones are changed inconsistently 2. whether the inconsistencies are introduced intentionally 3. whether the unintentional inconsistencies can represent faults 09 th of March Do Code Clones Matter? 6
Contribution A novel suffix-tree based algorithm for detection of inconsistent clones An empirical study showing whether the cloned code is harmful or not 09 th of March Do Code Clones Matter? 7
Detection of inconsistent clones Works on token level Identifier names are irrelevant (due to the normalizer) Clone groups whose clones overlap with each other are filtered out 09 th of March Do Code Clones Matter? 8
Detection in detail: suffix trees A suffix tree for the word: “ wood ” A path from root to a leaf wood describes (uniquely) a suffix; the o d tree describes all suffixes The circles represent nodes, the rectangles represent leaves d od Each edge to a node is labeled with a substring of the initial word Each edge to a leaf is labeled with the empty string 09 th of March Do Code Clones Matter? 9
Detection in detail: edit distance The edit distance between 2 strings is the minimum number of operations required to change one string into another The allowed operations are: 1. Deletion of a character 2. Insertion of a character 3. Changing a character into another The edit distance between “ wood ” and “ floor ” is 3 09 th of March Do Code Clones Matter? 10
Detection in detail: the algorithm Input parameters: the sequence (composed of n tokens), maximum edit distance and minimal clone length The suffix tree over the input sequence is constructed For each suffix an approximate search based on the edit distance is performed in the tree 09 th of March Do Code Clones Matter? 11
Performance of algorithm Worst case complexity is hard to analyze Results on Intel Core 2 Duo 2.4 GHz, 3.5 GB RAM, running Java in a single thread are shown below: 09 th of March Do Code Clones Matter? 12
Study description: study objects 5 projects described below, were analyzed: System Organization Language Age (years) Size (kLOC) A Munich Re C# 6 317 B Munich Re C# 4 454 C Munich Re C# 2 495 D LV 1871 Cobol 17 197 Sysiphus TUM Java 8 281 09 th of March Do Code Clones Matter? 13
Study description: research questions RQ1 : Are clones changed inconsistently? RQ2 : Are inconsistent clones created unintentionally? RQ3 : Can inconsistent clones be indicators for faults in real systems? 09 th of March Do Code Clones Matter? 14
Study measurements: parameters tuning Minimal clone length: 10 statements (for Cobol: 20) Maximum edit distance: 5 (for Cobol: 10) Maximal inconsistency ratio (the ratio of edit distance and clone length): 0.2 Additional constraint: the first 2 statements of two clones need to be equal 09 th of March Do Code Clones Matter? 15
Study measurements: absolute numbers 09 th of March Do Code Clones Matter? 16
Study measurements: relative numbers RQ1 mean value: 0.52 RQ2 mean value: 0.28 RQ3 mean value: 0.15 09 th of March Do Code Clones Matter? 17
Fault density To answer RQ3, we have to compare the fault density in inconsistencies against the average fault density Fault density in inconsistencies was evaluated in faults/ kLOC But, the average fault density in the analyzed systems was not known Typical range for fault density is: 0.1 – 50 faults/kLOC [Endres and Rombach, 2003] 09 th of March Do Code Clones Matter? 18
RQ3 answered Average fault density in inconsistencies: 48.1 faults/kLOC 09 th of March Do Code Clones Matter? 19
Threats to validity The development repositories of the systems were not analyzed (to trace the evolution of inconsistencies) Comparison with the actual fault density would have been better The analyzed projects were not sampled randomly Majority of the systems is written in C# Only 5 systems were analyzed 09 th of March Do Code Clones Matter? 20
Conclusion The answer to RQ1 is positive: clones are changed inconsistently The answer to RQ2 is positive: inconsistent clones are created unintentionally The answer to RQ3 is also positive: the average fault in inconsistent clones is very close to the upper bound of the reported average fault density Inconsistent clones can be indicators for faults in real systems 09 th of March Do Code Clones Matter? 21
References A. Endres and D. Rombach. A Handbook of Software and Systems Engineering . Pearson 2003 J. Krinke. A study of consistent and inconsistent changes to code clones. In Proc. WCRE´07 . IEEE, 2007 B. Lague, D. Proulx, J. Mayrand, E. M. Merlo and J. Hudepohl. Assessing the benefits of incorporating function clone detection in a development process. In Proc. ICSM ´97 . IEEE, 1997 09 th of March Do Code Clones Matter? 22
Detection in detail: detect procedure 09 th of March Do Code Clones Matter? 23
Detection in detail: search procedure 09 th of March Do Code Clones Matter? 24
Recommend
More recommend