automatic inference of structural changes for matching
play

Automatic Inference of Structural Changes for Matching Across - PowerPoint PPT Presentation

Automatic Inference of Structural Changes for Matching Across Program Versions Miryung Kim, David Notkin, Dan Grossman Computer Science & Engineering University of Washington Foo.mA(float) Foo.mA() Foo.mB(float) Foo.mB() Foo.mC()


  1. Automatic Inference of Structural Changes for Matching Across Program Versions Miryung Kim, David Notkin, Dan Grossman Computer Science & Engineering University of Washington Foo.mA(float) Foo.mA() Foo.mB(float) Foo.mB() Foo.mC() Foo.mC() Bar.mA(bool) Boo.mA(bool) Boo.mA(int) Boo.mB(bool) Boo.mB(int)

  2. Code Matching Problem P P’ Bar.Bar() Bar.Bar() Bar.mC(int) Bar.mC(int) Foo.mA(float) Foo.mA() Foo.mB(float) Foo.mB() Foo.mC() Foo.mC() Bar.mA(bool) Boo.mA(bool) Boo.mA(int) Boo.mB(bool) Boo.mB(int)

  3. Our Approach: Matching with Change Rules P P’ Change Rules Bar.Bar() Bar.Bar() Bar.mC(int) Bar.mC(int) Foo.mA(float) Foo.mA() Foo.mB(float) Foo.mB() Foo.mC() Foo.mC() Bar.mA(bool) Boo.mA(bool) Boo.mA(int) Boo.mB(bool) Boo.mB(int)

  4. Our Approach: Matching with Change Rules P P’ Change Rules Bar.Bar() Bar.Bar() Bar.mC(int) Bar.mC(int) Foo.mA(float) Foo.mA() Foo.mB(float) Foo.mB() Foo.mC() Foo.mC() Bar.mA(bool) Boo.mA(bool) all methods in Boo class take int Boo.mA(int) argument instead of bool . Boo.mB(bool) Boo.mB(int)

  5. Motivations for Matching Code • A fundamental building block for mining software repositories • Also a basis for classic software evolution research and tools • Software version merging • Regression testing • Profile propagation

  6. Matching is Challenging. • Matching is hard due to code addition & deletion, copy & paste, refactorings, etc. • Delta between two versions can be very large. • For many uses, matching results must be concise and comprehensible.

  7. Outline • background • our rule-based matching approach • inference algorithm • evaluation • potential applications of change rules

  8. Matching Problem ≈ Change Identification Problem The problem of identifying code matches The problem of identifying changes

  9. Existing Approaches diff, Syntactic Diff (CDiff), Semantic Diff, JDiff, origin analysis, refactoring reconstruction tools, etc. Individually compare code elements at particular granularities using similarity measures

  10. Limitations of Existing Approaches P P’

  11. Limitations of Existing Approaches P P’ Bar.Bar() Bar.Bar() Bar.mC(int) Bar.mC(int) Foo.mA(float) Foo.mA() Foo.mB(float) Foo.mB() Foo.mC() Foo.mC() Bar.mA(bool) Boo.mA(bool) Boo.mA(int) Boo.mB(bool) Boo.mB(int)

  12. Limitations of Existing Approaches P P’ Bar.Bar() Bar.Bar() Bar.mC(int) Bar.mC(int) Foo.mA(float) Foo.mA() Foo.mB(float) Foo.mB() Foo.mC() Foo.mC() Bar.mA(bool) Boo.mA(bool) Boo.mA(int) Boo.mB(bool) Boo.mB(int)

  13. Limitations of Existing Approaches P P’ Bar.Bar() Bar.Bar() Bar.mC(int) Bar.mC(int) Foo.mA(float) Foo.mA() Foo.mB(float) Foo.mB() Foo.mC() Foo.mC() Bar.mA(bool) Boo.mA(bool) Boo.mA(int) Boo.mB(bool) Boo.mB(int)

  14. Limitations of Existing Approaches P P’ Bar.Bar() Bar.Bar() Bar.mC(int) Bar.mC(int) Foo.mA(float) Foo.mA() Foo.mB(float) Foo.mB() Foo.mC() Foo.mC() Bar.mA(bool) Boo.mA(bool) Boo.mA(int) Boo.mB(bool) Boo.mB(int) Cannot disambiguate among many potential matches

  15. Limitations of Existing Approaches P P’ Bar.Bar() Bar.Bar() Bar.mC(int) Bar.mC(int) Foo.mA(float) Foo.mA() Foo.mB(float) Foo.mB() Foo.mC() Foo.mC() Bar.mA(bool) Boo.mA(bool) Boo.mA(int) Boo.mB(bool) Boo.mB(int) Difficult to spot inconsistent and incomplete changes

  16. Limitations of Existing Approaches P P’ Output is an unstructured, usually lengthy list of matches

  17. Limitations of Existing Approaches P P’ move axis drawing classes from chart to chart.axis add boolean input arg to all chart creation APIs Output is an unstructured, usually lengthy list of matches

  18. Limitations of Existing Approaches P P’ Output is an unstructured, usually lengthy list of matches

  19. Outline ✓ background • our rule-based matching approach • inference algorithm • evaluation • potential applications of change rules

  20. Our Rule-based Matching Approach • Our change rule can concisely describe a set of related refactorings and API changes at or above the method header level. • Our tool automatically infers a set of likely change rules between two versions of a program.

  21. Our Contribution 1. Comprehensibility P P’ move axis drawing for all x in chart.*Axis*.*(*) classes from chart packageReplace(x, chart, chart.axis) to chart.axis add boolean for all x in Factory.create*Chart(*) input arg to all argAppend(x, boolean) chart creation APIs Represent a high-level change pattern using a change rule ➡ Easy to understand change intent

  22. Our Contribution 2. Conciseness P P’ R1 R2 R3 R4 R5 R6 Concisely represent large deltas using a small number of change rules

  23. Our Contribution 3. High Recall P P’ Bar.Bar() Bar.Bar() Bar.mC(int) Bar.mC(int) Foo.mA(float) Foo.mA() Foo.mB(float) Foo.mB() Foo.mC() Foo.mC() X Bar.mA(bool) Boo.mA(bool) O Boo.mA(int) Boo.mB(bool) Boo.mB(int) Find matches evidenced by a more general change pattern ➡ Improving recall

  24. Our Contribution 4. Explicit Exceptions for all x in Foo.m*() P except {Foo.mC()} P’ argAppend(x, float) Bar.Bar() Bar.Bar() Bar.mC(int) Bar.mC(int) Foo.mA(float) Foo.mA() Foo.mB(float) Foo.mB() Foo.mC() Foo.mC() Bar.mA(bool) Boo.mA(bool) Boo.mA(int) Boo.mB(bool) Boo.mB(int) Our rule encodes exceptions explicitly ➡ Easy to notice inconsistent and incomplete changes

  25. Change Rule P P’ . for all x:method in scope transformation(x)

  26. Scope • We use a regular expression to denote a set of methods • e.g. chart.Factory.create*Chart(*)

  27. Transformations At or Above the Level of Method Header • 9 types of transformations representing: • replace the name of package, class, and method • replace the return type • modify the input signature, etc.

  28. Change Rule with Exceptions P P’ . for all x:method in (scope - exceptions) transformation(x)

  29. Example Change Rule P P’ . Factory.createChart() Factory.createBarChart() Factory.createChart(int) ... Factory.createBarChart(int) Factory.createPieChart() ... Factory.createLineChart() Factory.createPieChart() Factory.createLineChart(int) Chart creation APIs were changed to take an additional int parameter.

  30. Example Change Rule P P’ . Factory.createChart() Factory.createBarChart() Factory.createChart(int) ... Factory.createBarChart(int) Factory.createPieChart() ... Factory.createLineChart() Factory.createPieChart() Factory.createLineChart(int) For all x in Factory.create*Chart(*) argAppend(x, [int])

  31. Example Change Rule P P’ . Factory.createChart() Factory.createBarChart() Factory.createChart(int) ... Factory.createBarChart(int) Factory.createPieChart() ... Factory.createLineChart() Factory.createPieChart() Factory.createLineChart(int) For all x in Factory.create*Chart(*) except {Factory.createPieChart()} argAppend(x, [int]) 14 matches and 1 exception

  32. Outline ✓ background ✓ our rule-based matching approach • inference algorithm • evaluation • potential applications of change rules

  33. Inference Algorithm Overview Input: two versions of a program Output: a set of likely change rules 1. Generate seed matches 2. Generate candidate rules by generalizing seed matches 3. Evaluate and select candidate rules (greedy algorithm)

  34. Step 1: Generate Seed Matches • Seed matches provide hints textual similarity: 0.75 about likely changes. Foo.getBar(int) • We generate seeds based on Foo.getBar(bool) textual similarity between two method headers. • Seed matches need not be all correct matches.

  35. Step 2: Generate Candidate Rules for each seed [x, y] Given a seed match, • Compare x and y and [Foo.getBar(int), Boo.getBar(bool)] Transformations = { reverse engineer a set of replaceArg(x, int, bool) transformations, T . replaceClass(x, Foo, Boo)} Scopes = {*.*(*), Foo.*(*), ..., • Based on x , guess a set of *.get*(*), *.*Bar(*), ... , Foo.get*(int),... } scopes, S . Candidate Rules = { • Generate candidate rules for all x in *.*(*) replaceArg(x, int, bool), for all x in Foo.*(*) for each pair in S × replaceClass(x, Foo, Boo), ..., PowerSet(T) . for all x in *.*(*) replaceArg(x, int, bool) AND replaceClass(x, Foo, Boo)

  36. Step 3: Evaluate and Select Rules • Greedily select a small subset of candidate rules that explain a large number of matches . • In each iteration • evaluate all candidate rules • select a valid rule with the most number of matches • exclude the matched methods from the set of remaining unmatched methods • Repeat until no rule can find any additional matches.

  37. Finding Exceptions a rule is valid if # exceptions < ε × |scope| P P’ . Factory.createChart() Factory.createBarChart() Factory.createChart(int) Factory.createPieChart() Factory.createBarChart(int) Factory.createLineChart() Factory.createPieChart() Factory.createLineChart(int) For all x in Factory.create*Chart(*) argAppend(x, [int])

  38. Finding Exceptions a rule is valid if # exceptions < ε × |scope| P P’ . Factory.createChart() Factory.createBarChart() Factory.createChart(int) Factory.createPieChart() Factory.createBarChart(int) Factory.createLineChart() Factory.createPieChart() Factory.createLineChart(int) For all x in Factory.create*Chart(*) except {Factory.createPieChart} argAppend(x, [int]) 3 matches 1 exceptions

Recommend


More recommend