Mining Version Histories to Guide Software Changes by T. Zimmermann, - PowerPoint PPT Presentation

Mining Version Histories to Guide Software Changes by T. Zimmermann, P. Weißgerber, S. Diehl, A. Zeller in IEEE Transaction on Software Engineering, Vol. 31, No. 6., June 2005

The Idea Can we make similar suggestions for software changes?

Extending Eclipse Preferences  Extend Eclipse IDE with a new preference  Preferences are stored in a field fKeys[]

Extending Eclipse Preferences  What else do you need to change?  Which of the 27,000 files?  Which of the 20,000 classes?  Which of the 200,000 methods?  Program analysis  fKeys[] and initDefaults() use the same variables  Usage does not induce change  Usage can be detected only within the source code  Eclipse has 12,000 non-Java files

Learning from History  Programmer who changed fKeys[] also changed …

From CVS to Transactions  The CVS archive for Eclipse has more than 47,000 transactions

ROSE in a Nutshell

Changes -> Transactions -> Rules Entity – a triple (c, i, p),   where c – syntactic category; i – identifier; p – parent entity Example: (method, initDefaults(), ( class, Comp, (file, Comp.java, …))  Operations on entities: add_to, del_from, alter  Transaction – the set of changes simultaneously submitted by a  developer to a version archive

Getting Syntactic Entities

Light-Weight Analysis with ROSE

Light-Weight Analysis with ROSE Rose analyzes C/C++, JAVA, PYTHON, T E X and TEXINFO files We get modified methods , variables and subsections

Changes -> Transactions -> Rules  ROSE retrieves changes and transactions from CVS [Berliner’90]  CVS provides only file versioning  Per-file changes are grouped into transactions  Files -> Transactions -> Sliding window approach [Fogel’02]  Two subsequent changes, the same author, 200 second apart  Branches and Merges in CVS  Rose ignores changes that affect more than 30 entities

Changes -> Transactions -> Rules  Rules are mined from transactions  Rules are mined with Apriori Algorithm [Agrawal’94]  The generated rules have the form: antecedent(s) => consequent (s)   The rules have a probabilistic interpretation Evidence: support count (# of transactions) and confidence (the  strength of the correspondence)

Evolutionary Coupling

Evolutionary Coupling Support : How much evidence (= simultaneous changes)? Confidence : How much relevant is coupling for participants?

Applying Rules  The programmer performs a change – “a situation”:  ROSE suggests further changes by applying matching rules  Matching rule = situation = antecedent  The suggestion = union of the consequents of all the matching rules  The # of rules depends on support count and confidence

Multi-Dimensional Rules  If something is added to software, there is no way to predict the change based on history  E.g., the developer adds “Foo” constant to Comp.java  ROSE can do that in “operation” dimension

Examples of Rules  GCC arrays that define the cost of different assembler operations for INTEL CPUs  The arrays have been altered 9 times; 9 out of 11 times, the change is triggered by a change in the type:

Examples of Rules  Python and C files – detecting evolutionary couplings in different programming languages  It would require cross-language program analysis to detect this coupling

Examples of Rules  POSTGRES documentation

ROSE Server and Client  The ROSE server determines coupling and rules  The ROSE client guides the programmer along related changes

Evaluation  How good are rules at predicting changes?  Training period: ROSE infers rules from the past  Evaluation period: ROSE applies the mined rules  In evaluation period, every transaction T is checked:  Navigation : given one change in T, does ROSE point to further changes in T?  Error Prevention : given all but one change from T, does ROSE point to the missing change?  Closure : given all changes of T, does ROSE stay silent?

Evaluating Additional Questions  Granularity  Files and functions  Maintenance  No addition or deletions  Multiple Dimensions  What is the benefit of add_to and del_from ?  History  How much history? Usefulness over time? Quality or recommendations depending of the development cycle and releases?  Recent Changes  Relevance of old changes

Projects Used for Evaluation

Precision vs. Recall  Recall : How many relevant entities are returned?  Precision : How many of the returned entities are relevant?

Precision vs. Feedback / Support Count vs. Confidence

Results: Navigation, Prevention, Closure

Navigation, Prevention, Closure  The programmer has changed one single entity. Can ROSE suggest other entities that should be changed?  The programmer has changed several entities but one. Does ROSE find the missing one?  The programmer made all necessary changes. How often does ROSE still suggest a missing change?

Results for Fine Granularity

Results: Navigation  Given one initial item, ROSE makes predictions in 66 percent of all queries  On average, the predictions contain 33 percent of all items changed  For those queries for which ROSE makes recommendations, in 7 percent of the cases, a correct location is within ROSE’s topmost three suggestions

Results: Prevention and Closure  In 3 percent of the queries where one item is missing, ROSE issues a correct warning  A warning predicts 75 percent of the items that need to be changed  ROSE’s warning about missing items should be taken seriously …  Only 2 percent of all transactions cause a false alarm (!)

Results for Coarse Granularity

Results for Maintenance  Rose shows the best predictive power for changes to existing entities

Threads to Validity  Kinds of version histories and software projects  8 projects; 100,000 transactions  Transactions do not record the order  CVS limitation  Quality of transactions?  User studies?

Summary For stable systems like GCC, ROSE gives precise suggestions  (recommendation in 63% of transactions, precision – 30%, in 90% of all recommendations – 3 topmost suggestions contain correct entity) For rapidly changing systems like KOFFICE, most useful  suggestions are at the file level (because prediction new functions – out of reach for any approach) Predictive power of ROSE is best during maintenance phases  In about 2-7% of all erroneous transactions, ROSE correctly detects  the missing change (only 2% of all transactions cause false alarm) ROSE detects coupling between non-program entities (e.g. docs,  manuals, mappings)

Future Work  Taxonomies : identify patterns of changes  Sequence rules : detect rules across multiple transactions  Further data sources : log messages, bug databases  Refactoring : ROSE does not recognize renamings of methods or files  Program analysis : can improve the overall approach  Rule presentation : visualization of rules can help

Downloads ROSE is publicly available as a plug-in for ECLIPSE For details and downloads visit http://www.st.cs.uni-sb.de/softevo

Mining Version Histories to Guide Software Changes by T. Zimmermann, - PowerPoint PPT Presentation

Mining Version Histories to Guide Software Changes by T. Zimmermann, P. Weigerber, S. Diehl, A. Zeller in IEEE Transaction on Software Engineering, Vol. 31, No. 6., June 2005 The Idea Can we make similar suggestions for software changes?

1 Mining Event Histories Mining Event Histories Sequence Analysis in Social Sciences Sequence

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Vibration Case Histories Vibration Case Histories Barry T. Cease MeadWestvaco 1 9/19/2006

Concepts, Science, Communities: Writing histories of African economic thought Dr Gerardo Serra,

MEDIA HISTORIES 1850-2050 Winter 2017 DESMA 8 Media Histories Dr. Peter Lunenfeld

Making histories, sharing histories: Community-based Archives & Digging Where We Stand Dr

1 2 3 State R&D Graphic, Version 1 Version 1 4 State R&D Graphic, Version 1,

Cement, Aggregates, Mining Presentation Cement, Aggregates and Mining Cement, Aggregates and

Frequent Pattern Mining Frequent Sequence Mining Frequent Tree Mining Christian Borgelt

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

Web MINING Web MINING Overview Overview Dr Ahmed Rafea Rafea Dr Ahmed 1 Web Mining Outline

Welkom Subjects: Introduction Changes version 4 to 5: protocol Changes version 4 to

Evaluation Map Guide Evaluation Map Guide Evaluation Map Guide Evaluation Map Guide Progress on

Semantic Slicing of Software Version Histories Yi Li / U Toronto Julia Rubin / MIT Marsha

CSSS 569 Visualizing Data and Models Lab 2: Intro to L A T EX with Overleaf Kai Ping (Brian)

Typing Directories Kathleen Fisher AT&T Labs Research Joint work with David Walker and Kenny

Continuous-variable quantum computing: scalable designs and fault tolerance Nicolas C

Health Rebecca Jolley, MBA Terry Hill, MPA Executive Director, Rural Health Executive Director,

Wavelet-based CVS method to solve a convection-dominated problem: the numerical simulation of

Empirical Project Monitor and Results from 100 OSS Development Projects Masao Ohira Empirical

Building DICE Building DICE Building DICE Building DICE Packages Packages Packages Packages

Automatically Generating Predicates and Solutions for Configuration Troubleshooting * Ya-Yunn Su

Mining Version Histories to Guide Software Changes by T. Zimmermann, - PowerPoint PPT Presentation

Mining Version Histories to Guide Software Changes by T. Zimmermann, P. Weigerber, S. Diehl, A. Zeller in IEEE Transaction on Software Engineering, Vol. 31, No. 6., June 2005 The Idea Can we make similar suggestions for software changes?

1 Mining Event Histories Mining Event Histories Sequence Analysis in Social Sciences Sequence

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Vibration Case Histories Vibration Case Histories Barry T. Cease MeadWestvaco 1 9/19/2006

Concepts, Science, Communities: Writing histories of African economic thought Dr Gerardo Serra,

MEDIA HISTORIES 1850-2050 Winter 2017 DESMA 8 Media Histories Dr. Peter Lunenfeld

Making histories, sharing histories: Community-based Archives &amp; Digging Where We Stand Dr

1 2 3 State R&amp;D Graphic, Version 1 Version 1 4 State R&amp;D Graphic, Version 1,

Cement, Aggregates, Mining Presentation Cement, Aggregates and Mining Cement, Aggregates and

Frequent Pattern Mining Frequent Sequence Mining Frequent Tree Mining Christian Borgelt

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

Web MINING Web MINING Overview Overview Dr Ahmed Rafea Rafea Dr Ahmed 1 Web Mining Outline

Welkom Subjects: Introduction Changes version 4 to 5: protocol Changes version 4 to

Evaluation Map Guide Evaluation Map Guide Evaluation Map Guide Evaluation Map Guide Progress on

Semantic Slicing of Software Version Histories Yi Li / U Toronto Julia Rubin / MIT Marsha

CSSS 569 Visualizing Data and Models Lab 2: Intro to L A T EX with Overleaf Kai Ping (Brian)

Typing Directories Kathleen Fisher AT&amp;T Labs Research Joint work with David Walker and Kenny

Continuous-variable quantum computing: scalable designs and fault tolerance Nicolas C

Health Rebecca Jolley, MBA Terry Hill, MPA Executive Director, Rural Health Executive Director,

Wavelet-based CVS method to solve a convection-dominated problem: the numerical simulation of

Empirical Project Monitor and Results from 100 OSS Development Projects Masao Ohira Empirical

Building DICE Building DICE Building DICE Building DICE Packages Packages Packages Packages

Automatically Generating Predicates and Solutions for Configuration Troubleshooting * Ya-Yunn Su

Making histories, sharing histories: Community-based Archives & Digging Where We Stand Dr

1 2 3 State R&D Graphic, Version 1 Version 1 4 State R&D Graphic, Version 1,

Typing Directories Kathleen Fisher AT&T Labs Research Joint work with David Walker and Kenny