overview tracking structural evolution using origin
play

Overview Tracking Structural Evolution using Origin Analysis Open - PDF document

Overview Tracking Structural Evolution using Origin Analysis Open questions in software evolution research Motivation University of Waterloo Origin analysis and Beagle Michael Godfrey and Qiang Tu Efficiency


  1. Overview Tracking Structural Evolution using Origin Analysis • Open questions in software evolution research • Motivation University of Waterloo • “Origin analysis” and Beagle Michael Godfrey and Qiang Tu • Efficiency considerations • An example Software Architecture Group (SWAG) • Open questions in origin analysis University of Waterloo 20 May, 2002 IWPSE-02 2 Some open questions Some open questions • Philosophical: • Practical: – Does software evolve in the same way as frogs and social structures? – What information do developers need to know about how a software • The Nature of Economies , by Jane Jacobs system has evolved? – What are the recurring patterns and compelling metaphors of software – What kinds of tools would be useful: evolution? • to the front-line developer? • to the manager? • Methodological: – How best to deal with: – How to measure size? ∗ many_versions ) • Large data sets ( large_system • How to correlate size and quality? • Visualization and navigation – How to measure change? • How to model architectural change? – What is the predictive power of such models? • Do the “other phenomena” dominate? 20 May, 2002 IWPSE-02 3 20 May, 2002 IWPSE-02 4 Motivation Motivation • Want to build tools to aid developers in understanding how • This also begs the question of software artifact ontology: software evolves. – What are the software entities/artifacts of interest in evolutionary studies? – Change can be mostly additive … or much more invasive • All CVSd things? • “Hard” machine processable things, like source code files? • User docs, requirements docs, …? • Building an accurate model of how a system has evolved is • Atomic vs. composite things? hard in the presence of refactoring, redesign, structural and (subsystems vs. files vs. classes vs. methods) architectural change. – Usual assumption: – What does it mean for an artifact/entity to be a different version of an older artifact/entity? • A change in name/location of a software entity means the old one died and a new one was born • Same name? file? location? CVS control? – … which means that “structural” discontinuities break old models of • “Because I say so”? the system, and cause useful knowledge to be lost. 20 May, 2002 IWPSE-02 5 20 May, 2002 IWPSE-02 6

  2. “Origin analysis” The Beagle tool [IWPC-02] V new z Design goals: Suppose that: f – f is the name of a software entity ( e.g., • Support browsing of function, type, global variable) of evolutionary histories of x y version V new of a software system. software systems – There is no entity of the same • Visual navigation and name/kind in the previous version V old querying ??? • Architectural-level modelling We define origin analysis as the process V old • Compare system snapshots z of deciding: • Support identification and – if f was newly introduced in V new , or g detection of change patterns – if it should be more accurately viewed x as a changed/moved/ renamed version y of a differently named entity of V old 20 May, 2002 IWPSE-02 7 20 May, 2002 IWPSE-02 8 The Beagle tool Origin analysis: Two techniques [IWPC-02] 1. Entity analysis (i.e., metrics-based “Bertillonage”) At system check-in: For each “added” entity f : – • Populate database with • Calculate combined Euclidean distance from each “deleted” entity for “facts” and metrics info from five metrics [Kostas]. various tools. • Select top k matches; compare entity names. grok scripts “lift” facts to • file/ subsystem /architectural level. 2. Relationship analysis (e.g., calls, is-called-by, refs) For each “added” entity f : – At runtime: Find R f , set of all entities that call f that are present in both versions. • • SWAGkit (PBS) engine for For each g ∈ R f , calculate Q g , set of all “deleted” entities that g calls • in the old version. visualization/navigation. Look at intersection of the Q g s; these are good candidates. • • Java-based infrastructure using DB/2, VA-Java, IBM- Websphere. 20 May, 2002 IWPSE-02 9 20 May, 2002 IWPSE-02 10 Efficiency considerations Efficiency considerations • When comparing V new to V old, need to find the entities that seem • Entity analysis: to have been added and deleted. – Entity info is generated by fact extractor and metrics tool. – These sets are fast to determine. • Info is generated only once per version, when system is checked into repository. – Most subsequent calculations involve only these small subsets of the – Performing entity analysis is a matter of a simple numerical calculation entire entity space (plus the other entities they have “relationships” on a small set of “likely candidates”. with). • Relationship analysis: • Computationally expensive approaches for clone detection – Relationship info ( who-calls-whom , who-inherits-from-whom , etc. ) is ( e.g., graph matching) were not considered. generated by fact extractor. – Can’t pre-compute easily. • Info is generated only once per version, when system is checked into – Precise matching not worth the effort, as it doesn’t seem to help much repository. for this task. – Computation and comparison of relational images is fairly fast. • Special-purpose tool ( grok ) and relatively small amount of data. 20 May, 2002 IWPSE-02 11 20 May, 2002 IWPSE-02 12

  3. Case study: gcc/g++/egcs Case study: gcc/g++/egcs • Have extracted full info for 29 versions of gcc/g++/egcs • Example: File # Fcns # New # Old % New – Want to examine major breaks in development to see how well origin 9 9 0 100% gcc/cp/errfn.c analysis works. – The EGCS 1.0 Parser 59 57 2 97% gcc/cp/pt.c gcc/except.c 55 52 3 95% subsystem contains 15 (non- gcc/cp/decl2.c 57 50 7 88% trivial) implementation files, gcc/c-lang.c 16 14 2 88% • EGCS v1.0 was forked from the GCC v2.7.2.3 codebase comprising 848 functions. gcc/cp/method.c 30 26 4 87% – EGCS project goals: gcc/cp/except.c 25 20 5 80% gcc/cp/decl.c 134 84 50 63% • C++ compiler more ANSI compliant, – Using origin analysis and gcc/cp/error.c 31 16 15 52% • new FORTRAN front-end, 61 31 30 51% common sense, Qiang decided gcc/cp/class.c 81 40 41 49% gcc/cp/search.c • new optimizations and code-generation algorithms, … that about half of the “new” 70 29 41 41% gcc/c-decl.c functions weren’t new. 44 15 29 34% gcc/fold-const.c 167 17 150 10% gcc/objc/objc-act.c – … and EGCS introduced a new directory structure and a new file 9 0 9 0% gcc/c-aux-info.c naming scheme, in addition to all of the other redesign and – That’s still a massive amount TOTAL 848 460 388 54% restructuring. of change for a new release of a compiler! – Naïve analysis indicated “everything old is new again” � 20 May, 2002 IWPSE-02 13 20 May, 2002 IWPSE-02 14 Origin analysis: Open issues • Origin analysis is a semi-automatic technique; it requires human intervention to make intelligent decisions. – In general, there’s no ultimate arbiter of correctness/appropriateness. – Techniques are fast and approximate. • Bertillonage, not DNA comparison • What are the most effective ways of performing entity and relationship analysis? – Which metrics? Which relationships? How best to combine them all? – Requires case studies, validation. • What is the best way to consider composite software entities? ( e.g., files, classes, subsystems) – Can evaluate as atoms, or – Can simply use hints from contained entities. 20 May, 2002 IWPSE-02 15

Recommend


More recommend