Overview Tracking Structural Evolution using Origin Analysis • Open questions in software evolution research • Motivation University of Waterloo • “Origin analysis” and Beagle Michael Godfrey and Qiang Tu • Efficiency considerations • An example Software Architecture Group (SWAG) • Open questions in origin analysis University of Waterloo 20 May, 2002 IWPSE-02 2 Some open questions Some open questions • Philosophical: • Practical: – Does software evolve in the same way as frogs and social structures? – What information do developers need to know about how a software • The Nature of Economies , by Jane Jacobs system has evolved? – What are the recurring patterns and compelling metaphors of software – What kinds of tools would be useful: evolution? • to the front-line developer? • to the manager? • Methodological: – How best to deal with: – How to measure size? ∗ many_versions ) • Large data sets ( large_system • How to correlate size and quality? • Visualization and navigation – How to measure change? • How to model architectural change? – What is the predictive power of such models? • Do the “other phenomena” dominate? 20 May, 2002 IWPSE-02 3 20 May, 2002 IWPSE-02 4 Motivation Motivation • Want to build tools to aid developers in understanding how • This also begs the question of software artifact ontology: software evolves. – What are the software entities/artifacts of interest in evolutionary studies? – Change can be mostly additive … or much more invasive • All CVSd things? • “Hard” machine processable things, like source code files? • User docs, requirements docs, …? • Building an accurate model of how a system has evolved is • Atomic vs. composite things? hard in the presence of refactoring, redesign, structural and (subsystems vs. files vs. classes vs. methods) architectural change. – Usual assumption: – What does it mean for an artifact/entity to be a different version of an older artifact/entity? • A change in name/location of a software entity means the old one died and a new one was born • Same name? file? location? CVS control? – … which means that “structural” discontinuities break old models of • “Because I say so”? the system, and cause useful knowledge to be lost. 20 May, 2002 IWPSE-02 5 20 May, 2002 IWPSE-02 6
“Origin analysis” The Beagle tool [IWPC-02] V new z Design goals: Suppose that: f – f is the name of a software entity ( e.g., • Support browsing of function, type, global variable) of evolutionary histories of x y version V new of a software system. software systems – There is no entity of the same • Visual navigation and name/kind in the previous version V old querying ??? • Architectural-level modelling We define origin analysis as the process V old • Compare system snapshots z of deciding: • Support identification and – if f was newly introduced in V new , or g detection of change patterns – if it should be more accurately viewed x as a changed/moved/ renamed version y of a differently named entity of V old 20 May, 2002 IWPSE-02 7 20 May, 2002 IWPSE-02 8 The Beagle tool Origin analysis: Two techniques [IWPC-02] 1. Entity analysis (i.e., metrics-based “Bertillonage”) At system check-in: For each “added” entity f : – • Populate database with • Calculate combined Euclidean distance from each “deleted” entity for “facts” and metrics info from five metrics [Kostas]. various tools. • Select top k matches; compare entity names. grok scripts “lift” facts to • file/ subsystem /architectural level. 2. Relationship analysis (e.g., calls, is-called-by, refs) For each “added” entity f : – At runtime: Find R f , set of all entities that call f that are present in both versions. • • SWAGkit (PBS) engine for For each g ∈ R f , calculate Q g , set of all “deleted” entities that g calls • in the old version. visualization/navigation. Look at intersection of the Q g s; these are good candidates. • • Java-based infrastructure using DB/2, VA-Java, IBM- Websphere. 20 May, 2002 IWPSE-02 9 20 May, 2002 IWPSE-02 10 Efficiency considerations Efficiency considerations • When comparing V new to V old, need to find the entities that seem • Entity analysis: to have been added and deleted. – Entity info is generated by fact extractor and metrics tool. – These sets are fast to determine. • Info is generated only once per version, when system is checked into repository. – Most subsequent calculations involve only these small subsets of the – Performing entity analysis is a matter of a simple numerical calculation entire entity space (plus the other entities they have “relationships” on a small set of “likely candidates”. with). • Relationship analysis: • Computationally expensive approaches for clone detection – Relationship info ( who-calls-whom , who-inherits-from-whom , etc. ) is ( e.g., graph matching) were not considered. generated by fact extractor. – Can’t pre-compute easily. • Info is generated only once per version, when system is checked into – Precise matching not worth the effort, as it doesn’t seem to help much repository. for this task. – Computation and comparison of relational images is fairly fast. • Special-purpose tool ( grok ) and relatively small amount of data. 20 May, 2002 IWPSE-02 11 20 May, 2002 IWPSE-02 12
Case study: gcc/g++/egcs Case study: gcc/g++/egcs • Have extracted full info for 29 versions of gcc/g++/egcs • Example: File # Fcns # New # Old % New – Want to examine major breaks in development to see how well origin 9 9 0 100% gcc/cp/errfn.c analysis works. – The EGCS 1.0 Parser 59 57 2 97% gcc/cp/pt.c gcc/except.c 55 52 3 95% subsystem contains 15 (non- gcc/cp/decl2.c 57 50 7 88% trivial) implementation files, gcc/c-lang.c 16 14 2 88% • EGCS v1.0 was forked from the GCC v2.7.2.3 codebase comprising 848 functions. gcc/cp/method.c 30 26 4 87% – EGCS project goals: gcc/cp/except.c 25 20 5 80% gcc/cp/decl.c 134 84 50 63% • C++ compiler more ANSI compliant, – Using origin analysis and gcc/cp/error.c 31 16 15 52% • new FORTRAN front-end, 61 31 30 51% common sense, Qiang decided gcc/cp/class.c 81 40 41 49% gcc/cp/search.c • new optimizations and code-generation algorithms, … that about half of the “new” 70 29 41 41% gcc/c-decl.c functions weren’t new. 44 15 29 34% gcc/fold-const.c 167 17 150 10% gcc/objc/objc-act.c – … and EGCS introduced a new directory structure and a new file 9 0 9 0% gcc/c-aux-info.c naming scheme, in addition to all of the other redesign and – That’s still a massive amount TOTAL 848 460 388 54% restructuring. of change for a new release of a compiler! – Naïve analysis indicated “everything old is new again” � 20 May, 2002 IWPSE-02 13 20 May, 2002 IWPSE-02 14 Origin analysis: Open issues • Origin analysis is a semi-automatic technique; it requires human intervention to make intelligent decisions. – In general, there’s no ultimate arbiter of correctness/appropriateness. – Techniques are fast and approximate. • Bertillonage, not DNA comparison • What are the most effective ways of performing entity and relationship analysis? – Which metrics? Which relationships? How best to combine them all? – Requires case studies, validation. • What is the best way to consider composite software entities? ( e.g., files, classes, subsystems) – Can evaluate as atoms, or – Can simply use hints from contained entities. 20 May, 2002 IWPSE-02 15
Recommend
More recommend