Motivation � Underlying question : How does software change ? � In : Two versions of a program � Out : Picture of changes � Relevance � Software development � Software engineering 1 Understanding Source Code Evolution Using Abstract Syntax Tree Matching
Objective and Approach � Summarize C program changes � Functions (body AST, prototype) � Global variables (type and initializer) � Types � Structs/Unions (fields deleted / added / type changed) � Typedefs � Enums � Our Approach: AST matching � Accurate; handles renamings � Scales to real-world applications; e.g., Apache, Linux kernel, OpenSSH 2 Understanding Source Code Evolution Using Abstract Syntax Tree Matching
Raw Output struct "net_device": 1 fields changed type: “accept_fastpath” struct "reiserfs_journal": 1 fields deleted: “j_dummy_inode” struct "reiserfs_journal": 1 fields added: “j_dirty_buffers” function "block_read_full_page": 1 arguments changed type: “get_block” function "ext2_readdir": 1 arguments changed type: “filldir___0” + function “inetdev_changename” + function “__ide_dma_good_drive” + function “ide_unplugged_outbsync” + function “inode_init_once” - function “target_cpus” - function “ide_dmafunc_verbose” + typedef “cisco_proto” - typedef “ide_ioctl_proc” + global var “idecd” Linux 2.4.20 vs 2.4.21 3 Understanding Source Code Evolution Using Abstract Syntax Tree Matching
The Renaming Problem Same program, syntactic changes only typedef int sz_t; typedef int size_t; struct foo { struct bar { int i; int i; }; }; int count; int counter; void f(int a) { void f(int b) { struct foo sf; struct bar sb; sz_t c = 2; size_t d = 2; sf.i = a + c; sf.i = b + d; count++; counter++; } } Version 1 Version 2 4 Understanding Source Code Evolution Using Abstract Syntax Tree Matching
Abstract Syntax Tree Matching Compare ASTs for functions with same name Program Prog ram AST AST AST 1 AST 1 Parsing Chang Changes Version 1 Version 1 Traversal Traversal Renaming & Detection Map Change Statisti Statistics cs Generation Detection Program Program AST 2 AST 2 Parsing Version 2 Version 2 AST Matching 5 Understanding Source Code Evolution Using Abstract Syntax Tree Matching
AST Traversal - Name Map Generation Name Map f f a a b b c d sf c sf sb sb d count counter c= 2 d= 2 sf.i= a+ c sb.i= b+ d count+ + counter+ + void f(int a) { void f(int b) { struct foo sf; struct bar sb; sz_t c = 2; size_t d = 2; sf.i = a + c; sf.i = b + d; count++; counter++; } } Version 1 Version 2 6 Understanding Source Code Evolution Using Abstract Syntax Tree Matching
AST Traversal - Type Map Generation Type Map f int int f struct foo b : int struct bar a : int sz_t size_t sb : struct bar sf : struct foo d : size_t c : sz_t void f(int a) { void f(int b) { struct foo sf; struct bar sb; sz_t c = 2; size_t d = 2; sf.i = a + c; sf.i = b + d; count++; counter++; } } Version 1 Version 2 7 Understanding Source Code Evolution Using Abstract Syntax Tree Matching
Abstract Syntax Tree Matching Name/Type Maps -> Name/Type Bijections Traverse the ASTs in parallel, computing changes Prog Program ram AST AST AST 1 AST 1 Parsing Changes Chang Version 1 Version 1 Traversal Traversal Renaming & Detection Map Change Statistics Statisti cs Generation Detection Program Program AST 2 AST 2 Parsing Version 2 Version 2 AST Matching A renamed to B iff • A B in the map • A deleted • B added 8 Understanding Source Code Evolution Using Abstract Syntax Tree Matching
AST Traversal - Change Detection struct foo : struct foo struct foo field i changed type: int -> long long field e added i : int i : long long f : sz_t f : size_t e : double sz_t size_t typedef int sz_t; typedef int size_t; struct foo { struct foo { int i; long long i; sz_t f; size_t f; } double e; } Version 1 Version 2 9 Understanding Source Code Evolution Using Abstract Syntax Tree Matching
Implementation � Parsing via CIL toolkit � Merges whole program into single, preprocessed file � Fast � Scales linearly, 400.000 LOC in 1 minute � Generates different output formats � Raw differences, summaries, density trees 10 Understanding Source Code Evolution Using Abstract Syntax Tree Matching
Summary Statistics ------- Functions ------- ------- Structs/ Unions ------- ------- Typedefs ------- Version1 : 7697 Version1 : 1214 Version1 : 487 Version2 : 7881 Version2 : 1233 Version2 : 469 added : 232 added : 17 added : 13 deleted : 48 deleted : 1 deleted : 31 locals/ formals changed name : 3 field type changes : 15 base type changes : 2 arguments type changes : 19 field count changes : 19 return types changes : 15 ------- Global Variables --- ------- Enums ------- Version1 : 8027 Version1 :33 Version2 : 8074 Version2 : 31 added : 43 deleted : 2 deleted : 16 item count changes : 1 var type changes : 11 var exp changes : 20 var val changes : 51 Linux 2.4.20 vs 2.4.21 11 Understanding Source Code Evolution Using Abstract Syntax Tree Matching
Density Trees / : 1 11 i nc lude / : 101 l i nux / : 96 f s .h : 4 i de . h : 80 r e iser fs_ fs_sb .h : 1 r e iser fs_ fs_ i . h : 2 sched .h : 1 w i rel ess .h : 1 hdreg .h : 7 ne t / : 2 t cp . h : 1 sock .h : 1 asm- i386 / : 3 i o_ap ic .h: 3 d r i ve rs / : 9 char / : 1 agp / : 1 a gp .h : 1 i de / : 8 i de - pc i .c: 8 n e t / : 1 i pv4 / : 1 i p_ f r agment .c : 1 Linux 2.4.20 vs 2.4.21 Struct/Union field additions 12 Understanding Source Code Evolution Using Abstract Syntax Tree Matching
Case Studies: OpenSSH, Vsftpd, Apache Functions & global variables: how often added and deleted? • OpenSSH changes most frequently • Deletions infrequent, relative to additions 13 Understanding Source Code Evolution Using Abstract Syntax Tree Matching
Case Studies: OpenSSH, Vsftpd, Apache How often do function bodies and prototypes change? • Function bodies do change a lot • Function prototypes do not change much 14 Understanding Source Code Evolution Using Abstract Syntax Tree Matching
Related Approaches � Standard diff � Low-level � Verbose: Linux 2.4.20-> 2.4.21 patch : 21MB � Release notes � High level � Possibly incomplete 15 Understanding Source Code Evolution Using Abstract Syntax Tree Matching
Summary � Approach for reporting changes to C programs � AST-matching � Variety of changes at several levels of detail � Accurate � Scalable � Soon to be available at http://www.cs.umd.edu/~ neamtiu/evolution 16 Understanding Source Code Evolution Using Abstract Syntax Tree Matching
Recommend
More recommend