Applying Social Network Analysis to the Information in CVS Repositories Luis L´ opez-Fern´ andez, Gregorio Robles, Jes´ us M. Gonz´ alez Barahona, GSyC, Universidad Rey Juan Carlos, Madrid, Spain { llopez,grex,jgb } @gsyc.escet.urjc.es MSR 2004 (Edinburgh, UK) 25th May 2004
Background 1 Background There is a lot of (too much?) information about libre software projects out there We’re starting to streamline the extraction of raw data (e.g., from CVS repositories) We have to apply data mining and data interpretation techniques to get meaningful information Let’s explore approaches which were productive in other fields c � 2004 Jes´ us M. Gonz´ alez Barahona Applying Social Network Analysisto the Information in CVS Repositories
Main aims of the study 2 Main aims of the study To advance in the understanding of the social structure of libre soft- ware projects To characterize projects according to this structure To relate the evolution of a project to the evolution of its social structure To explore self-organization in the social structure of libre software projects c � 2004 Jes´ us M. Gonz´ alez Barahona Applying Social Network Analysisto the Information in CVS Repositories
Methodology 3 Methodology Download CVS history information from the repository for a libre software project Extract the information related to who commited what Build with it the commiter and module networks Analyze the resulting networks using social network analysis Extract some conclusions c � 2004 Jes´ us M. Gonz´ alez Barahona Applying Social Network Analysisto the Information in CVS Repositories
The commiter network 4 The commiter network One side of affiliation network Each vertex, a commiter (usually a developer) Edge: when there is contribution to at least one common module Weight of edges: commits by both commiters to all common modules c � 2004 Jes´ us M. Gonz´ alez Barahona Applying Social Network Analysisto the Information in CVS Repositories
The module network 5 The module network Other side of the same affiliation network Each vertex, a module (usually a top-level directory) Edge: when there is at least one common commiter Weight of edges: commits by common commiters to both modules c � 2004 Jes´ us M. Gonz´ alez Barahona Applying Social Network Analysisto the Information in CVS Repositories
Both are a complex mesh 6 Both are a complex mesh Module network for the Apache project, ca. February 2004 c � 2004 Jes´ us M. Gonz´ alez Barahona Applying Social Network Analysisto the Information in CVS Repositories
But they can be characterized 7 But they can be characterized Degree (number of connections per vertex) Weighted degree (in our case, by commits) Distance centrality (proximity to the rest of the network) Betweenness centrality (shortest paths traversing a vertex) Clustering coefficient (connectivity to the neighborhood) Weighted clustering coefficient (in our case, by commits) Community analysis (Girvan-Newman algorithm) c � 2004 Jes´ us M. Gonz´ alez Barahona Applying Social Network Analysisto the Information in CVS Repositories
Apache: connection degree (commiters network) 8 Apache: connection degree (commiters network) 120 100 80 60 40 20 0 0 50 100 150 200 250 300 350 400 450 Degree Apache, circa February 2004 c � 2004 Jes´ us M. Gonz´ alez Barahona Applying Social Network Analysisto the Information in CVS Repositories
Apache and GNOME clustering coefficient (modules network) 9 Apache and GNOME clustering coefficient (modules network) 30 120 25 100 20 80 15 60 10 40 5 20 0 0 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 1.05 cc (clustering coeficient) cc (clustering coeficient) Apache (left), GNOME (right) circa February 2004 c � 2004 Jes´ us M. Gonz´ alez Barahona Applying Social Network Analysisto the Information in CVS Repositories
Apache, GNOME, KDE weighted clustering coefficient (modules network) 10 Apache, GNOME, KDE weighted clustering coefficient (modules network) 30 250 250 25 200 200 20 150 150 15 100 100 10 50 50 5 0 0 0 0 5000 10000 15000 20000 0 20000 40000 60000 80000 100000 120000 140000 0 20000 40000 60000 80000 100000 120000 140000 Weighted clustering coeficient Weighted clustering coeficient Weighted clustering coeficient Apache (left), GNOME (center), KDE (right) circa February 2004 c � 2004 Jes´ us M. Gonz´ alez Barahona Applying Social Network Analysisto the Information in CVS Repositories
Apache connection degree (modules network) 11 Apache connection degree (modules network) 8 12 7 10 6 8 5 4 6 3 4 2 2 1 0 0 0 5 10 15 20 25 30 35 0 10 20 30 40 50 60 70 Degree Degree 14 14 12 12 10 10 8 8 6 6 4 4 2 2 0 0 0 10 20 30 40 50 60 70 80 90 0 20 40 60 80 100 120 Degree Degree 2001 (top left) to 2004 (bottom right) c � 2004 Jes´ us M. Gonz´ alez Barahona Applying Social Network Analysisto the Information in CVS Repositories
Apache modules community analysis (1999.01) 12 Apache modules community analysis (1999.01) c � 2004 Jes´ us M. Gonz´ alez Barahona Applying Social Network Analysisto the Information in CVS Repositories
Apache modules community analysis (2000.01) 13 Apache modules community analysis (2000.01) c � 2004 Jes´ us M. Gonz´ alez Barahona Applying Social Network Analysisto the Information in CVS Repositories
Apache modules community analysis (2000.09) 14 Apache modules community analysis (2000.09) c � 2004 Jes´ us M. Gonz´ alez Barahona Applying Social Network Analysisto the Information in CVS Repositories
Apache modules community analysis (2002.01) 15 Apache modules community analysis (2002.01) c � 2004 Jes´ us M. Gonz´ alez Barahona Applying Social Network Analysisto the Information in CVS Repositories
Apache modules community analysis (2004.02) 16 Apache modules community analysis (2004.02) c � 2004 Jes´ us M. Gonz´ alez Barahona Applying Social Network Analysisto the Information in CVS Repositories
Conclusions 17 Conclusions Methodology for studying the structure of libre software projects Captures both relationships between modules and commiters First step to community analysis Access to traditional social network analysis tools Further work: characterization of projects c � 2004 Jes´ us M. Gonz´ alez Barahona Applying Social Network Analysisto the Information in CVS Repositories
Recommend
More recommend