Conceptual Framework for Agent- Conceptual Framework for Agent- Based Modeling and Simulation: Based Modeling and Simulation: The Computer Experiment The Computer Experiment Yongqin Gao Vincent Freeh Greg Madey Yongqin Gao Vincent Freeh Greg Madey CSE Department CS Department CSE Department CSE Department CS Department CSE Department University of Notre Dame NCSU University of Notre Dame University of Notre Dame NCSU University of Notre Dame NAACSOS Conference NAACSOS Conference Pittsburgh, PA Pittsburgh, PA June 25, 2003 June 25, 2003 Supported in part by the Supported in part by the National Science Foundation - Digital Society & Technology Program National Science Foundation - Digital Society & Technology Program
The Computer Experiment The Computer Experiment
Agent-Based Simulation as Agent-Based Simulation as a Component of the a Component of the Scientific Method Scientific Method Modeling (Hypothesis) Social Network Model of F/OSS Agent -Based Observation Simulation (Experiment) Analysis of Grow Artificial SourceForge SourceForge Data
Outline Outline • Investigation: Free/Open Source Software (F/OSS) • Investigation: Free/Open Source Software (F/OSS) • Conceptual framework(s) • Conceptual framework(s) • Model description • Model description • ER model • ER model • BA model • BA model • BA model with constant fitness • BA model with constant fitness • BA model with dynamic fitness • BA model with dynamic fitness • Summary • Summary
Open Source Software (OSS) Open Source Software (OSS) GNU Linux • … • Free … Free – – to view source to view source – – to modify to modify – to share – to share – of cost – of cost Savannah • • Examples Examples – – Apache Apache – – Perl Perl – – GNU GNU – – Linux Linux – – Sendmail Sendmail – – Python Python – – KDE KDE – – GNOME GNOME – – Mozilla Mozilla – – Thousands more Thousands more
Free Open Source Software (F/OSS) Free Open Source Software (F/OSS) • Development • Development – Mostly volunteer – Mostly volunteer – Global teams – Global teams – Virtual teams – Virtual teams – Self-organized - often peer-based meritocracy – Self-organized - often peer-based meritocracy – Self-managed - but often a “ charismatic ” leader – Self-managed - but often a “ charismatic ” leader – Often large numbers of developers, testers, support help, end – Often large numbers of developers, testers, support help, end user participation user participation – Rapid, frequent releases – Rapid, frequent releases – Mostly unpaid – Mostly unpaid
Typical Typical Charismatic Charismatic Leaders? Leaders? Larry Wall Perl Linus Tolvalds Richard Stallman Linux GNU Manifesto Eric Raymond Cathedral and Bazaar
F/OSS: Significance F/OSS: Significance • • • Contradicts traditional wisdom: • Research issues: Contradicts traditional wisdom: Research issues: – – Software engineering Software engineering – Understanding motives – Understanding motives – Coordination, large numbers – Coordination, large numbers – – Understanding processes Understanding processes – Motivation of developers – Motivation of developers – Intellectual property – Intellectual property – Quality – Quality – Digital divide – Digital divide – Security – Security – Self-organization – Self-organization – Business strategy – Business strategy – Government policy – Government policy • • Almost everything is done Almost everything is done – Impact on innovation – Impact on innovation electronically and available in electronically and available in digital form digital form – Ethics – Ethics • • Opportunity for Social Science Opportunity for Social Science – Economic models – Economic models Research -- large amounts of online Research -- large amounts of online – – Cultural issues Cultural issues data available data available – International factors – International factors
SourceForge SourceForge • VA Software • Part of OSDN • Started 12/1999 • Collaboration tools • 58,685 Projects • 80,000 Developers • 590,00 Registered Users
Savannah Savannah • Uses SourceForge Software • Free Software Foundation •1,508 Projects •15,265 Registered Users
F/OSS: Importance Major Component of e-Technology Infrastructure with major presence in e-Commerce e-Science e-Government e-Learning Apache has over 65% market share of Internet Web servers Linux on over 7 million computers Most Internet e-mail runs on Sendmail Tens of thousands of quality products Part of product offerings of companies like IBM, Apple Apache in WebSphere, Linux on mainframe, FreeBSD in OSX Corporate employees participating on OSS projects
Free/Open Source Software Free/Open Source Software • Seems to challenge traditional economic assumptions • Seems to challenge traditional economic assumptions • Model for software engineering • Model for software engineering • New business strategies • New business strategies – Cooperation with competitors – Cooperation with competitors – Beyond trade associations, shared industry research, and – Beyond trade associations, shared industry research, and — shared product development! standards processes — shared product development! standards processes • Virtual, self-organizing and self-managing teams • Virtual, self-organizing and self-managing teams • Social issues, e.g., digital divide, international • Social issues, e.g., digital divide, international participation participation • Government policy issues, e.g., US software industry, • Government policy issues, e.g., US software industry, impact on innovation, security, intellectual property impact on innovation, security, intellectual property
Research Model Research Model Conceptual Explanatory Model of Parameter Values OSS: Agent-Based Modeling and Simulation Structural Features Understanding the Cross Validation Social and Task Combined Data Mining Dynamics that Predict Parameter Values Developer Behaviors Structural Features Social Network Analysis : Longitudinal Study of Preferential Parameter Values Attachment and Dynamic Attachment
— Monthly Data Collection — Monthly Data Collection • • Web crawler (scripts) Web crawler (scripts) – Python Python – PROJ|DEVELOPER – Perl – Perl 8001|dev378 – AWK AWK – – Sed 8001|dev8975 – Sed • • Monthly Monthly 8001|dev9972 8002|dev27650 • Since Jan 2001 • Since Jan 2001 8005|dev31351 • • ProjectID ProjectID 8006|dev12509 • DeveloperID • DeveloperID 8007|dev19395 • Almost 2 million records • Almost 2 million records 8007|dev4622 • • Relational database Relational database 8007|dev35611 8008|dev8975
F/OSS Developers - Social Network Component Developers are nodes / Projects are links 24 Developers 5 Projects Project 7597 2 Linchpin Developers 1 Cluster dev[64] dev[72] dev[67] Project 6882 Project 7028 dev[52] dev[65] dev[70] dev[57] 7597 dev[46] 6882 dev[47] dev[64] dev[45] dev[52] dev[99] 7597 dev[46] 7597 dev[46] dev[72] dev[67] 7597 dev[46] 6882 dev[47] dev[47] dev[55] dev[55] dev[55] 7597 dev[46] 7028 dev[46] dev[70] 7597 dev[46] 7028 dev[46] dev[57] dev[61] dev[45] dev[99] dev[51] 7597 dev[46] 6882 dev[47] 7028 dev[46] 6882 dev[58] dev[61] dev[51] dev[79] dev[47] 7597 dev[46] dev[58] dev[58] dev[46] 9859 dev[46] dev[54] dev[54] 15850 dev[46] dev[58] 9859 dev[46] dev[79] dev[58] 9859 dev[46] dev[49] dev[53] 9859 dev[46] 15850 dev[46] dev[59] 15850 dev[46] dev[56] dev[83] 15850 dev[46] dev[49] dev[48] dev[53] dev[59] dev[56] dev[83] Project 9859 dev[48] Project 15850
Models of the F/OSS Social Network Models of the F/OSS Social Network (Alternative Hypotheses) (Alternative Hypotheses) • • General model features General model features – Agents are nodes on a graph (developers or projects) – Agents are nodes on a graph (developers or projects) – Behaviors: Create, join, abandon and idle – Behaviors: Create, join, abandon and idle – Edges are relationships (joint project participation) – Edges are relationships (joint project participation) – Growth of network: random or types of preferential – Growth of network: random or types of preferential attachment, formation of clusters attachment, formation of clusters – Fitness – Fitness – Network attributes: diameter, average degree, power law, – Network attributes: diameter, average degree, power law, clustering coefficient clustering coefficient • • Four specific models Four specific models – ER (random graph) – ER (random graph) – BA (scale free) – BA (scale free) – BA ( + constant fitness) – BA ( + constant fitness) – – BA ( + dynamic fitness) BA ( + dynamic fitness)
– degree distribution ER model – degree distribution ER model • Degree distribution is binomial distribution while it is power law in empirical data • R 2 = 0.9712 for developer network • R 2 = 0.9815 for project network
Recommend
More recommend