NETWORK ANALYSIS: PEOPLE AND OPEN SOURCE COMMUNITIES Dawn M. Foster PhD ¡Student ¡ @geekygirldawn ¡ University ¡of ¡Greenwich ¡ dawn@dawnfoster.com ¡ London, ¡UK fastwonderblog.com
WHOAMI • Geek, traveler, reader • 20 year tech career. Past 15 years doing community & open source (Intel, Jive, Puppet Labs, etc.) • PhD student at University of Greenwich researching Linux Photos by Josh Bancroft, Don Park kernel
WHAT IS NETWORK ANALYSIS? Studies relationships between units and looks for patterns and structure in those relationships Image from ANAMIA Project
AGENDA AND INFO • Gathering your data • Data manipulation for network analysis • Visualization • What else can you do? Image from a Northern Marina Islands Network Scripts, Data, and More: github.com/geekygirldawn/linuxcon_2015
I 💗 METRICS GRIMOIRE MailingListStats aka MLStats CVSAnalY - repos Bicho - bugs More Photo by Bitergia http://metricsgrimoire.github.io/
MLSTATS a) Install mlstats $ python setup.py install b) Create database mysql> create database mlstats; c) Import data by running mlstats $ mlstats --db-user=USERNAME --db-password=PASS http://URLOFYOURLIST
MLSTATS: EXTRACT DATA people sending emails subquery: who SELECT mp.email_address AS sender, they replied to (SELECT mp2.email_address FROM messages m2, messages_people mp2 WHERE m2.is_response_of=m.is_response_of AND mp2.message_id=m2.is_response_of limit 1) AS receiver FROM messages_people mp, messages m WHERE YEAR(m.first_date)=2015 AND for manageable limit time MONTH(m.first_date)=1 AND mp.message_id=m.message_id; data Network Analysis Output for R / Visone: sender@example.com in_reply_to@example.com sender1@example.com in_reply_to1@example.com sender2@example.com in_reply_to2@example.com ...
EXTRACT DATA: SCRIPTS Reformat / clean up data Reproducible Reduce human error linuxcon.py script Image from Mark Grealish github.com/geekygirldawn/linuxcon_2015
R / VISONE Convert data for better use with network analysis Visualize data using RStudio and Visone
Image from WebOps.com
GOURCE Visualize data using Gource
GOURCE CUSTOM FORMAT Pipe Separated File timestamp - A unix timestamp of when the update occured. username - The name of the user who made the update. type - Update type - (A)dded, (M)odified or (D)eleted. file - Path of the file. color - Color for the file in hex (FFFFFF) format (Optional) Examples: 1275543595|andrew|A|src/main.cpp 1275543700|bob|M|src/main.cpp https://github.com/acaudwell/Gource/wiki/Custom-Log-Format
EXAMPLE: a) Extract data using mlstats / database queries b) Generate Gource custom format (pipe sep file) unixtime|user-email_sender|A|new unixtime|user-email_sender|M|user-in_response_to OR) Run linuxcon.py from my linuxcon_2015 repo (a & b) c) Run Gource $ gource -i 10 --max-user-speed 100 -a 1 --highlight-users gource_output.log github.com/geekygirldawn/linuxcon_2015
OTHER OPTIONS Bug data Wikis Other stuff Photo by Bitergia https://github.com/acaudwell/Gource/wiki/Custom-Log-Format
Image from WebOps.com
WHAT ELSE? So many visualization tools Python network packages Network analysis is more than just pretty pictures!
THANK YOU Dawn Foster University of Greenwich Centre for Business Network Analysis www2.gre.ac.uk/about/faculty/business/research/centres/cbna/home @geekygirldawn, dawn@dawnfoster.com fastwonderblog.com
Recommend
More recommend