Who is going to Mentor Newcomers in Open Source Projects? Gerardo Massimiliano Rocco Sebastiano Canfora Di Penta Oliveto Panichella
C ontext and M otivations • Software Development H ow? • Training via Mentoring C ase S tudy • Explorative analysis • Recommendation system evaluation
Training Project Newcomers With a GOOD TRAINING Can immediately start to work ACTIVELY Newcomer
Previous Work... Newcomer Low Sociability Better training from Senior Developers Zhou and Mockus
Previous Work... Newcomer MENTOR Mentoring of project newcomers is highly desirable Dagenais et al.
Characteristics of a good Mentor
Sources of Information SVN GIT CVS
Mentoring Small/large Projects Small Projects: find Mentors is a trivial problem Large Projects: : find Mentors is not a trivial problem .........
YODA (Young and newcOmer Developer Assistant) Approach for Mentors Identification in Open Source Projects
YODA: two pashes What factor can be used to identify mentors? ? SVN GIT CVS
RQ1: Identify past mentors What factor can be used to identify mentors?
How does Arnetminer works? Ranks pairs of researchers according to four factors: f1 : they published Many papers Together f2 : advisor published More than the Student f3 : advisor older than the student f4 : student published her first paper(s) with the advisor
Heuristics to identify Mentors F1: Exchanged emails Time
Heuristics to identify Mentors F1: Exchanged emails When Alice joins Time the project
Heuristics to identify Mentors F2: overall amount of emails
Heuristics to identify Mentors F2: overall amount of emails
Heuristics to identify Mentors F2: overall amount of emails
Heuristics to identify Mentors F3: project age
Heuristics to identify Mentors F3: project age Time
Heuristics to identify Mentors F4: newcomer early emails
Heuristics to identify Mentors F4: newcomer early emails First emails by Alice in the project Time
Heuristics to identify Mentors F5: Commits
Heuristics to identify Mentors F5: Commits When Alice joins the project Time
Aggregating the factors What factors can be used 5 to identify mentors? w i f i i 1
Recommend Mentors Time
Recommend Mentors Time
Recommend Mentors t Time
Recommend Mentors Mentor with adequate skills t Time
Recommend Mentors Inspired to the work On Bug Triaging by J. Anvik et al. 2011 Time
Recommend Mentors Inspired to the work On Bug Triaging by J. Anvik et al. 2011 t Time
Recommend Mentors Inspired to the work On Bug Triaging by J. Anvik et al. 2011 t Time
Recommend Mentors Inspired to the work On Bug Triaging by J. Anvik et al. 2011 t Time
Recommend Mentors Inspired to the work On Bug Triaging by J. Anvik et al. 2011 t Time
Recommend Mentors Inspired to the work On Bug Triaging by J. Anvik et al. 2011 t Time DICE SIMILARITY
Empirical Study • Goal: analyze data from mailing lists and versioning systems • Purpose: investigating which factors can be used to identify mentors • Quality focus: recommend mentors in software projects • Context: mailing lists and versioning systems of five software Apache, FreeBSD, PostgreSQL, Python and Samba
Context Training and Test sets for evaluating Yoda. Apache FreeBSD PostgreSQL Python Samba Period 08/2001-03/2002 11/1998-02/2000 10/1998-05/2001 05/2000-05/2001 04/1998-09/2000 (Training set) Period 04/2002-12/2008 03/2000-10/2008 06/2001-03/2008 06/2001-12/2008 10/2000-12/2008 (Test set) # of Mentors 19 65 10 28 17 (Training set) # of 13 33 8 32 33 Newcomers (Training set) # of 13 33 7 31 33 Newcomers (Test set)
Research Questions ?
RQ1: How can we identify mentors from the past history of a software project? COUPLES SCORE 5 2.5 w i f i i 1 1.5 1.5 1.5 1.5 1.5 ………. ……….
RQ1: How can we identify mentors from the past history of a software project? COUPLES SCORE 5 2.5 w i f i i 1 1.5 Manual 1.5 Validation 1.5 1.5 1.5 ………. ……….
RQ1: How can we identify mentors from the past history of a software project? Possible Configurations 100% 90% f1 80% 70% Precision 60% 50% 40% 30% 20% 10% 0% 18 19 20 21 22 23 24 Number of newcomer‐mentor pairs
RQ1: How can we identify mentors from the past history of a software project? Possible Configurations 100% 90% f1 +f2+ f3 80% 70% Precision 60% 50% 40% 30% 20% 10% 0% 18 19 20 21 22 23 24 Number of newcomer‐mentor pairs
RQ1: How can we identify mentors from the past history of a software project? Possible Configurations 100% 90% f1 +f2+ f4 80% 70% Precision 60% 50% 40% 30% 20% 10% 0% 18 19 20 21 22 23 24 Number of newcomer‐mentor pairs
RQ1: How can we identify mentors from the past history of a software project? Possible Configurations 100% 90% f5 80% 70% Precision 60% 50% 40% 30% 20% 10% 0% (Baseline) 18 19 20 21 22 23 24 Number of newcomer‐mentor pairs
RQ1: How can we identify mentors from the past history of a software project? PostgreSQL Apache 100% 100% 90% 90% 80% 80% 70% 70% Precision Precision 60% 60% 50% 50% 40% 40% 30% 30% 20% 20% 10% 10% 0% 0% 18 19 20 21 22 23 24 12 14 16 18 20 22 Number of newcomer‐mentor pairs Number of newcomer‐mentor pairs f1 f1 +f2+ f3 f1 +f2+ f4 f5 (Baseline)
RQ1: How can we identify mentors from the past history of a software project? PostgreSQL Apache 100% 100% 90% 90% 80% 80% 70% 70% Precision Precision 60% 60% 50% 50% 40% 40% 30% 30% 20% 20% 10% 10% 0% 0% 18 19 20 21 22 23 24 12 14 16 18 20 22 Number of newcomer‐mentor pairs Number of newcomer‐mentor pairs f1 f1 +f2+ f3 f1 +f2+ f4 f5 (Baseline)
RQ1: How can we identify mentors from the past history of a software project? Python FreeBSD 100% 100% 90% 90% 80% 80% 70% Precision 70% Precision 60% 60% 50% 50% 40% 40% 30% 30% 20% 20% 10% 10% 0% 0% 23 25 27 29 31 33 35 37 39 41 24 26 28 30 32 34 36 38 40 42 44 46 48 Number of newcomer‐mentor pairs Number of newcomer‐mentor pairs Samba 100% 90% 80% 70% Precision 60% 50% 40% 30% 20% 10% 0% 30 32 34 36 38 40 42 Number of newcomer‐mentor pairs
RQ1: How can we identify mentors from the past history of a software project? Python FreeBSD 100% 100% 90% 90% 80% 80% 70% Precision 70% Precision 60% 60% 50% 50% 40% 40% USEFUL FACTORS FOR 30% 30% 20% 20% MENTORS IDENTIFICATION 10% 10% 0% 0% 23 25 27 29 31 33 35 37 39 41 24 26 28 30 32 34 36 38 40 42 44 46 48 f1 Number of newcomer‐mentor pairs Number of newcomer‐mentor pairs Samba 100% 0.5*f1 + 0.25*f2 + 0.25*f3 90% 80% 0.5*f1 + 0.25*f2 + 0.25*f4 70% Precision 60% 50% 40% 30% 20% 10% 0% 30 32 34 36 38 40 42 Number of newcomer‐mentor pairs
RQ2: To what extent would it be possible to recommend mentors to newcomers joining a software project?
RQ2: To what extent would it be possible to recommend mentors to newcomers joining a software project?
RQ2: To what extent would it be possible to recommend mentors to newcomers joining a software project? YODA make it is possible possible to recommend Mentors
Why don’t just using Top Committers?
Why don’t just using Top Committers?
Why don’t just using Top Committers? Not all Committers Are Good Mentors
Surveying Projects Developers Questions Asked: - Done/received mentoring - Perceived importance of mentoring - What makes a good Mentor
Sent to 114 Subjects… ..... Samba 37 ..... FreeBSD 37 ..... Postgre- SQL 15 ..... Python 23 ..... Apache 23
Obtained Answare … Samba FreeBSD Postgre- SQL Python - Apache
Done/received mentoring? Had a mentor? 58% 42% Did mentoring? 92% 8% 0% 20% 40% 60% 80% 100% YES NO
Done/received mentoring? Had a mentor? 58% 42% Yes, I received Yes, I did Mentoring. My mentoring… mentor was… > Did mentoring? 92% 8% 0% 20% 40% 60% 80% 100% YES NO
Perceived importance of mentoring 0% Useless at all 0% 0% Not important 0% 11% Neutral 45% 56% Important 36% 33% Very important 18% 0% 20% 40% 60% Effect of mentor Effect on newcomer
Perceived importance of mentoring 0% Useless at all 0% 0% Not important 0% 11% Neutral 45% 56% Important 36% 33% Very important 18% 0% 20% 40% 60% Effect of mentor Effect on newcomer
Perceived importance of mentoring 0% Useless at all 0% 0% Not important 0% 11% Neutral 45% 56% Important 36% 33% Very important 18% 0% 20% 40% 60% Effect of mentor Effect on newcomer
Perceived importance of mentoring 0% Useless at all 0% 0% Not important 0% 11% Is very important that Neutral 45% mentor share knowledge with a mentee… 56% Important 36% 33% Very important 18% 0% 20% 40% 60% Effect of mentor Effect on newcomer
More recommend