fourd do developers discuss design revisited
play

FourD: Do Developers Discuss Design? Revisited Abbas Shakiba - PowerPoint PPT Presentation

FourD: Do Developers Discuss Design? Revisited Abbas Shakiba Robert Green Rob ober ert Dyer er Bowling Green State University supported in part by the US National Science Foundation under CCF-15-18776 and CNS-15-12947 Do developers


  1. FourD: “Do Developers Discuss Design?” Revisited Abbas Shakiba Robert Green Rob ober ert Dyer er Bowling Green State University supported in part by the US National Science Foundation under CCF-15-18776 and CNS-15-12947

  2. Do developers discuss design decisions? 2 • Are design decisions only happening before implementation? • Do design discussions/decisions show in the commit logs?

  3. Prior work 3 • Brunet, João, et al. " Do esign? ” o dev evel elop oper ers discuss des 11th Working Conference on Mining Software Repositories, 2014 • Selected set of 5 projects for analysis • Analyzed: • commit logs • bug reports • discussions

  4. Our Study 4 • Data from 2 software repositories • GitHub, SourceForge • For each, 5 randomly selected projects • Focus on commit logs • 200 randomly selected non-empty commits per project • 2 x 200 x 5 = 2,000 commits total • Train ML classifiers to identify commits discussing design

  5. Tools Used 5 • Boa Language and Infrastructure • A language for analyzing ultra-large-scale software repositories • Weka • Data Mining Tool written in Java • Ruby on Rails • A web application framework written in Ruby

  6. Approach 6 Manual Manual Pre Pre- Build uild Test est Getting etting Data ata Analyze nalyze Classification lassification Processing Processing Models Mod els Models Mod els (Boa) oa) Results esults (survey) (survey) (Wek (W eka) a) (Wek (W eka) a) (Wek eka) a)

  7. Manual l Cla lassif ific ication ion Pre Pre-Proc oces essin ing Build ild Mod odels els Tes est Mod odels els Analy lyze e Getting etting Data ata (B (Boa) oa) (survey ey) (Wek eka) (Wek eka) (Wek eka) Res esult lts Approach (Cont'd) 7 COMMITS: output top(200)[string] of string weight float; • Boa queries ids := {"6176545", "6150849", "209281", "13151128", "1019785"}; isempty := function(s: string) : bool { • Randomly pick 5 projects s2 := trim(s); if (match(`^\s*$`, s2)) (not shown) return true; if (match(`^no message$`, lowercase(s2))) return true; • Randomly pick 200 commits if (match(`^\*\*\* empty log message \*\*\*$`, lowercase(s2))) (shown) return true; return false; }; exists (i: int; input.id == ids[i]) visit(input, visitor { before rev: Revision -> if (!isempty(rev.log)) COMMITS[input.id] << rev.log weight rand(); });

  8. Manual l Cla lassif ific ication ion Pre Pre-Proc oces essin ing Build ild Mod odels els Tes est Mod odels els Analy lyze e Get ettin ing Data (Boa oa) (survey ey) (Wek eka) (Wek eka) (Wek eka) Res esult lts Approach (Cont'd) 8 • Survey website for crowdsourcing • Each log shown to 2-3 users • Required 2 YES or 2 NO

  9. Pre-Processing Pre Processing Manual l Cla lassif ific ication ion Build ild Mod odels els Tes est Mod odels els Analy lyze e Get ettin ing Data (Boa oa) (W (Wek eka) a) (survey ey) (Wek eka) (Wek eka) Res esult lts Approach (Cont'd) 9 • Convert data to ARFF format class: No Swapping 1 attributes: the 2 • e.g., data1: position 1 of 1 “swapping the position of the input function <</>>” the 1 Classified: no o input 1 function 1 <</>> 1 class: Yes reorganized 1 • e.g., data2: attributes: package 1 “reorganized a package structure to better reflect a layered structure 1 approach” to 1 better 1 Classified: yes es reflect 1 a 2 layered 1 approach 1

  10. Pre Pre-Processing Processing Manual l Cla lassif ific ication ion Build ild Mod odels els Tes est Mod odels els Analy lyze e Get ettin ing Data (Boa oa) (Wek (W eka) a) (survey ey) (Wek eka) (Wek eka) Res esult lts Approach (Cont'd) 10 • Convert data to ARFF format class: No Swapping 1 • Tokenization attributes: the 2 position 1 • Remove tokens without letters of 1 the 1 • Stemming input 1 function 1 • Remove stop words <</>> 1 • a, an, the, to, etc. class: Yes reorganized 1 • Eliminate prefix and suffix attributes: package 1 structure 1 • -ing, -ed, -ly, etc. to 1 better 1 reflect 1 a 2 layered 1 approach 1

  11. Pre-Processing Pre Processing Manual l Cla lassif ific ication ion Build ild Mod odels els Tes est Mod odels els Analy lyze e Get ettin ing Data (Boa oa) (W (Wek eka) a) (survey ey) (Wek eka) (Wek eka) Res esult lts Approach (Cont'd) 11 • Convert data to ARFF format class: No Swapping 1 • Tokenization attributes: the 2 position 1 • Remove tokens without letters of 1 the 1 • Stemming input 1 function 1 • Remove stop words <</>> 1 • a, an, the, to, etc. class: Yes reorganized 1 • Eliminate prefix and suffix attributes: package 1 structure 1 • -ing, -ed, -ly, etc. to 1 better 1 reflect 1 a 2 layered 1 approach 1

  12. Pre-Processing Pre Processing Manual l Cla lassif ific ication ion Build ild Mod odels els Tes est Mod odels els Analy lyze e Get ettin ing Data (Boa oa) (Wek (W eka) a) (survey ey) (Wek eka) (Wek eka) Res esult lts Approach (Cont'd) 12 • Convert data to ARFF format • Tokenization class: No Swap 1 attributes: input 1 • Remove tokens without letters pos 1 function 1 • Stemming class: Yes organ 1 • Remove stop words attributes: pack 1 • a, an, the, to, etc. struc 1 better 1 • Eliminate prefix and suffix flect 1 • -ing, -ed, -ly, etc. layer 1 approach 1

  13. Build ild Mod odels els Manual l Cla lassif ific ication ion Pre-Proc Pre oces essin ing Tes est Mod odels els Analy lyze e Get ettin ing Data (Boa oa) (survey ey) (Wek eka) (Wek eka) (Wek eka) Res esult lts Approach (Cont'd) 13 • Machine Learning Algorithms in Weka • Decision Tree • Random Forest • Naïve Bayes • Multinomial Bayes • Support Vector Machines • K-Nearest Neighbor

  14. Test est Mod Models els Manual l Cla lassif ific ication ion Pre-Proc Pre oces essin ing Build ild Mod odels els Analy lyze e Get ettin ing Data (Boa oa) (Wek eka) a) (survey ey) (Wek eka) (Wek eka) Res esult lts Difficulties 14 Different Data Distributions 100 50 0 Dataset 1 Dataset 2 Class: No Class: Yes

  15. � Test est Mod Models els Manual l Cla lassif ific ication ion Pre Pre-Proc oces essin ing Build ild Mod odels els Analy lyze e Get ettin ing Data (Boa oa) (Wek eka) a) (survey ey) (Wek eka) (Wek eka) Res esult lts Difficulties (Cont'd) 15 • Confusion Matrix Pred edicted ed • Add weight to cells Con onfusion on Matrix Yes es No No • Statistical measurements True e Pos ositive False e Neg egative Yes es • F-Measure (T (TP) (FN) (F N) Ac Actual • G-Mean False e Pos ositive True e Neg egative No No (F (FP) (T (TN) N) 𝑈𝑂 𝑈𝑄 𝑈𝑄 𝑈𝑄 𝐵𝑑𝑑𝑣𝑠𝑏𝑑𝑧 12 = 𝐵𝑑𝑑𝑣𝑠𝑏𝑑𝑧 '() = 𝑈𝑂 + 𝐺𝑄 𝑄𝑠𝑓𝑑𝑗𝑡𝑗𝑝𝑜 = 𝑆𝑓𝑑𝑏𝑚𝑚 = 𝑈𝑄 + 𝐺𝑂 𝑈𝑄 + 𝐺𝑄 𝑈𝑄 + 𝐺𝑂 9 𝑡𝑑𝑝𝑠𝑓 = 𝑄𝑠𝑓𝑑𝑗𝑡𝑗𝑝𝑜 ×𝑠𝑓𝑑𝑏𝑚𝑚 𝐵𝑑𝑑𝑣𝑠𝑏𝑑𝑧 '() ×𝐵𝑑𝑑𝑣𝑠𝑏𝑑𝑧 12 𝐻 4(56 = 𝐺 𝑄𝑠𝑓𝑑𝑗𝑡𝑗𝑝𝑜 + 𝑠𝑓𝑑𝑏𝑚𝑚

  16. Analyze nalyze Manual l Cla lassif ific ication ion Pre Pre-Proc oces essin ing Build ild Mod odels els Tes est Mod odels els Get ettin ing Data (Boa oa) Results esults (survey ey) (Wek eka) (Wek eka) (Wek eka) All Results 16

  17. Analyze nalyze Manual l Cla lassif ific ication ion Pre Pre-Proc oces essin ing Build ild Mod odels els Tes est Mod odels els Get ettin ing Data (Boa oa) Results esults (survey ey) (Wek eka) (Wek eka) (Wek eka) Interesting Results 17

  18. Analyze nalyze Manual l Cla lassif ific ication ion Pre Pre-Proc oces essin ing Build ild Mod odels els Tes est Mod odels els Get ettin ing Data (Boa oa) Results esults (survey ey) (Wek eka) (Wek eka) (Wek eka) F-measure and G-mean 18 GitH Gi tHub ub Sou ourceF eFor orge

  19. Future Work 19 • Move analysis completely into Boa • Pre-processing tasks • Machine learning models • Do developers discuss other topics? • testing • debugging • etc.

  20. http://boa.cs.iastate.edu/ COMMITS: output top(200)[string] of string weight float; class: No Swap 1 ids := {"6176545", "6150849", "209281", "13151128", "1019785"}; attributes: input 1 isempty := function(s: string) : bool { 20 s2 := trim(s); pos 1 if (match(`^\s*$`, s2)) return true; function 1 if (match(`^no message$`, lowercase(s2))) return true; if (match(`^\*\*\* empty log message \*\*\*$`, lowercase(s2))) class: Yes organ 1 return true; attributes: pack 1 return false; }; struc 1 exists (i: int; input.id == ids[i]) visit(input, visitor { better 1 before rev: Revision -> if (!isempty(rev.log)) flect 1 COMMITS[input.id] << rev.log weight rand(); }); layer 1 approach 1 To summarize… Pred edicted ed Con onfusion on Ma Matri rix Yes es No No Yes es TP TP FN FN Ac Actual No No FP FP TN TN

Recommend


More recommend