the many faces of software analytics
play

The Many Faces of Software Analytics David Lo School of - PowerPoint PPT Presentation

The Many Faces of Software Analytics David Lo School of Information Systems Singapore Management University davidlo@smu.edu.sg Talk at the University of Luxembourg, Dec 2014 A Brief Self-Introduction X X 6,496 miles or 10,454 km 2 A


  1. The Many Faces of Software Analytics David Lo School of Information Systems Singapore Management University davidlo@smu.edu.sg Talk at the University of Luxembourg, Dec 2014

  2. A Brief Self-Introduction X X 6,496 miles or 10,454 km 2

  3. A Brief Self-Introduction From Wikipedia 3

  4. A Brief Self-Introduction 4

  5. Singapore Management University  Third university in Singapore  Number of students:  7000+ (UG)  1000+ (PG)  Schools:  Information Systems  Economics  Law  Business  Accountancy  Social Science 5

  6. School of Information Systems  Undergraduates: 1000+  Master students: 100+  Doctoral students: 50+ 6

  7. Our Research Group @ SMU 7

  8. Our Research Group @ SMU  9 PhD Students  1 Visiting Professor  1 Research Engineer (Jan 2015) 8

  9. Software Analytics ”Data exploration and analysis in order to obtain insightful and actionable information for data- driven tasks around software and services” (Zhang and Xie, 2012) 9

  10. Software Analytics: Definition  Analysis of a large amount of software data stored in various repositories in order to:  Understand software development process  Help improve software maintenance  Help improve software reliability  And more 10

  11. Software Analytics Mailings Bugzilla Code Dev. Execution SVN Network traces 11

  12. Research Directions: Software Analytics Analytics for Coding & Collaboration Analytics for Testing & Debugging Analytics for Requirement & Design Validation 12

  13. Our Past and Current Work Analytics for Coding & Collaboration 13

  14. Intelligent Multi Modal Code Search 14

  15. Intelligent Multi Modal Code Search e.g., structured query, free User text, code example… Query Code Version control Search system, Code base Engine collaboration sites…… e.g., code fragment, Relevant method, class, projects, … Code 15

  16. Intelligent Multimodal Code Search Nodes: func A, func B, var C, var D; How do I load properties Relations: C dataDepends A, D from an XML file? dataDepends B, D isFieldOf C; Targets: D Free Text Dependence Query Language Code Search Engine Code Examples 16

  17. Structured Code Search (ASE10) A developer can define a query about the dependence relationship in a bug pattern or a need-to-refactor code pattern. Using our search engine, he/she can find x1, x2, and x3 which are instances of the code pattern X1 Codes Bug Report X3 X2 Query Dependence Based Code Search Engine 17

  18. Workflow of Our Approach Query Graph Query Query Construction Graphs and Splitting Query Post-Filtering Graph Query Results and Merging Processing Code I ndexed Graph SDG SDG Indexing 18

  19. Dependence Query Language (DQL)  Allows developers to describe a target  Involving several code elements  Including the dependencies between the elements  Composed of 4 parts  Query identifier declarations [D]  Code element (node) constraints [N]  Relation constraints [R]  Desired target identifiers [T] 19

  20. Dependence Query Language (DQL)  Node Description [N] : Code element constraints  contains < Text> , inFile < FileName> , inFunction < FnName> , controlType < for/while/switch/if> , etc.  Relation Description [R] : Relationship constraints  A (transitively) controls B, A calls B, A is data dependent on B  A is one step ( directly ) < depend-operation> on B  A textual contains B, etc. 20

  21. Query Splitting  Split a query with disjunctions of conditions  Result: Multiple queries with only conjunctions function A, variable B; A contains "abc"; A dataDepends B; want A control-point A, variable B; A contains "abc"; A dataDepends B; want A function/control-point A, variable B; A contains "abc" or contains "de"; function A, variable B; A contains A dataDepends B; wantA "de"; A dataDepends B; want A control-point A, variable B; A contains "de"; A dataDepends B; want A 21

  22. Query Graph Construction  Query Declarations  Each identifier becomes a node in the query graph  Relation Descriptions  Each dependence relation becomes an edge in the query graph A:declaration B: actual-out C: expression 22

  23. Query Graph Splitting  Divide the query graph to two sub-graphs  Each only capture control OR data dependences A:declaration D: Control B: actual-out point A:declaration C: expression B: actual-out D: Control C: expression point B: actual-out C: expression 23

  24. Graph Indexing and Query  Purpose:  Locate all instances of a given graph pattern in a large graph (Cheng et al., ICDE08) Graph A1 A2 A3 Query Three results found: B2 - triangle A B1 C1 - square C3 C2 - star B D1 E1 F E2 F2 E3 F1 (b) (a) 24

  25. Result Filtering & Merging  Result Filtering  Textual conditions (e.g., textual contains)  Other relation descriptions  Result Merging  Split 1: Disjunctions  Split 2: Data vs. Control Dependences  Need to union the sub-results 25

  26. Evaluation  Two open source projects  expat, gpsbabel Project name Description Version Size (LOC) expat 2002-05-17 13 XML handling library 2002-05-22 13 gpsbabel GPS toolkit 2004-10-27 50 2005-03-21 54  Four software maintenance tasks  From pairs of snapshots from version histories  Developer change = Gold standard 26

  27. Overall Results: Accuracy Task # Targets Text Search Code Clone Our approach Detection FP FN FP FN FP FN 1 2 526 0 0 2 36 0 2 8(186) 829(651) 0 0 8 200(22) 0 3 37 297 0 23 3 25 2 4 19 86 0 9 2 3 0 For task 2, the number in the bracket: Adjusted numbers after considering correct locations that are not modified yet by developers 27

  28. Free Text Code Search (FSE12) Find optimum connected graph that meets user needs Greedy subgraph search algorithm with shortest path indexing 28

  29. Example Based Code Search (ASEJ15) Example 2: Example 1: if(b> 1){ if(c> 3){ b= ext()+ foo(); c= getStr(); } c= ext(); } Lightweight type Extend to compilable Generate PDGs inference, codes Closed subgraph mining PDGs Generation Engine Our Manual Generate Recover Mine Prec. 0.684 0.584 dependency textual common query information subgraphs Recall 0.721 0.767 F1 0.702 0.664 Query Generation Engine 29

  30. Coding & Collaboration Structured Example Free Text Active Code Search Based Code Search Code Search (ASE10) Code Search (FSE12) (ASE14) (ASEJ15) Multi-Criteria Project Search Structured (ICECCS13) + Topic Model (WCRE10) Similar Project Search (ICSM12) 30

  31. Coding & Collaboration Recommending Recommending Recommending Answer Posts Related Libraries API Methods Given (ASE11) (WCRE13) Feature Requests (ASE13) 31

  32. Coding & Collaboration Observatory of Automated Content Recommending Developer Tweets and Trends Categorization Recommendation Tags to Contents (ASE11) (ICPC14) (MSR13, ICSME14) (WCRE11) Project Recommending Identification of Best Answerers Success Relevant Microblogs (QMC13) Estimation (ICSM12) (CSMR13) 32

  33. Coding & Collaboration Software Diffusion Collaboration Coding Practice New Media Patterns Usages APSEC12 PLOS13 WCRE10 MUD14 COMPSAC13 CSMR13 CSMR13 SAC13 MSR12 33

  34. Our Past and Current Work Analytics for Testing & Debugging 34

  35. Bug Finding and Fixing are Hard !  Software bugs cost the US Economy 59.5 billion dollars annually  Stated by the US National Institute of Standards and Technology in 2002 (Tassey, 2002)  Software debugging is an expensive and time consuming task in software projects  Testing and debugging activities account 30-90% of the labor expended on a project (Beizer, 1990) 35

  36. Bug Finding Techniques A buggy program Analyze program List of possible buggy program elements 36

  37. Bug Finding Techniques Bug Report Failure Bug Finder Anomaly 37

  38. Spectrum-Based Fault Localization Block Program Element T1 T2 T3, T4, … I D 1 double a, x; double ap, del, sum; int n; double temp; if ( x < = 0.0 ) 2 { return 0.0;} 3 del = sum = 1.0 / (ap = a); for ( n = 1; n < = ITMAX; + + n){ 4 sum + = del * = x / + + ap; if ( Abs( del ) < Abs( sum ) * EPS){ 5 /* BUGS: supposed to be:* / /* temp = sum * exp(-x + a* log(x)-Lgamma(a))* / temp = sum * exp( x + a* log( x )-Lgamma(a)); return temp;} } F P Status of Test Case Execution Program spectra 38

  39. Measuring suspiciousness Suspiciousness Scores Program Elements vb vb e.g., spectrum-based fault localization (Abreu et.al, TAICPART-MUTATION’07, Lucia et al., ICSM’10 ) 39

  40. Motivation There is no single fault localization techniques that is the best in all cases. (Lucia et al., JSEP, 2014) Combine different techniques? 40

  41. Fusion Localizer (ASE14) 41

  42. Step 2. Techniques selection A set of fault localization techniques Choosing the techniques to be fused (A) Overlap-based (B) Bias-based selection selection Selected fault localization techniques 42

Recommend


More recommend