analyzing and supporting adaptations of online code
play

Analyzing and Supporting Adaptations of Online Code Examples Tianyi - PowerPoint PPT Presentation

Analyzing and Supporting Adaptations of Online Code Examples Tianyi Zhang, 1 Di Yang, 2 Crista Lopes, 2 Miryung Kim 1 1 University of California, Los Angeles 2 University of California, Irvine Dataset and Tool:


  1. Analyzing and Supporting Adaptations of Online Code Examples Tianyi Zhang, 1 Di Yang, 2 Crista Lopes, 2 Miryung Kim 1 1 University of California, Los Angeles 2 University of California, Irvine Dataset and Tool: https://github.com/tianyi-zhang/ExampleStack-ICSE-Artifact * Both the first author and the second author contributed significantly. 1

  2. Modern Programming Workflow Interpret Browse Search Modify Problem & Assess Online Code “how to connect to MySQL in Java?” Brandt et al. Two studies of opportunistic programming: interleaving web foraging, learning, and writing code. 2009. 2

  3. Modern Programming Workflow Interpret Browse Search Modify Problem & Assess Online Code “how to connect to MySQL in Java?” Brandt et al. Two studies of opportunistic programming: interleaving web foraging, learning, and writing code. 2009. 3

  4. Modern Programming Workflow Interpret Browse Search Modify Problem & Assess Online Code “how to connect to MySQL in Java?” Brandt et al. Two studies of opportunistic programming: interleaving web foraging, learning, and writing code. 2009. 4

  5. Modern Programming Workflow Interpret Browse Search Modify Problem & Assess Online Code “how to connect to MySQL in Java?” Brandt et al. Two studies of opportunistic programming: interleaving web foraging, learning, and writing code. 2009. 5

  6. Modern Programming Workflow Interpret Browse Search Modify Problem & Assess Online Code Prompter [Ponzanelli et al. , 2014] Test cases [CodeGenie] StrathCona [Homles and AnswerBot [Xu et al. , 2017] I/O types [ParseWeb, Murphy, 2005] Deprecation Watchter [Zhou et Hunter] Sourcerer [Bajracharya al. , 2017] I/O examples [Stolee et et al. , 2006] ExampleCheck [Zhang et al. , 2018] al., 2014, FlashFill] Exemplar [McMillan et Examplore [Glassman et al. , 2018] Multimodal [Reiss, al. , 2012] … 2009] FaCoy [Kim et al. , 2018] … … 6

  7. What we have known so far … • Online Code Reuse Behavior • Copy and paste with adaptations [Wu et al. , 2018] • Seldom attribute to the sources of online code [Baltes and Diehl, 2018] • Code Adaptation & Integration Support • Rename variables and port relevant program statements [SnipMatch, Jigsaw] 7

  8. What we don’t know yet… • RQ1. What kinds of adaptations do developers make in practice? • RQ2. Are these adaptations done repetitively? • RQ3. How can we provide effective tool support? 8

  9. Outline 1. A Comprehensive 2. Qualitative Analysis Dataset 4. Tool Design & User 3. Quantitative Analysis Study 9

  10. Outline 1. A Comprehensive 2. Qualitative Analysis Dataset 4. Tool Design & User 3. Quantitative Analysis Study 10

  11. Identify Reused Stack Overflow Examples • Challenge : the lack of attribution [Baltes and Diehl, 2018] 312K Stack Overflow code 14,124 snippets (>= 3 LOC) potentially Clone Timestamp Variation reused SO Detection Analysis Dataset Java examples Java 629 explicitly Scan SO Clone Timestamp Adaptation attributed SO Detection Analysis Links 50K GitHub repos (>= 5 stars) Dataset examples Sajnani et al. SourcererCC: Scaling Code Clone Detection to Big Code. 2016 11

  12. Identify Reused Stack Overflow Examples • Challenge : the lack of attribution [Baltes and Diehl, 2018] 14,124 312K Stack Overflow code potentially snippets (>= 3 LOC) Clone Timestamp Variation reused SO Detection Analysis Dataset examples Java Variation Set: Over-approximation Java 629 explicitly Scan SO Clone Timestamp Adaptation attributed SO Detection Analysis Links 50K GitHub repos (>= 5 stars) Dataset examples Adaptation Set: Under-approximation 12

  13. Outline 1. A Comprehensive 2. Qualitative Analysis Dataset 4. Tool Design & User 3. Quantitative Analysis Study 13

  14. Qualitative Analysis • Randomly sample 200 pairs of clones from each dataset • Manually inspect their differences using GumTree [Falleri et al. , 2014] • Label program changes with short descriptions and group similar ones. 14

  15. 24 Frequent Adaptation Types in 6 Categories Code Hardening Resolve Compilation Error Exception Handling Logic Customization Refactoring Miscellaneous 15

  16. 24 Frequent Adaptation Types in 6 Categories Insert/delete a try-catch block Code Hardening Insert/delete a thrown exception in a method header Resolve Compilation Error Update an exception type Exception Handling Change statements in a catch/finally block Logic Customization Refactoring Miscellaneous Zhang et al. Analyzing and Supporting Adaptation of Online Code Examples. 2019. 16

  17. Outline 1. A Comprehensive 2. Qualitative Analysis Dataset 4. Tool Design & User 3. Quantitative Analysis Study 17

  18. Automated Rule-based Classification • Codify each adaptation type as a logic rule • e.g., Insert ( t 1 , t 2 , i ) ∧ NodeType ( t 1 , TryStatement ) ⇒ Insert_Try_Catch_Block • 98% precision and 96% recall on another 100 clone pairs 18

  19. Distribution of Common Adaptation Types 19

  20. Finding 1. Variation patterns resemble adaptation patterns 20

  21. Finding 2. Different GitHub clones of the same example share common adaptation types. at least one common adaptation type different adaptation types A Add an if check, 126 150 renaming 100 54 50 B Add an if check, change a 70% 0 method call Adaptation Dataset C Stack Add an if check, 8000 6548 renaming 6000 Overflow 4000 2314 D Example Change a method call, 2000 74% renaming 0 GitHub Adaptations Variation Dataset Counterparts 21

  22. Implications and Hypothesis Development • Implications • Variations in similar code resemble real adaptations made by developers • Different GitHub developers make similar adaptations independently • Hypothesis: Displaying variations in similar GitHub code can inspire more careful reasoning when adapting code 22

  23. Outline 1. A Comprehensive 2. Qualitative Analysis Dataset 4. Tool Design & User 3. Quantitative Analysis Study 23

  24. “How to calculate the distance between two coordinates?” 24

  25. “How to calculate the distance between two coordinates?” 25

  26. 26

  27. Within-Subjects User Study • Sixteen students from UCLA Computer Science • Two code reuse tasks • Control: view a code example and search online • Experiment: view similar code in GitHub using ExampleStack Task Description LOC GitHub Clone# Task I compute the distance between two coordinates 12 2 on earth Task II get the relative path of a given file and a root 74 2 folder Task III encode an array of bytes to a hexadecimal string 12 17 Task IV add animation to an Android view 29 4 27

  28. Finding 1. Viewing variations in similar GitHub code inspires new adaptations that are otherwise overlooked. Without ExampleStack With ExampleStack 28

  29. Finding 1. Viewing variations in similar GitHub code inspires new adaptations that are otherwise overlooked. Without ExampleStack With ExampleStack 29

  30. Finding 2. Seeing similar code is more useful than overwhelming. P5: “It highlights the best practices followed by the community and prioritizes the changes that I should make first” P6: “Super nice, it seems like the fast path to reach consensus on a particular operation” P9: “[It is] reassuring to know that the same code is used in production systems and to know the common pitfalls ” P14: “I would have completely forgotten about the null check without seeing it in a couple of [GitHub] examples ” 30

  31. Contributions 2. Rigorously codify common 1. Make available a large- adaptation patterns and scale dataset of reused code create a taxonomy between SO and GitHub. 4. Build a prototype and 3. Quantify the frequencies conduct a user study of common adaptations Dataset and Tool: https://github.com/tianyi-zhang/ExampleStack-ICSE-Artifact 31

Recommend


More recommend