How Developers Read and Comprehend Stack Overflow Questions for Tag - PowerPoint PPT Presentation

How Developers Read and Comprehend Stack Overflow Questions for Tag Prediction Senior Capstone Project By: Ali Morris

Objectives ● Determine what developers focus on when reading Stack Overflow questions to assign tags using eye-tracking ● Determine valuable areas of interest (AOIs) for tag assignment especially keywords Research Questions RQ1. Which sections of postings are most valuable when assigning tags (code, title, etc)? RQ2. How will non-novice developers compare against novice developers in regards to tag assignment accuracy, reading patterns, areas of interest? RQ3. How can this information be used to enhance existing auto-generating tag techniques? 2

Stack Overflow ● The largest online community for programmers to learn & share their knowledge ○ 2 million questions, 19 million answers and 47 million comments ○ Available to download; data dump of size 70GB ● Forum format where developers can post questions and others can respond ● Organization of site dependent on classification scheme driven by tagging system Why is auto-tagging important? → ● ○ Users may not know how to correctly categorize questions ○ Stack Overflow dependent upon this for organization, usefulness ● Current auto-generation tag accuracy: 68.47% [1] 3

Related Work Studies to auto-generate tags for Stack Overflow without eye-tracking ● ● Current approaches all similar: ○ Data Mining & Machine Learning Algorithms [1]-[3] ■ Extract important features by tokenizing many postings ■ Train algorithms on existing data to predict tags for new postings ● Can use these concepts for future work Can improve tag accuracy with eye tracking as implicit feedback ● 4

Eye-Tracking ● Gaze data holds information about visual attention Thought processes, strategies, user technique ○ A new field: Eye-tracking to study how developers work ● Huge amount of data per session: ● Running at 60Hz → 60 samples per second ○ Different types of gaze data holding different information ● 5

Eye-Tracking ● Types of gaze data & analysis: Fixation: focus point where the eyes remain stationary for some time ○ ○ Duration: total fixation time for an area ○ Saccade: Quick eye movement between fixations ○ Scanpath: sequences saccade-fixation-saccade that interconnect Area of Interest (AOI): specific areas on the screen on which quantitative ○ eye movements (fixation counts and durations) are calculated 6

Experiment Design ● Conducted in eye-tracking lab utilizing Tobii Studio ● 7 participants ○ CS, CIS, & EE majors attending Youngstown State University ○ Coding experience in C/C++ of less than a year and up to 5 years ○ Each briefed on the study and participated in pre and post surveys ● Participants presented with of 9 tasks from 3 different categories ○ Sourced directly from Stack Overflow ○ Questions C/C++ relevant ○ Categories increased with complexity & curated based on defined criteria ● Participants assigned up to 5 tags from a Suggested Tags list ○ 10 possible tags: 5 relevant, 5 distractors ○ Participants allowed to suggest tags not in list if necessary 7

Task Categories Simple Content commonly taught in CS1: ● Simple data types ● Operators ● Control structures ● Basic properties of C/C++ language. Average Knowledge beyond CS1 level & comes from experience developing: ● Specific details of data structures ● Involved application of aspects from the simple level Complex Applications of more difficult/compound topics: ● Algorithm designs ● Complicated memory management techniques ● Obscure/intense properties of the C++ language. 8

Figure 1. Sample Task Representation 9

Analysis ● AOI groups assigned to each task: ○ Title ○ Description ○ Code ○ Relevant Tags ○ Distractor Tags ○ Keywords 10

Figure 2. AOI Representation 11

Analysis: Tag Accuracy ● Average Accuracy: 90.57% ● Average Tags per Task: 3 ● Feedback on overall confidence levels generally reflected accuracy 12

Analysis: Tag Accuracy Tag accuracy decreases with difficulty 13

Analysis: Overall Fixation Duration ● Averages of all recordings ● Relevant and Distractor Tags approximately equal Most focus time on ● Description & code ● Least focus time on Title 14

Analysis: Overall Fixation Duration over Categories Noticable Duration trends on Code & Title fixations 15

Analysis: Overall Fixation Count ● Averages of all recordings ● Approximately consistent with Duration Fixations 16

Analysis: Overall Fixation Count over Categories Same trends appear with changes in Code and Title 17

Analysis: Accuracy Non-novice v. Novice ● Non-novice performed slightly better ● Where novice excelled: ○ Average Level Tasks Also only assigned 1-2 tags in this category Average Tag Assignment ● Non-novice: 3-4 tags ● Novice: 2 tags Non-novice more confident in general in tag assignment 18

Analysis: Fixation Duration Non-Novice v. Novice Duration Ratios: Duration Ratios: Code: 32% Code: 22% 19 Title & Description: 37% Title & Description: 46%

Analysis: Fixation Count Non-Novice v Novice Count Ratios: Count Ratios: Code: 32% Code: 24% 20 Title & Description: 43% Title & Description: 50% : : Code: 13 s Code: 13 s Title & Description: 27 s Title & Description: 27 s

Analysis: Keywords ● First time to fixation ○ Tags not evaluated before posting ● Notice: on average a quick fixation on keywords... 21

Analysis: Keywords ● Readers often go back to keyword after first fixation ● Average of 26% of fixation on keywords; a small portion of screen 22

Fixation Count vs Duration vs Visits [4] 23

Conclusions Fixation count & duration often correlates ● Approximately equal time spent evaluating Relevant and Distractor tags ● ● With an increase in difficulty → Increase of fixations on Code ■ ■ Decrease of fixations on Title (especially true for non-novice programmers) Non-novice programmers: perform better, assigned more tags, focus more on ● code in comparison to novice & use it more as questions become more difficult Novice programmers: less accuracy in tag assignment, assigned less tags, ● focus mostly on description & title ● From visual and statistical analysis: developers tend to evaluate postings first and tags after (sequential pattern) Learning styles & reading patterns can affect outcome [5] ○ ● Developers quickly focus on keywords & revisit frequently throughout evaluation 24

Future Work Continuation of this project: Machine algorithms (informed by eye-gaze) to predict tags: ● ○ Linear Support Vector Machines (SVM), Naive Bayes, Random Forest Keyword Identification: Identify keywords in text automatically ● Consider existing models for tag generation compounded with eye-tracking ○ Recognize code as relevant keywords ● ○ Will differ with different languages 25

References [1] A. K. Saha, R. K. Saha, and K. A. Schneider, “A discriminative model approach for suggesting tags automatically for stack overflow questions,” in Proceedings of the 10th Working Conference on Mining Software Repositories , 2013. [2] C. Stanley and M. D. Byrne, “Predicting tags for stackoverflow posts,” in Proceedings of ICCM , 2013, vol. 2013. [3] S. Schuster, W Zhu, Y. Cheng, “Predicting Tags for Stack Overflow Questions”, 2013. [4] Tobii AB, “Tobii Studio User’s Manual”, Version 3.4.5, 2016. [5] A. Goswami, G. Walia, M. McCourt, G. Padmanabhan, “Using Eye Tracking to Investigate Reading Patterns and Learning Styles of Software Requirement Inspectors to Enhance Inspection Team Outcome”, in Proceedings of ESEM, 2016. 26

How Developers Read and Comprehend Stack Overflow Questions for Tag - PowerPoint PPT Presentation

How Developers Read and Comprehend Stack Overflow Questions for Tag Prediction Senior Capstone Project By: Ali Morris Objectives Determine what developers focus on when reading Stack Overflow questions to assign tags using eye-tracking

Re-arquitetando o Re-arquitetando o Stack Overflow Stack Overflow ou como construmos o Stack

Evolution of Stack Overflow Discussions Using Sentimental Analysis on Comments in Stack Overflow

Stack Stack Heap Heap Data Data Text Text Program A Program B Stack Stack Text Heap

Stack and Queue Stack Overview Stack ADT Basic operations of stack Pushing, popping

a single gadget weird machine Framing Signals a return to portable shellcode Erik Bosman and

Threshold Implementations: Comprehend and Apply Svetla Nikova, KU Leuven, Belgium July 4rd, 2013

Threshold Implementations: Comprehend and Apply Svetla Nikova, KU Leuven, Belgium June 8, 2013

Buffer Overflow Attacks IA32 Linux Stack Higher Addresses Virtual Address Space Heap Data

Call Stack Stack Bottom Memory region managed with stack discipline Procedures and the Call

Stack ADT Tiziana Ligorio 1 Todays Plan Questons? Stack ADT 2 Abstract Data Types

Tutorial 5 Overflow Structures Call stack and stack frames 1 CS 136 Spring 2020

History of the Stack Overflow Buffer Overflow Understood

The Stack Eric McCreath The Stack The stack is a simple but useful data structure in computer

Sorting with Pop Stacks Stack sorting Pop stack sorting 1-pop-stack sortability 2-pop-stack

Compilers Stack Machines Alex Aiken Stack Machines Only storage is a stack An

Buffer Overflow overflows Defenses and other memory safety vulnerabilities Buffer overflow

Analysis of drivers eye of the scene". Eye tracking is a tool to analyse the movement

60 MS TO GET IT RIGHT GAZE-CONTINGENT RENDERING & HUMAN PERCEPTION Dr. Rachel Albert, GTC San

Optimal control principles to explain 3D eye movement Carlos Aleluia June 2019 1 Computer and

Maternity System Report to the Outer North East London Joint Health Overview and Scrutiny

Navigating new opportunities for prognosis of neurodegenerative disorders Tarkeshwar Singh

Usability and Eye Tracking Marco Pretorius Usability Manager & Researcher UNISA: School of

Go GC: Prioritizing Low Latency and Simplicity Rick Hudson Google Engineer QCon San Francisco

Modeling User Behavior and Interactions M d li U B h i d I t ti Lecture 2: Interpreting

Sambuz

Useful Links

Newsletter

Mail Us

How Developers Read and Comprehend Stack Overflow Questions for Tag - PowerPoint PPT Presentation

How Developers Read and Comprehend Stack Overflow Questions for Tag Prediction Senior Capstone Project By: Ali Morris Objectives Determine what developers focus on when reading Stack Overflow questions to assign tags using eye-tracking

Re-arquitetando o Re-arquitetando o Stack Overflow Stack Overflow ou como construmos o Stack

Evolution of Stack Overflow Discussions Using Sentimental Analysis on Comments in Stack Overflow

Stack Stack Heap Heap Data Data Text Text Program A Program B Stack Stack Text Heap

Stack and Queue Stack Overview Stack ADT Basic operations of stack Pushing, popping

a single gadget weird machine Framing Signals a return to portable shellcode Erik Bosman and

Threshold Implementations: Comprehend and Apply Svetla Nikova, KU Leuven, Belgium July 4rd, 2013

Threshold Implementations: Comprehend and Apply Svetla Nikova, KU Leuven, Belgium June 8, 2013

Buffer Overflow Attacks IA32 Linux Stack Higher Addresses Virtual Address Space Heap Data

Call Stack Stack Bottom Memory region managed with stack discipline Procedures and the Call

Stack ADT Tiziana Ligorio 1 Todays Plan Questons? Stack ADT 2 Abstract Data Types

Tutorial 5 Overflow Structures Call stack and stack frames 1 CS 136 Spring 2020

History of the Stack Overflow Buffer Overflow Understood

The Stack Eric McCreath The Stack The stack is a simple but useful data structure in computer

Sorting with Pop Stacks Stack sorting Pop stack sorting 1-pop-stack sortability 2-pop-stack

Compilers Stack Machines Alex Aiken Stack Machines Only storage is a stack An

Buffer Overflow overflows Defenses and other memory safety vulnerabilities Buffer overflow

Analysis of drivers eye of the scene&quot;. Eye tracking is a tool to analyse the movement

60 MS TO GET IT RIGHT GAZE-CONTINGENT RENDERING &amp; HUMAN PERCEPTION Dr. Rachel Albert, GTC San

Optimal control principles to explain 3D eye movement Carlos Aleluia June 2019 1 Computer and

Maternity System Report to the Outer North East London Joint Health Overview and Scrutiny

Navigating new opportunities for prognosis of neurodegenerative disorders Tarkeshwar Singh

Usability and Eye Tracking Marco Pretorius Usability Manager &amp; Researcher UNISA: School of

Go GC: Prioritizing Low Latency and Simplicity Rick Hudson Google Engineer QCon San Francisco

Modeling User Behavior and Interactions M d li U B h i d I t ti Lecture 2: Interpreting

Sambuz

Useful Links

Newsletter

Mail Us

Analysis of drivers eye of the scene". Eye tracking is a tool to analyse the movement

60 MS TO GET IT RIGHT GAZE-CONTINGENT RENDERING & HUMAN PERCEPTION Dr. Rachel Albert, GTC San

Usability and Eye Tracking Marco Pretorius Usability Manager & Researcher UNISA: School of