How Developers Read and Comprehend Stack Overflow Questions for Tag Prediction Senior Capstone Project By: Ali Morris
Objectives ● Determine what developers focus on when reading Stack Overflow questions to assign tags using eye-tracking ● Determine valuable areas of interest (AOIs) for tag assignment especially keywords Research Questions RQ1. Which sections of postings are most valuable when assigning tags (code, title, etc)? RQ2. How will non-novice developers compare against novice developers in regards to tag assignment accuracy, reading patterns, areas of interest? RQ3. How can this information be used to enhance existing auto-generating tag techniques? 2
Stack Overflow ● The largest online community for programmers to learn & share their knowledge ○ 2 million questions, 19 million answers and 47 million comments ○ Available to download; data dump of size 70GB ● Forum format where developers can post questions and others can respond ● Organization of site dependent on classification scheme driven by tagging system Why is auto-tagging important? → ● ○ Users may not know how to correctly categorize questions ○ Stack Overflow dependent upon this for organization, usefulness ● Current auto-generation tag accuracy: 68.47% [1] 3
Related Work Studies to auto-generate tags for Stack Overflow without eye-tracking ● ● Current approaches all similar: ○ Data Mining & Machine Learning Algorithms [1]-[3] ■ Extract important features by tokenizing many postings ■ Train algorithms on existing data to predict tags for new postings ● Can use these concepts for future work Can improve tag accuracy with eye tracking as implicit feedback ● 4
Eye-Tracking ● Gaze data holds information about visual attention Thought processes, strategies, user technique ○ A new field: Eye-tracking to study how developers work ● Huge amount of data per session: ● Running at 60Hz → 60 samples per second ○ Different types of gaze data holding different information ● 5
Eye-Tracking ● Types of gaze data & analysis: Fixation: focus point where the eyes remain stationary for some time ○ ○ Duration: total fixation time for an area ○ Saccade: Quick eye movement between fixations ○ Scanpath: sequences saccade-fixation-saccade that interconnect Area of Interest (AOI): specific areas on the screen on which quantitative ○ eye movements (fixation counts and durations) are calculated 6
Experiment Design ● Conducted in eye-tracking lab utilizing Tobii Studio ● 7 participants ○ CS, CIS, & EE majors attending Youngstown State University ○ Coding experience in C/C++ of less than a year and up to 5 years ○ Each briefed on the study and participated in pre and post surveys ● Participants presented with of 9 tasks from 3 different categories ○ Sourced directly from Stack Overflow ○ Questions C/C++ relevant ○ Categories increased with complexity & curated based on defined criteria ● Participants assigned up to 5 tags from a Suggested Tags list ○ 10 possible tags: 5 relevant, 5 distractors ○ Participants allowed to suggest tags not in list if necessary 7
Task Categories Simple Content commonly taught in CS1: ● Simple data types ● Operators ● Control structures ● Basic properties of C/C++ language. Average Knowledge beyond CS1 level & comes from experience developing: ● Specific details of data structures ● Involved application of aspects from the simple level Complex Applications of more difficult/compound topics: ● Algorithm designs ● Complicated memory management techniques ● Obscure/intense properties of the C++ language. 8
Figure 1. Sample Task Representation 9
Analysis ● AOI groups assigned to each task: ○ Title ○ Description ○ Code ○ Relevant Tags ○ Distractor Tags ○ Keywords 10
Figure 2. AOI Representation 11
Analysis: Tag Accuracy ● Average Accuracy: 90.57% ● Average Tags per Task: 3 ● Feedback on overall confidence levels generally reflected accuracy 12
Analysis: Tag Accuracy Tag accuracy decreases with difficulty 13
Analysis: Overall Fixation Duration ● Averages of all recordings ● Relevant and Distractor Tags approximately equal Most focus time on ● Description & code ● Least focus time on Title 14
Analysis: Overall Fixation Duration over Categories Noticable Duration trends on Code & Title fixations 15
Analysis: Overall Fixation Count ● Averages of all recordings ● Approximately consistent with Duration Fixations 16
Analysis: Overall Fixation Count over Categories Same trends appear with changes in Code and Title 17
Analysis: Accuracy Non-novice v. Novice ● Non-novice performed slightly better ● Where novice excelled: ○ Average Level Tasks Also only assigned 1-2 tags in this category Average Tag Assignment ● Non-novice: 3-4 tags ● Novice: 2 tags Non-novice more confident in general in tag assignment 18
Analysis: Fixation Duration Non-Novice v. Novice Duration Ratios: Duration Ratios: Code: 32% Code: 22% 19 Title & Description: 37% Title & Description: 46%
Analysis: Fixation Count Non-Novice v Novice Count Ratios: Count Ratios: Code: 32% Code: 24% 20 Title & Description: 43% Title & Description: 50% : : Code: 13 s Code: 13 s Title & Description: 27 s Title & Description: 27 s
Analysis: Keywords ● First time to fixation ○ Tags not evaluated before posting ● Notice: on average a quick fixation on keywords... 21
Analysis: Keywords ● Readers often go back to keyword after first fixation ● Average of 26% of fixation on keywords; a small portion of screen 22
Fixation Count vs Duration vs Visits [4] 23
Conclusions Fixation count & duration often correlates ● Approximately equal time spent evaluating Relevant and Distractor tags ● ● With an increase in difficulty → Increase of fixations on Code ■ ■ Decrease of fixations on Title (especially true for non-novice programmers) Non-novice programmers: perform better, assigned more tags, focus more on ● code in comparison to novice & use it more as questions become more difficult Novice programmers: less accuracy in tag assignment, assigned less tags, ● focus mostly on description & title ● From visual and statistical analysis: developers tend to evaluate postings first and tags after (sequential pattern) Learning styles & reading patterns can affect outcome [5] ○ ● Developers quickly focus on keywords & revisit frequently throughout evaluation 24
Future Work Continuation of this project: Machine algorithms (informed by eye-gaze) to predict tags: ● ○ Linear Support Vector Machines (SVM), Naive Bayes, Random Forest Keyword Identification: Identify keywords in text automatically ● Consider existing models for tag generation compounded with eye-tracking ○ Recognize code as relevant keywords ● ○ Will differ with different languages 25
References [1] A. K. Saha, R. K. Saha, and K. A. Schneider, “A discriminative model approach for suggesting tags automatically for stack overflow questions,” in Proceedings of the 10th Working Conference on Mining Software Repositories , 2013. [2] C. Stanley and M. D. Byrne, “Predicting tags for stackoverflow posts,” in Proceedings of ICCM , 2013, vol. 2013. [3] S. Schuster, W Zhu, Y. Cheng, “Predicting Tags for Stack Overflow Questions”, 2013. [4] Tobii AB, “Tobii Studio User’s Manual”, Version 3.4.5, 2016. [5] A. Goswami, G. Walia, M. McCourt, G. Padmanabhan, “Using Eye Tracking to Investigate Reading Patterns and Learning Styles of Software Requirement Inspectors to Enhance Inspection Team Outcome”, in Proceedings of ESEM, 2016. 26
Recommend
More recommend