Motivation Research Questions An empirical study on negative Methodology answers at Stack Overflow Experiment Result Threat of Validity Ethan Wang & Sherry Zhu Future Work Conclusion DEFINE RESEARCH QUESTIONS ▪ Over 11 million users on Stack Overflow ▪ What is the distribution of positive / neutral / negative replies? ▪ Many people experience Stack Overflow as a hostile place, especially newer coders, women, people of color. ▪ What kind of reasons for a respondent to give negative answers, and what is the distribution across the reasons?
▪ StackExchange Data Explore (From 2019.1.1 ~ 2019.10.31) ▪ Post Table Schema ▪ Senti4SD Requirements: 1. Contains only normal text 2. All text from single answer should be in one line. ▪ Text Sanitization Algorithm: ▪ Replace all new lines and extra spaces. ▪ Remove all characters not in ASCII visible range. ▪ Random sampling using NewID() ▪ Parse the HTML tag from the text, remove all the HTML tags ▪ Remove all code blocks and links while parsing the HTML tags
▪ Divide data into smaller segments ▪ Regular Expression (5000 each) Extract common patterns for each group o ▪ Filter all negative answers and random sample 200 records ▪ K-means Clustering & Support Vector Machine 1. Data Cleaning (lowercase, removed punctuation) ▪ Use online card sort tool 2. Stop Words Removing (the, he, she…) called ”U sabiliTEST ” 3. Text Vectorization (TF-IDF) Preparation 1. 2. Execution ▪ SVM: Interpretation 3. Model_1 classifies 'neutral' and 'negative' o Model_2 classifies 'negative' into multiple groups o ▪ Five themes: ▪ Neutral K-means Regex SVM ▪ Vague ▪ Undetermined ▪ Cold ▪ Irreproducible • Silhouette • Precision • Precision Coefficient = 76.86% = 86.89% = 0.002808 • Recall = • Recall = 45.49% 48.38%
▪ Regex SVM THREAT TO VALIDITY ▪ Native subjectiveness on manual process ▪ Accuracy of the Senti4SD tool STACK OVERFLOW INTERVIEWS AND POSITIVE COMMENTS SURVEYS FEEDBACK
1 2 Hostility takes only a tiny The environment on Stack portion of overall replies Overflow is satisfactory in general.
Recommend
More recommend