Do Developers Feel Emotion? An Exploratory Analysis of Emotions. - - PowerPoint PPT Presentation
Do Developers Feel Emotion? An Exploratory Analysis of Emotions. - - PowerPoint PPT Presentation
Do Developers Feel Emotion? An Exploratory Analysis of Emotions. Motivation Feelings and emotions dictate to a large extent our actions and decisions. Developers potential and productivity is fully unlockable if people feel safe
Motivation
- Feelings and emotions dictate to a large
extent our actions and decisions.
- Developersʼ potential and productivity is fully
unlockable if people feel safe and happy.
- It is important to support managers and project
leaders in detecting emotions
Final Goal
- Building a tool for automatic emotion
- detection. A first step:
- Can emotions actually be detected from
issue reports?
- If so, can human actually agree on the
identified emotions?
Our approach
- A significant sample of developers’
comments of the Apache were analyzed based on Parrott’s emotional framework.
- Can human raters, without any training, agree on
the presence of emotions in issue reports?
- Dose training improve the agreement of human
raters?
- Dose context improve the agreement of human
raters?
Related Work
Ahmed Hassan et. al tried to answering these questions:
- What is the personality type of OSS
developers?
- Dose the language and attitude of a
developer change as moves from being a current, to a departing developer?
Related Work
- Guzman et. al proposed an approach to improve
emotional awareness in software development teams by means of quantitative emotion summaries.
- Their approach automatically extracts and summarizes emotions
expressed in collaboration artifacts by combining probabilistic topic modelling with lexical sentiment analysis techniques.
Emotion Mining
- Emotion mining tries to identify the
presence of emotions like joy or fear
- Sentiment analysis evaluates a given
emotion as being positive or negative
Emotion Mining in Software Engineering
- Applied to text artifacts can be used to
provide hints on factors responsible for joy and satisfaction, or fear and anger among developers.
- It provides a different perspective to
interpret productivity and job satisfaction.
Parrott’s Framework
Issue Tracking System
- A repository used by software companies to
- rganize software maintenance and evolution.
- Team members submit and discuss issues including
bugs and feature requests, ask for advice or share
- pinions
- It might reveal how committers feel towards a
bug, feature, project or even their colleagues.
- Each issue is characterized by several attributes
like: priority, status, type(improvement, perfective maintenance, new feature, corrective maintenance, adaptive maintenance)
Experimental Setup
- Goal: Understand the kinds of emotions
found in issue reports
- Four authors rated issue reports from
- pen source systems
- Analyzing the identified emotions and
rater’s agreement
Dataset
- Issue repository of the Apache software
foundation
- host of 117 open source projects rating
large long-lived to small representative data
Dataset
- Issue reports since 19th of October 2000
till July 2013
- Developers’ comments + issue report
attributes
- No distinction between bugs, new features,
and enhancements
- Granularity: issue comment level
- Enough number of issue commits to obtain
95% confidence level.
Emotion Mining
- Each rater identified emotions associated
to each comment according to Parrott’s six emotions: love, joy, surprise, anger, sadness, fear
- Personal rate
- Based on common understanding of
Parrott’s framework
- No ground true: agreement is considered
as correct , agreement: majority vote
Examples
- I'm not so convinced that moving all the static
methods out is useful (Fear).
- How is a bunch of static methods on a utility class
easier than a bunch of static methods within the HtmlCalendarRenderer better? (Anger)
- The risk of introducing new bugs for no great
benefit (Fear).
- Previously almost all these helper methods were
private; this \textbf{patch} makes them all public [...]} (Neutral)
Measuring Agreement
- Degree of inter-rare agreement
- Cohen’s for two raters
- Fleiss’s k value for more than two raters
Question 1
- Can human raters, without any training,
agree on the presence of emotions in issue reports?
- Motivation: Emotion mining from software
development artifacts is not trivial, since they consist of unstructured data, they are relatively short, written in informal way.
Question 1: Approach
- 400 issue report comments were arbitrary
assigned to two of the raters.
- Each author selected the emotions that
were present in the comment
- Once all comments had been annotated,
the four files were collected and analyzed using Cohen’s K.
Question1: Result
- In 41% of the comments, the raters agreed
- n all 6 emotions whereas 85% of
comments do not contain any emotion
- Only for Love, the raters achieved more
than slight agreement, moderate value.
- 6.5% agreed on the presence of a particular
emotion, Love, 96.75+5 on the absence, Surprise.
Result
- While some emotions obtain higher
agreement than others, only one emotion
- btained moderate agreement, and raters
agree the most on the absence of an emotion.
Question 2
- Dose training improve the agreement of
human raters on the presence of emotions in issue reports?
- Motivation: Without thorough training,
raters achieve only a slight agreement. This leads to the current question.
Question 2: Approach
- Each rater compiled a list of generic
expressions he or she felt insecure
- A general example and emotion added
- 144 expressions were obtained
- Meeting for discussion
- Replication and refinement study
performed
Question 2: Replication and Refinement Study
- Replicated our study of RQ1 on a second
sample.
- Refinement study revisited 235 comments
- f RQ1 with at least one emotion
disagreement, all four authors decide about
- ccurrence of emotion.
- Why refinement was done?
Question 2: Results
- 65% of comments, the raters agreed on all
6 emotions
- Four out of six emotions improve from
slight to fair agreement. Joy, Anger, Sadness and Fear
- 4.17% agreed on the presence of an
emotion, Love
- 72.76 obtained agreement by at least 3
raters.
Result
- Training improves the overall agreement on
emotions, as well as for most of the individual emotions. Love, joy and sadness are the most common emotions.
Question 3
- Dose context improve the agreement of
human raters on the presence of emotions in issue reports?
- Motivation: previous experiments can be
compared to eavesdropping on a group, and catching just one phrase.
- Due to technical and unstructured nature
- f software development artifacts, the
impact of context might be different than in literary English.
Question3: example
- Sentence: “yeah right”
- “moving to java 8 we solve all problems”
- “breaking backward compatibility is
risky”
Question3: Approach
- Experiment with two steps:
- Replication of study RQ2: 384
comments, two raters
- Same analyze with the context of those
comments
Question3:Results
- Adding context reduces rater agreement
for love
- More raters change their mind for
comments with context
- Context seems to make raters doubt about
t h e i r r a t i n g , i n t r o d u c i n g m o r e disagreement.
Discussion
- A. Impact of Context:
- at first, our findings seem counter-intuitive.
- Using a simple yes/no decision as rating is too
large as simplification. Instead, multiple rating.
- B. Do Emotions Really Matter for Issue
Reports:
- Our finding suggests there is link between
emotions and software development. Reports with “love” emotion tend to have a lower number of comments and fixing time.
Threats to Validity
- Internal validity: We rely on the presence
- f a casual relationship between a
developer’s emotions and what he or she writes in issue report comments.
- Construct validity: Ambiguity of messages
and subjectivity of emotions. To reduce:
- Parrott’s framework is adopted
- explanation and clarifying of framework
- each commit was analyzed by at least two
authors
Threats to Validity
- External validity: Replication of this work
- n other open source systems and on
commercial projects are needed to confirm
- ur findings.
- Reliability Validity: No ground truth exist to
compare our findings. Different groups of raters overall will obtain the same results as well.
Conclusion
- Software development, as collaborative activity of
developers, is influenced by human emotions.
- Issue reports do express emotions towards design
choices, maintenance activity or colleagues.
- Love, joy and sadness are easier to agree on.
- Emotion mining can improve through training
- Some challenges like the impact of context need
to be studied more, on more data sources and systems.
34