semi supervised stance detection in tweets based on
play

SEMI-SUPERVISED STANCE DETECTION IN TWEETS BASED ON SENTIMENT RULES - PowerPoint PPT Presentation

1 SEMI-SUPERVISED STANCE DETECTION IN TWEETS BASED ON SENTIMENT RULES Marcelo Dias and Karin Becker Instituto de Informtica UFRGS Porto Alegre - Brazil marcelo.dias@inf.ufrgs.br and karin.becker@inf.ufrgs.br Introduction 2


  1. 1 SEMI-SUPERVISED STANCE DETECTION IN TWEETS BASED ON SENTIMENT RULES Marcelo Dias and Karin Becker Instituto de Informática – UFRGS – Porto Alegre - Brazil marcelo.dias@inf.ufrgs.br and karin.becker@inf.ufrgs.br

  2. Introduction 2  Opinion Analysis  Detect sentiment polarity (negative or positive)  T arget (often mentioned in the text)  Stance Detection  Detect Stance (against or favor)  T owards a given target (main target vs indirect targets)  In favor stance can be expressed through positive/negative sentiments (and vice-versa)

  3. Introduction 3  Related Work  Structured text or discussion threads (congress vote, on-line debate, ....)  wider textual context to interpret content  [Thomas et al. 2006] [Anand et al. 2011] [Somasundaran and Wiebe 2009]  T weets: short text and poorly written content  rely more on inferences from static/dynamic properties of the platform  [Rajadesingan and Liu 2014]  Less focus on properties extracted from textual contents only  Most works adopt supervised methods  Often address a binary problem (Favor/Against)

  4. Goal 4  Stance Detection based only on tweets textual content  Rule-based, Semi-supervised method  3 classes problem (Favor, Against and None)  Improvements on our early work  Third place in SemEval 2016 T ask 6-B (unsupervised, Trump T arget)  Evaluate generality using several distinct domains  SemEval 2016 T ask 6-A T argets (supervised)

  5. Process Overview 6

  6. Process Overview 7

  7. Process Overview: automatic labeling 8

  8. Key and T arget N-grams 9  Key n-grams: terms/phrases that denote a stance  T arget n-grams: identify a target directly or indirectly related to main target  combined with polarity to denote a stance  May be Favor or Against Main target: Hillary Clinton N-GRAMS FAVOR AGAINST KEY ReadyForHillary, StopHillary, Hillary2016 MakeAmericaGreatAgain TARGET Hillary, Democrats T rump, Republicans

  9. Key and T arget N-grams Identifjcation 10

  10. Key and T arget N-grams Identifjcation 11  Input: domain corpus  Current selection  N-Gram frequency ranking  Manual selection of top frequent n-grams  Output: selected Key and T arget n-grams  Currently evaluating automatic n-grams selection methods

  11. Process Overview: Automatic Labeling 12

  12. Rules x Stance 13 FEATURES Presence of at least one Favor/Against Key N-grams Presence of at least one Favor/Against T arget N-grams Presence of at least one hashtag T weet Polarity

  13. Rules x Stance 14 FEATURES Presence of at least one Favor/Against Key N-grams Presence of at least one Favor/Against T arget N-grams Presence of at least one hashtag T weet Polarity

  14. Rules x Stance 15 FEATURES Presence of at least one Favor/Against Key N-grams Presence of at least one Favor/Against T arget N-grams Presence of at least one hashtag T weet Polarity

  15. Rules x Stance 16 FEATURES Presence of at least one Favor/Against Key N-grams Presence of at least one Favor/Against T arget N-grams Presence of at least one hashtag T weet Polarity

  16. Rules x Stance 17 FEATURES Presence of at least one Favor/Against Key N-grams Presence of at least one Favor/Against T arget N-grams Presence of at least one hashtag T weet Polarity

  17. Automatic Labeling 18  Input: selected n-grams and a dataset  T weet Pre-processing  features extraction  tweet polarity detection (combination of ofg-the- shelf APIs)  Rules Application  Output: Filtered labeled tweets and discarded tweets

  18. Predictive Model Generation 20

  19. Method Overview: Stance Detection 22

  20. Experiments 24  Goal:  Generality of the method for stance detection  6 datasets on various domains  Rules coverage  Rules precision  Stance prediction

  21. Datasets: SemEval 2016 – T ask 6 25  Stance: Against, Favor or None  Subtask A – Supervised  5 targets with 2 datasets each (training and test)  Atheism, Climate change is a real concern, Feminism, Hillary Clinton and Legalization of Abortion  Subtask B – Semi- supervised/Unsupervised  1 targets with 2 datasets each (domain and test) Fonte:  Donald Trump http://www.saifmohamma d.com/WebPages/StanceD ataset.htm

  22. Rules Coverage 26  Average corpus coverage: 75%  In general, Rules 2, 3, 4 and 7 were representative  13% to 17%  Rules 5 and 6 are representative only for Atheism  Rule 1 is representative only for Feminism

  23. Rules Precision 27 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% RULE 1 RULE 2 RULE 3 RULE 4 RULE 5 RULE 6 RULE 7

  24. Automatic Labeling x Predictive Model 28 Precision weighted Average 80 77 75 69 69 70 63 62 58 60 56 48 50 42 41 Automatic Labelling 40 35 Predictive Model 30 20 10 0 Abortion Atheism Climate Feminism Hillary Trump

  25. Results x Baseline 29 0.7 0.63 0.62 0.61 0.58 0.57 0.6 0.56 0.54 0.54 0.51 0.48 0.48 0.5 0.42 0.4 0.3 OUR RESUL T 0.2 SEMEVAL WINNER 0.1 0 Except for Trump, all the baselines were developed using a supervised method

  26. Strengths and Weakness 30  Strengths  Simplicity of the method  May be applied to difgerent domains/targets  Simplify the manual corpus annotation efgort  Restricted to n-grams  Weakness  Dependent on the appropriate selection of n-grams  Requires domain knowledge  Some rules do not perform well  Performance depends on the prevalence of the class

  27. Future Work 31  Key and target N-grams automatic identifjcation  Revised set of rules  Neutral stance identifjcation improvement  Improvement of supervised-learning predictive models  Predictive model features  Automatic extraction of training instances from authority twitter profjles  Classifjcation algorithms or committees

Recommend


More recommend