Modeling and Representing Negation in Data-driven Machine Learning-based Sentiment Analysis Robert Remus rremus@informatik.uni-leipzig.de Natural Language Processing Group Department of Computer Science University of Leipzig, Germany ESSEM-2013 — December 3rd, 2013 1 · 13
Negation Modeling — Introduction I � In sentiment analysis (SA), negation plays a special role [Wiegand et al., 2010]: (1) They are � comfortable to wear � + . (2) They are � not � comfortable to wear � + � − . 2 · 13
Negation Modeling — Introduction II � Negations . . . � are expressed via negation words/signals, e.g. � “don’t x ” � “no findings of x ” � “rules out x ” � . . . and via morphology, e.g. � “un- x ” � “ x -free” � “ x -less” � . . . � have a negation scope, i.e. the words that are negated, e.g. (1) They are not comfortable to wear. 3 · 13
Negation Modeling — Introduction III � In compositional semantic approaches to SA, negations are usually captured via some ad hoc rule(s), e.g. � “ Polarity(not [arg1]) = ¬ Polarity(arg1) ” [Choi & Cardie, 2008] � But what about (1) The stand doesn’t work. (2) The stand doesn’t work well. ? � How to model and represent negation in a data-driven machine learning-based approach to SA � . . . based solely on word n -grams and � . . . w/o lexical resources, such as SentiWordNet [Esuli & Sebastiani, 2006] ? 4 · 13
Negation Modeling — Implicitly � Implicit negation modeling via higher order word n -grams: � bigrams (“ *n’t return”) � trigrams (“ lack of padding”) � tetragrams (“ denied sending wrong size”) � . . . � So, we don’t need to incorporate extra knowledge of negation into our model, that’s convenient! � But what about long negation scopes (length ≥ 4 ) as in (1) The leather straps have never worn out or broken. ? � Long negation scopes are the rule, not the exception! ( > 70%) � Word n -grams ( n < 5 ) don’t capture such long negation scopes � Learning models using word n -grams ( n ≥ 3 ) is usually backed up by almost no findings in the training data 5 · 13
Negation Modeling — Explicitly I � Let’s incorporate some knowledge of negation into our model and model negation explicitly! � Vital: negation scope detection (NSD) (1) They don’t stand up to laundering very wellstand up to laundering very well, in that they shrink up quite a bit. e.g. via � NegEx 1 — regular expression-based = “baseline” � LingScope 2 — CRF-based = “state-of-the-art” 1 http://code.google.com/p/negex/ 2 http://sourceforge.net/projects/lingscope/ 6 · 13
Negation Modeling — Explicitly II � Once negation scopes are detected, negated and non-negated word n -grams need to be explicitly represented in feature space: � W = { w i } , i = 1 , . . . , d word n -grams � X = { 0 , 1 } d feature space of size d where for x j ∈ X � x j k = 1 denotes the presence of w k � x j k = 0 denotes the absence of w k � For each feature x j k : additional feature ˘ x j k � ˘ x j k = 1 encodes that w k appears negated � ˘ x j k = 0 encodes that w k appears non-negated � Result: augmented feature space ˘ X = { 0 , 1 } 2 d � In ˘ X we are now able to represent whether a word n -gram � w is present ( [1 , 0] ), � w is absent ( [0 , 0] ), � w is present and negated ( [0 , 1] ) or � w is present both negated and non-negated ( [1 , 1] ). 7 · 13
Negation Modeling — Explicitly III � Example: explicit negation modeling for word unigrams in (1) They don’t stand up to laundering very well, in that they shrink up quite a bit. � Na¨ ıve tokenization that splits at white spaces � Ignore punctuation characters � Vocabulary W uni = { “bit”, “don’t”, “down”, “laundering”, “quite”, “shrink”, “stand”, “up”, “very”, “well” } Scheme bit don’t down laundering quite shrink stand up/up very well w/ [1 , 0 1 , 0 0 , 0 0 , 1 1 , 0 1 , 0 0 , 1 1 , 1 0 , 1 0 , 1] w/o [1 1 0 1 1 1 1 1 1 1 ] Table : Stylized feature vectors of example (1). 8 · 13
Negation Modeling — Evaluation I � 3 SA subtasks: 1. In-domain document-level polarity classification on � 10 domains from [Blitzer et al., 2007]’s Multi-Domain Sentiment Dataset v2.0 2. Cross-domain document-level polarity classification on � 90 source domain–target domain pairs from the same data set 3. Sentence-level polarity classification on � [Pang & Lee, 2005]’s sentence polarity dataset v1.0 9 · 13
Negation Modeling — Evaluation II � Standard setup: � SVMs, linear kernel, fixed C = 2 . 0 � Implicit negation modeling/features: word { uni,bi,tri } -grams � Explicit negation modeling � word { uni,bi,tri } -grams � NSD: NegEx & LingScope � Evaluation measure: accuracy averaged over 10-fold cross validations � For cross-domain experiments: 3 domain adaptation methods � = lots & lots & lots of combinations . . . 3 3 Summarized evaluation results are to be found in the paper corresponding to this talk. Additionally, full evaluation results are available at http://asv.informatik.uni-leipzig.de/staff/Robert_Remus 10 · 13
Negation Modeling — Results “in a nutshell” � Explicitly modeling negation always yields statistically significant better results than modeling it only implicitly � Explicitly modeling negation not only of word unigrams, but of higher order word n -grams is beneficial � Discriminative data-driven word n -gram models + explicit negation modeling = competitive: outperforms several state-of-the-art models � LingScope performs better than NegEx 11 · 13
Negation Modeling — Future Work � Given appropriate scope detection methods, our approach is easily extensible to model � other valence shifters [Polanyi & Zaenen, 2006], e.g. intensifiers like “very” or “many” � hedges [Lakoff, 1973], e.g.“may” or “might”. � Accounting for negation scopes in the scope of other negations: (1) I � don’t care that they are � not really leather �� . 12 · 13
Thanks! Any questions or suggestions? 13 · 13
Appendix — Literature I Blitzer, J., Dredze, M., & Pereira, F. C. (2007). Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL) (pp. 440–447). Choi, Y. & Cardie, C. (2008). Learning with compositional semantics as structural inference for subsentential sentiment analysis. In Proceedings of the 13th Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 793–801).
Appendix — Literature II Esuli, A. & Sebastiani, F. (2006). SentiWordNet: A publicly available lexical resource for opinion mining. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC) (pp. 417–422). Lakoff, G. (1973). Hedging: A study in media criteria and the logic of fuzzy concepts. Journal of Philosophical Logic , 2, 458–508. Pang, B. & Lee, L. (2005). Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL) (pp. 115–124).
Appendix — Literature III Polanyi, L. & Zaenen, A. (2006). Contextual valence shifters. In J. G. Shanahan, Y. Qu, & J. Wiebe (editors), Computing Attitude and Affect in Text: Theory and Application , Ausgabe 20 of The Information Retrieval Series. Computing Attitude and Affect in Text: Theory and Applications (pp. 1–9). Dordrecht: Springer. Wiegand, M., Balahur, A., Roth, B., Klakow, D., & Montoyo, A. (2010). A survey on the role of negation in sentiment analysis. In Proceedings of the 2010 Workshop on Negation and Speculation in Natural Language Processing (NeSp-NLP) (pp. 60–68).
Recommend
More recommend