Analysing domain suitability of a sentiment lexicon by identifying - PowerPoint PPT Presentation

Analysing domain suitability of a sentiment lexicon by identifying distributionally bipolar words Lucie Flekova, Ubiquitous Knowledge Processing Lab (UKP, TU Darmstadt), Daniel Preotiuc-Pietro (University of Pennsylvania) and Eugen Ruppert (LangTech, TU Darmstadt) Lazy guy Lazy sunday 2015 | Computer Science Department | UKP Lab - Prof. Dr. Iryna Gurevych | Lucie Flekova 1

Word polarity lexicons § SemEval 2014, 2015 - vast majority of systems still based on sentiment lexica + supervised cl. 2015 | Computer Science Department | UKP Lab - Prof. Dr. Iryna Gurevych | 2

Word polarity lexicons § SemEval 2014, 2015 - vast majority of systems still based on sentiment lexica + supervised cl. § Cold § Dark § Limited § Wisdom § Sincere 2015 | Computer Science Department | UKP Lab - Prof. Dr. Iryna Gurevych | 3

Word polarity lexicons § SemEval 2014, 2015 - vast majority of systems still based on sentiment lexica + supervised cl. § Cold § Dark § Limited § Wisdom § Sincere 2015 | Computer Science Department | UKP Lab - Prof. Dr. Iryna Gurevych | 4

Word polarity lexicons § SemEval 2014, 2015 - vast majority of systems still based on sentiment lexica + supervised cl. § Cold :cold beer (+) or cold food (-), § Dark: dark chocolate (+) or dark soul (-). § Limited: Limited edition (+) or limited intellect (-) § Wisdom: wisdom tooth (-) or wisdom source (+) § Sincere: sincere condolences (-) or sincere love (+) § Lexicon ambiguities at a contextual level § Sense disambiguation does not help here 2015 | Computer Science Department | UKP Lab - Prof. Dr. Iryna Gurevych | 5

Assessing lexicon suitability for new platform How do you quantify if a lexicon you use does more harm than help to the data you use, and how should you adapt it? 2015 | Computer Science Department | UKP Lab - Prof. Dr. Iryna Gurevych | 6

Unigram polarity silver lexicon standard corpus Background in-domain corpus Create Add bigrams Remove too Evaluate bigram to unigram ambiguous performance thesaurus lexicon words and quality 2015 | Computer Science Department | UKP Lab - Prof. Dr. Iryna Gurevych | 7

Ingredient 1: Unigram polarity lexicon § We demonstrate our approach on two polarity lexicons Unigram consisting of single words: polarity silver lexicon standard corpus § the lexicon of Hu and Liu Background (Hu and Liu, 2004) in-domain corpus § the MPQA lexicon (Wilson et al., 2005). 2015 | Computer Science Department | UKP Lab - Prof. Dr. Iryna Gurevych | 8

Ingredient 2: Silver standard sentiment corpus § 1.6 million tweets from the Sentiment140 data set Unigram (Go et al., 2009) polarity silver lexicon standard corpus § collected by searching for Background positive and negative emoticons in-domain corpus 2015 | Computer Science Department | UKP Lab - Prof. Dr. Iryna Gurevych | 9

Ingredient 3: Twitter corpus (unlabeled data) § Twitter corpus of 1 % of all English tweets from the year Unigra m 2013 = 460 million tweets silver polarity standard lexicon corpus Background in-domain corpus 2015 | Computer Science Department | UKP Lab - Prof. Dr. Iryna Gurevych | 10

Creating Twitter Bigram Thesaurus § Using not PMI, but its adaptation Lexicographer’s Mutual Information (LMI) Distributional Sentiment: § Bigram LMI over a corpus of positive, resp. negative • LMI computed tweets separately on positive and negative tweets § For comparability of LMI_pos and LMI_neg, bigrams from Sentiment140 weighted by their relative frequency in POS and (Go et al., 2009, NEG data 1.6m tweets) 2015 | Computer Science Department | UKP Lab - Prof. Dr. Iryna Gurevych | 12

Creating Twitter Bigram Thesaurus Distributional Thesaurus: Distributional Sentiment Silver: • computed on 80 million • LMI computed separately English Tweets based on on positive and negative left and right neighbor tweets from Sentiment140 bigrams (Go et al., 2009, 1.6m tw.) § limited size of silver standard data = not the most reliable scores -> further boost of LMI by incorporating scores from a background corpus (LMIGLOB) LMI_neg_glob(word, context) = LMI_neg(word, context) x LMI_glob(word, context) LMI_pos_glob(word, context) = LMI_pos(word, context) x LMI_glob(word, context) § Emphasizes frequent & informative bigrams, even when their score in one polarity data set is low 2015 | Computer Science Department | UKP Lab - Prof. Dr. Iryna Gurevych | 13

Creating Twitter Bigram Thesaurus global LMI semantic orientation = LMI_pos_glob – LMI_neg_glob dark_past = -128.14, dark_chocolate=+1558.96, ... 2015 | Computer Science Department | UKP Lab - Prof. Dr. Iryna Gurevych | 14

Unigram polarity silver lexicon standard corpus Background in-domain corpus Add Create Remove too Evaluate bigrams to bigram ambiguous performance unigram thesaurus words and quality lexicon 2015 | Computer Science Department | UKP Lab - Prof. Dr. Iryna Gurevych | 15

Twitter Bigram Thesaurus: invert polar bigrams DARK: dark_past = -128.14, dark_chocolate=+1558.96, ... Negative word to positive bigram: Positive word to negative bigram: Hu&Liu MPQA Hu&Liu MPQA why limit vice versa good luck super duper sneak peek stress reliever wisdom tooth happy camper mission impossible calmed down oh well just puked lazy sunday deep breath gotta work heart breaker desperate housewives long awaited hot outside gold digger cold beer cloud computing feels better light bulbs guilty pleasure dark haired super tired sincere condolendes belated birthday bloody mary enough money frank iero https://www.ukp.tu-darmstadt.de/data/sentiment-analysis/inverted-polarity-bigrams/ 2015 | Computer Science Department | UKP Lab - Prof. Dr. Iryna Gurevych | 16

Twitter Bigram Thesaurus: observations Polarity shifting occurs in a broad range of situations, e.g.: § polar word as an intensity expression: § super tired § polar word in names: § desperate housewives, frank iero § multiword expressions, idioms and collocations § cloud computing, sincere condolences, light bulbs § polar nominal context § cold beer/person, dark chocolate/thoughts, stress reliever/management, guilty pleasure/feeling 2015 | Computer Science Department | UKP Lab - Prof. Dr. Iryna Gurevych | 17

Finding the most ambiguous unigrams Some words occur in many contexts with both original and switched polarity = harmful in either of the polarity sides = better to remove it Word ambiguity = (#positive contexts - #negative contexts) / #contexts Hu&Liu MPQA hot .022 just -.002 support .022 less .009 important -.023 sound -.011 super -.043 real .027 crazy -.045 little .032 right -.065 help -.037 proper -.093 back -.046 worked -.111 mean .090 top .113 down -.216 enough -.114 too -.239 2015 | Computer Science Department | UKP Lab - Prof. Dr. Iryna Gurevych | 19

Unigram polarity silver lexicon standard corpus Background in-domain corpus Add bigrams Remove too Evaluate Create bigram to unigram ambiguous performance thesaurus lexicon words and quality 2015 | Computer Science Department | UKP Lab - Prof. Dr. Iryna Gurevych | 20

Test corpus § Facebook posts rated for affect by two psychology experts on scale 1 – 9 (1 = strong negative, 9 = strong positive sentiment) § normal distribution of ratings § inter-annotator agreement: weighted Cohen’s κ = 0.61 on exact score § Neutral posts for our task removed, posts containing no lexicon word removed (20%) => left with: § 1,601 posts for MPQA § 1,526 posts for Hu & Liu. 2015 | Computer Science Department | UKP Lab - Prof. Dr. Iryna Gurevych | 21

Sentiment polarity prediction results Features Acc. HL Acc. MPQA Unigrams .7070 .6608 Baseline Uni+bigrams .7215 .6633 Add bigrams to Uni+bigramsPos .7123 .6621 unigram lexicon Uni+bigramsNeg .7163 .6621 Pruned .7228 .6627 Remove too Pruneg+bigrams .7333 .6646 ambiguous Pruned+bigramsPos .7150 .6633 words Pruned+BigramsNeg .7287 .6640 All in-domain bigrams .6907 .7008 2015 | Computer Science Department | UKP Lab - Prof. Dr. Iryna Gurevych | 22

Analysing domain suitability of a sentiment lexicon by identifying - PowerPoint PPT Presentation

Analysing domain suitability of a sentiment lexicon by identifying distributionally bipolar words Lucie Flekova, Ubiquitous Knowledge Processing Lab (UKP, TU Darmstadt), Daniel Preotiuc-Pietro (University of Pennsylvania) and Eugen Ruppert

Moving beyond the lexicon Moving beyond the lexicon An isolated lexicon? An isolated lexicon?

Ambiguity and the Lexicon in Natural Language Informatics 2A: Lecture 12 2 The Lexicon Word

Week 5 Kullmann Analysing BFS Depth-first search Depth-first search Analysing DFS

Ambiguity and the Lexicon in Natural Language 2 The Lexicon Informatics 2A: Lecture 12 Closed vs.

Twitter Sentiment Analysis Twitter Sentiment Analysis Presented by: Loitongbam Gyanendro Singh

February 19 2019 Habitat Suitability Modelling for Protected Corals Habitat Suitability Modelling

Blount Countys Blount Countys Suitability Modelling for Smart Suitability Modelling for

Ambiguity and the Lexicon in Natural Language Informatics 2A: Lecture 14 Mirella Lapata School

Lexicon Induction Melanie Bolla and Olga Whelan Ling 575 Lexicon Induction (and the problem it

Pronunciation Lexicon Background Outline Brief Introduction on Pronunciation Lexicon

Rule-Based Sentiment Analysis in Narrow Domain Detecting Sentiment in Daily Horoscopes Using

Sentiment analysis Christopher Potts CS 244U: Natural language understanding May 19 1 / 83

Linguistic Expressions of Sentiment, Subjectivity & Stance Ling575 Sentiment April 1, 2014

Pl u tchik ' s w heel of emotion , polarit y v s . sentiment SE N TIME N T AN ALYSIS IN R Ted K

Outline Introduction Background Progress on the implementation of the MTSF

Monitoring and analysing multilingual media reports Monitoring and analysing multilingual media

A Tale of Two Anomalies: from LHCb to ANITA Speaker: Yicong Sui In collaboration with: Wolfgang

1: Sentiment Classification Machine Learning and Real-world Data (MLRD) Ann Copestake (based on

A proof theoretical account of polarity items and monotonic inference. Raffaella Bernardi UiL

Managing Polarities in Pursuit Managing Polarities in Pursuit of Quality of Quality USE

Asynchronous Polar-Coded Modulation Authors: Jincheng Dai, Kai Niu, and Zhongwei Si Speaker:

Future CMB observations: Can we break CDM? Christian Reichardt University of Melbourne Outline

Hyperon Spectroscopy with _ PANDA Sep 13, 2016 | Albrecht Gillitzer, IKP

The Potential for Baryon Spectroscopy at PANDA and the Day-1 Setup Sep 14, 2016 |

Analysing domain suitability of a sentiment lexicon by identifying - PowerPoint PPT Presentation

Analysing domain suitability of a sentiment lexicon by identifying distributionally bipolar words Lucie Flekova, Ubiquitous Knowledge Processing Lab (UKP, TU Darmstadt), Daniel Preotiuc-Pietro (University of Pennsylvania) and Eugen Ruppert

Moving beyond the lexicon Moving beyond the lexicon An isolated lexicon? An isolated lexicon?

Ambiguity and the Lexicon in Natural Language Informatics 2A: Lecture 12 2 The Lexicon Word

Week 5 Kullmann Analysing BFS Depth-first search Depth-first search Analysing DFS

Ambiguity and the Lexicon in Natural Language 2 The Lexicon Informatics 2A: Lecture 12 Closed vs.

Twitter Sentiment Analysis Twitter Sentiment Analysis Presented by: Loitongbam Gyanendro Singh

February 19 2019 Habitat Suitability Modelling for Protected Corals Habitat Suitability Modelling

Blount Countys Blount Countys Suitability Modelling for Smart Suitability Modelling for

Ambiguity and the Lexicon in Natural Language Informatics 2A: Lecture 14 Mirella Lapata School

Lexicon Induction Melanie Bolla and Olga Whelan Ling 575 Lexicon Induction (and the problem it

Pronunciation Lexicon Background Outline Brief Introduction on Pronunciation Lexicon

Rule-Based Sentiment Analysis in Narrow Domain Detecting Sentiment in Daily Horoscopes Using

Sentiment analysis Christopher Potts CS 244U: Natural language understanding May 19 1 / 83

Linguistic Expressions of Sentiment, Subjectivity &amp; Stance Ling575 Sentiment April 1, 2014

Pl u tchik ' s w heel of emotion , polarit y v s . sentiment SE N TIME N T AN ALYSIS IN R Ted K

Outline Introduction Background Progress on the implementation of the MTSF

Monitoring and analysing multilingual media reports Monitoring and analysing multilingual media

A Tale of Two Anomalies: from LHCb to ANITA Speaker: Yicong Sui In collaboration with: Wolfgang

1: Sentiment Classification Machine Learning and Real-world Data (MLRD) Ann Copestake (based on

A proof theoretical account of polarity items and monotonic inference. Raffaella Bernardi UiL

Managing Polarities in Pursuit Managing Polarities in Pursuit of Quality of Quality USE

Asynchronous Polar-Coded Modulation Authors: Jincheng Dai, Kai Niu, and Zhongwei Si Speaker:

Future CMB observations: Can we break CDM? Christian Reichardt University of Melbourne Outline

Hyperon Spectroscopy with _ PANDA Sep 13, 2016 | Albrecht Gillitzer, IKP

The Potential for Baryon Spectroscopy at PANDA and the Day-1 Setup Sep 14, 2016 |

Linguistic Expressions of Sentiment, Subjectivity & Stance Ling575 Sentiment April 1, 2014