Text Mining Paper Presentation: Determining the Sentiment of - PowerPoint PPT Presentation

Text Mining Paper Presentation: Determining the Sentiment of Opinions Soo-Min Kim Eduard Hovy Presenters: Karthik Chinnathambi (kc4bf) Prashant Bhanu Gorthi (pg3bh) Sofia Francis Xavier (sf4uh)

Background Opinion: [Topic, Holder, Claim, Sentiment] ● Topic : Theme of the texts ● Holder : Person and Organization ● Claim : Comment about the topic ● Sentiment : Positive, Negative and Neutral ●

Problem Addressed Given a Topic and a set of texts about the topic, find the Sentiments expressed about the Topic in each text, and identify the people who hold each sentiment.

Algorithm Given a topic and a set of texts, the system operates in four steps. Select sentences containing both the topic phrase and holder candidates ● Delimit the regions of opinion based on holder ● Calculate the polarity of all sentiment-bearing words individually using the sentence sentiment ● classifier Combine all the polarities to produce the holder’s sentiment for the whole sentence ●

Architecture

Sentiment Classifiers Word Sentiment Classifier ● Sentence Sentiment Classifier ●

Construction of sentiment seed list Sentiment-bearing words: adjective, verb and noun ● seed lists : randomly selected verbs (23 positive and 21 negative) and adjectives (15 ● positive and 19 negative), adding nouns later. For each seed word, extract from WordNet its expansions and add them back into the ● appropriate seed lists. finally 5880 positive adjectives, 6233 negative adjectives, 2840 positive verbs, and 3239 ● negative verbs. Challenge: Some words are both positive and negative!

Resolving sentiment ambiguous words Given a new word use WordNet to obtain a synonym set of the unseen word argmaxP ( c | w ) ≅ argmaxP ( c | syn1, syn2,....synn) c is a sentiment category (positive or negative) ● W is the unseen word ● synn are the WordNet synonyms of w. ●

Word Sentiment Classifier Model 1 : fk: kth feature of sentiment class c and a member of the synonym set of w count(fk,synset(w)): total number of occurrences of fk in the synonym set of w. Model 2 : P(w|c) : Probability of word w given a sentiment class c.

Sample Output of the Word Sentiment Classifier

Sentence Sentiment Classifier Holder Identification ● Sentiment Region ● Sentence Sentiment Classification Models ●

Holder Identification BBN’s named entity tagger IdentiFinder to identify potential holders of an opinion ● Consider PERSON and ORGANIZATION as the possible opinion holders ● For sentences with more than one Holder, chose the one closest to the Topic ●

Sentiment Region

Sentence Sentiment Classification Models

Experiments Two sets of experiments to examine the performance of: Different word level classifier models ● Different sentence level classifier models ● Classification task defined as assigning each word / sentence as: Positive ● Negative ● Neutral or N/A ●

Experiments: Word Classification Training Data Basic English word list for TOEFL test ● Intersected with a list of 19748 adjectives and 8011 verbs ● Methodology Randomly select 462 adjectives and 502 verbs ● 3 Humans (in pairs) classify the list of randomly selected words ● Baseline for evaluating models proposed in the paper ○ Test the word level classification using 2 models: ● Model that randomly assigns a sentiment category to each word (averaged over 10 iterations) ○ Model 1 proposed in slide 9 - statistical model that takes into account both polarity and strength of the ○ sentiment

Experiments: Word Classification Testing the models Model trained with initial seed list of 23 +ve and 21 -ve verbs, 15 +ve and 19 -ve adjectives ● Tested the effect of increasing the size of the seed list of 251 verbs and 231 adjectives ● Evaluation Agreement measure ● Strict agreement - Agree over all 3 categories ○ Lenient agreement - Merge positive and neutral into one category. Differentiate words with negative ○ sentiment.

Experiments: Word Classification Results Model 1 achieved lower agreement than humans, but better performance than random process. ● Algorithm able to classify 93.07% of verbs and 83.27% of adjectives as either +ve or -ve sentiment. ● Increasing the seed list improved the agreement between human and machine classification. ●

Experiments: Sentence Classification Training Data 100 sentences selected from DUC 2001 corpus ● Topics include “illegal alien”, “term limits”, “gun control” and “NAFTA” ● Two humans annotated the sentences with overall sentiment ● Testing and Evaluation Experimented with combinations of: ● 3 models for sentence classification ○ 4 different window definitions ○ 4 variations of word level classifiers ○ Tested the models using manually annotated and automatic identification of sentiment holder ● Evaluation metric - Classification accuracy ●

Experiments: Sentence Classification Observations Correctness defined as matching both the holder and sentiment ● Best model performance: ● 81% accuracy with manually annotated holder ○ 67% accuracy with automatic holder identification ○

Experiment Results Best performance achieved using: Model 0 (sentence level) - considering only sentiment polarity ● Manually annotated topic and holder ● Variation 4 for sentiment window (words starting from topic/holder to end of the sentence) ● Effect of sentiment categories: Presence of negative words more important than sentiment strength ● Neutral sentiment words classified as non-opinion bearing words in most cases ●

Experiment Results

Problem with the methodology Some words have both strong positive and negative sentiment - ambiguity ● Unigram model is not sufficient ● E.g., ‘Term Limits really hit at democracy’, says Prof. Fenno ○ Holder in the sentence can express multiple opinions ● Models cannot infer sentiment from facts ● E.g., She thinks term limits will give women more opportunities in politics ○ Detecting the holder of the sentiment is challenging when multiple holders are detected in the ● sentence

Conclusion Future work identified by the authors Sentences with weak-opinion-bearing words ● Sentences with multiple opinions about a topic ● Improved sentence parsers to reliably detect holder for a sentiment region ● Explore other learning techniques such as SVM, Decision Lists ●

Q & A?

Thank you!

Text Mining Paper Presentation: Determining the Sentiment of - PowerPoint PPT Presentation

Text Mining Paper Presentation: Determining the Sentiment of Opinions Soo-Min Kim Eduard Hovy Presenters: Karthik Chinnathambi (kc4bf) Prashant Bhanu Gorthi (pg3bh) Sofia Francis Xavier (sf4uh) Background Opinion: [Topic, Holder, Claim,

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

Twitter Sentiment Analysis Twitter Sentiment Analysis Presented by: Loitongbam Gyanendro Singh

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Mining Sentiment Mining Sentiment Classification from Classification from Political Web Logs

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

Text Mining Text Mining Web pages Emails Technical documents Corporate documents

Data Mining 2020 Text Classification Naive Bayes Ad Feelders Universiteit Utrecht Ad Feelders

Sentiment analysis Christopher Potts CS 244U: Natural language understanding May 19 1 / 83

Linguistic Expressions of Sentiment, Subjectivity & Stance Ling575 Sentiment April 1, 2014

Pl u tchik ' s w heel of emotion , polarit y v s . sentiment SE N TIME N T AN ALYSIS IN R Ted K

Post-Conference Presentation Sunday Oladayo Oladejo Table of Content A Introduction B

Data Mining in Bioinformatics Day 4: Text Mining Karsten Borgwardt February 25 to March 10

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Enhancing ICANN Text Accountability 26 June 2014 Text #ICANN50 Text #ICANN50 Text #ICANN50

Add Your Title Here Replace your text here! Replace your text here! Insert your title here 1

Text Text #ICANN51 15 October 2014 Text Text IDN Root Zone LGR Sarmad Hussain IDN Program

INVESTOR PRESENTATION // March 2013 Forward looking statements This presentation may contain

Test Automation Jon Schewe - jschewe@bbn.com BBN Technologies January 11, 2010 Jon Schewe -

Advisory Board July 25, 2019 Advisory Board Presentation 1. Role of the Advisory Board 2.

Vainshtein mechanism in a cosmological background in the most general second-order scalar-tensor

Possible solution to the Li-7 problem by the long lived stau Masato Yamanaka ( Saitama University

12 October 2016 ASX Market Announcements Office Australian Securities Exchange Baby Bunting

Daniela Kirilova Institute of Astronomy and NAO Bulgarian Academy of Sciences, Sofia, Bulgaria

February 21, 2019 F e b r u a r y 2 0 1 9 Important disclosure This presentation, prepared by

Text Mining Paper Presentation: Determining the Sentiment of - PowerPoint PPT Presentation

Text Mining Paper Presentation: Determining the Sentiment of Opinions Soo-Min Kim Eduard Hovy Presenters: Karthik Chinnathambi (kc4bf) Prashant Bhanu Gorthi (pg3bh) Sofia Francis Xavier (sf4uh) Background Opinion: [Topic, Holder, Claim,

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

Twitter Sentiment Analysis Twitter Sentiment Analysis Presented by: Loitongbam Gyanendro Singh

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Mining Sentiment Mining Sentiment Classification from Classification from Political Web Logs

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

Text Mining Text Mining Web pages Emails Technical documents Corporate documents

Data Mining 2020 Text Classification Naive Bayes Ad Feelders Universiteit Utrecht Ad Feelders

Sentiment analysis Christopher Potts CS 244U: Natural language understanding May 19 1 / 83

Linguistic Expressions of Sentiment, Subjectivity &amp; Stance Ling575 Sentiment April 1, 2014

Pl u tchik ' s w heel of emotion , polarit y v s . sentiment SE N TIME N T AN ALYSIS IN R Ted K

Post-Conference Presentation Sunday Oladayo Oladejo Table of Content A Introduction B

Data Mining in Bioinformatics Day 4: Text Mining Karsten Borgwardt February 25 to March 10

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Enhancing ICANN Text Accountability 26 June 2014 Text #ICANN50 Text #ICANN50 Text #ICANN50

Add Your Title Here Replace your text here! Replace your text here! Insert your title here 1

Text Text #ICANN51 15 October 2014 Text Text IDN Root Zone LGR Sarmad Hussain IDN Program

INVESTOR PRESENTATION // March 2013 Forward looking statements This presentation may contain

Test Automation Jon Schewe - jschewe@bbn.com BBN Technologies January 11, 2010 Jon Schewe -

Advisory Board July 25, 2019 Advisory Board Presentation 1. Role of the Advisory Board 2.

Vainshtein mechanism in a cosmological background in the most general second-order scalar-tensor

Possible solution to the Li-7 problem by the long lived stau Masato Yamanaka ( Saitama University

12 October 2016 ASX Market Announcements Office Australian Securities Exchange Baby Bunting

Daniela Kirilova Institute of Astronomy and NAO Bulgarian Academy of Sciences, Sofia, Bulgaria

February 21, 2019 F e b r u a r y 2 0 1 9 Important disclosure This presentation, prepared by

Linguistic Expressions of Sentiment, Subjectivity & Stance Ling575 Sentiment April 1, 2014