An Information Gain-Driven Feature Study for Aspect-Based Sentiment - PowerPoint PPT Presentation

An Information Gain-Driven Feature Study for Aspect-Based Sentiment Analysis Kim Schouten , Flavius Frasincar, and Rommert Dekker Erasmus University Rotterdam, the Netherlands

Many opinions… • Nowadays the Web is filled with opinion and sentiment • People freely share their thoughts on basically everything • Useful, but lot of noise • Need automatic methods to sift through this much data • Our scope is consumer reviews

Sentiment Analysis • Sentiment Analysis -> extract sentiment from text • Sentiment can be defined as polarity (positive/negative) • Or as something more complex (numeric scale or set of emotions) • Useful for consumers to know what other people think • Useful for producers to gauge public opinion w.r.t. their product

Aspect-Based Sentiment Analysis • Sentiment Analysis has a scope, for instance a document • More interesting however is the aspect level • An aspect is a characteristic or feature of a product or service being reviewed • This can range from general things like price and size of a product, to very specific aspects like wine selection for restaurants or battery life for laptops

Data snippet

Currently… • Mostly supervised machine learning algorithms • Focus on performance • Feature overload • But which features are actually useful?

Setup • NLP Pipeline to extract linguistic features • Compute Information Gain (IG) for each feature • Order features by descending IG • Run a linear SVM to classify sentiment for each aspect • Incrementally add features from ordered list and record performance • All of this with ten-fold cross-validation • 7 folds for training the SVM • 2 folds for determining parameters (aspect context, and the SVM C param) • 1 fold for testing

NLP Pipeline Spelling Correction Tokenization Part-of-Speech Lemmatization Sentence Splitting Tagging Word Sense JLanguageTool Syntactic Analysis Disambiguation Stanford CoreNLP Lesk implementation

In Information Gain • Each binary feature splits the data in two • How much easier is it to choose the correct class given this split?

In Information Gain • Compute entropy, or impurity, of data • Then Information Gain is the decrease in entropy after split

homes.cs.washington.edu/~shapiro/EE596/notes/ InfoGain .pdf

Features • Word-based features • Lemma • Negation present • Synset-based features • Synset “ok#JJ#1” • Related-synsets “Similar To big#JJ#1” • Grammar-based features • Lemma-grammar “keep -nsubj- we” • POS-grammar “VB -nsubj- PRP” • Synset-grammar “ok#JJ#1 -cop- be#VB#1” • Polarity-grammar “neutral -nsubj- neutral” • Aspect feature • Category (of aspect) “FOOD#QUALITY”

Data Sentiment Number of aspects % of aspects Positive 1652 66.1% Neutral 98 3.9% Negative 749 30% Total 2499 100% Type Number of aspects % of aspects Explicit 1879 75.2% Implicit 620 24.8% Total 2499 100%

Results – features ordered by descending IG IG

Results – average IG IG per feature type

Results – sentiment classification results

Overfitting with low IG IG scores

Results – average IG IG

Results – proportion of feature type

Results – top 3 features per type

Conclusions • Using Information Gain to select features: • We can use just 1% of the features at only a 2.9% penalty in accuracy • And with 1% of the features, training time of the SVM is reduced by 80% • Relatively unknown features such as related-synsets and polarity- grammar turned out to be effective for sentiment classification • In future work we hope to • Compare the grammar-based features with the traditional n-grams • Include more features, e.g., multiple sentiment lexicons • Investigate feature interaction • Incorporate a smarter aspect context instead of the simple word window

An Information Gain-Driven Feature Study for Aspect-Based Sentiment - PowerPoint PPT Presentation

An Information Gain-Driven Feature Study for Aspect-Based Sentiment Analysis Kim Schouten , Flavius Frasincar, and Rommert Dekker Erasmus University Rotterdam, the Netherlands Many opinions Nowadays the Web is filled with opinion and

GAIN FFI GAIN Premix Facility Rizwan Yusufali Senior Associate March 2010 The GAIN Vision,

Priority-Driven Scheduling of Periodic Tasks Priority-driven vs. clock-driven scheduling:

False fasting is driven by pride False fasting is driven by pride False fasting is

To gain an understanding of psychotropic medication and its purpose To gain an

Relative Gain Pattern ANITA HORN 0,2,3 THESE ARE UNCORRECTED NUMBERS! The shape is what is

1 Gain vs Vmesh 50000 y = 1.196305E-04e 3.300424E-02x y = 2.541469E-04e 2.812789E-02x Gain

IMMORAL GAIN 9 Woe to him who gets evil gain for his house To put his nest on high, To be

Gillian Smith September 13, 2012 gillian@ccs.neu.edu Graphics-Driven Game Design

Services Objectives For Participants To gain an understanding of behavioral services and what

Gain Share Projects Shelley Oylear Bicycle and Pedestrian Coordinator January 18, 2017 Gain

EVALUATING THIRD PARTY RELATIONSHIPS OVERVIEW BENEFITS Gain expertise Gain

Topic #28 Nyquist plots: Gain and phase margin Reference textbook : Control Systems, Dhanesh N.

Part III Part III Gain- -based synthesis based synthesis Gain enabler for correct- -by

Modeling Relevance Gain Evaluation, session 4 CS6200: Information Retrieval Expected Relevance

Transit-Driven Complete Streets Transit-Driven Complete Streets Questions: Type questions

Data-Driven Research Program Data-Driven Research Program Linked Longitudinal Retrospective

Memory Model COS 597C 10/5/2010 Example a = Flag = 0 Thread a = 26; Flag = 1; 2

Non-Intrusive Load Monitoring (NILM) Presentation to the EEAC May 25 th , 2016 What is NILM?

Evaluation of Cottonscope precision for module averaging Patrick Mileto and Stuart Gordon March 18

SPACE OPTICS T. Yokoyama 1 *, T. Zama 1 , S. Uehara 1 , S. Sakane 1 , T. Ozaki 2 1 Super Resin

ABB Q2 2013 results Joe Hogan, CEO Eric Elzvik, CFO Important notices This presentation

Xcel Energy Colorado DSM Roundtable Discussion November 14, 2012 9:00am to 11:00am 1800

Fractional Delay Equations in the Young Sense Jorge A. Len Departamento de Control Automtico

Simple Heuristic Approach for training of Type-2 NEO-Fuzzy Neural Network Yancho Todorov,