exploiting new sentiment based meta level features for

Exploiting New Sentiment-Based Meta-level Features for Effective - PDF document

Exploiting New Sentiment-Based Meta-level Features for Effective Sentiment Analysis Srgio Canuto Marcos Andr Gonalves Fabrcio Benevenuto Federal University of Minas Federal University of Minas Federal University of Minas Gerais

  1. Exploiting New Sentiment-Based Meta-level Features for Effective Sentiment Analysis Sérgio Canuto Marcos André Gonçalves Fabrício Benevenuto Federal University of Minas Federal University of Minas Federal University of Minas Gerais Gerais Gerais Computer Science Computer Science Computer Science Department Department Department Belo Horizonte, MG, Brazil Belo Horizonte, MG, Brazil Belo Horizonte, MG, Brazil sergiodaniel@dcc.ufmg.br mgoncalv@dcc.ufmg.br fabricio@dcc.ufmg.br ABSTRACT Keywords In this paper we address the problem of automatically learn- meta features, sentiment analysis ing to classify the sentiment of short messages/reviews by exploiting information derived from meta-level features i.e., 1. INTRODUCTION features derived primarily from the original bag-of-words The popularity of online forums, reviews and social net- representation. We propose new meta-level features espe- works has led numerous people to share their opinions on a cially designed for the sentiment analysis of short messages wide range of subjects, including products, events, news and such as: (i) information derived from the sentiment distri- even daily experiences. Dealing with this massive amount bution among the k nearest neighbors of a given short test of data, generated everyday on online platforms, can bring document x , (ii) the distribution of distances of x to their a number of new opportunities to businesses and markets. neighbors and (iii) the document polarity of these neighbors In particular, the sentiment analysis of such unstructured given by unsupervised lexical-based methods. Our approach data can reveal how people feel about a particular product is also capable of exploiting information from the neighbor- or service. hood of document x regarding (highly noisy) data obtained In this work, we focus on a supervised learning paradigm from 1.6 million Twitter messages with emoticons. The set to deal with sentiment classification of short messages/(mi- of proposed features is capable of transforming the original cro-)reviews, since it is one of the most effective and adapt- feature space into a new one, potentially smaller and more able approaches for this task [18]. Given a set of train- informed. Experiments performed with a substantial num- ing messages classified into one or more predefined senti- ber of datasets (nineteen) demonstrate that the effectiveness ments/polarities, the task is to automatically learn how to of the proposed sentiment-based meta-level features is not classify new (unclassified) messages, using a combination of only superior to the traditional bag-of-word representation features of these messages that associate them with prede- (by up to 16%) but is also superior in most cases to state-of- fined sentiments or polarities. In particular, we focus on the art meta-level features previously proposed in the literature supervised (binary) task of discriminating between positive for text classification tasks that do not take into account and negative polarities of the messages. The reasons for some idiosyncrasies of sentiment analysis. Our proposal is this are threefold: (i) in several domains (e.g., reviews and also largely superior to the best lexicon-based methods as micro-reviews), the basic motivation for people to write such well as to supervised combinations of them. In fact, the messages is to provide positive or negative feedback on prod- proposed approach is the only one to produce the best re- ucts, experiences and services that can be helpful to others; sults in all tested datasets in all scenarios. (ii) even in other domains in which “neutral” opinions can occur more frequently, many applications are interested in knowing only the most “polarized” opinions about certain CCS Concepts topics (e..g., politicians, events, etc); and finally (iii), even if identifying neutral positions is important, some works (e.g., • Information systems → Document representation; [4, 25, 33] have advocated doing this in a prior step (aka, • Computing methodologies → Machine learning ap- subjectivity extraction) before determining the polarity of proaches; the message, which is our focus here. A recent trend that has emerged in supervised approaches for text classification, that works in the data engineering level instead of in the algorithmic level, is the introduction of meta-level features 1 that can replace or work in conjunction � 2016 Association for Computing Machinery. ACM acknowledges that this contri- c with the the original set of (bag-of-words-based) features [7, bution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to 6, 21, 22, 27]. Such meta-level features, which are usually publish or reproduce this article, or to allow others to do so, for Government purposes manually designed and extracted from other features, cap- only. WSDM’16, February 22–25, 2016, San Francisco, CA, USA. 1 In this paper, we will use the terms “meta-level features” � 2016 ACM. ISBN 978-1-4503-3716-8/16/02 ...$15.00. c and “meta-features” interchangeably. DOI: http://dx.doi.org/10.1145/2835776.2835821


More recommend