A Unified Graph Model for Sentence-based Opinion Retrieval Binyang Li, Lanjun Zhou, Shi Feng, Kam-Fai Wong Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong {byli, ljzhou, sfeng, kfwong}@se.cuhk.edu.hk tion, opinion extraction, opinion question ans- Abstract wering, and opinion summarization, etc. are re- ceiving growing attention (Wilson, et al., 2005; There is a growing research interest in opinion Liu et al., 2005; Oard et al., 2006). However, retrieval as on-line users’ opinions are becom- most existing works concentrate on analyzing ing more and more popular in business, social opinions expressed in the documents, and none networks, etc. Practically speaking, the goal of on how to represent the information needs re- opinion retrieval is to retrieve documents, quired to retrieve opinionated documents. In this which entail opinions or comments, relevant to paper, we focus on opinion retrieval, whose goal a target subject specified by the user’s query. A is to find a set of documents containing not only fundamental challenge in opinion retrieval is the query keyword(s) but also the relevant opi- information representation. Existing research nions. This requirement brings about the chal- focuses on document-based approaches and lenge on how to represent information needs for documents are represented by bag-of-word . However, due to loss of contextual information, effective opinion retrieval. this representation fails to capture the associa- In order to solve the above problem, previous tive information between an opinion and its work adopts a 2-stage approach. In the first stage, corresponding target. It cannot distinguish dif- relevant documents are determined and ranked ferent degrees of a sentiment word when asso- by a score, i.e. tf - idf value. In the second stage, ciated with different targets. This in turn se- an opinion score is generated for each relevant riously affects opinion retrieval performance. document (Macdonald and Ounis, 2007; Oard et In this paper, we propose a sentence-based ap- al., 2006). The opinion score can be acquired by proach based on a new information representa- either machine learning-based sentiment classifi- tion, namely topic-sentiment word pair, to cap- ture intra-sentence contextual information be- ers, such as SVM (Zhang and Yu, 2007), or a tween an opinion and its target. Additionally, sentiment lexicons with weighted scores from we consider inter-sentence information to cap- training documents (Amati et al., 2007; Hannah ture the relationships among the opinions on et al., 2007; Na et al., 2009). Finally, an overall the same topic. Finally, the two types of infor- score combining the two is computed by using a mation are combined in a unified graph-based score function, e.g. linear combination, to re-rank model, which can effectively rank the docu- the retrieved documents. ments. Compared with existing approaches, Retrieval in the 2-stage approach is based on experimental results on the COAE08 dataset document and document is represented by showed that our graph-based model achieved bag-of-word . This representation, however, can significant improvement. only ensure that there is at least one opinion in 1 Introduction each relevant document, but it cannot determine the relevance pairing of individual opinion to its In recent years, there is a growing interest in target. In general, by simply representing a sharing personal opinions on the Web, such as document in bag-of-word , contextual informa- product reviews, economic analysis, political tion i.e. the corresponding target of an opinion, is polls, etc. These opinions cannot only help inde- neglected. This may result in possible mismatch pendent users make decisions, but also obtain between an opinion and a target and in turn af- valuable feedbacks (Pang et al., 2008). Opinion fects opinion retrieval performance. By the same oriented research, including sentiment classifica- token, the effect to documents consisting of mul- 1367 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics , pages 1367–1375, Uppsala, Sweden, 11-16 July 2010. c � 2010 Association for Computational Linguistics
tiple topics, which is common in blogs and election). The topic of the document is not re- on-line reviews, is also significant. In this setting, quired to be the same as the target, but an opi- even if a document is regarded opinionated, it nion about the target has to be presented in the cannot ensure that all opinions in the document document or one of the comments to the docu- are indeed relevant to the target concerned. ment (Macdonald and Ounis, 2006). Therefore, Therefore, we argue that existing information in this paper we regard the information needs for representation i.e. bag-of-word , cannot satisfy opinion retrieval as relevant opinion . the information needs for opinion retrieval. 2.2 Motivation of Our Approach In this paper, we propose to handle opinion re- trieval in the granularity of sentence. It is ob- In traditional information retrieval (IR) served that a complete opinion is always ex- bag-of-word representation is the most common pressed in one sentence, and the relevant target way to express information needs. However, in of the opinion is mostly the one found in it. opinion retrieval, information need target at re- Therefore, it is crucial to maintain the associative levant opinion , and this renders bag-of-word re- information between an opinion and its target presentation ineffective. within a sentence. We define the notion of a top- Consider the example in Figure 1. There are ic-sentiment word pair, which is composed of a three sentences A , B , and C in a document d i . topic term (i.e. the target) and a sentiment word Now given an opinion-oriented query Q related (i.e. opinion) of a sentence. Word pairs can to ‘ Avatar’ . According to the conventional maintain intra-sentence contextual information to 2-stage opinion retrieval approach, d i is express the potential relevant opinions. In addi- represented by a bag-of-word . Among the words, tion, inter-sentence contextual information is also there is a topic term Avatar ( t 1 ) occurring twice, captured by word pairs to represent the relation- i.e. Avatar in A and Avatar in C , and two senti- ship among opinions on the same topic. In prac- ment words comfortable ( o 1 ) and favorite ( o 2 ) tice, the inter-sentence information reflects the (refer to Figure 2 (a)). In order to rank this doc- degree of a word pair. Finally, we combine both ument, an overall score of the document d i is intra-sentence and inter-sentence contextual in- computed by a simple combination of the rele- formation to construct a unified undirected graph vant score ( ����� ��� ) and the opinion score to achieve effective opinion retrieval. ( ����� �� ), e.g. equal weighted linear combination, The rest of the paper is organized as follows. as follows. In Section 2, we describe the motivation of our ����� ��� � ����� ��� � ����� �� approach. Section 3 presents a novel unified For simplicity, we let ����� ��� � �� � , and � � ��� graph-based model for opinion retrieval. We ����� �� be computed by using lexicon-based evaluated our model and the results are presented method: ����� �� � ���� ℎ � ����������� � ���� ℎ � �������� . in Section 4. We review related works on opi- nion retrieval in Section 5. Finally, in Section 6, A. 阿凡达明日将在中国上映。 the paper is concluded and future work is sug- Tomorrow, Avatar will be shown in China. B. 我预订到了 IMAX 影院中最舒服的位子。 gested. I’ve reserved a comfortable seat in IMAX. 2 Motivation C. 阿凡达是我最喜欢的一部 3D 电影。 Avatar is my favorite 3D movie. In this section, we start from briefly describing Figure 1: A retrieved document d i on the target the objective of opinion retrieval. We then illu- ‘ Avatar’ . strate the limitations of current opinion retrieval Although bag-of-word representation achieves approaches, and analyze the motivation of our good performance in retrieving relevant docu- method. ments, our study shows that it cannot satisfy the 2.1 Formal Description of Problem information needs for retrieval of relevant opi- nion . It suffers from the following limitations: Opinion retrieval was first presented in the (1) It cannot maintain contextual information; TREC 2006 Blog track, and the objective is to thus, an opinion may not be related to the target retrieve documents that express an opinion about of the retrieved document is neglected. In this a given target. The opinion target can be a “tradi- example, only the opinion favorite ( o 2 ) on Avatar tional” named entity (e.g. a name of person, lo- in C is the relevant opinion . But due to loss of cation, or organization, etc.), a concept (e.g. a contextual information between the opinion and type of technology), or an event (e.g. presidential its corresponding target, Avatar in A and com- 1368
Recommend
More recommend