Review T opic Discovery with Phrases using the Po ́ lya Urn Model Geli Fei, Zhiyuan Chen, Bing Liu University of Illinois at Chicago Presenter: Alan Akbik IBM Research Almaden / Berlin Institute of Technology
Product Aspects Large collection of product reviews ◦ Example domain: Smartphones Task: Discover aspects that are being discussed in the reviews ◦ Battery - Battery life, AAA batteries „The battery life of this smartphone is great .“ „ It uses AAA batteries .“ ◦ Screen - Screen size, touch screen ◦ Camera - Resolution, image quality
T opic Models Widely used in review topic / aspect discovery Most models regard each topic as a distribution over individual terms (unigrams) Terms in each document are assigned to topics ◦ Documents assigned to topics via terms The generation of topics is mostly governed by “higher order co- occurrence” ( Heinrich 2009) ◦ i.e., how often words co-occur in different contexts
T opic Models Major issue: individual words may not convey the same information as natural phrases ◦ e.g. “battery life” vs. “life” Leading to three problems: ◦ Interpretability - T opics are hard for users to interpret unless they are domain experts ◦ Ambiguity - Hard to directly make use of the topical words ◦ False evidence - Causes extra or wrong co-occurrences in topic generation, leading to poorer topics
Possible Solutions (1) Treat each whole phrase as one term “The battery life of this smartphone is great” <the> < battery_life > <of> <this> <smartphone> <is> <great> Problems : ◦ Many phrases very rare ◦ Remove important words “battery life” may not be in the same topic as “battery”, because we don’t observe co -occurence
Possible Solutions (2) Keep individual words, add extra terms for phrases “The battery life of this tablet is great” <the> < battery > < life > < battery_life > <of> <this> <smartphone> <is> <great> Problems: ◦ False evidence still exists ◦ Many phrases rare “battery life” is much less frequent than “life” to be ranked on the top in a topic
Challenge How to retain connections between phrases and words while removing wrong co- occurrences?
Related Work Using n-grams in topic modeling (Mukherjee and Liu 2013; Mukherjee et al. 2013). Identifying key phrases in the post-processing step based on the discovered topical unigrams (Blei and Lafferty 2009; Liu et al. 2010; Zhao et al. 2011). Directly modeling word order in topic model (Wallach 2006; Wang et al. 2007). ◦ Breaking the “bag -of- word” assumption ◦ Although ” bag-of- word” assumption does not always hold, it offers a great computational advantage ◦ Our method still follows the ” bag-of- word” assumption
Gibbs Sampling for LDA One of the most commonly used inference techniques for topic models. Considers each term in the documents in turn Samples a topic to the current term, conditioned on the topic assignments of other terms.
Simple Po ́ lya Urn Model (SPU) Designed in the context of colored balls and urns In the context of topic models: ◦ A ball with a certain color: a term ◦ The urn: contains a mixture of balls with various colors (terms) Topic-word (topic-term) distribution is reflected by the proportion of balls with a certain color in the urn
Simple Po ́ lya Urn Model (SPU) Left: initial state Middle: draw a ball of a certain color Right: put two balls of the same color back Self-reinforcing property known as “the rich get richer”
Generalized Po ́ lya Urn Model (GPU) GPU vs. SPU: apart from two balls with the same color being put back, a certain number of balls with some other colors are also put in the urn. We call this the promotion of these colored balls Using the idea in the sampling process : ◦ SPU: seeing “staff” under a topic only increases the chance of seeing it again under the same topic ◦ GPU: also increases the chance of seeing “hotel staff” under the topic
Generalized Po ́ lya Urn Model (GPU) In our application: ◦ We use each whole phrase as a term to remove wrong co-occurrences ◦ And use GPU to regain the connection between phrases and words Two directions of promotion: ◦ Word to phrase: when a topic is assigned to an individual word, phrases containing the word are promoted ◦ Phrase to word: when a topic is assigned to a phrase, each component word is promoted
Datasets and Preprocessing Data sets: ◦ 30 categories of electronics reviews from Amazon (1,000 reviews in each category) ◦ Hotel reviews from TripAdvisor (101,234 reviews) ◦ Restaurant reviews from Yelp (25,459 reviews) Preprocessing: ◦ Review sentences as documents Standard topic models cannot discover product aspects well when directly applied to reviews (Titov and McDonald, 2008) ◦ Rule-based method for noun phrase detection Use rule-based method for efficiency
Experiments Four sets of experiments on 32 domains ◦ Baseline #1, LDA(w) : without considering phrases ◦ Baseline #2, LDA(p) : considers phrases, uses each whole phrase as a term ◦ Baseline #3, LDA(w_p) : considers phrases, keeps individual component words, and adds phrases as extra terms ◦ LDA(p_GPU) : Our proposed method
Parameter Setting Use the same set of parameters for all experiments ◦ Set Dirichlet priors as in (Griffiths and Steyvers, 2004) Set document-topic prior 𝛽 =50/ 𝐿 , where 𝐿 is the number of topics. Set topic-term prior 𝛾 =0.1 ◦ Set number of topics 𝐿 =15 ◦ posterior inference was drawn after 2000 Gibbs sampling iterations with 400 iterations of burn-in
Parameters for GPU Model Not all words in a phrase are equally important ◦ e .g. “staff” is more important than “hotel” in “ hotel staff ” Determine head nouns ◦ Following (Wang et al., 2007), we assume the last word in a noun phrase as the head noun GPU promotion ◦ Word to phrase: promote a phrase by virtualcount when a topic is assigned to its head noun ◦ Phrase to word: promote 0.5 * virtualcount to the head noun and 0.25 * virtualcount to all other words when a topic is assigned to a phrase ◦ Set virtualcount=0.1 empirically, based on how much to promote phrases
Statistical Evaluation Two commonly used evaluation statistics: ◦ Perplexity: measures the likelihood of unseen documents ◦ KL-divergence: measure the distinctiveness of topics ◦ Neither of them correlates well with human judgments We use topic coherence (Mimno et al. 2011) ◦ It measures the degree of co-occurrence of topical words under a topic ◦ Has been shown to correlate with human judgment quite well ◦ Generates a negative value, the higher the better
Statistical Evaluation Topic Coherence using top 15 topical terms
Statistical Evaluation Topic Coherence using top 30 topical terms
Human Evaluation Done by two annotators in two stages sequentially ◦ Topic labeling (Kappa score: 0.838) ◦ Topical terms labeling by computing precision@n (Kappa score: 0.846) ◦ We compute average p@15 and p@30 for each model on each domain
Human Evaluation Human evaluation on five domains ◦ Hotel, Restaurant, Watch, Tablet, MP3Player
Example T opics Example topics by LDA(w) and LDA(p_GPU)
Future Work Design a topic quality metrics for topics with phrases Systematically set the amount of promotion based on the designed metrics
Thank You!
Recommend
More recommend