  1. Review T opic Discovery with Phrases using the Po ́ lya Urn Model Geli Fei, Zhiyuan Chen, Bing Liu University of Illinois at Chicago Presenter: Alan Akbik IBM Research Almaden / Berlin Institute of Technology

  2. Product Aspects  Large collection of product reviews ◦ Example domain: Smartphones  Task: Discover aspects that are being discussed in the reviews ◦ Battery - Battery life, AAA batteries  „The battery life of this smartphone is great .“  „ It uses AAA batteries .“ ◦ Screen - Screen size, touch screen ◦ Camera - Resolution, image quality

  3. T opic Models  Widely used in review topic / aspect discovery  Most models regard each topic as a distribution over individual terms (unigrams)  Terms in each document are assigned to topics ◦ Documents assigned to topics via terms  The generation of topics is mostly governed by “higher order co- occurrence” ( Heinrich 2009) ◦ i.e., how often words co-occur in different contexts

  4. T opic Models  Major issue: individual words may not convey the same information as natural phrases ◦ e.g. “battery life” vs. “life”  Leading to three problems: ◦ Interpretability - T opics are hard for users to interpret unless they are domain experts ◦ Ambiguity - Hard to directly make use of the topical words ◦ False evidence - Causes extra or wrong co-occurrences in topic generation, leading to poorer topics

  5. Possible Solutions (1)  Treat each whole phrase as one term “The battery life of this smartphone is great” <the> < battery_life > <of> <this> <smartphone> <is> <great>  Problems : ◦ Many phrases very rare ◦ Remove important words  “battery life” may not be in the same topic as “battery”, because we don’t observe co -occurence

  6. Possible Solutions (2)  Keep individual words, add extra terms for phrases “The battery life of this tablet is great” <the> < battery > < life > < battery_life > <of> <this> <smartphone> <is> <great>  Problems: ◦ False evidence still exists ◦ Many phrases rare  “battery life” is much less frequent than “life” to be ranked on the top in a topic

  7. Challenge How to retain connections between phrases and words while removing wrong co- occurrences?

  8. Related Work  Using n-grams in topic modeling (Mukherjee and Liu 2013; Mukherjee et al. 2013).  Identifying key phrases in the post-processing step based on the discovered topical unigrams (Blei and Lafferty 2009; Liu et al. 2010; Zhao et al. 2011).  Directly modeling word order in topic model (Wallach 2006; Wang et al. 2007). ◦ Breaking the “bag -of- word” assumption ◦ Although ” bag-of- word” assumption does not always hold, it offers a great computational advantage ◦ Our method still follows the ” bag-of- word” assumption

  9. Gibbs Sampling for LDA  One of the most commonly used inference techniques for topic models.  Considers each term in the documents in turn  Samples a topic to the current term, conditioned on the topic assignments of other terms.

  10. Simple Po ́ lya Urn Model (SPU)  Designed in the context of colored balls and urns  In the context of topic models: ◦ A ball with a certain color: a term ◦ The urn: contains a mixture of balls with various colors (terms)  Topic-word (topic-term) distribution is reflected by the proportion of balls with a certain color in the urn

  11. Simple Po ́ lya Urn Model (SPU)  Left: initial state  Middle: draw a ball of a certain color  Right: put two balls of the same color back  Self-reinforcing property known as “the rich get richer”

  12. Generalized Po ́ lya Urn Model (GPU)  GPU vs. SPU: apart from two balls with the same color being put back, a certain number of balls with some other colors are also put in the urn.  We call this the promotion of these colored balls  Using the idea in the sampling process : ◦ SPU: seeing “staff” under a topic only increases the chance of seeing it again under the same topic ◦ GPU: also increases the chance of seeing “hotel staff” under the topic

  13. Generalized Po ́ lya Urn Model (GPU)  In our application: ◦ We use each whole phrase as a term to remove wrong co-occurrences ◦ And use GPU to regain the connection between phrases and words  Two directions of promotion: ◦ Word to phrase: when a topic is assigned to an individual word, phrases containing the word are promoted ◦ Phrase to word: when a topic is assigned to a phrase, each component word is promoted

  14. Datasets and Preprocessing  Data sets: ◦ 30 categories of electronics reviews from Amazon (1,000 reviews in each category) ◦ Hotel reviews from TripAdvisor (101,234 reviews) ◦ Restaurant reviews from Yelp (25,459 reviews)  Preprocessing: ◦ Review sentences as documents  Standard topic models cannot discover product aspects well when directly applied to reviews (Titov and McDonald, 2008) ◦ Rule-based method for noun phrase detection  Use rule-based method for efficiency

  15. Experiments  Four sets of experiments on 32 domains ◦ Baseline #1, LDA(w) : without considering phrases ◦ Baseline #2, LDA(p) : considers phrases, uses each whole phrase as a term ◦ Baseline #3, LDA(w_p) : considers phrases, keeps individual component words, and adds phrases as extra terms ◦ LDA(p_GPU) : Our proposed method

  16. Parameter Setting  Use the same set of parameters for all experiments ◦ Set Dirichlet priors as in (Griffiths and Steyvers, 2004)  Set document-topic prior 𝛽 =50/ 𝐿 , where 𝐿 is the number of topics.  Set topic-term prior 𝛾 =0.1 ◦ Set number of topics 𝐿 =15 ◦ posterior inference was drawn after 2000 Gibbs sampling iterations with 400 iterations of burn-in

  17. Parameters for GPU Model  Not all words in a phrase are equally important ◦ e .g. “staff” is more important than “hotel” in “ hotel staff ”  Determine head nouns ◦ Following (Wang et al., 2007), we assume the last word in a noun phrase as the head noun  GPU promotion ◦ Word to phrase: promote a phrase by virtualcount when a topic is assigned to its head noun ◦ Phrase to word: promote 0.5 * virtualcount to the head noun and 0.25 * virtualcount to all other words when a topic is assigned to a phrase ◦ Set virtualcount=0.1 empirically, based on how much to promote phrases

  18. Statistical Evaluation  Two commonly used evaluation statistics: ◦ Perplexity: measures the likelihood of unseen documents ◦ KL-divergence: measure the distinctiveness of topics ◦ Neither of them correlates well with human judgments  We use topic coherence (Mimno et al. 2011) ◦ It measures the degree of co-occurrence of topical words under a topic ◦ Has been shown to correlate with human judgment quite well ◦ Generates a negative value, the higher the better

  19. Statistical Evaluation  Topic Coherence using top 15 topical terms

  20. Statistical Evaluation  Topic Coherence using top 30 topical terms

  21. Human Evaluation  Done by two annotators in two stages sequentially ◦ Topic labeling (Kappa score: 0.838) ◦ Topical terms labeling by computing precision@n (Kappa score: 0.846) ◦ We compute average p@15 and p@30 for each model on each domain

  22. Human Evaluation  Human evaluation on five domains ◦ Hotel, Restaurant, Watch, Tablet, MP3Player

  23. Example T opics  Example topics by LDA(w) and LDA(p_GPU)

  24. Future Work  Design a topic quality metrics for topics with phrases  Systematically set the amount of promotion based on the designed metrics

