Discovering Coherent Topics Using General Knowledge Meichun Hsu Zhiyuan (Brett) Chen Malu Castellanos Arjun Mukherjee Riddhiman Ghosh Bing Liu http://www.cs.uic.edu/~zchen/
Topic Model Topic 1 Document 1 Topic Topic 2 Document 2 Model … … Topic T Document M
Coherent Topics Price Cheap Expensive Cost Money Pricey Dollar
Coherent Topics Price Price Cheap Family Expensive Cheap Cost Expensive Money Politics Pricey Cost Dollar Size
Issues of Unsupervised Topic Models Many topics are not coherent. Objective functions do not correlate well with human judgments (Chang et al., 2009).
Remedy: Knowledge-based Topic Models
Knowledge-based Topic Models DF-LDA (Andrzejewski et al., 2009) Must-Link Picture Photo Cannot-Link Picture Price
Knowledge-based Topic Models DF-LDA (Andrzejewski et al., 2009) Seeded models (Burns et al., 2012; Jagarlamudi et al., 2012; Lu et al., 2011; Mukherjee and Liu, 2012)
Knowledge Assumptions Knowledge is correct for a domain.
Knowledge Assumptions Knowledge is correct for a domain. Knowledge is domain dependent.
Existing Model Flow
Existing Model Flow
Existing Model Flow
Existing Model Flow
Existing Model Flow
Existing Model Flow
Our Proposed Model Flow
Our Proposed Model Flow
Our Proposed Model Flow General Knowledge
General Knowledge Domain Independent May be wrong for a domain
Lexical Semantic Relations Synonyms {Expensive, Pricey} Antonyms {Expensive, Cheap} Adjective-Attributes {Expensive, Price}
Lexical Semantic Relations Synonyms {Expensive, Pricey} WordNet Antonyms {Expensive, Cheap} Adjective-Attributes {Expensive, Price} (Fei et al. 2012)
LR-Sets Example: {Expensive, Pricey, Cheap, Price}
LR-Sets (Lexical Relation) Example: {Expensive, Pricey, Cheap, Price} Words should be in the same topic
Issues of LR-Sets No correct LR- sets for a word Partially wrong knowledge
Issues of LR-Sets No correct LR-sets for a word {Card, Menu} Card {Card, Bill}
Issues of LR-Sets No correct LR-sets for a word {Card, Menu} {Card, Bill}
Issues of LR-Sets No correct LR-sets for a word {Card, Menu} {Card, Bill}
Issues of LR-Sets Partially wrong knowledge Picture {Picture, Pic, Flick}
Issues of LR-Sets Partially wrong knowledge {Picture, Pic, Flick}
Addressing Issues No correct LR- Relaxing wrong sets for a word sets for a word Partially wrong Word Correlation knowledge + GPU
Addressing Issues No correct LR- Relaxing wrong sets for a word sets for a word Partially wrong Word Correlation knowledge + GPU
Relaxing Wrong LR-sets {Card, Menu} {Card, Bill}
Relaxing Wrong LR-sets {Card, Menu} {Card, Bill} {Card}
Estimate Knowledge {Picture, Image} {Picture, Painting}
Word Distributions From LDA Word Prob Picture 0.20 Image 0.15 Photo 0.12 Quality 0.10 Resolution 0.05 … Painting 0.0002
Estimate Word Correlation Word Prob Picture 0.20 Image 0.15 Photo 0.12 {Picture, Image} Quality 0.10 Resolution 0.05 {Picture, Painting} … Painting 0.0002
Word Correlation Matrix C Word Prob Picture 0.20 Image 0.15 Photo 0.12 {Picture, Image} Quality 0.10 0.15 / 0.20 Resolution 0.05 {Picture, Painting} … 0.0002 / 0.20 Painting 0.0002
Quality of LR-set s Towards w
Relaxing Wrong LR-sets {Card, Menu} Q(s1, “Card”) < ɛ {Card, Bill} Q(s2, “Card”) < ɛ
Relaxing Wrong LR-sets {Card, Menu} Q(s1, “Card”) < ɛ {Card, Bill} Q(s2, “Card”) < ɛ {Card}
Addressing Issues No correct LR- Relaxing wrong sets for a word sets for a word Partially wrong Word Correlation knowledge + GPU
Simple Pólya Urn Model (SPU)
Simple Pólya Urn Model (SPU)
Simple Pólya Urn Model (SPU)
Simple Pólya Urn Model (SPU)
Simple Pólya Urn Model (SPU)
Simple Pólya Urn Model (SPU) The richer get richer!
Interpreting LDA Under SPU
Interpreting LDA Under SPU picture Topic 0
Interpreting LDA Under SPU picture picture Topic 0
Generalized Pólya Urn Model (GPU)
Generalized Pólya Urn Model (GPU)
Generalized Pólya Urn Model (GPU)
Generalized Pólya Urn Model (GPU)
Generalized Pólya Urn Model (GPU)
Applying GPU picture Topic 0
Applying GPU picture picture image painting Topic 0
Applying GPU picture picture image painting Word Correlation Topic 0
Addressing Issues No correct LR- Relaxing wrong sets for a word sets for a word Partially wrong Word Correlation knowledge + GPU
Evaluation
Evaluation Four domains KL-Divergence Evaluation Topic Coherence Human Evaluation
Model Comparison LDA (Blei et al., 2003) LDA-GPU (Mimno et al., 2011) DF-LDA (Andrzejewski et al., 2009) MDK-LDA (Chen et al., 2013) GK-LDA
KL-Divergence
Topic Coherence (#T = 15)
Human Evaluation
Example Topics love
Conclusions Discovering Coherent Topics Using General Knowledge
Conclusions Discovering Coherent Topics Using General Knowledge No correct LR- sets for a word Partially wrong knowledge
Conclusions Discovering Coherent Topics Using General Knowledge No correct LR- Relaxing wrong sets for a word sets for a word Partially wrong Word Correlation knowledge + GPU
Datasets: http://www.cs.uic.edu/~zchen/
Datasets: http://www.cs.uic.edu/~zchen/
Recommend
More recommend