Exploiting Domain Knowledge in Aspect Extraction Meichun Hsu Zhiyuan (Brett) Chen Malu Castellanos Arjun Mukherjee Riddhiman Ghosh Bing Liu
Aspect Extraction Extracting aspect terms
Aspect Terms This camera takes beautiful pictures but its price is higher than $200.
Aspect Terms This camera takes beautiful pictures but its price is higher than $200.
Aspect Extraction Extracting aspect terms Clustering terms into categories
Clustering Picture Price Photo Cost Image Money Aspect 1 Aspect 2
Existing Work For extracting only Word frequency + syntactic dependency (e.g., Hu and Liu, 2004) Supervised sequence labeling/classification (e.g., Liu, Hu and Cheng 2005)
Existing Work For extracting only Word frequency + syntactic dependency (e.g., Hu and Liu, 2004) Supervised sequence labeling/classification (e.g., Liu, Hu and Cheng 2005) For clustering only Grouping aspect terms (e.g., Zhai et al., 2010)
Existing Work For both extracting and clustering Topic models (e.g., Mukherjee and Liu, 2012; Kim et al., 2013; Lazaridou et al., 2013; Lin and He, 2009; Lu and Zhai, 2008; Moghaddam and Ester, 2011; Sauper et al., 2011; Titov and McDonald, 2008;)
Issues of Unsupervised Topic Models Many aspects/topics are not meaningful. Objective functions do not correlate well with human judgments (Chang et al., 2009).
Remedy: Knowledge-based Topic Models
Knowledge-based Topic Models Multiple senses DF-LDA Adverse effect Seeded Models Utilize “cannot” knowledge
Knowledge-based Topic Models DF-LDA (Andrzejewski et al., 2009) Must-Link Picture Photo Cannot-Link Picture Price
Knowledge-based Topic Models DF-LDA (Andrzejewski et al., 2009) Seeded models (Burns et al., 2012; Jagarlamudi et al., 2012; Lu et al., 2011; Mukherjee and Liu, 2012)
Knowledge-based Topic Models Multiple senses Light
Knowledge-based Topic Models Multiple senses {Light, Bright} Light {Light, Heavy}
Knowledge-based Topic Models Multiple senses Light {Light, Bright, Heavy}
Knowledge-based Topic Models Adverse effect of knowledge Color {Price, Cost} Price Cheap Color Price Cheap Cost Cost Pricy Pricy Cost … … Price
Knowledge-based Topic Models Utilize “cannot” knowledge Amazon Amazon Price Review Expensive {Amazon,Price} Shipping Review Order Shipping Price Order Expensive Money Money Cheap Cheap
Knowledge-based Topic Models Multiple senses DF-LDA Adverse effect Seeded Utilize “cannot” Models knowledge
Addressing Issues Multiple senses Adverse effect Utilize “cannot” knowledge
Addressing Issues Multiple senses Adding variable s Adverse effect GPU Model Utilize “cannot” E-GPU Model knowledge
M-Set and C-Set Must-set: {Price, Cost, Money} Do not enforce transitivity. Cannot-set: {Price, Color, Size}
Addressing First Issue Multiple senses Adding variable s Adverse effect GPU Model Utilize “cannot” E-GPU Model knowledge
MDK-LDA (Chen et al., IJCAI 2013)
MDK-LDA (Chen et al., IJCAI 2013) S1: {Light, Heavy, Weight} S2: {Light, Bright, Luminance}
Addressing Second Issue Multiple senses Adding variable s Adverse effect GPU Model Utilize “cannot” E-GPU Model knowledge
Simple Pólya Urn Model (SPU)
Simple Pólya Urn Model (SPU)
Simple Pólya Urn Model (SPU)
Simple Pólya Urn Model (SPU)
Simple Pólya Urn Model (SPU)
Simple Pólya Urn Model (SPU) The richer get richer!
Interpreting LDA Under SPU
Interpreting LDA Under SPU price Topic 0
Interpreting LDA Under SPU price price Topic 0
Generalized Pólya Urn Model (GPU)
Generalized Pólya Urn Model (GPU)
Generalized Pólya Urn Model (GPU)
Generalized Pólya Urn Model (GPU)
Generalized Pólya Urn Model (GPU)
Applying GPU price Topic 0
Applying GPU price price money cost Topic 0
Addressing Third Issue Multiple senses Adding variable s Adverse effect GPU Model Utilize “cannot” E-GPU Model knowledge
Our Proposed E-GPU Model Topic 0 Topic 1 Topic 2
E-GPU Model price Topic 0 Topic 1 Topic 2
E-GPU Model price price money cost Topic 0 Topic 1 Topic 2
E-GPU Model color {price, color} Topic 0 Topic 1 Topic 2
E-GPU Model color 8 1 1 “color” “color” “color” Topic 0 Topic 1 Topic 2
E-GPU Model color 8 1 1 “color” “color” “color” Topic 0 Topic 1 Topic 2
E-GPU Model amazon {price, amazon} Topic 0 Topic 1 Topic 2
E-GPU Model amazon 0 10 0 “amazon” “amazon” “amazon” Topic 0 Topic 1 Topic 2
amazon E-GPU Model 0 10 0 “amazon” “amazon” “amazon” Topic 0 Topic 1 Topic 2
amazon E-GPU Model 0 10 0 “amazon” “amazon” “amazon” Topic 0 Topic 1 Topic 2 Topic 3
Addressing Issues Multiple senses Adding variable s Adverse effect GPU Model Utilize “cannot” E-GPU Model knowledge
Evaluation
Evaluation Four domains Knowledge Objective Evaluation Human
Model Comparison LDA (Blei et al., 2003) LDA-GPU (Mimno et al., 2011) DF-LDA (Andrzejewski et al., 2009) MC-LDA
Model Comparison LDA LDA-GPU DF-M DF-LDA DF-MC M-LDA MC-LDA MC-LDA
Model Comparison LDA LDA-GPU Baselines DF-M DF-LDA DF-MC M-LDA MC-LDA MC-LDA
Objective Evaluation Topic Coherence
Objective Evaluation Topic Coherence
Human Evaluation Precision @ 5
Human Evaluation Precision @ 10
Example Aspects
Conclusions Discover meaningful aspects using knowledge
Conclusions Discover meaningful aspects using knowledge Multiple senses Adverse effect Utilize “cannot” knowledge
Conclusions Discover meaningful aspects using knowledge Multiple senses Adding variable s Adverse effect GPU Model Utilize “cannot” E-GPU Model knowledge
Datasets: http://www.cs.uic.edu/~zchen/
Datasets: http://www.cs.uic.edu/~zchen/
Recommend
More recommend