Discovering Coherent Topics Using General Knowledge Meichun Hsu - PowerPoint PPT Presentation

Discovering Coherent Topics Using General Knowledge Meichun Hsu Zhiyuan (Brett) Chen Malu Castellanos Arjun Mukherjee Riddhiman Ghosh Bing Liu http://www.cs.uic.edu/~zchen/

Topic Model Topic 1 Document 1 Topic Topic 2 Document 2 Model … … Topic T Document M

Coherent Topics Price Cheap Expensive Cost Money Pricey Dollar

Coherent Topics Price Price Cheap Family Expensive Cheap Cost Expensive Money Politics Pricey Cost Dollar Size

Issues of Unsupervised Topic Models Many topics are not coherent. Objective functions do not correlate well with human judgments (Chang et al., 2009).

Remedy: Knowledge-based Topic Models

Knowledge-based Topic Models DF-LDA (Andrzejewski et al., 2009) Must-Link Picture Photo Cannot-Link Picture Price

Knowledge-based Topic Models DF-LDA (Andrzejewski et al., 2009) Seeded models (Burns et al., 2012; Jagarlamudi et al., 2012; Lu et al., 2011; Mukherjee and Liu, 2012)

Knowledge Assumptions Knowledge is correct for a domain.

Knowledge Assumptions Knowledge is correct for a domain. Knowledge is domain dependent.

Existing Model Flow

Our Proposed Model Flow

Our Proposed Model Flow General Knowledge

General Knowledge Domain Independent May be wrong for a domain

Lexical Semantic Relations Synonyms {Expensive, Pricey} Antonyms {Expensive, Cheap} Adjective-Attributes {Expensive, Price}

Lexical Semantic Relations Synonyms {Expensive, Pricey} WordNet Antonyms {Expensive, Cheap} Adjective-Attributes {Expensive, Price} (Fei et al. 2012)

LR-Sets Example: {Expensive, Pricey, Cheap, Price}

LR-Sets (Lexical Relation) Example: {Expensive, Pricey, Cheap, Price} Words should be in the same topic

Issues of LR-Sets No correct LR- sets for a word Partially wrong knowledge

Issues of LR-Sets No correct LR-sets for a word {Card, Menu} Card {Card, Bill}

Issues of LR-Sets No correct LR-sets for a word {Card, Menu} {Card, Bill}

Issues of LR-Sets Partially wrong knowledge Picture {Picture, Pic, Flick}

Issues of LR-Sets Partially wrong knowledge {Picture, Pic, Flick}

Addressing Issues No correct LR- Relaxing wrong sets for a word sets for a word Partially wrong Word Correlation knowledge + GPU

Relaxing Wrong LR-sets {Card, Menu} {Card, Bill}

Relaxing Wrong LR-sets {Card, Menu} {Card, Bill} {Card}

Estimate Knowledge {Picture, Image} {Picture, Painting}

Word Distributions From LDA Word Prob Picture 0.20 Image 0.15 Photo 0.12 Quality 0.10 Resolution 0.05 … Painting 0.0002

Estimate Word Correlation Word Prob Picture 0.20 Image 0.15 Photo 0.12 {Picture, Image} Quality 0.10 Resolution 0.05 {Picture, Painting} … Painting 0.0002

Word Correlation Matrix C Word Prob Picture 0.20 Image 0.15 Photo 0.12 {Picture, Image} Quality 0.10 0.15 / 0.20 Resolution 0.05 {Picture, Painting} … 0.0002 / 0.20 Painting 0.0002

Quality of LR-set s Towards w

Relaxing Wrong LR-sets {Card, Menu} Q(s1, “Card”) < ɛ {Card, Bill} Q(s2, “Card”) < ɛ

Relaxing Wrong LR-sets {Card, Menu} Q(s1, “Card”) < ɛ {Card, Bill} Q(s2, “Card”) < ɛ {Card}

Simple Pólya Urn Model (SPU)

Simple Pólya Urn Model (SPU) The richer get richer!

Interpreting LDA Under SPU

Interpreting LDA Under SPU picture Topic 0

Interpreting LDA Under SPU picture picture Topic 0

Generalized Pólya Urn Model (GPU)

Applying GPU picture Topic 0

Applying GPU picture picture image painting Topic 0

Applying GPU picture picture image painting Word Correlation Topic 0

Evaluation

Evaluation Four domains KL-Divergence Evaluation Topic Coherence Human Evaluation

Model Comparison LDA (Blei et al., 2003) LDA-GPU (Mimno et al., 2011) DF-LDA (Andrzejewski et al., 2009) MDK-LDA (Chen et al., 2013) GK-LDA

KL-Divergence

Topic Coherence (#T = 15)

Human Evaluation

Example Topics love

Conclusions Discovering Coherent Topics Using General Knowledge

Conclusions Discovering Coherent Topics Using General Knowledge No correct LR- sets for a word Partially wrong knowledge

Conclusions Discovering Coherent Topics Using General Knowledge No correct LR- Relaxing wrong sets for a word sets for a word Partially wrong Word Correlation knowledge + GPU

Datasets: http://www.cs.uic.edu/~zchen/

Discovering Coherent Topics Using General Knowledge Meichun Hsu - PowerPoint PPT Presentation

Discovering Coherent Topics Using General Knowledge Meichun Hsu Zhiyuan (Brett) Chen Malu Castellanos Arjun Mukherjee Riddhiman Ghosh Bing Liu http://www.cs.uic.edu/~zchen/ Topic Model Topic 1 Document 1 Topic Topic 2 Document 2 Model

Coherent beam-beam effects X. Buffat Content Coherent vs. incoherent Self-consistent

Coherent beam-beam effects X. Buffat Content Coherent vs. incoherent Self-consistent

Discovering Gods Word (Part-2) Discovering Gods Word (Part-2) Hermeneutics = The science

Knowledge-Based Agents knowledge knowledge representation, knowledge base, types of knowledge

Wednesday, November 30, 2016 3:41 PM General Page 1 General Page 2 General Page 3 General Page

Rebound: Scalable Checkpointing for Coherent Shared Memor for Coherent Shared Memory Rishi

COHERENT Experiment at the SNS Jason Newby for the COHERENT collaboration CPAD 2019 Workshop

Enumerating coherent configurations of small order Matan Ziv-Av Ben Gurion University of the

OUTLINE - Coherent elastic neutrino-nucleus scattering (CEvNS) - Why measure it? Physics

A coherent nanotube oscillator driven by electromechanical backaction Edward Laird A coherent

DP-QPSK Receiver OPTI 500, Spring 2012, Lecture 16, Coherent Transmitters and Receivers 1

R. Avakian R.H. Avakian JLAB, November 1, 2004 R.H. Avakian JLAB, November 1, 2004 Coherent

Coherent detection and reconstruction of burst events in S5 data S.Klimenko, University of

The COHERENT Collaboration: Initial Results and Present Status Samuel Hedges 30 May 2018

Discovering, Visualizing and Sharing Knowledge through Personalized Learning Knowledge Maps The

The 3 rd Covenant Re-Discovering the Word of God within the words of the Bible Re-Discovering The

Mixed-Criticality Scheduling with I/O Eric Missimer, Katherine Missimer, Richard West Boston

On lattice polytopes, convex matroid optimization, and degree sequences of hypergraphs Antoine

Documenting conversational conventions in Swahili Daniel W. Hieber University of California,

CSCE 471/871 Lecture 6: Multiple Sequence Alignments Residues occupy similar positions in 3D

Simulation of the laser plasma interaction with the PIC code ALaDyn Carlo Benedetti Department

A Categorification of Group Cohomology Michael Horst The Ohio State University horst.59@osu.edu

Picard Cohomology Michael Horst The Ohio State University horst.59@osu.edu

Site integral management with Puppet M. Caubet, A. Bria, X. Espinal PIC (Port d'Informaci