Representing Documents via Latent Keyphrase Inference April. 15 th , - PowerPoint PPT Presentation

Representing Documents via Latent Keyphrase Inference April. 15 th , 2016

Document Representation in Vector Space Critical for document retrieval, categorization 2

Traditional Methods q Bag-of-Words or Phrases q Cons: Sparse on short texts 3

q Topic models [LDA] Each topic is a distribution over words, each document is a mixture of corpus-wide topics q Cons: Difficult for human to infer topic semantics 4

q Concept-based models [ESA] Every Wikipedia article represents a concept Concept: Panthera Cat [0.92] Leopard [0.84] Roar [0.77] Article words are associated with the concept (TF.IDF), which help infer concepts from document q Cons: Low coverage of concepts in human-curated knowledge base 5

q Word/Document embedding models [word2vec paragraph2vec] q Cons: Difficult to explain what each dimension means 6

Document Representation Using Keyphrases Corpus Domain Keyphrases <K 1 , K 2 , …, K M > q Use domain keyphrases as the entries in the vector and q Identify document keyphrases (subset of domain keyphrases) by evaluating relatedness between (doc, domain keyphrase) q Unsupervised model 7

Challenges q Where to get domain keyphrases from a given corpus? Mining Quality Phrases from Massive Text Corpora [SIGMOD15] q q How to identify document keyphrases? q Can be latent mentions (short text) q Relatedness scores 8

How to identify document keyphrases? q Powered by Bayesian Inference on “Domain Keyphrase Silhouette” Domain Keyphrase Silhouette: Topic centered on domain keyphrase q “Reverse” topic models q Learned from corpus q 9

Framework for Latent Keyphrase Inference (LAKI) 10

Domain Keyphrase Silhouette q Learning Hierarchical Bayesian Network (DAG) Binary Variables Task 1: Model Learning: learning link weights Task 2: Structure Learning: learning network structure 12

Task 1: Model Learning given Structure q Use Z to represent K (domain keyphrases) and T (content units) q Noisy-OR A parent node is easier to activate its children q when the link weight is larger Toy example A child node is influenced by all its parents q Noise / Prior Aggregated over all other links connected with 𝑎 " 13

Maximum Likelihood Estimation q Training data: Documents q Expectation-step: q For each document, collect sufficient statistics q Link firing (Parent, child both being activated) probability q Node activation probability Partially observed q Maximization-step: document keyphrases q Update link weight Fully observed content units 14

Task 2: Structure Learning q Domain keyphrases are connected to content units Help infer document keyphrases from content units q q Domain keyphrases are interconnected Help infer document keyphrases from other keyphrases q 15

A Heuristic Approach q Data-Driven, DAG, similar to ontology q Heuristic: q Two nodes are connected only q Closely Related: word2vec q Co-occur frequently q Links are always point to less frequent nodes q Work well in practice 16

Inference q Exact inference is slow! q NP hard to compute posterior probability for Noisy-Or networks q Approximate inference instead q Pruning irrelevant nodes using an efficient scoring function q Gibbs sampling 18

Experiments q Two text-related tasks to evaluate document representation quality q Phrase relatedness q Document classification q Two datasets 19

Methods ESA (Explicit Semantic Analysis) q KBLink uses link structure in Wikipedia q BoW (bag-of-words) q ESA-C extends ESA by replacing Wiki with domain corpus q LSA (Latent Semantic Analysis) q LDA (Latent Dirichlet Allocation) q Word2Vec is a neural network computing word q embeddings EKM uses explicit keyphrase detection q 20

Phrase Relatedness Correlation Document Classification 21

Case Study 22

Time Complexity 500 500 1500 Runing Time (ms) Runing Time (ms) Runing Time (ms) 400 400 300 300 1000 200 200 500 Academia Academia Academia 100 100 Yelp Yelp Yelp 0 1000 3000 5000 7000 9000 10 100 200 300 400 500 0 100 200 400 800 #Samples #Quality Phrases After Pruning #Words 24

Breakdown of Processing Time 25

Conclusion We have introduced a novel document representation method using latent q keyphrases Each dimension is explainable q Works for short text q Works for closed-domain text q We have developed an efficient inference method to do real time keyphrase q identification Future work q Better structure learning approach q Combined with knowledge base q Try other inference method other than Gibbs sampling q Code available at http://jialu.info q 26 26

Representing Documents via Latent Keyphrase Inference April. 15 th , - PowerPoint PPT Presentation

Representing Documents via Latent Keyphrase Inference April. 15 th , 2016 Document Representation in Vector Space Critical for document retrieval, categorization 2 Traditional Methods q Bag-of-Words or Phrases q Cons: Sparse on short texts 3 q

Reducing Over-generation Errors for Automatic Keyphrase Extraction using Integer Linear

Deep Keyphrase Generation Rui Meng, Sanqiang Zhao, Shuguang Han, Daqing He, Peter Brusilovsky, Yu

Case Study: Approximate Bayesian Inference for Latent Gaussian Models by Using Integrated Nested

Approximate Bayesian inference for latent Gaussian models avard Rue 1 H Department of

Distributed Variational Inference in Sparse Gaussian Process Regression and Latent Variable Models

1 WHAT L-KD ( Labelled-KD ): tool for keyphrase clustering and labelling Extension of KD:

Doubly Stochastic Variational Inference for Neural Processes with Hierarchical Latent Variables

Bayesian inference for logistic models using Polya-gamma latent variables Nicholas G. Polson,

CS 285 Instructor: Sergey Levine UC Berkeley Todays Lecture 1. Probabilistic latent variable

1 Latent variable models In the next section we will discuss latent variable models for

GpKex : Genetically Programmed Keyphrase Extraction from Croatian Texts Marko Bekavac and Jan

C unobserved construct (e.g. Disordered v. Non- Disordered) Latent classes are mutually

Neural Variational Inference and Learning Andriy Mnih, Karol Gregor 22 June 2014 1 / 14

Generative models for natural language inference DGM4NLP Miguel Rios University of Amsterdam

Part III: Latent Tree Models Le Song ICML 2012 Tutorial on Spectral Algorithms for Latent

Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model CS330

Demystifying Relational Latent Representations Sebastijan Dumani, Hendrik Blockeel DTAI, KU

Latent Variable Models CS3750 Xiaoting Li 1 Out utli line Latent Variable Models

MCMC and Variational Inference for AutoEncoders Achille Thin 1 , Alain Durmus 2 , Eric Moulines 1 1

Exact Inference Inference Basic task for inference: Compute

8 Approximate inference in switching linear dynamical systems using Gaussian mixtures David

Retrieval by Content Part 3: Text Retrieval Latent Semantic Indexing Srihari: CSE 626 1 Latent

Scalable Machine Learning 10. Distributed Inference and Applications Alex Smola Yahoo! Research

Optimization-Based Model Fitting for Latent Class and Latent Profile Analyses Guan-Hua Huang,