Neural Models for Documents with Metadata Dallas Card, Chenhao Tan, Noah A. Smith July 18, 2018
Outline Main points of this talk: 1. Introducing Scholar 1 : a neural model for documents with metadata Background (LDA, SAGE, SLDA, etc.) Model and related work Experiments and Results 2. Power of neural variational inference for interactive modeling 1 Sparse Contextual Hidden and Observed Language Autoencoder 1
Latent Dirichlet Allocation Blei, Ng, and Jordan. Latent Dirichlet Allocation . JMLR. 2003. David Blei. Probabilistic topic models . Comm. ACM. 2012 2
Types of metadata Date or time Author(s) Rating Sentiment Ideology etc. 3
Variations and extensions Author topic model (Rosen-Zvi et al 2004) Supervised LDA (SLDA; McAuliffe and Blei, 2008) Dirichlet multinomial regression (Mimno and McCallum, 2008) Sparse additive generative models (SAGE; Eisenstein et al, 2011) Structural topic model (Roberts et al, 2014) ... 4
Desired features of model Fast, scalable inference. Easy modification by end-users. 5
Desired features of model Fast, scalable inference. Easy modification by end-users. Incorporation of metadata: Covariates: features which influences text (as in SAGE). Labels: features to be predicted along with text (as in SLDA). 5
Desired features of model Fast, scalable inference. Easy modification by end-users. Incorporation of metadata: Covariates: features which influences text (as in SAGE). Labels: features to be predicted along with text (as in SLDA). Possibility of sparse topics. 5
Desired features of model Fast, scalable inference. Easy modification by end-users. Incorporation of metadata: Covariates: features which influences text (as in SAGE). Labels: features to be predicted along with text (as in SLDA). Possibility of sparse topics. Incorporate additional prior knowledge. 5
Desired features of model Fast, scalable inference. Easy modification by end-users. Incorporation of metadata: Covariates: features which influences text (as in SAGE). Labels: features to be predicted along with text (as in SLDA). Possibility of sparse topics. Incorporate additional prior knowledge. → Use variational autoencoder (VAE) style of inference (Kingma and Welling, 2014) 5
Desired outcome Coherent groupings of words (something like topics), with offsets for observed metadata 6
Desired outcome Coherent groupings of words (something like topics), with offsets for observed metadata Encoder to map from documents to latent representations 6
Desired outcome Coherent groupings of words (something like topics), with offsets for observed metadata Encoder to map from documents to latent representations Classifier to predict labels from from latent representation 6
Model k i generator network: p ( w i ) = f g ( ) words 7
Model k i p ( w ) generator network: p ( w i ) = f g ( ) i words 8
Model k i p ( w ) generator network: p ( w i ) = f g ( ) i q ( w ) i words 9
Model k i p ( w ) generator network: p ( w i ) = f g ( ) i q ( w ) i words ELBO = E q [ log p ( words | θ i )] − D KL [ q ( θ i | words ) � p ( θ i )] 10
Model words encoder network: q ( w ) = f e ( ) i k i generator network: p ( w i ) = f g ( ) words ELBO = E q [ log p ( words | θ i )] − D KL [ q ( θ i | words ) � p ( θ i )] 11
Model words encoder network: q ( w ) = f e ( ) i r i k i = softmax ( r i ) k i generator network: p ( w i ) = f g ( ) words ELBO = E q [ log p ( words | r i )] − D KL [ q ( r i | words ) � p ( r i )] 12
Model words encoder network: q ( w ) = f e ( ) i r i k i = softmax ( r i ) k i generator network: p ( w i ) = f g ( ) words � S s = 1 [ log p ( words | r ( s ) ELBO ≈ 1 )] − D KL [ q ( r i | words ) � p ( r i )] S i 13
Model words (0, I ) encoder network: q ( w ) = f e ( ) i r i k i = softmax ( r i ) k i generator network: p ( w i ) = f g ( ) words � S s = 1 [ log p ( words | r ( s ) ELBO ≈ 1 )] − D KL [ q ( r i | words ) � p ( r i )] S i 14
Model words (0, I ) encoder network: q ( w ) = f e ( ) i = q + ( s ) r i k q i = softmax ( r i ) k i generator network: p ( w i ) = f g ( ) words � S s = 1 [ log p ( words | r ( s ) ELBO ≈ 1 )] − D KL [ q ( r i | words ) � p ( r i )] S i 15
Model words (0, I ) encoder network: q ( w ) = f e ( ) i = q + ( s ) r i k q i = softmax ( r i ) k i generator network: p ( w i ) = f g ( ) words Srivastava and Sutton, 2017, Miao et al, 2016 16
Model words (0, I ) encoder network: q ( w ) = f e ( ) i = q + ( s ) r i k q i = softmax ( r i ) k i generator network: p ( w i ) = f g ( ) y i words 17
Model words (0, I ) encoder network: q ( w ) = f e ( ) i = q + ( s ) r i k q i = softmax ( r i ) c i k i generator network: p ( w i ) = f g ( ) y i words 18
Model , c i , y i words (0, I ) encoder network: q ( w ) = f e ( ) i = q + ( s ) r i k q i = softmax ( r i ) c i k i generator network: p ( w i ) = f g ( ) y i words 19
Scholar Generator network: i B ( topic ) + c T p ( word | θ i , c i ) = softmax ( d + θ T i B ( cov ) ) 20
Scholar Generator network: i B ( topic ) + c T p ( word | θ i , c i ) = softmax ( d + θ T i B ( cov ) ) Optionally include interactions between topics and covariates 20
Scholar Generator network: i B ( topic ) + c T p ( word | θ i , c i ) = softmax ( d + θ T i B ( cov ) ) Optionally include interactions between topics and covariates p ( y i | θ i , c i ) = f y ( θ i , c i ) 20
Scholar Generator network: i B ( topic ) + c T p ( word | θ i , c i ) = softmax ( d + θ T i B ( cov ) ) Optionally include interactions between topics and covariates p ( y i | θ i , c i ) = f y ( θ i , c i ) Encoder: µ i = f µ ( words , c i , y i ) log σ i = f σ ( words , c i , y i ) Optional incorporation of word vectors to embed input 20
Optimization Stochastic optimization using mini-batches of documents Tricks from Srivastava and Sutton, 2017: Adam optimizer with high-learning rate to bypass mode collapse Batch-norm layers to avoid divergence Annealing away from batch-norm output to keep results interpretable 21
Output of Scholar B ( topic ) , B ( cov ) : Coherent groupings of positive and negative deviations from background ( ∼ topics) 22
Output of Scholar B ( topic ) , B ( cov ) : Coherent groupings of positive and negative deviations from background ( ∼ topics) f µ , f σ : Encoder network: mapping from words to topics: ˆ θ i = softmax ( f e ( words , c i , y i , ǫ )) 22
Output of Scholar B ( topic ) , B ( cov ) : Coherent groupings of positive and negative deviations from background ( ∼ topics) f µ , f σ : Encoder network: mapping from words to topics: ˆ θ i = softmax ( f e ( words , c i , y i , ǫ )) f y : Classifier mapping from ˆ θ i to labels: ˆ y = f y ( θ i , c i ) 22
Evaluation 1. Performance as a topic model, without metadata (perplexity, coherence) 2. Performance as a classifier, compared to SLDA 3. Exploratory data analysis 23
Quantitative results: basic model 2000 Perplexity 1000 0 0.2 Coherence 0.1 0.0 Sparsity 0.5 0.0 LDA IMDB dataset (Maas, 2011) 24
Quantitative results: basic model 2000 Perplexity 1000 0 0.2 Coherence 0.1 0.0 Sparsity 0.5 0.0 LDA SAGE IMDB dataset (Maas, 2011) 25
Quantitative results: basic model 2000 Perplexity 1000 0 0.2 Coherence 0.1 0.0 Sparsity 0.5 0.0 LDA SAGE NVDM IMDB dataset (Maas, 2011) 26
Quantitative results: basic model 2000 Perplexity 1000 0 0.2 Coherence 0.1 0.0 Sparsity 0.5 0.0 LDA SAGE NVDM Scholar IMDB dataset (Maas, 2011) 27
Quantitative results: basic model 2000 Perplexity 1000 0 0.2 Coherence 0.1 0.0 Sparsity 0.5 0.0 LDA SAGE NVDM Scholar Scholar +wv IMDB dataset (Maas, 2011) 28
Quantitative results: basic model 2000 Perplexity 1000 0 0.2 Coherence 0.1 0.0 Sparsity 0.5 0.0 LDA SAGE NVDM Scholar Scholar Scholar +wv +sparsity IMDB dataset (Maas, 2011) 29
Classification results 1.0 0.9 0.8 Accuracy 0.7 0.6 0.5 LR SLDA Scholar Scholar (labels) (covariates) IMDB dataset (Maas, 2011) 30
Exploratory Data Analysis Data: Media Frames Corpus (Card et al, 2015) Collection of thousands of news articles annotated in terms of tone and framing Relevant metadata: year of publication, newspaper, etc. 31
Tone as a label english language city spanish community boat desert died men miles coast haitian visas visa applications students citizenship asylum judge appeals deportation court labor jobs workers percent study wages bush border president bill republicans state gov benefits arizona law bill bills arrested charged charges agents operation 0 1 p (pro-immigration | topic) 32
Recommend
More recommend