A New Method of Moments for Latent Variable Models Matteo Ruffini, Marta Casanellas, Ricard Gavald` a Universitat Polit` ecnica de Catalunya, Barcelona, Spain Ruffini, Casanellas, Gavald` a (UPC) Methods of Moments for Topic Models 1 / 30
Methods of Moments in Statistics and Machine Learning Ruffini, Casanellas, Gavald` a (UPC) Methods of Moments for Topic Models 2 / 30
Methods of Moments in Statistics and Machine Learning The method of moments was introduced by Pearson in the 1890’s. Ruffini, Casanellas, Gavald` a (UPC) Methods of Moments for Topic Models 2 / 30
Methods of Moments in Statistics and Machine Learning The method of moments was introduced by Pearson in the 1890’s. Estimates the parameters of a model by solving equations that relate the moments of the data with model parameters. X ∼ p θ → E [ f ( X )] = g ( θ ) Ruffini, Casanellas, Gavald` a (UPC) Methods of Moments for Topic Models 2 / 30
Methods of Moments in Statistics and Machine Learning The method of moments was introduced by Pearson in the 1890’s. Estimates the parameters of a model by solving equations that relate the moments of the data with model parameters. X ∼ p θ → E [ f ( X )] = g ( θ ) In the last decade has been used in machine learning to obtain PAC-learning algorithms for topic models, hidden Markov models, mixtures of Gaussians, etc. Ruffini, Casanellas, Gavald` a (UPC) Methods of Moments for Topic Models 2 / 30
Methods of Moments in Statistics and Machine Learning The method of moments was introduced by Pearson in the 1890’s. Estimates the parameters of a model by solving equations that relate the moments of the data with model parameters. X ∼ p θ → E [ f ( X )] = g ( θ ) In the last decade has been used in machine learning to obtain PAC-learning algorithms for topic models, hidden Markov models, mixtures of Gaussians, etc. This Paper Introduce improved methods of moments for topic models. Experimentally validate their performance against traditional learning methods (e.g. Gibbs Sampling). Ruffini, Casanellas, Gavald` a (UPC) Methods of Moments for Topic Models 2 / 30
Agenda 1 Topic Models and Method of Moments. 2 Our Method. 3 Experiments. Ruffini, Casanellas, Gavald` a (UPC) Methods of Moments for Topic Models 3 / 30
The Single Topic Model Ruffini, Casanellas, Gavald` a (UPC) Methods of Moments for Topic Models 4 / 30
The Single Topic Model A generative process for texts: We have k latent topics. Ruffini, Casanellas, Gavald` a (UPC) Methods of Moments for Topic Models 4 / 30
The Single Topic Model A generative process for texts: We have k latent topics. A text only deals with a unique topic i with probability ω i : P [Topic = i ] = ω i . Ruffini, Casanellas, Gavald` a (UPC) Methods of Moments for Topic Models 4 / 30
The Single Topic Model A generative process for texts: We have k latent topics. A text only deals with a unique topic i with probability ω i : P [Topic = i ] = ω i . Given the latent topic, all the words of a text are sampled from a discrete distribution with parameter µ i ∈ R d : P [Sample word j | topic = i ] = ( µ i ) j Ruffini, Casanellas, Gavald` a (UPC) Methods of Moments for Topic Models 4 / 30
The Single Topic Model A generative process for texts: We have k latent topics. A text only deals with a unique topic i with probability ω i : P [Topic = i ] = ω i . Given the latent topic, all the words of a text are sampled from a discrete distribution with parameter µ i ∈ R d : P [Sample word j | topic = i ] = ( µ i ) j Parameters: Notation: The topics M = [ µ 1 , ..., µ k ] ∈ R d × k . d vocabulary size. Weights ω = ( ω 1 , ..., ω k ) ∈ R k . x j one-hot encoded j th word of a document. Ruffini, Casanellas, Gavald` a (UPC) Methods of Moments for Topic Models 4 / 30
Latent Dirichlet Allocation Ruffini, Casanellas, Gavald` a (UPC) Methods of Moments for Topic Models 5 / 30
Latent Dirichlet Allocation A generative process for texts: We have k latent topics. Ruffini, Casanellas, Gavald` a (UPC) Methods of Moments for Topic Models 5 / 30
Latent Dirichlet Allocation A generative process for texts: We have k latent topics. A text deals with a multitude of topics, sampled from a Dirichlet distribution. Ruffini, Casanellas, Gavald` a (UPC) Methods of Moments for Topic Models 5 / 30
Latent Dirichlet Allocation A generative process for texts: We have k latent topics. A text deals with a multitude of topics, sampled from a Dirichlet distribution. First, you sample the topic proportions for the text h ≈ Dirichlet ( ω ) Ruffini, Casanellas, Gavald` a (UPC) Methods of Moments for Topic Models 5 / 30
Latent Dirichlet Allocation A generative process for texts: We have k latent topics. A text deals with a multitude of topics, sampled from a Dirichlet distribution. First, you sample the topic proportions for the text h ≈ Dirichlet ( ω ) Then you sample the latent topic of each word: P [Topic i ] = ( h ) i Ruffini, Casanellas, Gavald` a (UPC) Methods of Moments for Topic Models 5 / 30
Latent Dirichlet Allocation A generative process for texts: We have k latent topics. A text deals with a multitude of topics, sampled from a Dirichlet distribution. First, you sample the topic proportions for the text h ≈ Dirichlet ( ω ) Then you sample the latent topic of each word: P [Topic i ] = ( h ) i Last, you sample the word, depending on its topic: P [Sample word j | Topic = i ] = ( µ i ) j Ruffini, Casanellas, Gavald` a (UPC) Methods of Moments for Topic Models 5 / 30
Latent Dirichlet Allocation A generative process for texts: We have k latent topics. A text deals with a multitude of topics, sampled from a Dirichlet distribution. First, you sample the topic proportions for the text h ≈ Dirichlet ( ω ) Then you sample the latent topic of each word: P [Topic i ] = ( h ) i Last, you sample the word, depending on its topic: P [Sample word j | Topic = i ] = ( µ i ) j Parameters: The topics M = [ µ 1 , ..., µ k ] ∈ R d × k Weights ω = ( ω 1 , ..., ω k ) ∈ R k Ruffini, Casanellas, Gavald` a (UPC) Methods of Moments for Topic Models 5 / 30
Learning a Topic Model From an iid sample X = { x (1) , ..., x ( n ) } , x ( i ) = { x ( i ) 1 , x ( i ) 2 , x ( i ) 3 , ... } We want to recover the parameters of the model: Single Topic Model: ( µ 1 , ..., µ k , ω ) Latent Dirichlet Allocation: ( µ 1 , ..., µ k , ω ) Ruffini, Casanellas, Gavald` a (UPC) Methods of Moments for Topic Models 6 / 30
Learning a Topic Model From an iid sample X = { x (1) , ..., x ( n ) } , x ( i ) = { x ( i ) 1 , x ( i ) 2 , x ( i ) 3 , ... } We want to recover the parameters of the model: Single Topic Model: ( µ 1 , ..., µ k , ω ) Latent Dirichlet Allocation: ( µ 1 , ..., µ k , ω ) Likelihood-based methods: (EM, sampling, variational methods) Either very slow or poor guarantees. Ruffini, Casanellas, Gavald` a (UPC) Methods of Moments for Topic Models 6 / 30
Spectral Method of Moments [Anandkumar et al., (2014)] Ruffini, Casanellas, Gavald` a (UPC) Methods of Moments for Topic Models 7 / 30
Spectral Method of Moments [Anandkumar et al., (2014)] Applicable to any model admitting a parametrization in terms of centers and weights: M = [ µ 1 , ..., µ k ] ∈ R d × k , ω = ( ω 1 , ..., ω k ) ∈ R k Ruffini, Casanellas, Gavald` a (UPC) Methods of Moments for Topic Models 7 / 30
Spectral Method of Moments [Anandkumar et al., (2014)] Applicable to any model admitting a parametrization in terms of centers and weights: M = [ µ 1 , ..., µ k ] ∈ R d × k , ω = ( ω 1 , ..., ω k ) ∈ R k Find (model-dependent) estimators of the moments: ˆ ˆ ˆ M 1 ( X ) , M 2 ( X ) , M 3 ( X ) 1 k E [ ˆ � ω i µ i ∈ R d M 1 ] = M 1 = i =1 k E [ ˆ ω i µ i ⊗ µ i ∈ R d × d � M 2 ] = M 2 = i =1 k E [ ˆ � ω i µ i ⊗ µ i ⊗ µ i ∈ R d × d × d M 3 ] = M 3 = i =1 Ruffini, Casanellas, Gavald` a (UPC) Methods of Moments for Topic Models 7 / 30
Spectral Method of Moments [Anandkumar et al., (2014)] Applicable to any model admitting a parametrization in terms of centers and weights: M = [ µ 1 , ..., µ k ] ∈ R d × k , ω = ( ω 1 , ..., ω k ) ∈ R k Find (model-dependent) estimators of the moments: ˆ ˆ ˆ M 1 ( X ) , M 2 ( X ) , M 3 ( X ) 1 k E [ ˆ � ω i µ i ∈ R d M 1 ] = M 1 = i =1 k E [ ˆ ω i µ i ⊗ µ i ∈ R d × d � M 2 ] = M 2 = i =1 k E [ ˆ � ω i µ i ⊗ µ i ⊗ µ i ∈ R d × d × d M 3 ] = M 3 = i =1 Retrieve an estimate of model parameters (ˆ µ 1 , ..., ˆ µ k , ˆ ω ) with tensor decomposition: 2 k k k ˆ � ˆ � ˆ � M 1 ≈ ω i ˆ ˆ M 2 ≈ ω i ˆ ˆ µ i ⊗ ˆ M 3 ≈ ω i ˆ ˆ µ i ⊗ ˆ µ i ⊗ ˆ µ i , µ i , µ i i =1 i =1 i =1 Ruffini, Casanellas, Gavald` a (UPC) Methods of Moments for Topic Models 7 / 30
Pros and Cons Pros Fast – linear in the sample size . Reduce the model-learning task to a tensor decomposition problem. PAC-style guarantees. It is the ideal setting for topic models. Ruffini, Casanellas, Gavald` a (UPC) Methods of Moments for Topic Models 8 / 30
Recommend
More recommend