A New Method of Moments for Latent Variable Models Matteo Ruffini, - PowerPoint PPT Presentation

A New Method of Moments for Latent Variable Models Matteo Ruffini, Marta Casanellas, Ricard Gavald` a Universitat Polit` ecnica de Catalunya, Barcelona, Spain Ruffini, Casanellas, Gavald` a (UPC) Methods of Moments for Topic Models 1 / 30

Methods of Moments in Statistics and Machine Learning Ruffini, Casanellas, Gavald` a (UPC) Methods of Moments for Topic Models 2 / 30

Methods of Moments in Statistics and Machine Learning The method of moments was introduced by Pearson in the 1890’s. Ruffini, Casanellas, Gavald` a (UPC) Methods of Moments for Topic Models 2 / 30

Methods of Moments in Statistics and Machine Learning The method of moments was introduced by Pearson in the 1890’s. Estimates the parameters of a model by solving equations that relate the moments of the data with model parameters. X ∼ p θ → E [ f ( X )] = g ( θ ) Ruffini, Casanellas, Gavald` a (UPC) Methods of Moments for Topic Models 2 / 30

Methods of Moments in Statistics and Machine Learning The method of moments was introduced by Pearson in the 1890’s. Estimates the parameters of a model by solving equations that relate the moments of the data with model parameters. X ∼ p θ → E [ f ( X )] = g ( θ ) In the last decade has been used in machine learning to obtain PAC-learning algorithms for topic models, hidden Markov models, mixtures of Gaussians, etc. Ruffini, Casanellas, Gavald` a (UPC) Methods of Moments for Topic Models 2 / 30

Methods of Moments in Statistics and Machine Learning The method of moments was introduced by Pearson in the 1890’s. Estimates the parameters of a model by solving equations that relate the moments of the data with model parameters. X ∼ p θ → E [ f ( X )] = g ( θ ) In the last decade has been used in machine learning to obtain PAC-learning algorithms for topic models, hidden Markov models, mixtures of Gaussians, etc. This Paper Introduce improved methods of moments for topic models. Experimentally validate their performance against traditional learning methods (e.g. Gibbs Sampling). Ruffini, Casanellas, Gavald` a (UPC) Methods of Moments for Topic Models 2 / 30

Agenda 1 Topic Models and Method of Moments. 2 Our Method. 3 Experiments. Ruffini, Casanellas, Gavald` a (UPC) Methods of Moments for Topic Models 3 / 30

The Single Topic Model Ruffini, Casanellas, Gavald` a (UPC) Methods of Moments for Topic Models 4 / 30

The Single Topic Model A generative process for texts: We have k latent topics. Ruffini, Casanellas, Gavald` a (UPC) Methods of Moments for Topic Models 4 / 30

The Single Topic Model A generative process for texts: We have k latent topics. A text only deals with a unique topic i with probability ω i : P [Topic = i ] = ω i . Ruffini, Casanellas, Gavald` a (UPC) Methods of Moments for Topic Models 4 / 30

The Single Topic Model A generative process for texts: We have k latent topics. A text only deals with a unique topic i with probability ω i : P [Topic = i ] = ω i . Given the latent topic, all the words of a text are sampled from a discrete distribution with parameter µ i ∈ R d : P [Sample word j | topic = i ] = ( µ i ) j Ruffini, Casanellas, Gavald` a (UPC) Methods of Moments for Topic Models 4 / 30

The Single Topic Model A generative process for texts: We have k latent topics. A text only deals with a unique topic i with probability ω i : P [Topic = i ] = ω i . Given the latent topic, all the words of a text are sampled from a discrete distribution with parameter µ i ∈ R d : P [Sample word j | topic = i ] = ( µ i ) j Parameters: Notation: The topics M = [ µ 1 , ..., µ k ] ∈ R d × k . d vocabulary size. Weights ω = ( ω 1 , ..., ω k ) ∈ R k . x j one-hot encoded j th word of a document. Ruffini, Casanellas, Gavald` a (UPC) Methods of Moments for Topic Models 4 / 30

Latent Dirichlet Allocation Ruffini, Casanellas, Gavald` a (UPC) Methods of Moments for Topic Models 5 / 30

Latent Dirichlet Allocation A generative process for texts: We have k latent topics. Ruffini, Casanellas, Gavald` a (UPC) Methods of Moments for Topic Models 5 / 30

Latent Dirichlet Allocation A generative process for texts: We have k latent topics. A text deals with a multitude of topics, sampled from a Dirichlet distribution. Ruffini, Casanellas, Gavald` a (UPC) Methods of Moments for Topic Models 5 / 30

Latent Dirichlet Allocation A generative process for texts: We have k latent topics. A text deals with a multitude of topics, sampled from a Dirichlet distribution. First, you sample the topic proportions for the text h ≈ Dirichlet ( ω ) Ruffini, Casanellas, Gavald` a (UPC) Methods of Moments for Topic Models 5 / 30

Latent Dirichlet Allocation A generative process for texts: We have k latent topics. A text deals with a multitude of topics, sampled from a Dirichlet distribution. First, you sample the topic proportions for the text h ≈ Dirichlet ( ω ) Then you sample the latent topic of each word: P [Topic i ] = ( h ) i Ruffini, Casanellas, Gavald` a (UPC) Methods of Moments for Topic Models 5 / 30

Latent Dirichlet Allocation A generative process for texts: We have k latent topics. A text deals with a multitude of topics, sampled from a Dirichlet distribution. First, you sample the topic proportions for the text h ≈ Dirichlet ( ω ) Then you sample the latent topic of each word: P [Topic i ] = ( h ) i Last, you sample the word, depending on its topic: P [Sample word j | Topic = i ] = ( µ i ) j Ruffini, Casanellas, Gavald` a (UPC) Methods of Moments for Topic Models 5 / 30

Latent Dirichlet Allocation A generative process for texts: We have k latent topics. A text deals with a multitude of topics, sampled from a Dirichlet distribution. First, you sample the topic proportions for the text h ≈ Dirichlet ( ω ) Then you sample the latent topic of each word: P [Topic i ] = ( h ) i Last, you sample the word, depending on its topic: P [Sample word j | Topic = i ] = ( µ i ) j Parameters: The topics M = [ µ 1 , ..., µ k ] ∈ R d × k Weights ω = ( ω 1 , ..., ω k ) ∈ R k Ruffini, Casanellas, Gavald` a (UPC) Methods of Moments for Topic Models 5 / 30

Learning a Topic Model From an iid sample X = { x (1) , ..., x ( n ) } , x ( i ) = { x ( i ) 1 , x ( i ) 2 , x ( i ) 3 , ... } We want to recover the parameters of the model: Single Topic Model: ( µ 1 , ..., µ k , ω ) Latent Dirichlet Allocation: ( µ 1 , ..., µ k , ω ) Ruffini, Casanellas, Gavald` a (UPC) Methods of Moments for Topic Models 6 / 30

Learning a Topic Model From an iid sample X = { x (1) , ..., x ( n ) } , x ( i ) = { x ( i ) 1 , x ( i ) 2 , x ( i ) 3 , ... } We want to recover the parameters of the model: Single Topic Model: ( µ 1 , ..., µ k , ω ) Latent Dirichlet Allocation: ( µ 1 , ..., µ k , ω ) Likelihood-based methods: (EM, sampling, variational methods) Either very slow or poor guarantees. Ruffini, Casanellas, Gavald` a (UPC) Methods of Moments for Topic Models 6 / 30

Spectral Method of Moments [Anandkumar et al., (2014)] Ruffini, Casanellas, Gavald` a (UPC) Methods of Moments for Topic Models 7 / 30

Spectral Method of Moments [Anandkumar et al., (2014)] Applicable to any model admitting a parametrization in terms of centers and weights: M = [ µ 1 , ..., µ k ] ∈ R d × k , ω = ( ω 1 , ..., ω k ) ∈ R k Ruffini, Casanellas, Gavald` a (UPC) Methods of Moments for Topic Models 7 / 30

Spectral Method of Moments [Anandkumar et al., (2014)] Applicable to any model admitting a parametrization in terms of centers and weights: M = [ µ 1 , ..., µ k ] ∈ R d × k , ω = ( ω 1 , ..., ω k ) ∈ R k Find (model-dependent) estimators of the moments: ˆ ˆ ˆ M 1 ( X ) , M 2 ( X ) , M 3 ( X ) 1 k E [ ˆ � ω i µ i ∈ R d M 1 ] = M 1 = i =1 k E [ ˆ ω i µ i ⊗ µ i ∈ R d × d � M 2 ] = M 2 = i =1 k E [ ˆ � ω i µ i ⊗ µ i ⊗ µ i ∈ R d × d × d M 3 ] = M 3 = i =1 Ruffini, Casanellas, Gavald` a (UPC) Methods of Moments for Topic Models 7 / 30

Spectral Method of Moments [Anandkumar et al., (2014)] Applicable to any model admitting a parametrization in terms of centers and weights: M = [ µ 1 , ..., µ k ] ∈ R d × k , ω = ( ω 1 , ..., ω k ) ∈ R k Find (model-dependent) estimators of the moments: ˆ ˆ ˆ M 1 ( X ) , M 2 ( X ) , M 3 ( X ) 1 k E [ ˆ � ω i µ i ∈ R d M 1 ] = M 1 = i =1 k E [ ˆ ω i µ i ⊗ µ i ∈ R d × d � M 2 ] = M 2 = i =1 k E [ ˆ � ω i µ i ⊗ µ i ⊗ µ i ∈ R d × d × d M 3 ] = M 3 = i =1 Retrieve an estimate of model parameters (ˆ µ 1 , ..., ˆ µ k , ˆ ω ) with tensor decomposition: 2 k k k ˆ � ˆ � ˆ � M 1 ≈ ω i ˆ ˆ M 2 ≈ ω i ˆ ˆ µ i ⊗ ˆ M 3 ≈ ω i ˆ ˆ µ i ⊗ ˆ µ i ⊗ ˆ µ i , µ i , µ i i =1 i =1 i =1 Ruffini, Casanellas, Gavald` a (UPC) Methods of Moments for Topic Models 7 / 30

Pros and Cons Pros Fast – linear in the sample size . Reduce the model-learning task to a tensor decomposition problem. PAC-style guarantees. It is the ideal setting for topic models. Ruffini, Casanellas, Gavald` a (UPC) Methods of Moments for Topic Models 8 / 30

A New Method of Moments for Latent Variable Models Matteo Ruffini, - PowerPoint PPT Presentation

A New Method of Moments for Latent Variable Models Matteo Ruffini, Marta Casanellas, Ricard Gavald` a Universitat Polit` ecnica de Catalunya, Barcelona, Spain Ruffini, Casanellas, Gavald` a (UPC) Methods of Moments for Topic Models 1 / 30

1 Latent variable models In the next section we will discuss latent variable models for

Latent Variable Models CS3750 Xiaoting Li 1 Out utli line Latent Variable Models

Learning Overcomplete Latent Variable Models through Tensor Methods Anima Anandkumar UC Irvine

Part III: Latent Tree Models Le Song ICML 2012 Tutorial on Spectral Algorithms for Latent

Pengtao Xie Joint work with Yuntian Deng and Eric Xing Carnegie Mellon University 1 Latent

Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 6 Stefano

Learning Latent Variable Models through Tensor Methods Anima Anandkumar U.C. Irvine Challenges

APPLYING THE METHOD APPLYING THE METHOD OF MOMENTS TO OF MOMENTS TO DEVELOP RELIABILITY

Numberjack User Guide May 27, 2013 1 Variables Constructor for the class Variable : Constructor

Guaranteed Learning of Latent Variable Models through Tensor Methods Furong Huang University of

Discrete Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 15

Outline Latent Variable Generative Models Cooperative Vector Quantizer Model Model

Maximum Reconstruction Estimation for Generative Latent-Variable Models Yong Cheng joint work

Latent Variable models for GWAs Oliver Stegle Machine Learning and Computational Biology Research

A Method of Moments for Mixture Models and Hidden Markov Models Anima Anandkumar @ Daniel Hsu #

Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model CS330

HOME & HTF Cost Allocation Clinic January 13, 2020 Welcome & Introductions

Updates, And Selected Underwriting Best Practices/Tips and Tricks HUD-WMAC Conference, September

Warwick Conferences, 8 th March 2018 Agency Engagement Committee Amy Bewley Julie Shorrock

WELCOME Advanced Introduction to Philosophy Matthias Brinkmann

Some Remarks on Constrained Optimization Jos e Mario Mart nez www.ime.unicamp.br/

Software engineering Facts Fact : The economies of ALL developed nations are CSC 4181 -

The Affordable Care Act is Here: Now What? Michael S. Policar, MD, MPH Clinical Professor of

High Order Methods for Empirical Risk Minimization Alejandro Ribeiro Department of Electrical and

A New Method of Moments for Latent Variable Models Matteo Ruffini, - PowerPoint PPT Presentation

A New Method of Moments for Latent Variable Models Matteo Ruffini, Marta Casanellas, Ricard Gavald` a Universitat Polit` ecnica de Catalunya, Barcelona, Spain Ruffini, Casanellas, Gavald` a (UPC) Methods of Moments for Topic Models 1 / 30

1 Latent variable models In the next section we will discuss latent variable models for

Latent Variable Models CS3750 Xiaoting Li 1 Out utli line Latent Variable Models

Learning Overcomplete Latent Variable Models through Tensor Methods Anima Anandkumar UC Irvine

Part III: Latent Tree Models Le Song ICML 2012 Tutorial on Spectral Algorithms for Latent

Pengtao Xie Joint work with Yuntian Deng and Eric Xing Carnegie Mellon University 1 Latent

Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 6 Stefano

Learning Latent Variable Models through Tensor Methods Anima Anandkumar U.C. Irvine Challenges

APPLYING THE METHOD APPLYING THE METHOD OF MOMENTS TO OF MOMENTS TO DEVELOP RELIABILITY

Numberjack User Guide May 27, 2013 1 Variables Constructor for the class Variable : Constructor

Guaranteed Learning of Latent Variable Models through Tensor Methods Furong Huang University of

Discrete Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 15

Outline Latent Variable Generative Models Cooperative Vector Quantizer Model Model

Maximum Reconstruction Estimation for Generative Latent-Variable Models Yong Cheng joint work

Latent Variable models for GWAs Oliver Stegle Machine Learning and Computational Biology Research

A Method of Moments for Mixture Models and Hidden Markov Models Anima Anandkumar @ Daniel Hsu #

Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model CS330

HOME &amp; HTF Cost Allocation Clinic January 13, 2020 Welcome &amp; Introductions

Updates, And Selected Underwriting Best Practices/Tips and Tricks HUD-WMAC Conference, September

Warwick Conferences, 8 th March 2018 Agency Engagement Committee Amy Bewley Julie Shorrock

WELCOME Advanced Introduction to Philosophy Matthias Brinkmann

Some Remarks on Constrained Optimization Jos e Mario Mart nez www.ime.unicamp.br/

Software engineering Facts Fact : The economies of ALL developed nations are CSC 4181 -

The Affordable Care Act is Here: Now What? Michael S. Policar, MD, MPH Clinical Professor of

High Order Methods for Empirical Risk Minimization Alejandro Ribeiro Department of Electrical and

HOME & HTF Cost Allocation Clinic January 13, 2020 Welcome & Introductions