models w latent random variables
play

Models w/ Latent Random Variables Chunting Zhou, Junxian He Site - PowerPoint PPT Presentation

CS11-747 Neural Networks for NLP Models w/ Latent Random Variables Chunting Zhou, Junxian He Site https://phontron.com/class/nn4nlp2019/ Slides from Graham Neubig Discriminative vs. Generative Models Discriminative model: calculate the


  1. CS11-747 Neural Networks for NLP Models w/ Latent Random Variables Chunting Zhou, Junxian He Site https://phontron.com/class/nn4nlp2019/ Slides from Graham Neubig

  2. Discriminative vs. Generative Models • Discriminative model: calculate the probability of output given input P(Y|X) • Generative model: calculate the probability of a variable P(X), or multiple variables P(X,Y) • Which of the following models are discriminative vs. generative? • Standard BiLSTM POS tagger • Globally normalized CRF POS tagger • Language model

  3. Types of Variables • Observed vs. Latent: • Observed: something that we can see from our data, e.g. X or Y • Latent: a variable that we assume exists, but we aren’t given the value • Deterministic vs. Random: • Deterministic: variables that are calculated directly according to some deterministic function • Random (stochastic): variables that obey a probability distribution, and may take any of several (or infinite) values

  4. Quiz: What Types of Variables? • In the an attentional sequence-to-sequence model using MLE/teacher forcing, are the following variables observed or latent? deterministic or random? • The input word ids f • The encoder hidden states h • The attention values a • The output word ids e

  5. Goal of Latent Random Variable Modeling • Specify structural relationships in the context of unknown variables, to learn interpretable structure • Inject inductive bias / prior knowledge

  6. What is Latent Random Variable Model • Older latent variable models • Topic models (unsupervised)

  7. What is Latent Random Variable Model • Older latent variable models • Topic models (unsupervised) • Hidden Markov Model (unsupervised tagger)

  8. What is Latent Random Variable Model • Older latent variable models • Topic models • Hidden Markov Model (unsupervised tagger) • Some tree-structured Model (unsupervised parsing)

  9. Why Latent Random Variable • Specify structure, but interpretable structure is often discrete • There is always a tradeo ff between interpretability and flexibility

  10. What is Latent Random Variable Model • Deep latent variable models • Variational Autoencoders (VAEs) • Generative Adversarial Network (GANs) • Flow-based generative models

  11. Variational Auto-encoders (Kingma and Welling 2014)

  12. A Latent Variable Model • We observed output x (assume a continuous vector for now) • We have a latent variable z generated from a Gaussian • We have a function f, parameterized by Θ that maps from z to x , where this function is usually a neural net z ~ N (0, I) Θ x = f( z ; Θ ) x N

  13. An Example (Goersch 2016) f z x

  14. A Latent Variable Model • We observed output x (assume a continuous vector for now) • We have a latent variable z generated from a Gaussian • We have a function f, parameterized by Θ that maps from z to x , where this function is usually a neural net z ~ N (0, I) Θ x = f( z ; Θ ) x N

  15. What is Our Loss Function? • We would like to maximize the corpus log likelihood X log P ( X ) = log P ( x ; θ ) x ∈ X • For a single example, the marginal likelihood is Z P ( x ; θ ) = P ( x | z ; θ ) P ( z ) d z • We can approximate this by sampling z s then summing X S ( x ) := { z 0 ; z 0 ∼ P ( z ) } P ( x ; θ ) ≈ P ( x | z ; θ ) where z ∈ S ( x )

  16. <latexit sha1_base64="726sRLPU0hZ9Kj5P1KihHSpU9D0=">ACWHicZVBNaxsxENVu03y4bWI7x1xETKEHY9YhkORmXAo5FJCnBi8xmjlWVtEH4s029os+0t6TX5U6J+J1vahTgbEPL15I5eknhMIpegvDzsfdvf2D2qfPXw6P6o3mvTO5TDgRho7TJgDKTQMUKCEYWaBqUTCQ/L4vZo/AbrhNF3uMxgrNhMi1Rwhp6a1I/iGdAYHFj5/9m3JSb0WdaFX0PehuQIts6nbSCM7iqeG5Ao1cMudG3SjDcEsCi6hrMW5g4zxRzaDkYeaKXDjYuW8pF89M6Wpsf5opCv2/42CKacYzr2yam5rVjFojHRtr8K5qlr1zOrulipJ6pdiaxL3RsjmF6OC6GzHEHztY80lxQNrVKiU2GBo1x6wLgV/iuUz5lH2WyYSVdZqsY/3CjF9LSIF2URJ6pYlOU2D2sePO9T7r7N9D0YnHWuOtGv81av4l7n5yQU/KNdMkF6ZFrcksGhJOc/CVP5Dn4FwbhXniwlobBZueYbFXYfAVZ7Zp</latexit> <latexit sha1_base64="726sRLPU0hZ9Kj5P1KihHSpU9D0=">ACWHicZVBNaxsxENVu03y4bWI7x1xETKEHY9YhkORmXAo5FJCnBi8xmjlWVtEH4s029os+0t6TX5U6J+J1vahTgbEPL15I5eknhMIpegvDzsfdvf2D2qfPXw6P6o3mvTO5TDgRho7TJgDKTQMUKCEYWaBqUTCQ/L4vZo/AbrhNF3uMxgrNhMi1Rwhp6a1I/iGdAYHFj5/9m3JSb0WdaFX0PehuQIts6nbSCM7iqeG5Ao1cMudG3SjDcEsCi6hrMW5g4zxRzaDkYeaKXDjYuW8pF89M6Wpsf5opCv2/42CKacYzr2yam5rVjFojHRtr8K5qlr1zOrulipJ6pdiaxL3RsjmF6OC6GzHEHztY80lxQNrVKiU2GBo1x6wLgV/iuUz5lH2WyYSVdZqsY/3CjF9LSIF2URJ6pYlOU2D2sePO9T7r7N9D0YnHWuOtGv81av4l7n5yQU/KNdMkF6ZFrcksGhJOc/CVP5Dn4FwbhXniwlobBZueYbFXYfAVZ7Zp</latexit> <latexit sha1_base64="726sRLPU0hZ9Kj5P1KihHSpU9D0=">ACWHicZVBNaxsxENVu03y4bWI7x1xETKEHY9YhkORmXAo5FJCnBi8xmjlWVtEH4s029os+0t6TX5U6J+J1vahTgbEPL15I5eknhMIpegvDzsfdvf2D2qfPXw6P6o3mvTO5TDgRho7TJgDKTQMUKCEYWaBqUTCQ/L4vZo/AbrhNF3uMxgrNhMi1Rwhp6a1I/iGdAYHFj5/9m3JSb0WdaFX0PehuQIts6nbSCM7iqeG5Ao1cMudG3SjDcEsCi6hrMW5g4zxRzaDkYeaKXDjYuW8pF89M6Wpsf5opCv2/42CKacYzr2yam5rVjFojHRtr8K5qlr1zOrulipJ6pdiaxL3RsjmF6OC6GzHEHztY80lxQNrVKiU2GBo1x6wLgV/iuUz5lH2WyYSVdZqsY/3CjF9LSIF2URJ6pYlOU2D2sePO9T7r7N9D0YnHWuOtGv81av4l7n5yQU/KNdMkF6ZFrcksGhJOc/CVP5Dn4FwbhXniwlobBZueYbFXYfAVZ7Zp</latexit> Variational Inference ≥ ELBO The inequality holds for any q (z|x), but the lower bound is tight only if q(z|x) = p(z|x) p(z|x) is intractable

  17. Practice Prove >= Hint: use Jensen’s inequality

Recommend


More recommend