maximum reconstruction estimation for generative latent
play

Maximum Reconstruction Estimation for Generative Latent-Variable - PowerPoint PPT Presentation

Maximum Reconstruction Estimation for Generative Latent-Variable Models Yong Cheng joint work with Yang Liu, Wei Xu 1 Problem Generative latent-variable models are important for natural language processing due to their capability of providing


  1. Maximum Reconstruction Estimation for Generative Latent-Variable Models Yong Cheng joint work with Yang Liu, Wei Xu 1

  2. Problem Generative latent-variable models are important for natural language processing due to their capability of providing compact representations of data. Maximum likelihood estimation su ff ers from a significant problem: it may guide the model to focus on explaining irrelevant but common correlations in the data. 2

  3. Maximum Reconstruction Estimation Circumvent irrelevant but common correlations by maximizing the probability of reconstructing observed data. 3

  4. Maximum Reconstruction Estimation Advantages: Direct learning of model parameters. Tractable inference. 4

  5. Maximum Likelihood Estimation A generative latent-variable model: Maximum likelihood estimation (MLE) Inference 5

  6. Maximum Reconstruction Estimation Objective: 6

  7. \ Maximum Reconstruction Estimation Objective: Prediction: 7

  8. Maximum Reconstruction Estimation Two classical generative latent-variable models: Hidden Markov models for unsupervised POS induction IBM translation models for unsupervised word alignment 8

  9. Hidden Markov Models for Unsupervised POS Induction Given an observed English sentence, the task is to induce the latent sequence of part-of-speech tags. 9

  10. Hidden Markov Models for Unsupervised POS Induction Given an observed English sentence, the task is to induce the latent sequence of part-of-speech tags. 10

  11. Hidden Markov Models for Unsupervised POS Induction Maximum Reconstruction Estimation ( MLE ) 11

  12. Hidden Markov Models for Unsupervised POS Induction Maximum Reconstruction Estimation ( MLE ) 12

  13. Hidden Markov Models for Unsupervised POS Induction Maximum Reconstruction Estimation ( MLE ) 13

  14. Hidden Markov Models for Unsupervised POS Induction Maximum Reconstruction Estimation ( MLE ) 14

  15. Hidden Markov Models for Unsupervised POS Induction Maximum Reconstruction Estimation ( MLE ) Maximum Reconstruction Estimation (MRE) 15

  16. Hidden Markov Models for Unsupervised POS Induction Maximum Reconstruction Estimation ( MLE ) Maximum Reconstruction Estimation (MRE) 16

  17. Experiments Comparison with MLE 17

  18. Experiments Comparison with MLE 18

  19. Experiments Comparison with CRF autoencoder 19

  20. Experiments Example emission probabilities for the POS “VBD” (verb past tense) 20

  21. IBM Translation Models for Unsupervised Word Alignment 21

  22. IBM Translation Models for Unsupervised Word Alignment Maximum Likelihood Estimation (MLE) Maximum Reconstruction Estimation (MRE) 22

  23. IBM Translation Models for Unsupervised Word Alignment Maximum Likelihood Estimation (MLE) Maximum Reconstruction Estimation (MRE) 23

  24. IBM Translation Models for Unsupervised Word Alignment Comparison with MLE 24

  25. Conclusion Conclusion We have presented maximum reconstruction estimation for training generative latent-variable models such as hidden Markov models and IBM translation models. In the future, we plan to apply our approach to more generative latent-variable models such as probabilistic context-free grammars and explore the possibility of developing new training algorithms that minimize reconstruction errors. 25

  26. Thank you ! 26

Recommend


More recommend