algorithms for nlp
play

Algorithms for NLP CS 11711, Fall 2019 Lecture 7: HMMs, POS tagging - PowerPoint PPT Presentation

Algorithms for NLP CS 11711, Fall 2019 Lecture 7: HMMs, POS tagging Yulia Tsvetkov 1 Readings for todays lecture J&M SLP3 https://web.stanford.edu/~jurafsky/slp3/8.pdf Collins (2011)


  1. Algorithms for NLP CS 11711, Fall 2019 Lecture 7: HMMs, POS tagging Yulia Tsvetkov 1

  2. Readings for today’s lecture ▪ J&M SLP3 https://web.stanford.edu/~jurafsky/slp3/8.pdf Collins (2011) ▪ http://www.cs.columbia.edu/~mcollins/hmms-spring2013.pdf 2

  3. Levels of linguistic knowledge Slide credit: Noah Smith 3

  4. Sequence Labeling ▪ map a sequence of words to a sequence of labels ▪ Part-of-speech tagging (Church, 1988; Brants, 2000) ▪ Named entity recognition (Bikel et al., 1999) ▪ Text chunking and shallow parsing (Ramshaw and Marcus, 1995) ▪ Word alignment of parallel text (Vogel et al., 1996) ▪ Compression (Conroy and O’Leary, 2001) ▪ Acoustic models, discourse segmentation, etc. 4

  5. Sequence labeling as classification 5

  6. Generative sequence labeling: Hidden Markov Models

  7. Markov Chain: weather the future is independent of the past given the present

  8. Markov Chain

  9. Markov Chain: words the future is independent of the past given the present

  10. Hidden Markov Models ▪ In real world many events are not observable q 1 q 2 q n ▪ ... Speech recognition: we observe acoustic features but not the phones ▪ POS tagging: we observe words but o 1 o 2 o n not the POS tags

  11. HMM From J&M

  12. HMM example From J&M

  13. Generative vs. Discriminative models ▪ Generative models specify a joint distribution over the labels and the data. With this you could generate new data ▪ Discriminative models specify the conditional distribution of the label y given the data x. These models focus on how to discriminate between the classes From Bamman

  14. Types of HMMs ▪ + many more From J&M

  15. HMM in Language Technologies ▪ Part-of-speech tagging (Church, 1988; Brants, 2000) ▪ Named entity recognition (Bikel et al., 1999) and other information extraction tasks ▪ Text chunking and shallow parsing (Ramshaw and Marcus, 1995) ▪ Word alignment of parallel text (Vogel et al., 1996) ▪ Acoustic models in speech recognition (emissions are continuous) ▪ Discourse segmentation (labeling parts of a document)

  16. HMM Parameters From J&M

  17. HMMs:Questions From J&M

  18. HMMs:Algorithms Forward Viterbi Forward–Backward; Baum–Welch From J&M

  19. HMM tagging as decoding

  20. HMM tagging as decoding

  21. HMM tagging as decoding

  22. HMM tagging as decoding

  23. HMM tagging as decoding

  24. HMM tagging as decoding

  25. HMM tagging as decoding How many possible choices?

  26. Part of speech tagging example Slide credit: Noah Smith

  27. Part of speech tagging example Greedy decoding? Slide credit: Noah Smith

  28. Part of speech tagging example Greedy decoding? Consider: “the old dog the footsteps of the young” Slide credit: Noah Smith

  29. The Viterbi Algorithm

  30. The Viterbi Algorithm

  31. The Viterbi Algorithm

  32. The Viterbi Algorithm

  33. The Viterbi Algorithm Complexity?

  34. Beam search

  35. Viterbi ▪ n -best decoding ▪ relationship to sequence alignment ▪

  36. HMMs:Algorithms Forward Viterbi Forward–Backward; Baum–Welch From J&M

  37. The Forward Algorithm sum instead of max

  38. Parts of Speech

  39. The closed classes

  40. More Fine-Grained Classes

  41. More Fine-Grained Classes

  42. The Penn Treebank Part-of-Speech Tagset

  43. The Universal POS tagset https://universaldependencies.org

  44. POS tagging

  45. POS tagging goal: resolve POS ambiguities

  46. POS tagging

  47. Most Frequent Class Baseline The WSJ training corpus and test on sections 22-24 of the same corpus the most-frequent-tag baseline achieves an accuracy of 92.34%.

  48. Most Frequent Class Baseline The WSJ training corpus and test on sections 22-24 of the same corpus the most-frequent-tag baseline achieves an accuracy of 92.34%. ● 97% tag accuracy achievable by most algorithms (HMMs, MEMMs, neural networks, rule-based algorithms)

Recommend


More recommend