parts of speech more fine grained classes more fine

Parts of Speech More Fine-Grained Classes More - PowerPoint PPT Presentation

Parts of Speech More Fine-Grained Classes More Fine-Grained Classes Actually , I ran home extremely quickly yesterday The closed classes Example of POS tagging The Penn Treebank Part-of-Speech

  1. ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪

  2. Parts of Speech

  3. More Fine-Grained Classes

  4. More Fine-Grained Classes Actually , I ran home extremely quickly yesterday

  5. The closed classes

  6. Example of POS tagging

  7. The Penn Treebank Part-of-Speech Tagset

  8. The Universal POS tagset

  9. POS tagging goal: resolve POS ambiguities

  10. POS tagging

  11. Most Frequent Class Baseline The WSJ training corpus and test on sections 22-24 of the same corpus the most-frequent-tag baseline achieves an accuracy of 92.34%.

  12. Most Frequent Class Baseline The WSJ training corpus and test on sections 22-24 of the same corpus the most-frequent-tag baseline achieves an accuracy of 92.34%. ● 97% tag accuracy achievable by most algorithms (HMMs, MEMMs, neural networks, rule-based algorithms)

  13. Why POS tagging ▪ Text-to-speech ▪ record, lead, protest ▪ Lemmatization ▪ saw/V → see , saw/N → saw ▪ Preprocessing for harder disambiguation problems ▪ syntactic parsing ▪ semantic parsing

  14. Generative sequence labeling: Hidden Markov Models

  15. Hidden Markov Models ▪ In real world many events are not observable q 1 q 2 q n ▪ ... Speech recognition: we observe acoustic features but not the phones ▪ POS tagging: we observe words but o 1 o 2 o n not the POS tags

  16. HMM From J&M

  17. HMM example From J&M

  18. HMMs:Algorithms Forward Viterbi Forward–Backward; Baum–Welch From J&M

  19. HMM tagging as decoding

  20. HMM tagging as decoding How many possible choices?

  21. Part of speech tagging example Slide credit: Noah Smith

  22. The Viterbi Algorithm

  23. The Viterbi Algorithm

  24. The Viterbi Algorithm

  25. The Viterbi Algorithm

  26. Beam search

  27. HMMs:Algorithms Forward Viterbi Forward–Backward; Baum–Welch From J&M

  28. The Forward Algorithm sum instead of max

  29. Viterbi ▪ n -best decoding ▪ relationship to sequence alignment ▪

  30. Extending the HMM Algorithm to Trigrams

  31. Unknown Words ▪ Word shape ▪ lower case → x ▪ upper case → X ▪ numbers → d ▪ punctuation → . ▪ I.M.F → X.X.X ▪ DC10-30 → XXdd-dd ▪ Word shape + consecutive character types are removed ▪ DC10-30 → Xd-d ▪ Prefixes & suffixes ▪ -s, -ed, ing ▪

  32. Brants (2000) ▪ a trigram HMM ▪ handling unknown words ▪ 96.7% on the Penn Treebank

  33. Generative vs. Discriminative models ▪ Generative models specify a joint distribution over the labels and the data. With this you could generate new data ▪ Discriminative models specify the conditional distribution of the label y given the data x. These models focus on how to discriminate between the classes From Bamman

  34. Maximum Entropy Markov Models (MEMM) ▪ HMM ▪ MEMM

  35. Features in a MEMM

  36. Features in a MEMM ▪ well-dressed

  37. Decoding and Training MEMMs

  38. Decoding MEMMs greedy approach: doesn’t use evidence from future decisions

  39. Decoding MEMMs Viterbi ▪ filling the chart with ▪ HMM ▪ MEMM

  40. Bidirectionality ▪ Label bias or observation bias problem ▪ will/NN to/TO fight/VB ▪ Linear-chain CRF (Lafferty et al. 2001) ▪ A bidirectional version of the MEMM (Toutanova et al. 2003) ▪ bi-LSTM

  41. Neural sequence tagger ▪ Lample et al. 2016 ▪ Neural Architectures for NER

  42. Multilingual POS tagging ▪ In morphologically-rich languages like Czech, Hungarian, Turkish ▪ a 250,000 word token corpus of Hungarian has more than twice as many word types as a similarly sized corpus of English ▪ a 10 million word token corpus of Turkish contains four times as many word types as a similarly sized English corpus ▪ ⇒ many UNKs ▪ more information is coded in morphology

  43. Multilingual POS tagging ▪ In non-word-space languages like Chinese word segmentation is either applied before tagging or done jointly ▪ UNKs are difficult: the majority of unknown words are common nouns and verbs because of extensive compounding ▪ Universal POS tagset accounts for cross-linguistic differences

  44. Named Entity Recognition

  45. Named Entity tags

  46. Ambiguity in NER

  47. NER as Sequence Labeling IOB tagging scheme

  48. A feature-based algorithm for NER

  49. A feature-based algorithm for NER ▪ gazetteers ▪ a list of place names providing millions of entries for locations with detailed geographical and political information ▪ binary indicator features

  50. Evaluation of NER ▪ F-score ▪ segmentation is a confound ▪ e.g., American/B-ORG Airlines ▪ 2 errors: false positive for O and a false negative for I-ORG

  51. HMMs in Automatic Speech Recognition “speech lab” ssssssssppppeeeeeeetshshshshllllaeaeaebbbbb

  52. HMMs in Automatic Speech Recognition Words w 1 w 2 Language model s 1 s 2 s 3 s 4 s 5 s 6 s 7 Sound types Acoustic a 1 a 2 a 3 a 4 a 5 a 6 a 7 model Acoustic observations


More recommend