▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪
Parts of Speech
More Fine-Grained Classes
More Fine-Grained Classes Actually , I ran home extremely quickly yesterday
The closed classes
Example of POS tagging
The Penn Treebank Part-of-Speech Tagset
The Universal POS tagset https://universaldependencies.org
POS tagging goal: resolve POS ambiguities
POS tagging
Most Frequent Class Baseline The WSJ training corpus and test on sections 22-24 of the same corpus the most-frequent-tag baseline achieves an accuracy of 92.34%.
Most Frequent Class Baseline The WSJ training corpus and test on sections 22-24 of the same corpus the most-frequent-tag baseline achieves an accuracy of 92.34%. ● 97% tag accuracy achievable by most algorithms (HMMs, MEMMs, neural networks, rule-based algorithms)
Why POS tagging ▪ Text-to-speech ▪ record, lead, protest ▪ Lemmatization ▪ saw/V → see , saw/N → saw ▪ Preprocessing for harder disambiguation problems ▪ syntactic parsing ▪ semantic parsing
Generative sequence labeling: Hidden Markov Models
Hidden Markov Models ▪ In real world many events are not observable q 1 q 2 q n ▪ ... Speech recognition: we observe acoustic features but not the phones ▪ POS tagging: we observe words but o 1 o 2 o n not the POS tags
HMM From J&M
HMM example From J&M
HMMs:Algorithms Forward Viterbi Forward–Backward; Baum–Welch From J&M
HMM tagging as decoding
HMM tagging as decoding How many possible choices?
Part of speech tagging example Slide credit: Noah Smith
The Viterbi Algorithm
The Viterbi Algorithm
The Viterbi Algorithm
The Viterbi Algorithm
Beam search
HMMs:Algorithms Forward Viterbi Forward–Backward; Baum–Welch From J&M
The Forward Algorithm sum instead of max
Viterbi ▪ n -best decoding ▪ relationship to sequence alignment ▪
Extending the HMM Algorithm to Trigrams
Unknown Words ▪ Word shape ▪ lower case → x ▪ upper case → X ▪ numbers → d ▪ punctuation → . ▪ I.M.F → X.X.X ▪ DC10-30 → XXdd-dd ▪ Word shape + consecutive character types are removed ▪ DC10-30 → Xd-d ▪ Prefixes & suffixes ▪ -s, -ed, ing ▪
Brants (2000) ▪ a trigram HMM ▪ handling unknown words ▪ 96.7% on the Penn Treebank
Generative vs. Discriminative models ▪ Generative models specify a joint distribution over the labels and the data. With this you could generate new data ▪ Discriminative models specify the conditional distribution of the label y given the data x. These models focus on how to discriminate between the classes From Bamman
Maximum Entropy Markov Models (MEMM) ▪ HMM ▪ MEMM
Features in a MEMM
Features in a MEMM ▪ well-dressed
Decoding and Training MEMMs
Decoding MEMMs greedy approach: doesn’t use evidence from future decisions
Decoding MEMMs Viterbi ▪ filling the chart with ▪ HMM ▪ MEMM
Bidirectionality ▪ Label bias or observation bias problem ▪ will/NN to/TO fight/VB ▪ Linear-chain CRF (Lafferty et al. 2001) ▪ A bidirectional version of the MEMM (Toutanova et al. 2003) ▪ bi-LSTM
Neural sequence tagger ▪ Lample et al. 2016 ▪ Neural Architectures for NER
Multilingual POS tagging ▪ In morphologically-rich languages like Czech, Hungarian, Turkish ▪ a 250,000 word token corpus of Hungarian has more than twice as many word types as a similarly sized corpus of English ▪ a 10 million word token corpus of Turkish contains four times as many word types as a similarly sized English corpus ▪ ⇒ many UNKs ▪ more information is coded in morphology
Multilingual POS tagging ▪ In non-word-space languages like Chinese word segmentation is either applied before tagging or done jointly ▪ UNKs are difficult: the majority of unknown words are common nouns and verbs because of extensive compounding ▪ Universal POS tagset accounts for cross-linguistic differences
Named Entity Recognition
Named Entity tags
Ambiguity in NER
NER as Sequence Labeling IOB tagging scheme
A feature-based algorithm for NER
A feature-based algorithm for NER ▪ gazetteers ▪ a list of place names providing millions of entries for locations with detailed geographical and political information ▪ binary indicator features
Evaluation of NER ▪ F-score ▪ segmentation is a confound ▪ e.g., American/B-ORG Airlines ▪ 2 errors: false positive for O and a false negative for I-ORG
HMMs in Automatic Speech Recognition “speech lab” ssssssssppppeeeeeeetshshshshllllaeaeaebbbbb
HMMs in Automatic Speech Recognition Words w 1 w 2 Language model s 1 s 2 s 3 s 4 s 5 s 6 s 7 Sound types Acoustic a 1 a 2 a 3 a 4 a 5 a 6 a 7 model Acoustic observations
Recommend
More recommend