part of speech tagging for twitter
play

Part-of-Speech Tagging for Twitter: Annotation, Features, and - PowerPoint PPT Presentation

Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments presented by: Pragati Shah Sally Gao Kennan Grant Overview 1. Introduction 2. Problem 3. Methodology 4. Results 5. Extensions 2 1. Introduction Primary goals


  1. Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments presented by: Pragati Shah Sally Gao Kennan Grant

  2. Overview 1. Introduction 2. Problem 3. Methodology 4. Results 5. Extensions 2

  3. 1. Introduction Primary goals and results Goals: ○ Enable richer text analysis of Twitter and other social media platforms ○ Provide case study on how to rapidly engineer core NLP system for new datasets Results: ○ ~90% accuracy on test corpus ○ Openly accessible annotated corpus and trained POS tagger 3

  4. 2. Problem: Why do we need a Twitter POS tagger? Twitter has 328 million monthly active users and is a fruitful source of user-generated content. However, POS tagging for Twitter is challenging. 1. Conversational tone 2. Unconventional orthography 3. Character limit (280 — used to be 140) 4

  5. 3. Methodology Summary 1,827 manually tagged tweets 1. Define Tagging Scheme 2. Create Features Develop tag set and manually Create additional features to annotate corpus incorporate into model 3. Build Tagger 4. Evaluate Conditional Random Field (CRF) Cross-validate and compare tagging accuracy against Stanford tagger 5

  6. 3. Methodology Tagset Development Aim: Develop intuitive tagset to maximize tagging consistency Steps: 1. Design coarse tagset: {Standard tags} + {Twitter-specific tags}. 2. Tokenize with Twitter tokenizer, and tag with Stanford POS tagger. 3. Correct automatic predictions of Step 2 with manual annotation. 4. Revise tokenization and tagging guidelines. 5. Correct annotations from Step 3. 6. Calculate annotator agreement. 7. Make final sweep to correct errors. 6

  7. 3. Methodology Tagset Development Cohen’s Kappa (κ) ◎ Measures inter-rater reliability In paper, κ = 0.914 ◎ i.e. the agreement between two raters who each classify N items into C mutually exclusive categories 7

  8. 3. Methodology Tagging Scheme Final Tagging Scheme: 25 tags Standard POS tags : (Nouns, Pronouns, Verbs, Adjectives etc.) ◎ Combined POS tags: {nominal, proper noun} × {verb, possessive} ◎ Twitter/online-specific tags: (#, @, URL & email-ids, emoticons and ◎ discourse markers). Miscellaneous Category tag (G): Multiword Abbreviations, Partial words, ◎ artifacts of tokenization errors, miscellaneous symbols, possessive endings 8

  9. 3. Methodology Tagging Scheme Tag Description Example S Nominal + possessive someone’s ^ Proper noun usa M Proper noun + verbal Mark’ll ! Interjection lol, haha, yea # Hashtag* #acl @ At-mention @BarackObama E Emoticon :-) G Other abbreviations, foreign words, ily [I love you] possessive endings, symbols, garbage ♫ --> *35% of hashtags were tagged with something other than # 9

  10. 3. Methodology Conditional Random Field ◎ Discriminative undirected probabilistic graphical model ○ Model global dependencies 10

  11. 3. Methodology Feature Engineering CRF enables the incorporation of arbitrary local features. Base features: ◎ A feature for each word type ◎ Features to check whether word contains digits or hyphens ◎ Suffix features ◎ Features looking at capitalization patterns 11

  12. 3. Methodology Feature Engineering TwOrth: Twitter Orthography. ◎ Regex-style rules to detect @ mentions, hashtags, URLs. ○ Names: Frequently capitalized tokens. ◎ Twitter users are inconsistent in their use of capitalization. ○ Likelihood of capitalization = ○ TagDict: Traditional tag dictionary. ◎ Features for POS tags from traditional tag dictionary (PTB). ○ DistSim: Distributional similarity. ◎ Representation of term similarity via distributional features. ○ Used 1.9 million tokens from 134,000 unlabeled tweets for 10,000 ○ most common terms. Metaph: Phonetic normalization. ◎ Used the Metaphone algorithm (1999) to create coarse phonetic ○ normalization, e.g. “lmao,” “lmaoo,” “lmaooo” map to LM. 12

  13. 3. Methodology Evaluation training set: 1,000 tweets (14,542 tokens) development set: 327 tweets (4,770 tokens) test set: 500 tweets (7,124 tokens) ◎ Trained Stanford tagger on labeled data ◎ Tuned Gaussian prior on development data ◎ In addition to tagger with full feature set, performed feature ablation experiments (remove one set of categories one at a time) 13

  14. 4. Results Tagging Accuracy CRF Tagger with full feature set Feature ablation experiments Stanford tagging accuracy Relative error reduction of 25% compared to the Stanford tagger 14

  15. 4. Results Challenges ◎ Despite the NAMES feature, the system struggles to identify proper nouns with non-standard capitalization ◎ The recall of proper nouns is only 71% ◎ The system also struggles with the miscellaneous category, G — accuracy of 26% 15

  16. 5. Extensions and Uses ◎ Cited by 739 according to Google Scholar ◎ Owoputi et al. (2013): Developed improved annotation guidelines ○ Improved annotations in the Gimpel et al. corpus ○ Twitter tagging improved from 90% to 93% accuracy ○ (state-of-the-art results) using large-scale unsupervised word clustering and new lexical features ◎ Mohammad et al. (2013) Used the Gimpel et al. POS tagger to build state-of-the-art ○ Twitter sentiment classifier. ◎ Lamb et al. (2013) Used the Gimpel et a. POS tagger to surveil the spread of flu ○ infections on Twitter. 16

  17. Thanks! Any questions? 17

Recommend


More recommend