part of speech tagging for historical english
play

Part-of-Speech Tagging for Historical English Yi Yang and Jacob - PowerPoint PPT Presentation

Part-of-Speech Tagging for Historical English Yi Yang and Jacob Eisenstein Georgia Tech Digital humaniEes research How does the portrayal of men and women differ in Shakespeares plays? Whats the language use paMerns in North


  1. Part-of-Speech Tagging for Historical English Yi Yang and Jacob Eisenstein Georgia Tech

  2. ‣ Digital humaniEes research ‣ How does the portrayal of men and women differ in Shakespeare’s plays? ‣ What’s the language use paMerns in North American slave narraEves? [Muralidharan and Hearst, 2011&2012]

  3. ‣ Digital humaniEes research ‣ How does the portrayal of men and women differ in Shakespeare’s plays? ‣ What’s the language use paMerns in North American slave narraEves? ‣ NLP can help! [Muralidharan and Hearst, 2011&2012]

  4. ‣ Digital humaniEes research ‣ How does the portrayal of men and women differ in Shakespeare’s plays? ‣ What’s the language use paMerns in North American slave narraEves? ‣ NLP can help! ‣ Only if NLP works for historical texts … [Muralidharan and Hearst, 2011&2012]

  5. Early Modern English Hee said nobody had said anything agt mee . [Henry Oxinden, 1660]

  6. Early Modern English He He against me Hee said nobody had said anything agt mee . ‣ Spelling variaEon [Henry Oxinden, 1660]

  7. Stanford POS Tagger Stanford: Hee said nobody had said anything agt mee . ‣ Spelling variaEon

  8. Stanford POS Tagger Gold: X X X Stanford: Hee said nobody had said anything agt mee . ‣ Spelling variaEon

  9. Transfer Loss for POS Tagging 25 20 Error rate 15 10 Modern English 5 3.0 0 [Rayson et al., 2007]

  10. Transfer Loss for POS Tagging 25 Early Modern English 20 18.0 Error rate 15 10 Modern English 5 3.0 0 [Rayson et al., 2007]

  11. Approaches ‣ Spelling normalizaEon } Rayson et al. (2007) ‣ Map from historical spellings to Scheible et al. (2011) contemporary forms. Bollmann (2011)

  12. Approaches ‣ Spelling normalizaEon } Rayson et al. (2007) ‣ Map from historical spellings to Scheible et al. (2011) contemporary forms. Bollmann (2011) ‣ Domain adaptaEon (this work) ‣ Build robust NLP systems with } Yang & Eisenstein (2014) representaEon learning. Yang & Eisenstein (2015)

  13. Spelling NormalizaEon Original: Hee said nobody had said anything agt mee . Normalized: Hee said nobody had said anything aged me . [VARD; Baron and Rayson, 2008]

  14. Spelling NormalizaEon Original: Hee said nobody had said anything agt mee . Normalized: Hee said nobody had said anything aged me . X ‣ Correct normalizaEon [VARD; Baron and Rayson, 2008]

  15. Spelling NormalizaEon against Original: Hee said nobody had said anything agt mee . Normalized: Hee said nobody had said anything aged me . X X ‣ Correct normalizaEon ‣ Incorrect normalizaEon [VARD; Baron and Rayson, 2008]

  16. Spelling NormalizaEon He against Original: Hee said nobody had said anything agt mee . Normalized: Hee said nobody had said anything aged me . X X X ‣ Correct normalizaEon ‣ Incorrect normalizaEon ‣ False negaEve [VARD; Baron and Rayson, 2008]

  17. Spelling NormalizaEon Gold: Stanford: Normalized: Hee said nobody had said anything aged me . X X X [VARD; Baron and Rayson, 2008]

  18. Spelling NormalizaEon Gold: X X Stanford: Normalized: Hee said nobody had said anything aged me . X X X [VARD; Baron and Rayson, 2008]

  19. RepresentaEon Learning Hee said nobody had said anything agt mee .

  20. RepresentaEon Learning Hee said nobody had said anything agt mee .

  21. RepresentaEon Learning Hee said nobody had said anything agt mee .

  22. RepresentaEon Learning Hee said nobody had said anything agt mee . OOV Context IV Context said said } } He was was I Hee came came We told told … … …

  23. Model

  24. Feature Embeddings Hee said nobody had said anything agt mee . [FEMA; Yang and Eisenstein, 2015]

  25. Feature Embeddings Hee said nobody had said anything agt mee . [FEMA; Yang and Eisenstein, 2015]

  26. Feature Embeddings Hee said nobody had said anything agt mee . CurrWord = hee } 1 NextWord = said 2 Prefix1 = h features 3 Suffix1 = e 4 … [FEMA; Yang and Eisenstein, 2015]

  27. Feature Embeddings Hee said nobody had said anything agt mee . CurrWord = hee } 1 NextWord = said 2 Prefix1 = h features 3 Suffix1 = e 4 … [FEMA; Yang and Eisenstein, 2015]

  28. Feature Embeddings Hee said nobody had said anything agt mee . CurrWord = hee } 1 NextWord = said 2 Prefix1 = h features 3 Suffix1 = e 4 … [FEMA; Yang and Eisenstein, 2015]

  29. Feature Embeddings Hee said nobody had said anything agt mee . CurrWord = hee } 1 NextWord = said 2 Prefix1 = h features 3 Suffix1 = e 4 … [FEMA; Yang and Eisenstein, 2015]

  30. Feature Embeddings > v t � � p ( f t | f 2 ) ∝ exp u 2 Input Output embeddings embeddings CurrWord = hee v 1 } 1 NextWord = said u 2 2 Prefix1 = h features v 3 3 Suffix1 = e v 4 4 … [FEMA; Yang and Eisenstein, 2015]

  31. Feature Embeddings > v t � � p ( f t | f 2 ) ∝ exp u 2 T X ` = log p ( f t | f 2 ) Input Output t 6 =2 embeddings embeddings CurrWord = hee v 1 } 1 NextWord = said u 2 2 Prefix1 = h features v 3 3 Suffix1 = e v 4 4 … [FEMA; Yang and Eisenstein, 2015]

  32. Word Embeddings hee } ‣ Word embeddings 1 said 2 nobody words 3 had 4 … CurrWord = hee ‣ Feature embeddings } 1 NextWord = said 2 Prefix1 = h features 3 Suffix1 = e 4 … [word2vec; Mikolov et al., 2013]

  33. Word Embeddings hee } ‣ Word embeddings 1 said 2 ‣ Generic representaEons nobody words 3 had 4 … CurrWord = hee ‣ Feature embeddings } 1 NextWord = said 2 Prefix1 = h features 3 Suffix1 = e 4 … [word2vec; Mikolov et al., 2013]

  34. Word Embeddings hee } ‣ Word embeddings 1 said 2 ‣ Generic representaEons nobody words 3 had 4 … CurrWord = hee ‣ Feature embeddings } 1 NextWord = said ‣ Task-specific representaEons 2 Prefix1 = h features 3 Suffix1 = e 4 … [word2vec; Mikolov et al., 2013]

  35. Word Embeddings hee } ‣ Word embeddings 1 said 2 ‣ Generic representaEons nobody words 3 ‣ Word co-occurrences had 4 … CurrWord = hee ‣ Feature embeddings } 1 NextWord = said ‣ Task-specific representaEons 2 Prefix1 = h features 3 Suffix1 = e 4 … [word2vec; Mikolov et al., 2013]

  36. Word Embeddings hee } ‣ Word embeddings 1 said 2 ‣ Generic representaEons nobody words 3 ‣ Word co-occurrences had 4 … CurrWord = hee ‣ Feature embeddings } 1 NextWord = said ‣ Task-specific representaEons 2 Prefix1 = h features 3 ‣ Feature co-occurrences Suffix1 = e 4 … [word2vec; Mikolov et al., 2013]

  37. Learning from MulEple Domains ‣ Previous work on unsupervised domain adaptaEon involves in two domains. [FEMA; Yang and Eisenstein, 2015]

  38. Learning from MulEple Domains ‣ Previous work on unsupervised domain adaptaEon involves in two domains. ‣ Unsupervised mulE-domain adaptaEon [FEMA; Yang and Eisenstein, 2015]

  39. Learning from MulEple Domains ‣ Previous work on unsupervised domain adaptaEon involves in two domains. ‣ Unsupervised mulE-domain adaptaEon [FEMA; Yang and Eisenstein, 2015]

  40. MulEple Feature Embeddings Hee said nobody had said anything agt mee . [FEMA; Yang and Eisenstein, 2015]

  41. MulEple Feature Embeddings Domain AMributes: Genre Epoch Hee said nobody had said anything agt mee . [FEMA; Yang and Eisenstein, 2015]

  42. MulEple Feature Embeddings Domain AMributes: Genre Epoch leMers 1600+ Hee said nobody had said anything agt mee . [FEMA; Yang and Eisenstein, 2015]

  43. MulEple Feature Embeddings Domain AMributes: Genre Epoch leMers 1600+ Hee said nobody had said anything agt mee . CurrWord = hee } 1 NextWord = said 2 Prefix1 = h features 3 Suffix1 = e 4 … [FEMA; Yang and Eisenstein, 2015]

  44. MulEple Feature Embeddings Domain AMributes: Genre Epoch leMers 1600+ Hee said nobody had said anything agt mee . CurrWord = hee } 1 NextWord = said = + + 2 (shared) (leMers) (1600+) Prefix1 = h features 3 Suffix1 = e 4 … [FEMA; Yang and Eisenstein, 2015]

  45. MulEple Feature Embeddings Domain AMributes: Genre Epoch leMers 1600+ Hee said nobody had said anything agt mee . CurrWord = hee } 1 NextWord = said = + + 2 (shared) (leMers) (1600+) Prefix1 = h features 3 Suffix1 = e 4 … [FEMA; Yang and Eisenstein, 2015]

  46. MulEple Feature Embeddings Hee said nobody had said anything agt mee . CurrWord = hee } 1 NextWord = said = + + 2 (shared) (leMers) (1600+) Prefix1 = h features 3 Suffix1 = e 4 … [FEMA; Yang and Eisenstein, 2015]

  47. MulEple Feature Embeddings = + + u 2 = h (shared) + h (letters) + h (1600+) 2 2 2 Hee said nobody had said anything agt mee . CurrWord = hee } 1 NextWord = said = + + 2 (shared) (leMers) (1600+) Prefix1 = h features 3 Suffix1 = e 4 … [FEMA; Yang and Eisenstein, 2015]

  48. MulEple Feature Embeddings > v t � � p ( f t | f 2 ) ∝ exp u 2 u 2 = h (shared) + h (letters) + h (1600+) 2 2 2 Hee said nobody had said anything agt mee . CurrWord = hee } 1 NextWord = said = + + 2 (shared) (leMers) (1600+) Prefix1 = h features 3 Suffix1 = e 4 … [FEMA; Yang and Eisenstein, 2015]

  49. Experiments

Recommend


More recommend