leveraging distributed representa0ons and lexico syntac0c
play

Leveraging distributed representa0ons and lexico-syntac0c fixedness - PowerPoint PPT Presentation

Leveraging distributed representa0ons and lexico-syntac0c fixedness for token-level predic0on of the idioma0city of English verbnoun combina0ons Milton King and Paul Cook University of New Brunswick Fredericton, Canada 1 Mul0word


  1. Leveraging distributed representa0ons and lexico-syntac0c fixedness for token-level predic0on of the idioma0city of English verb–noun combina0ons Milton King and Paul Cook University of New Brunswick Fredericton, Canada 1

  2. Mul0word Expressions • Expressions of mul0ple words that can exhibit an idioma0c meaning – Ivory tower – Hit up – Take a walk • Verb noun combina0ons – See stars – Kick the bucket 2

  3. Idioma0c vs Literal • Pull plug – (I) They pulled the plug on the Department of Health funding – (L) Unfortunately someone pulled the sink plug • See stars – (I) It caught him on the head and he went down seeing liAle sparkling stars – (L) It’s sDll dark enough to see the brightest stars 3

  4. Idiom Token Classifica0on • Determine if an MWE instance is idioma0c – They pulled the plug on the project [IdiomaDc/Literal] • Applica0ons – Machine transla0on • Kick the bucket [mourir/frapper avec le pied] – Sentence comple0on • Keegan is ready to pull the plug on [a deal / the tv] 4

  5. Overview of Approach • Supervised approach • VNC token instances are represented via use of an embedding model • Embedding models – Skip-thoughts – Word2vec – Siamese CBOW • SVM classifier 5

  6. Lexico-Syntac0c Fixedness • The idioma0c meaning of an expression is typically restricted to a small number of lexico-syntac0c paVerns • See star (Idioma0c) – Ac0ve voice, no determiner, plural noun • See stars • See star (Literal) – Ac0ve voice, determiner, singular noun • See a star – Passive voice, plural noun • Stars were seen 6

  7. PaVerns Afsaneh Fazly et al. 2009 7

  8. Canonical Form • Lexico-syntac0c paVerns that idioma0c usages tend to occur in Afsaneh Fazly et al. 2009 8

  9. Integra0ng Canonical Forms • Unsupervised method used in Fazly et al. to iden0fy canonical forms • One-dimensional binary vector represen0ng if the expression is in the canonical form 9

  10. VNC-Tokens Dataset Cook et al. 2008 • Dev • Test – 14 MWEs – 14 MWEs – Training – Training • 270 Idiom • 298 Idiom • 179 Literal • 172 Literal – Tes0ng – Tes0ng • 92 Idiom • 90 Idiom • 53 Literal • 53 Literal 10

  11. Accuracy 11

  12. Results per class 12

  13. Conclusion • Averaging word2vec embeddings outperforms all other models used • Canonical form feature improves results • Future work – Unseen MWEs – Other embedding models 13

  14. Thank you This work was financially supported by NSERC, NBIF, and University of New Brunswick 14

  15. Results per class 15

Recommend


More recommend