morse
play

MORSE: Semantic-ally Drive-n MORpheme SEgment-er Samuel MORSE - PowerPoint PPT Presentation

Tarek Sakakini Suma Bhat Pramod Viswanath MORSE: Semantic-ally Drive-n MORpheme SEgment-er Samuel MORSE minimized the number of on-off clicks for non-verbal communication. This MORSE minimizes the vocabulary size for Natural Language Processing


  1. Tarek Sakakini Suma Bhat Pramod Viswanath MORSE: Semantic-ally Drive-n MORpheme SEgment-er Samuel MORSE minimized the number of on-off clicks for non-verbal communication. This MORSE minimizes the vocabulary size for Natural Language Processing systems.

  2. 1 Morpheme Segmentation

  3. Morpheme Segmentation Hopefully

  4. Not a trivial task Player s Playing +ing +er Beijing +s Butterflies

  5. Applications Machine Translation Quick ly Sad Quickly Sad Model: Model: • Rapide • ment • Triste •Rapidement •Triste Sadly Sadly Test: Test: ??? Tristement

  6. Applications Information Retrieval Here at Toyota World, we have the cheap est car s in town. We are proudly called the first and last stop. … …

  7. Previous Work 2

  8. Letter Successor Variety (Harris, 1970) H e l p l e s s l y

  9. Morfessor (Creutz and Lagos, 2005) Help: 2387 Jump: 1847 Helping: 1586 Jumping: 1664 Helper: 498 Jumper: 1290 Helps: 2437 Jumps: 2987

  10. Downsides Freshman Butterfl ies Butterfly ies

  11. Locally Semantic Cosine similarity car caring car cars (Schone and Jurafsky, 2000) (Narasimhan et al., 2015) (Luo et al., 2017)

  12. Distinguishing criteria car cars fine fines player players wheel wheels runner runners car cars hand hands goal goals laptop laptops play plays lab labs

  13. MORSE 3 Unsupervised Input: Morphology Learning Word Embeddings 4 hyperparameters: Segmentation: Small tuning dataset Optimization Problem

  14. Step 1 Learning Morphology

  15. (Soricut and Och, 2015) Collecting candidate morphological rules Vocabulary: jump play buy jumping playing buying jumper player buyer ….. and stand (and, stand) (jump, jumping) (play, playing) (buy, buying) (suf, ∅ , ing): (jump, jumping) (play, playing) (buy, buying) (suf, ∅ , er): (jump, jumper) (play, player) (buy, buyer) (pre, ∅ , st): (and, stand) (one, stone) (ore, store)

  16. Signals Orthographic Semantic Word Embeddings quick quickly quick beautiful beautifully beautiful quickly confident confidently wrong beautifully wrong wrongly wrongly confident confidently

  17. What makes a good rule? Signal 1: Orthography Size = 8723 Rule = (suf, ∅ , ly) Rule = (pre, ∅ , st) Size= 16 (quick, quickly) (beautiful, beautifully) (confident, confidently) (ore, store) …… ……………………………………… (amp, stamp) ……………………. (wrong, wrongly)

  18. What makes a good rule? Signal 2: Semantics quick amp one stamp beautiful quickly and wrong store beautifully stone wrongly confident stand ore confidently

  19. What makes a good member of a rule? Scope: Vocabulary-Wide quick on only quickly confident wrong beautiful confidently wrongly beautifully

  20. What makes a good member of a rule? Scope: Local confident only confidently on

  21. Step 2 Segmenting

  22. Linear Optimization Problem (ring, uncaring) (caring, uncaring) (uncare, uncaring) t 1 t 2 uncaring t 3 t 4

  23. un + caring Iterate (car, caring) (care, caring) (carol, caring) t 1 t 2 caring t 3 t 4

  24. un + care + ing Iterate (car, care) (ca, care) (re, care) t 1 t 2 care t 3 t 4

  25. Experiments 4

  26. Experimental Setup Training Languages Gold Datasets Morpho Challenge jumping jump ing playing play ing jumps jump s calls call s rooms room s

  27. Experiments 80 70 70.32 60 64.35 50 40 38.07 30 34.06 31.01 20 14.98 10 0 English Turkish Finnish Morfessor MORSE

  28. Morpho Challenge downsides Business Non-compositional Turning - point Player ’ s Trivial instances Turning Human error

  29. Experiments New Dataset: SD17 ◉ 2000 words ◉ Compositional ◉ 91% inter-annotator agreement ◉ In canonical (butterfly + ies) and non-canonical version (butterfl + ies)

  30. Results on SD17 90 80 83.96 81.01 70 60 57.31 50 40 30 20 10 0 Morfessor MORSE MORSE (tuned on MC) (tuned on SD17) F-score

  31. Against state-of-the-art 90 80 83.96 79.9 70 67.4 67.14 60 50 40 30 20 10 0 MORSE MorphoChain Morfessor S + W Morfessor S + W+ L F-Scores

  32. Negative Dataset 50 45 43 40 ◉ 100 words like: honeymoon, 35 passport, outdoors 30 25 ◉ Checks for robustness 20 15 10 5 7 0 Morfessor MORSE #Segmentations

  33. Looking forward ◉ Robustness to highly agglutinative languages ◉ Extending to other languages (non-concatenative) k a t a b a A i

  34. Looking forward ◉ Morphological mappings across languages English French (suf, ∅ , ly) (suf, ∅ , ment) (suf, ∅ , s) (suf, ∅ , s) (suf, ∅ , es)

  35. Links https://morse.mybluemix.net https://github.com/yoonlee95/morse_segmentation

  36. Thank you Questions?

  37. Effect of Hyperparameters Precision Recall

  38. Prerequisite Morpho-syntactic regularities in word vectors Valid rule with an invalid instance Invalid rule (suf, ∅ , ing) (s, sing) (pre, ∅ , s) playing mile smile sing store play s cream jumping slay screaming jump tore scream scream lay

  39. Demo 4 morse.mybluemix.net

Recommend


More recommend