aggregation model for english
play

Aggregation Model for English Derivational Morphology Daniel - PowerPoint PPT Presentation

A Distributional and Orthographic Aggregation Model for English Derivational Morphology Daniel Deutsch,* John Hewitt,* and Dan Roth *equal contribution Co-Authors John Hewitt Dan Roth Co-First Author Advisor 2 Derivational Morphology


  1. A Distributional and Orthographic Aggregation Model for English Derivational Morphology Daniel Deutsch,* John Hewitt,* and Dan Roth *equal contribution

  2. Co-Authors John Hewitt Dan Roth Co-First Author Advisor 2

  3. Derivational Morphology employ er employ employ ment intense ly intense intens ity 3

  4. Derivational Morphology employ er employ employ ment intense ly intense intens ity transformation derived word root word 4

  5. Derivational Morphology employ er employ employ ment intense ly intense intens ity transformation derived word root word 5

  6. Motivation • Machine translation • Text simplification • Language generation 6

  7. Challenges • Suffix ambiguity • Orthographic irregularity 7

  8. Suffix Ambiguity “I have an observa ment !” 8

  9. Suffix Ambiguity “I have an observa ment !” ground ing *ground ation ground *ground ment *ground al Result 9

  10. Suffix Ambiguity “I have an observa ment !” ground ing *ground ation ground *ground ment *ground al Result valid ity valid *valid ness Nominal 10

  11. Orthographic Irregularity Result speak spee ch 11

  12. Orthographic Irregularity Result speak spee ch Result creak 12

  13. Orthographic Irregularity Result speak spee ch Result creak *cree ch 13

  14. Orthographic Irregularity Result speak spee ch Result creak *cree ch creak ing 14

  15. Orthographic Irregularity Result speak spee ch Result creak *cree ch creak ing Result erupt erupt ion 15

  16. Orthographic Irregularity Result speak spee ch Result creak *cree ch creak ing Result erupt erupt ion Result bankrupt 16

  17. Orthographic Irregularity Result speak spee ch Result creak *cree ch creak ing Result erupt erupt ion Result bankrupt *bankrupt ion 17

  18. Orthographic Irregularity Result speak spee ch Result creak *cree ch creak ing Result erupt erupt ion Result bankrupt *bankrupt ion bankrupt cy 18

  19. Model Overview orthographic: wise suffix ambiguity wisely + Adverb aggregation distributional: orthographic irregularity 19

  20. Model Overview wise + Adverb 20

  21. Model Overview orthographic: suffix ambiguity distributional: orthographic irregularity 21

  22. Model Overview wisely aggregation 22

  23. Model Overview orthographic: wise suffix ambiguity wisely + Adverb aggregation distributional: orthographic irregularity 23

  24. Model Overview orthographic: suffix ambiguity 24

  25. Orthographic Model • Seq2Seq baseline • Dictionary-constrained decoding • Reranking with frequency information 25

  26. Seq2Seq Baseline c o m p o s i # c o m p o s e # Result 26

  27. Seq2Seq Baseline # c o m p o s e # Result 27

  28. Seq2Seq Baseline # c o m p o s e # 28

  29. Seq2Seq Baseline Result 29

  30. Seq2Seq Baseline c o m p o s i 30

  31. Dictionary-Constrained Decoding Suffix Ambiguity • Seq2Seq models generate many unattested words, but ground ing are reasonable guesses *ground ation ground *ground ment *ground al Result 31

  32. Dictionary-Constrained Decoding Suffix Ambiguity • Seq2Seq models generate many unattested words, but ground ing are reasonable guesses *ground ation • Intuition: constrain model ground to only generate known *ground ment words *ground al Result 32

  33. Dictionary-Constrained Decoding 33

  34. Dictionary-Constrained Decoding … aa … a aba ab … abb # # … baa ba b … bab bb … 34

  35. Dictionary-Constrained Decoding aa a aba ab abb # # baa ba b bab bb 35

  36. Dictionary-Constrained Decoding … aa … a aba ab … abb # # … baa ba b … bab bb … 36

  37. Dictionary-Constrained Decoding aa a aba ab abb # # baa ba b bab bb 37

  38. Dictionary-Constrained Decoding … aa a aba ab abb # # baa ba b bab bb 38

  39. Dictionary-Constrained Decoding … aa a aba ab abb # # baa ba b bab bb 39

  40. Dictionary-Constrained Decoding … aa … a aba ab … abb # # … baa ba b … bab bb … 40

  41. Dictionary-Constrained Decoding … aa … a aba ab … abb # # … baa ba b … bab bb … 41

  42. Dictionary-Constrained Decoding a aba ab # # Search over trie ba b induced from dictionary 42

  43. Reranking with Frequency Information refute Result 43

  44. Reranking with Frequency Information refute Result Model Output Model Score refution -1.1 refutation -1.2 refut -4.8 refuty -5.6 refutat -8.7 44

  45. Reranking with Frequency Information refute Result Model Output Model Score refution -1.1 refutation -1.2 refut -4.8 refuty -5.6 refutat -8.7 45

  46. Reranking with Frequency Information refute Result Log Corpus Model Output Model Score Freq refution -1.1 5.0 refutation -1.2 14.3 refut -4.8 7.4 refuty -5.6 0.1 refutat -8.7 8.6 46

  47. Reranking with Frequency Information refute Result Log Corpus Reranker Reranker Model Output Model Score Freq Output Score 0.5 refution refutation -1.1 5.0 -0.9 refutation refution -1.2 14.3 -0.9 refut refut -4.8 7.4 -0.9 refuty refuty -5.6 0.1 -0.9 refutat refutat -8.7 8.6 47

  48. Model Overview orthographic: suffix ambiguity 48

  49. Model Overview distributional: orthographic irregularity 49

  50. Distributional Model • Orthographic information Orthographic can be unreliable Irregularity • Semantic transformation remains the same Result speak spee ch Result creak *cree ch creak ing 50

  51. Distributional Model Intuition 51

  52. Distributional Model Intuition 52

  53. Distributional Model Intuition Learn non-linear function per transformation 53

  54. Distributional Model Intuition Learn non-linear function per transformation Independent of orthography 54

  55. Distributional Model non-linear function 55

  56. Model Overview distributional: orthographic irregularity 56

  57. Model Overview wisely aggregation 57

  58. Aggregation Model Orthographic approv ation -0.2 approv al Distributional approv al -0.1 non-linear function 58

  59. Aggregation Model Score Ortho Score Distributional approval approvation -0.9 -0.6 bankruptcy bankruption -0.3 -0.8 expertly expertly -0.5 -1.1 strolls stroller -0.8 -0.9 59

  60. Aggregation Model Score Ortho Score Distributional approval approvation -0.9 -0.6 bankruptcy bankruption -0.3 -0.8 strolls stroller -0.8 -0.9 60

  61. Aggregation Model Score Aggregation Selection Ortho Score Distributional approval approvation approval -0.9 -0.6 bankruptcy bankruption bankruption -0.3 -0.8 expertly expertly expertly -0.5 -1.1 strolls stroller stroller -0.8 -0.9 61

  62. Experiments 62

  63. Dataset Cotterell et al. 2017 Transformation Count Example wise wisely 1715 Adverb simulate simulation recite recital 1251 Result overstate overstatement yodel yodeler 801 Agent survive survivor intense intensity effective effectiveness 354 Nominal pessimistic pessimism 63

  64. Experiment Details • 30 random restarts • Token information: Google Book NGrams – 360k unigram types – Token counts aggregated • Google News pre-trained word embeddings • Evaluation: full-token match accuracy 64

  65. Results Legend Seq2Seq Distributional Aggregation Dictionary-Constrained Decoding Frequency-Based Reranking 65

  66. Results 85 Unconstrained 80 75 oken Accuracy 70 65 60 Constrained 55 T 50 45 40 Dist Seq Aggr Seq+Freq Aggr+Freq Cotterell et al. 2017 66

  67. Results 85 Unconstrained 80 75 oken Accuracy 70 65 60 Constrained 55 T 50 45 40 Dist Seq Aggr Seq+Freq Aggr+Freq Cotterell et al. 2017 67

  68. Results 85 Unconstrained 80 75 oken Accuracy 70 65 60 Constrained 55 T 50 45 40 Dist Seq Aggr Seq+Freq Aggr+Freq Cotterell et al. 2017 68

  69. Results Significant improvement when combining Dist and Seq 85 Unconstrained 80 75 oken Accuracy 70 65 60 Constrained 55 T 50 45 40 Dist Seq Aggr Seq+Freq Aggr+Freq Cotterell et al. 2017 69

  70. Results Frequency statistics are a valuable signal 85 Unconstrained 80 75 oken Accuracy 70 65 60 Constrained 55 T 50 45 40 Dist Seq Aggr Seq+Freq Aggr+Freq Cotterell et al. 2017 70

  71. Results Combined model still outperforms separate models 85 Unconstrained 80 75 oken Accuracy 70 65 60 Constrained 55 T 50 45 40 Dist Seq Aggr Seq+Freq Aggr+Freq Cotterell et al. 2017 71

  72. Results 85 Unconstrained 80 75 oken Accuracy 70 65 60 Constrained 55 T 50 45 40 Dist Seq Aggr Seq+Freq Aggr+Freq Cotterell et al. 2017 72

  73. Results 22% and 37% relative error reductions over Seq 85 Unconstrained 80 75 oken Accuracy 70 65 60 Constrained 55 T 50 45 40 Dist Seq Aggr Seq+Freq Aggr+Freq Cotterell et al. 2017 73

  74. Results by Transformation Baseline 100 90 Cotterell et al. 2017 80 70 Aggr oken Accuracy 60 50 40 Aggr+Freq+Dict T 30 20 10 0 Nominal Result Agent Adverb 74

Recommend


More recommend