A Distributional and Orthographic Aggregation Model for English Derivational Morphology Daniel Deutsch,* John Hewitt,* and Dan Roth *equal contribution
Co-Authors John Hewitt Dan Roth Co-First Author Advisor 2
Derivational Morphology employ er employ employ ment intense ly intense intens ity 3
Derivational Morphology employ er employ employ ment intense ly intense intens ity transformation derived word root word 4
Derivational Morphology employ er employ employ ment intense ly intense intens ity transformation derived word root word 5
Motivation • Machine translation • Text simplification • Language generation 6
Challenges • Suffix ambiguity • Orthographic irregularity 7
Suffix Ambiguity “I have an observa ment !” 8
Suffix Ambiguity “I have an observa ment !” ground ing *ground ation ground *ground ment *ground al Result 9
Suffix Ambiguity “I have an observa ment !” ground ing *ground ation ground *ground ment *ground al Result valid ity valid *valid ness Nominal 10
Orthographic Irregularity Result speak spee ch 11
Orthographic Irregularity Result speak spee ch Result creak 12
Orthographic Irregularity Result speak spee ch Result creak *cree ch 13
Orthographic Irregularity Result speak spee ch Result creak *cree ch creak ing 14
Orthographic Irregularity Result speak spee ch Result creak *cree ch creak ing Result erupt erupt ion 15
Orthographic Irregularity Result speak spee ch Result creak *cree ch creak ing Result erupt erupt ion Result bankrupt 16
Orthographic Irregularity Result speak spee ch Result creak *cree ch creak ing Result erupt erupt ion Result bankrupt *bankrupt ion 17
Orthographic Irregularity Result speak spee ch Result creak *cree ch creak ing Result erupt erupt ion Result bankrupt *bankrupt ion bankrupt cy 18
Model Overview orthographic: wise suffix ambiguity wisely + Adverb aggregation distributional: orthographic irregularity 19
Model Overview wise + Adverb 20
Model Overview orthographic: suffix ambiguity distributional: orthographic irregularity 21
Model Overview wisely aggregation 22
Model Overview orthographic: wise suffix ambiguity wisely + Adverb aggregation distributional: orthographic irregularity 23
Model Overview orthographic: suffix ambiguity 24
Orthographic Model • Seq2Seq baseline • Dictionary-constrained decoding • Reranking with frequency information 25
Seq2Seq Baseline c o m p o s i # c o m p o s e # Result 26
Seq2Seq Baseline # c o m p o s e # Result 27
Seq2Seq Baseline # c o m p o s e # 28
Seq2Seq Baseline Result 29
Seq2Seq Baseline c o m p o s i 30
Dictionary-Constrained Decoding Suffix Ambiguity • Seq2Seq models generate many unattested words, but ground ing are reasonable guesses *ground ation ground *ground ment *ground al Result 31
Dictionary-Constrained Decoding Suffix Ambiguity • Seq2Seq models generate many unattested words, but ground ing are reasonable guesses *ground ation • Intuition: constrain model ground to only generate known *ground ment words *ground al Result 32
Dictionary-Constrained Decoding 33
Dictionary-Constrained Decoding … aa … a aba ab … abb # # … baa ba b … bab bb … 34
Dictionary-Constrained Decoding aa a aba ab abb # # baa ba b bab bb 35
Dictionary-Constrained Decoding … aa … a aba ab … abb # # … baa ba b … bab bb … 36
Dictionary-Constrained Decoding aa a aba ab abb # # baa ba b bab bb 37
Dictionary-Constrained Decoding … aa a aba ab abb # # baa ba b bab bb 38
Dictionary-Constrained Decoding … aa a aba ab abb # # baa ba b bab bb 39
Dictionary-Constrained Decoding … aa … a aba ab … abb # # … baa ba b … bab bb … 40
Dictionary-Constrained Decoding … aa … a aba ab … abb # # … baa ba b … bab bb … 41
Dictionary-Constrained Decoding a aba ab # # Search over trie ba b induced from dictionary 42
Reranking with Frequency Information refute Result 43
Reranking with Frequency Information refute Result Model Output Model Score refution -1.1 refutation -1.2 refut -4.8 refuty -5.6 refutat -8.7 44
Reranking with Frequency Information refute Result Model Output Model Score refution -1.1 refutation -1.2 refut -4.8 refuty -5.6 refutat -8.7 45
Reranking with Frequency Information refute Result Log Corpus Model Output Model Score Freq refution -1.1 5.0 refutation -1.2 14.3 refut -4.8 7.4 refuty -5.6 0.1 refutat -8.7 8.6 46
Reranking with Frequency Information refute Result Log Corpus Reranker Reranker Model Output Model Score Freq Output Score 0.5 refution refutation -1.1 5.0 -0.9 refutation refution -1.2 14.3 -0.9 refut refut -4.8 7.4 -0.9 refuty refuty -5.6 0.1 -0.9 refutat refutat -8.7 8.6 47
Model Overview orthographic: suffix ambiguity 48
Model Overview distributional: orthographic irregularity 49
Distributional Model • Orthographic information Orthographic can be unreliable Irregularity • Semantic transformation remains the same Result speak spee ch Result creak *cree ch creak ing 50
Distributional Model Intuition 51
Distributional Model Intuition 52
Distributional Model Intuition Learn non-linear function per transformation 53
Distributional Model Intuition Learn non-linear function per transformation Independent of orthography 54
Distributional Model non-linear function 55
Model Overview distributional: orthographic irregularity 56
Model Overview wisely aggregation 57
Aggregation Model Orthographic approv ation -0.2 approv al Distributional approv al -0.1 non-linear function 58
Aggregation Model Score Ortho Score Distributional approval approvation -0.9 -0.6 bankruptcy bankruption -0.3 -0.8 expertly expertly -0.5 -1.1 strolls stroller -0.8 -0.9 59
Aggregation Model Score Ortho Score Distributional approval approvation -0.9 -0.6 bankruptcy bankruption -0.3 -0.8 strolls stroller -0.8 -0.9 60
Aggregation Model Score Aggregation Selection Ortho Score Distributional approval approvation approval -0.9 -0.6 bankruptcy bankruption bankruption -0.3 -0.8 expertly expertly expertly -0.5 -1.1 strolls stroller stroller -0.8 -0.9 61
Experiments 62
Dataset Cotterell et al. 2017 Transformation Count Example wise wisely 1715 Adverb simulate simulation recite recital 1251 Result overstate overstatement yodel yodeler 801 Agent survive survivor intense intensity effective effectiveness 354 Nominal pessimistic pessimism 63
Experiment Details • 30 random restarts • Token information: Google Book NGrams – 360k unigram types – Token counts aggregated • Google News pre-trained word embeddings • Evaluation: full-token match accuracy 64
Results Legend Seq2Seq Distributional Aggregation Dictionary-Constrained Decoding Frequency-Based Reranking 65
Results 85 Unconstrained 80 75 oken Accuracy 70 65 60 Constrained 55 T 50 45 40 Dist Seq Aggr Seq+Freq Aggr+Freq Cotterell et al. 2017 66
Results 85 Unconstrained 80 75 oken Accuracy 70 65 60 Constrained 55 T 50 45 40 Dist Seq Aggr Seq+Freq Aggr+Freq Cotterell et al. 2017 67
Results 85 Unconstrained 80 75 oken Accuracy 70 65 60 Constrained 55 T 50 45 40 Dist Seq Aggr Seq+Freq Aggr+Freq Cotterell et al. 2017 68
Results Significant improvement when combining Dist and Seq 85 Unconstrained 80 75 oken Accuracy 70 65 60 Constrained 55 T 50 45 40 Dist Seq Aggr Seq+Freq Aggr+Freq Cotterell et al. 2017 69
Results Frequency statistics are a valuable signal 85 Unconstrained 80 75 oken Accuracy 70 65 60 Constrained 55 T 50 45 40 Dist Seq Aggr Seq+Freq Aggr+Freq Cotterell et al. 2017 70
Results Combined model still outperforms separate models 85 Unconstrained 80 75 oken Accuracy 70 65 60 Constrained 55 T 50 45 40 Dist Seq Aggr Seq+Freq Aggr+Freq Cotterell et al. 2017 71
Results 85 Unconstrained 80 75 oken Accuracy 70 65 60 Constrained 55 T 50 45 40 Dist Seq Aggr Seq+Freq Aggr+Freq Cotterell et al. 2017 72
Results 22% and 37% relative error reductions over Seq 85 Unconstrained 80 75 oken Accuracy 70 65 60 Constrained 55 T 50 45 40 Dist Seq Aggr Seq+Freq Aggr+Freq Cotterell et al. 2017 73
Results by Transformation Baseline 100 90 Cotterell et al. 2017 80 70 Aggr oken Accuracy 60 50 40 Aggr+Freq+Dict T 30 20 10 0 Nominal Result Agent Adverb 74
Recommend
More recommend