An Unsupervised Method for Uncovering Morphological Chains Karthik Narasimhan Regina Barzilay Tommi Jaakkola CSAIL, Massachusetts Institute of Technology 1
Morphological Chains 2
Morphological Chains Chains to model the formation of words. 2
Morphological Chains Chains to model the formation of words. paint → painting → paintings 2
Morphological Chains Chains to model the formation of words. paint → painting → paintings Richer representation than traditional scenarios 2
Morphological Chains Chains to model the formation of words. paint → painting → paintings Richer representation than traditional scenarios Segmentation 2
Morphological Chains Chains to model the formation of words. paint → painting → paintings Richer representation than traditional scenarios Paradigms Segmentation 2
Morphological Chains Chains to model the formation of words. paint → painting → paintings Richer representation than traditional scenarios Paradigms Segmentation 2
Our Approach Core Idea : Unsupervised discriminative model over pairs of words in the chain. paint → painting 3
Our Approach Core Idea : Unsupervised discriminative model over pairs of words in the chain. paint → painting • Orthographic features Morfessor (Goldwater and Johnson, 2004; Creutz and Lagus, 2007), Poon et al., 2009, Dreyer and Eisner, 2009, Sirts and Goldwater, 2013 3
Our Approach Core Idea : Unsupervised discriminative model over pairs of words in the chain. paint → painting • Orthographic features Morfessor (Goldwater and Johnson, 2004; Creutz and Lagus, 2007), Poon et al., 2009, Dreyer and Eisner, 2009, Sirts and Goldwater, 2013 • Semantic features Schone and Jurafsky, 2000; Baroni et al., 2002 3
Our Approach Core Idea : Unsupervised discriminative model over pairs of words in the chain. paint → painting • Orthographic features Morfessor (Goldwater and Johnson, 2004; Creutz and Lagus, 2007), Poon et al., 2009, Dreyer and Eisner, 2009, Sirts and Goldwater, 2013 • Semantic features Schone and Jurafsky, 2000; Baroni et al., 2002 • Handle transformations. (plan → planning) 3
Textual Cues 4
Textual Cues Orthographic 4
Textual Cues Orthographic Patterns in the characters forming words. 4
Textual Cues Orthographic Patterns in the characters forming words. paint pain paints pains painted pained 4
Textual Cues Orthographic Patterns in the characters forming words. paint pain paints pains painted pained pain ran paint rant 4
Textual Cues Orthographic Semantic Patterns in the Meaning embedded as characters forming vectors. words. paint pain paints pains painted pained pain ran paint rant 4
Textual Cues Orthographic Semantic Patterns in the Meaning embedded as characters forming vectors. words. A B cos(A,B) paint pain paint paints 0.68 paints pains paint painted 0.60 painted pained pain pains 0.60 pain paint 0.11 ran rant 0.09 pain ran paint rant 4
Textual Cues Orthographic Semantic Patterns in the Meaning embedded as characters forming vectors. words. A B cos(A,B) paint pain paint paints 0.68 paints pains paint painted 0.60 painted pained pain pains 0.60 pain paint 0.11 ran rant 0.09 pain ran paint rant 4
Task Setup Training Word Vector Learning Unannotated word list Large text corpus with frequencies a 395134 ability 17793 able 56802 about 524355 Wikipedia 5
6
Multiple chains possible for a word. nation → national → international → internationally nation → national → nationally → internationally 6
Multiple chains possible for a word. nation → national → international → internationally nation → national → nationally → internationally Different chains can share word pairs. nation → national → international → internationally nation → national → nationalize 6
Independence Assumption 7
Independence Assumption Treat word-parent pairs separately 7
Independence Assumption Treat word-parent pairs separately national Word ( w ) 7
Independence Assumption Treat word-parent pairs separately national nation Suffix Word ( w ) Parent ( p ) Type ( t ) 7
Independence Assumption Treat word-parent pairs separately national nation Suffix Word ( w ) Parent ( p ) Type ( t ) 7
Independence Assumption Treat word-parent pairs separately national nation Suffix Word ( w ) Parent ( p ) Type ( t ) Candidate ( z ) 7
national nation Suffix Word ( w ) Parent ( p ) Type ( t ) Candidate ( z ) 8
national nation Suffix Word ( w ) Parent ( p ) Type ( t ) Candidate ( z ) P ( w, z ) ∝ e θ · φ ( w,z ) 8
national nation Suffix Word ( w ) Parent ( p ) Type ( t ) Candidate ( z ) P ( w, z ) ∝ e θ · φ ( w,z ) Types - Prefix, Suffix, Transformations, Stop. 8
Transformations • Templates for handling changes in stem during addition of affixes. • Repetition template: PQ → PQQR (for each Q in alphabet). Ex. plan → planning P Q R • Feature template for each transformation. 9
Transformation types 10
Transformation types 3 different transformations: 10
Transformation types 3 different transformations: • Repetition (plan → planning) 10
Transformation types 3 different transformations: • Repetition (plan → planning) • Deletion (decide → deciding) 10
Transformation types 3 different transformations: • Repetition (plan → planning) • Deletion (decide → deciding) • Modification (carry → carried) 10
Transformation types 3 different transformations: • Repetition (plan → planning) • Deletion (decide → deciding) • Modification (carry → carried) Trade-off between types of transformation and computational tractability. 10
Transformation types 3 different transformations: • Repetition (plan → planning) • Deletion (decide → deciding) • Modification (carry → carried) Trade-off between types of transformation and computational tractability. • These three do well for a range of languages and are computationally tractable: max O(| ∑ | 2 ) for alphabet ∑ 10
Features φ (w,z) 11
Features φ (w,z) Orthographic 11
Features φ (w,z) Orthographic • A ffi xes : Indicator feature for top affixes 11
Features φ (w,z) Orthographic • A ffi xes : Indicator feature for top affixes • A ffi x Correlation : pairs of affixes sharing set of stems (inter-, re-), (under-, over-) 11
Features φ (w,z) Orthographic • A ffi xes : Indicator feature for top affixes • A ffi x Correlation : pairs of affixes sharing set of stems (inter-, re-), (under-, over-) • Word freq. of parent 11
Features φ (w,z) Orthographic • A ffi xes : Indicator feature for top affixes • A ffi x Correlation : pairs of affixes sharing set of stems (inter-, re-), (under-, over-) • Word freq. of parent • Transformation types with character bigrams 11
Features φ (w,z) Orthographic Semantic • A ffi xes : Indicator feature • Cosine similarity for top affixes between word vectors of word and parent • A ffi x Correlation : pairs of affixes sharing set of stems (inter-, re-), (under-, over-) • Word freq. of parent • Transformation types with character bigrams 11
Features φ (w,z) Orthographic Semantic • A ffi xes : Indicator feature • Cosine similarity for top affixes between word vectors of word and parent • A ffi x Correlation : pairs of affixes sharing set of stems (inter-, re-), (under-, over-) • Word freq. of parent • Transformation types with character bigrams Cosine similarity with player 11
Learning 12
Learning • Objective: 12
Learning • Objective: e θ · φ ( w,z ) Y Y X Y X P ( w ) = P ( w, z ) = w 0 ∈ Σ ⇤ ,z 0 e θ · φ ( w 0 ,z 0 ) P w w z w z 12
Learning • Objective: e θ · φ ( w,z ) Y Y X Y X P ( w ) = P ( w, z ) = w 0 ∈ Σ ⇤ ,z 0 e θ · φ ( w 0 ,z 0 ) P w w z w z • Optimize likelihood using convex optimization: LBFGS-B (with regularization) 12
Learning • Objective: e θ · φ ( w,z ) Y Y X Y X P ( w ) = P ( w, z ) = w 0 ∈ Σ ⇤ ,z 0 e θ · φ ( w 0 ,z 0 ) P w w z w z • Optimize likelihood using convex optimization: LBFGS-B (with regularization) • Not tractable - requires summing over all possible strings in alphabet to calculate normalization constant, Z. 12
Learning • Objective: e θ · φ ( w,z ) Y Y X Y X P ( w ) = P ( w, z ) = w 0 ∈ Σ ⇤ ,z 0 e θ · φ ( w 0 ,z 0 ) P w w z w z • Optimize likelihood using convex optimization: LBFGS-B (with regularization) • Not tractable - requires summing over all possible strings in alphabet to calculate normalization constant, Z. 12
Contrastive Estimation 13
Contrastive Estimation • Instead, we use Contrastive Estimation (Smith and Eisner, 2005): 13
Contrastive Estimation • Instead, we use Contrastive Estimation (Smith and Eisner, 2005): • Neighborhood of invalid words for each word to take probability mass from. 13
Recommend
More recommend