an unsupervised method for uncovering morphological chains
play

An Unsupervised Method for Uncovering Morphological Chains Karthik - PowerPoint PPT Presentation

An Unsupervised Method for Uncovering Morphological Chains Karthik Narasimhan Regina Barzilay Tommi Jaakkola CSAIL, Massachusetts Institute of Technology 1 Morphological Chains 2 Morphological Chains Chains to model the formation of words.


  1. An Unsupervised Method for Uncovering Morphological Chains Karthik Narasimhan Regina Barzilay Tommi Jaakkola CSAIL, Massachusetts Institute of Technology 1

  2. Morphological Chains 2

  3. Morphological Chains Chains to model the formation of words. 2

  4. Morphological Chains Chains to model the formation of words. paint → painting → paintings 2

  5. Morphological Chains Chains to model the formation of words. paint → painting → paintings Richer representation than traditional scenarios 2

  6. Morphological Chains Chains to model the formation of words. paint → painting → paintings Richer representation than traditional scenarios Segmentation 2

  7. Morphological Chains Chains to model the formation of words. paint → painting → paintings Richer representation than traditional scenarios Paradigms Segmentation 2

  8. Morphological Chains Chains to model the formation of words. paint → painting → paintings Richer representation than traditional scenarios Paradigms Segmentation 2

  9. Our Approach Core Idea : Unsupervised discriminative model over pairs of words in the chain. paint → painting 3

  10. Our Approach Core Idea : Unsupervised discriminative model over pairs of words in the chain. paint → painting • Orthographic features 
 Morfessor (Goldwater and Johnson, 2004; Creutz and Lagus, 2007), Poon et al., 2009, Dreyer and Eisner, 2009, Sirts and Goldwater, 2013 3

  11. Our Approach Core Idea : Unsupervised discriminative model over pairs of words in the chain. paint → painting • Orthographic features 
 Morfessor (Goldwater and Johnson, 2004; Creutz and Lagus, 2007), Poon et al., 2009, Dreyer and Eisner, 2009, Sirts and Goldwater, 2013 • Semantic features 
 Schone and Jurafsky, 2000; Baroni et al., 2002 3

  12. Our Approach Core Idea : Unsupervised discriminative model over pairs of words in the chain. paint → painting • Orthographic features 
 Morfessor (Goldwater and Johnson, 2004; Creutz and Lagus, 2007), Poon et al., 2009, Dreyer and Eisner, 2009, Sirts and Goldwater, 2013 • Semantic features 
 Schone and Jurafsky, 2000; Baroni et al., 2002 • Handle transformations. (plan → planning) 3

  13. Textual Cues 4

  14. Textual Cues Orthographic 4

  15. Textual Cues Orthographic Patterns in the characters forming words. 4

  16. Textual Cues Orthographic Patterns in the characters forming words. paint pain paints pains painted pained 4

  17. Textual Cues Orthographic Patterns in the characters forming words. paint pain paints pains painted pained pain ran paint rant 4

  18. Textual Cues Orthographic Semantic Patterns in the Meaning embedded as characters forming vectors. words. paint pain paints pains painted pained pain ran paint rant 4

  19. Textual Cues Orthographic Semantic Patterns in the Meaning embedded as characters forming vectors. words. A B cos(A,B) paint pain paint paints 0.68 paints pains paint painted 0.60 painted pained pain pains 0.60 pain paint 0.11 ran rant 0.09 pain ran paint rant 4

  20. Textual Cues Orthographic Semantic Patterns in the Meaning embedded as characters forming vectors. words. A B cos(A,B) paint pain paint paints 0.68 paints pains paint painted 0.60 painted pained pain pains 0.60 pain paint 0.11 ran rant 0.09 pain ran paint rant 4

  21. Task Setup Training Word Vector Learning Unannotated word list Large text corpus with frequencies a 395134 ability 17793 able 56802 about 524355 Wikipedia 5

  22. 6

  23. Multiple chains possible for a word. nation → national → international → internationally nation → national → nationally → internationally 6

  24. Multiple chains possible for a word. nation → national → international → internationally nation → national → nationally → internationally Different chains can share word pairs. nation → national → international → internationally nation → national → nationalize 6

  25. Independence Assumption 7

  26. Independence Assumption Treat word-parent pairs separately 7

  27. Independence Assumption Treat word-parent pairs separately national Word ( w ) 7

  28. Independence Assumption Treat word-parent pairs separately national nation Suffix Word ( w ) Parent ( p ) Type ( t ) 7

  29. Independence Assumption Treat word-parent pairs separately national nation Suffix Word ( w ) Parent ( p ) Type ( t ) 7

  30. Independence Assumption Treat word-parent pairs separately national nation Suffix Word ( w ) Parent ( p ) Type ( t ) Candidate ( z ) 7

  31. national nation Suffix Word ( w ) Parent ( p ) Type ( t ) Candidate ( z ) 8

  32. national nation Suffix Word ( w ) Parent ( p ) Type ( t ) Candidate ( z ) P ( w, z ) ∝ e θ · φ ( w,z ) 8

  33. national nation Suffix Word ( w ) Parent ( p ) Type ( t ) Candidate ( z ) P ( w, z ) ∝ e θ · φ ( w,z ) Types - Prefix, Suffix, Transformations, Stop. 8

  34. Transformations • Templates for handling changes in stem during addition of affixes. • Repetition template: PQ → PQQR (for each Q in alphabet). Ex. plan → planning P Q R • Feature template for each transformation. 9

  35. Transformation types 10

  36. Transformation types 3 different transformations: 10

  37. Transformation types 3 different transformations: • Repetition (plan → planning) 10

  38. Transformation types 3 different transformations: • Repetition (plan → planning) • Deletion (decide → deciding) 10

  39. Transformation types 3 different transformations: • Repetition (plan → planning) • Deletion (decide → deciding) • Modification (carry → carried) 10

  40. Transformation types 3 different transformations: • Repetition (plan → planning) • Deletion (decide → deciding) • Modification (carry → carried) Trade-off between types of transformation and computational tractability. 10

  41. Transformation types 3 different transformations: • Repetition (plan → planning) • Deletion (decide → deciding) • Modification (carry → carried) Trade-off between types of transformation and computational tractability. • These three do well for a range of languages and are computationally tractable: max O(| ∑ | 2 ) for alphabet ∑ 10

  42. Features φ (w,z) 11

  43. Features φ (w,z) Orthographic 11

  44. Features φ (w,z) Orthographic • A ffi xes : Indicator feature for top affixes 11

  45. Features φ (w,z) Orthographic • A ffi xes : Indicator feature for top affixes • A ffi x Correlation : pairs of affixes sharing set of stems (inter-, re-), (under-, over-) 11

  46. Features φ (w,z) Orthographic • A ffi xes : Indicator feature for top affixes • A ffi x Correlation : pairs of affixes sharing set of stems (inter-, re-), (under-, over-) • Word freq. of parent 11

  47. Features φ (w,z) Orthographic • A ffi xes : Indicator feature for top affixes • A ffi x Correlation : pairs of affixes sharing set of stems (inter-, re-), (under-, over-) • Word freq. of parent • Transformation types with character bigrams 11

  48. Features φ (w,z) Orthographic Semantic • A ffi xes : Indicator feature • Cosine similarity for top affixes between word vectors of word and parent • A ffi x Correlation : pairs of affixes sharing set of stems (inter-, re-), (under-, over-) • Word freq. of parent • Transformation types with character bigrams 11

  49. Features φ (w,z) Orthographic Semantic • A ffi xes : Indicator feature • Cosine similarity for top affixes between word vectors of word and parent • A ffi x Correlation : pairs of affixes sharing set of stems (inter-, re-), (under-, over-) • Word freq. of parent • Transformation types with character bigrams Cosine similarity with player 11

  50. Learning 12

  51. Learning • Objective: 12

  52. Learning • Objective: e θ · φ ( w,z ) Y Y X Y X P ( w ) = P ( w, z ) = w 0 ∈ Σ ⇤ ,z 0 e θ · φ ( w 0 ,z 0 ) P w w z w z 12

  53. Learning • Objective: e θ · φ ( w,z ) Y Y X Y X P ( w ) = P ( w, z ) = w 0 ∈ Σ ⇤ ,z 0 e θ · φ ( w 0 ,z 0 ) P w w z w z • Optimize likelihood using convex optimization: LBFGS-B (with regularization) 12

  54. Learning • Objective: e θ · φ ( w,z ) Y Y X Y X P ( w ) = P ( w, z ) = w 0 ∈ Σ ⇤ ,z 0 e θ · φ ( w 0 ,z 0 ) P w w z w z • Optimize likelihood using convex optimization: LBFGS-B (with regularization) • Not tractable - requires summing over all possible strings in alphabet to calculate normalization constant, Z. 12

  55. Learning • Objective: e θ · φ ( w,z ) Y Y X Y X P ( w ) = P ( w, z ) = w 0 ∈ Σ ⇤ ,z 0 e θ · φ ( w 0 ,z 0 ) P w w z w z • Optimize likelihood using convex optimization: LBFGS-B (with regularization) • Not tractable - requires summing over all possible strings in alphabet to calculate normalization constant, Z. 12

  56. Contrastive Estimation 13

  57. Contrastive Estimation • Instead, we use Contrastive Estimation (Smith and Eisner, 2005): 13

  58. Contrastive Estimation • Instead, we use Contrastive Estimation (Smith and Eisner, 2005): • Neighborhood of invalid words for each word to take probability mass from. 13

Recommend


More recommend