C OMPOSITIONAL M ORPHOLOGY FOR W ORD R EPRESENTATIONS AND L ANGUAGE M ODELLING Jan Botha , Phil Blunsom ICML 2014, Beijing
M OTIVATION P ROPOSED M ETHOD E XPERIMENTS M OTIVATING E XAMPLE W HAT WE SEE The king finally abdicated after years of unkingly conduct .
M OTIVATION P ROPOSED M ETHOD E XPERIMENTS M OTIVATING E XAMPLE W HAT WE SEE The king finally abdicated after years of unkingly conduct . Wait what – unkingly?
M OTIVATION P ROPOSED M ETHOD E XPERIMENTS M OTIVATING E XAMPLE W HAT WE SEE The king finally abdicated after years of unkingly conduct . Wait what – unkingly? unkingly 2n’kINli a word you have probably never seen, but still understand
M OTIVATION P ROPOSED M ETHOD E XPERIMENTS M OTIVATING E XAMPLE W HAT WE SEE The king finally abdicated after years of unkingly conduct . Wait what – unkingly? unkingly 2n’kINli a word you have probably never seen, but still understand ⇒ compositional morphology in action
M OTIVATION P ROPOSED M ETHOD E XPERIMENTS M OTIVATING E XAMPLE W HAT WE SEE The king finally abdicated after years of unkingly conduct . Wait what – unkingly? unkingly 2n’kINli a word you have probably never seen, but still understand ⇒ compositional morphology in action W HAT OUR MODELS SEE ( MOSTLY ) 10 2 95 529 11 88 21 50 74 239
M OTIVATION P ROPOSED M ETHOD E XPERIMENTS M OTIVATING E XAMPLE W HAT WE SEE The king finally abdicated after years of unkingly conduct . Wait what – unkingly? unkingly 2n’kINli a word you have probably never seen, but still understand ⇒ compositional morphology in action W HAT OUR MODELS SEE ( MOSTLY ) 10 2 95 529 11 88 21 50 74 239
M OTIVATION P ROPOSED M ETHOD E XPERIMENTS M OTIVATING E XAMPLE 2 Other languages display still more variation C ZECH T URKISH PRODUCTIVE DERIVATION Avrupa (Europe) CONJUGATION Avrupalı (of Europe) cistit (to clean) ˇ Avrupalıla¸ s (become of Europe) cistím ˇ Avrupalıla¸ stır (to Europeanise) cistíš ˇ Avrupalıla¸ stırama (be unable to Europeanise) cistí ˇ Avrupalıla¸ stıramadık (we were unable to Europeanise) cistíme ˇ . . . cistíte ˇ cistil ˇ cištˇ ˇ en cisti ˇ cistˇ ˇ ete cistˇ ˇ eme
M OTIVATION P ROPOSED M ETHOD E XPERIMENTS M OTIVATING E XAMPLE 2 Other languages display still more variation C ZECH T URKISH PRODUCTIVE DERIVATION Avrupa (Europe) CONJUGATION Avrupalı (of Europe) cistit (to clean) ˇ Avrupalıla¸ s (become of Europe) cistím ˇ Avrupalıla¸ stır (to Europeanise) cistíš ˇ Avrupalıla¸ stırama (be unable to Europeanise) cistí ˇ Avrupalıla¸ stıramadık (we were unable to Europeanise) cistíme ˇ . . . cistíte ˇ cistil ˇ cištˇ ˇ en ⇒ we should model morphemes! cisti ˇ cistˇ ˇ ete cistˇ ˇ eme
M OTIVATION P ROPOSED M ETHOD E XPERIMENTS R EPRESENTING WORDS ◮ Discrete set? {a, aardvark, . . . , account, accounted, accounting, . . . }
M OTIVATION P ROPOSED M ETHOD E XPERIMENTS R EPRESENTING WORDS ◮ Discrete set? {a, aardvark, . . . , account, accounted, accounting, . . . } ◮ Vector space? x 2 accounted account a aardvark x 1
M OTIVATION P ROPOSED M ETHOD E XPERIMENTS E XTRACT FROM C OLLOBERT & W ESTON E MBEDDINGS
M OTIVATION P ROPOSED M ETHOD E XPERIMENTS E XTRACT FROM C OLLOBERT & W ESTON E MBEDDINGS
M OTIVATION P ROPOSED M ETHOD E XPERIMENTS E XTRACT FROM C OLLOBERT & W ESTON E MBEDDINGS
M OTIVATION P ROPOSED M ETHOD E XPERIMENTS M ORPHEME VECTORS Existing word vectors already capture some morphology. ◮ − banks − − − − → bank ≈ − − → kings − − − − → king ≈ − − → queens − − − − − → − − → queen (Mikolov et al. 2013)
M OTIVATION P ROPOSED M ETHOD E XPERIMENTS M ORPHEME VECTORS Existing word vectors already capture some morphology. ◮ − banks − − − − → bank ≈ − − → kings − − − − → king ≈ − − → queens − − − − − → − − → queen (Mikolov et al. 2013) Logical extension: ◮ − kings ≈ − − − → king + − − → → - s ◮ − unkingly ≈ − − − − − → un - + − → king + − − → → - ly
M OTIVATION P ROPOSED M ETHOD E XPERIMENTS M ORPHEME VECTORS Existing word vectors already capture some morphology. ◮ − banks − − − − → bank ≈ − − → kings − − − − → king ≈ − − → queens − − − − − → − − → queen (Mikolov et al. 2013) Logical extension: ◮ − kings ≈ − − − → king + − − → → - s ◮ − unkingly ≈ − − − − − → un - + − → king + − − → → - ly H OW TO ... ◮ obtain morpheme vectors ◮ compose morpheme vectors ◮ do it all within a language model usable in an MT decoder
M OTIVATION P ROPOSED M ETHOD E XPERIMENTS M ORPHOLOGICAL COMPOSITION AS ADDITION Literally, word = sum of its parts?
M OTIVATION P ROPOSED M ETHOD E XPERIMENTS M ORPHOLOGICAL COMPOSITION AS ADDITION Literally, word = sum of its parts? Problems: hang + − − − → over � = − − → over + − − → − → ◮ bag of morphemes: hang greenhouse � = − − − − − − − − → green + − − − → − − → ◮ non-compositionality: house
M OTIVATION P ROPOSED M ETHOD E XPERIMENTS M ORPHOLOGICAL COMPOSITION AS ADDITION Literally, word = sum of its parts? Problems: − hang + − − → over � = − − → over + − − → − → ◮ bag of morphemes: hang − greenhouse � = − − − − − − − → green + − − − → − − → ◮ non-compositionality: house P RAGMATIC S OLUTION include word identity as component too: − − − − − − − → green stem + − − − − → − − → greenhouse ≡ house stem − − − − − → − → un pre + − king stem + − − → → unkingly ≡ ly suf
M OTIVATION P ROPOSED M ETHOD E XPERIMENTS M ORPHOLOGICAL COMPOSITION AS ADDITION Literally, word = sum of its parts? Problems: − hang + − − → over � = − − → over + − − → − → ◮ bag of morphemes: hang greenhouse � = − − − − − − − − → green + − − − → − − → ◮ non-compositionality: house P RAGMATIC S OLUTION include word identity as component too: − greenhouse ≡ − − − − − − − → greenhouse id + − − − − − − − → green stem + − − − → − − → house stem unkingly ≡ − − − − − − → unkingly id + − − − − − → → un pre + − king stem + − − → → ly suf
M OTIVATION P ROPOSED M ETHOD E XPERIMENTS S IMPLEST VECTOR - BASED PROBABILISTIC LM LBL (Log-bilinear model) (Mnih & Hinton, 2007; Mnih & Teh, 2012) “colorless green ideas sleep furiously .”
M OTIVATION P ROPOSED M ETHOD E XPERIMENTS A DD MORPHEME VECTORS INSIDE LM LBL ++ “colorless green ideas sleep furiously .”
M OTIVATION P ROPOSED M ETHOD E XPERIMENTS C OMPUTATIONAL E FFICIENCY Problem: Each probability query requires normalisation over vocabulary. ◮ O ( vocab size ) ◮ rich morphology ⇒ large vocabulary
M OTIVATION P ROPOSED M ETHOD E XPERIMENTS C OMPUTATIONAL E FFICIENCY Problem: Each probability query requires normalisation over vocabulary. ◮ O ( vocab size ) ◮ rich morphology ⇒ large vocabulary S OLUTION : D ECOMPOSE MODEL USING WORD CLASSES � � � � word | history = class ( word ) | history P P � � × P word | class ( word ) , history ◮ use unsupervised Brown-clustering √ ◮ each LM query becomes 2 × O ( vocab size ) ⇒ fast enough for MT-decoding
M OTIVATION P ROPOSED M ETHOD E XPERIMENTS E VALUATION O VERVIEW Setup ◮ 4-gram models ◮ Czech, English, French, German, Spanish, Russian ◮ train on 20–50m tokens ◮ large vocabularies (exclude 5% of singletons)
M OTIVATION P ROPOSED M ETHOD E XPERIMENTS E VALUATION O VERVIEW Setup ◮ 4-gram models ◮ Czech, English, French, German, Spanish, Russian ◮ train on 20–50m tokens ◮ large vocabularies (exclude 5% of singletons) Three evaluation contexts: ◮ Perplexity on test data ◮ Word similarity rating ◮ Machine translation
M OTIVATION P ROPOSED M ETHOD E XPERIMENTS E VALUATION O VERVIEW Three evaluation contexts: ◮ Perplexity on test data ◮ Word similarity rating ◮ Machine translation
M OTIVATION P ROPOSED M ETHOD E XPERIMENTS P ERPLEXITY I MPROVEMENTS BY L ANGUAGE CLBL → CLBL ++ 683 → 643 6 422 → 404 313 → 300 4 281 → 273 % 207 → 203 232 → 227 2 0 CS DE EN ES FR RU
M OTIVATION P ROPOSED M ETHOD E XPERIMENTS P ERPLEXITY I MPROVEMENTS ON G ERMAN CLBL → CLBL ++ (B REAK - DOWN BY TOKEN FREQUENCY ) 20 15 % 10 5 0 0 < 10 1 < 10 2 < 10 3 < 10 4 < 10 5 < 10 6 < 10 7 Bins of test token frequency
M OTIVATION P ROPOSED M ETHOD E XPERIMENTS E VALUATION O VERVIEW Three evaluation contexts: ◮ Perplexity on test data ◮ Word similarity rating ◮ Machine translation
M OTIVATION P ROPOSED M ETHOD E XPERIMENTS E VALUATION O VERVIEW Three evaluation contexts: ◮ Perplexity on test data ◮ Word similarity rating ◮ Machine translation
Recommend
More recommend