embeddings with almost
play

embeddings with (almost) no bilingual data Mik ikel l Art - PowerPoint PPT Presentation

Learning bilingual word embeddings with (almost) no bilingual data Mik ikel l Art rtetxe, Gorka Labaka, Eneko Agirre IXA NLP group University of the Basque Country (UPV/EHU) Who cares? Who cares? word embeddings are useful! Who


  1. Experiments • Dataset by Dinu et al. (2015) extended to German and Finnish ⇒ Monolingual embeddings (CBOW + negative sampling) ⇒ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals Eng English-Italian Eng English-German Eng English-Finnish 5,00 ,000 25 25 num num. 5,00 ,000 25 25 num num. 5,00 ,000 25 25 num num.

  2. Experiments • Dataset by Dinu et al. (2015) extended to German and Finnish ⇒ Monolingual embeddings (CBOW + negative sampling) ⇒ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals ⇒ Test dictionary: 1,500 word pairs Eng English-Italian Eng English-German Eng English-Finnish 5,00 ,000 25 25 num num. 5,00 ,000 25 25 num num. 5,00 ,000 25 25 num num.

  3. Experiments • Dataset by Dinu et al. (2015) extended to German and Finnish ⇒ Monolingual embeddings (CBOW + negative sampling) ⇒ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals ⇒ Test dictionary: 1,500 word pairs Eng English-Italian Eng English-German Eng English-Finnish 5,00 ,000 25 25 num. num 5,00 ,000 25 25 num num. 5,00 ,000 25 25 num num. wor ord translation inductio ion

  4. Experiments • Dataset by Dinu et al. (2015) extended to German and Finnish ⇒ Monolingual embeddings (CBOW + negative sampling) ⇒ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals ⇒ Test dictionary: 1,500 word pairs Eng English-Italian Eng English-German Eng English-Finnish 5,00 ,000 25 25 num num. 5,00 ,000 25 25 num num. 5,00 ,000 25 25 num num. Mikolov et al. (2013a) Xing et al. (2015) Zhang et al. (2016) Artetxe et al. (2016) wor ord translation inductio ion

  5. Experiments • Dataset by Dinu et al. (2015) extended to German and Finnish ⇒ Monolingual embeddings (CBOW + negative sampling) ⇒ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals ⇒ Test dictionary: 1,500 word pairs Eng English-Italian Eng English-German Eng English-Finnish 5,00 ,000 25 25 num num. 5,00 ,000 25 25 num num. 5,00 ,000 25 25 num. num Mikolov et al. (2013a) Xing et al. (2015) Zhang et al. (2016) Artetxe et al. (2016) Our method wor ord translation inductio ion

  6. Experiments • Dataset by Dinu et al. (2015) extended to German and Finnish ⇒ Monolingual embeddings (CBOW + negative sampling) ⇒ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals ⇒ Test dictionary: 1,500 word pairs Eng English-Italian Eng English-German English-Finnish Eng 5,00 ,000 25 25 num num. 5,00 ,000 25 25 num. num 5,00 ,000 25 25 num num. Mikolov et al. (2013a) 34.93% 0.00% 0.00% 35.00% 0.00% 0.07% 25.91% 0.00% 0.00% Xing et al. (2015) 36.87% 0.00% 0.13% 41.27% 0.07% 0.53% 28.23% 0.07% 0.56% Zhang et al. (2016) 36.73% 0.07% 0.27% 40.80% 0.13% 0.87% 28.16% 0.14% 0.42% Artetxe et al. (2016) 39.27% 0.07% 0.40% 41.87 .87% 0.13% 0.73% 30.62 .62% 0.21% 0.77% Our method 39.67 .67% 37.27 .27% 39.40 .40% 40.87% 39.60 .60% 40.27 .27% 28.72% 28.16 .16% 26.47 .47% wor ord translation inductio ion

  7. Experiments • Dataset by Dinu et al. (2015) extended to German and Finnish ⇒ Monolingual embeddings (CBOW + negative sampling) ⇒ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals ⇒ Test dictionary: 1,500 word pairs Eng English-Italian Eng English-German English-Finnish Eng 5,00 ,000 25 25 num num. 5,00 ,000 25 25 num. num 5,00 ,000 25 25 num num. Mikolov et al. (2013a) 34.93% 0.00% 0.00% 35.00% 0.00% 0.07% 25.91% 0.00% 0.00% Xing et al. (2015) 36.87% 0.00% 0.13% 41.27% 0.07% 0.53% 28.23% 0.07% 0.56% Zhang et al. (2016) 36.73% 0.07% 0.27% 40.80% 0.13% 0.87% 28.16% 0.14% 0.42% Artetxe et al. (2016) 39.27% 0.07% 0.40% 41.87 .87% 0.13% 0.73% 30.62 .62% 0.21% 0.77% Our method 39.67 .67% 37.27 .27% 39.40 .40% 40.87% 39.60 .60% 40.27 .27% 28.72% 28.16 .16% 26.47 .47% wor ord translation inductio ion

  8. Experiments • Dataset by Dinu et al. (2015) extended to German and Finnish ⇒ Monolingual embeddings (CBOW + negative sampling) ⇒ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals ⇒ Test dictionary: 1,500 word pairs Eng English-Italian Eng English-German English-Finnish Eng 5,00 ,000 25 25 num num. 5,00 ,000 25 25 num. num 5,00 ,000 25 25 num num. Mikolov et al. (2013a) 34.93% 0.00% 0.00% 35.00% 0.00% 0.07% 25.91% 0.00% 0.00% Xing et al. (2015) 36.87% 0.00% 0.13% 41.27% 0.07% 0.53% 28.23% 0.07% 0.56% Zhang et al. (2016) 36.73% 0.07% 0.27% 40.80% 0.13% 0.87% 28.16% 0.14% 0.42% Artetxe et al. (2016) 39.27% 0.07% 0.40% 41.87 .87% 0.13% 0.73% 30.62 .62% 0.21% 0.77% Our method 39.67 .67% 37.27 .27% 39.40 .40% 40.87% 39.60 .60% 40.27 .27% 28.72% 28.16 .16% 26.47 .47% wor ord translation inductio ion

  9. Experiments • Dataset by Dinu et al. (2015) extended to German and Finnish ⇒ Monolingual embeddings (CBOW + negative sampling) ⇒ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals ⇒ Test dictionary: 1,500 word pairs Eng English-Italian Eng English-German English-Finnish Eng 5,00 ,000 25 25 num num. 5,00 ,000 25 25 num. num 5,00 ,000 25 25 num num. Mikolov et al. (2013a) 34.93% 0.00% 0.00% 35.00% 0.00% 0.07% 25.91% 0.00% 0.00% Xing et al. (2015) 36.87% 0.00% 0.13% 41.27% 0.07% 0.53% 28.23% 0.07% 0.56% Zhang et al. (2016) 36.73% 0.07% 0.27% 40.80% 0.13% 0.87% 28.16% 0.14% 0.42% Artetxe et al. (2016) 39.27% 0.07% 0.40% 41.87 .87% 0.13% 0.73% 30.62 .62% 0.21% 0.77% Our method 39.67 .67% 37.27 .27% 39.40 .40% 40.87% 39.60 .60% 40.27 .27% 28.72% 28.16 .16% 26.47 .47% wor ord translation inductio ion

  10. Experiments • Dataset by Dinu et al. (2015) extended to German and Finnish ⇒ Monolingual embeddings (CBOW + negative sampling) ⇒ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals ⇒ Test dictionary: 1,500 word pairs Eng English-Italian Eng English-German English-Finnish Eng 5,00 ,000 25 25 num num. 5,00 ,000 25 25 num. num 5,00 ,000 25 25 num num. Mikolov et al. (2013a) 34.93% 0.00% 0.00% 35.00% 0.00% 0.07% 25.91% 0.00% 0.00% Xing et al. (2015) 36.87% 0.00% 0.13% 41.27% 0.07% 0.53% 28.23% 0.07% 0.56% Zhang et al. (2016) 36.73% 0.07% 0.27% 40.80% 0.13% 0.87% 28.16% 0.14% 0.42% Artetxe et al. (2016) 39.27% 0.07% 0.40% 41.87 .87% 0.13% 0.73% 30.62 .62% 0.21% 0.77% Our method 39.67 .67% 37.27 .27% 39.40 .40% 40.87% 39.60 .60% 40.27 .27% 28.72% 28.16 .16% 26.47 .47% wor ord translation inductio ion

  11. Experiments • Dataset by Dinu et al. (2015) extended to German and Finnish ⇒ Monolingual embeddings (CBOW + negative sampling) ⇒ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals ⇒ Test dictionary: 1,500 word pairs Eng English-Italian 5,00 ,000 25 25 num num. Mikolov et al. (2013a) 34.93% 0.00% 0.00% Xing et al. (2015) 36.87% 0.00% 0.13% Zhang et al. (2016) 36.73% 0.07% 0.27% Artetxe et al. (2016) 39.27% 0.07% 0.40% Our method 39.67 .67% 37.27 .27% 39.40 .40% wor ord translation inductio ion

  12. Experiments • Dataset by Dinu et al. (2015) extended to German and Finnish ⇒ Monolingual embeddings (CBOW + negative sampling) ⇒ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals

  13. Experiments • Dataset by Dinu et al. (2015) extended to German and Finnish ⇒ Monolingual embeddings (CBOW + negative sampling) ⇒ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals cros ossli lingual wor ord si simil ilarity

  14. Experiments • Dataset by Dinu et al. (2015) extended to German and Finnish ⇒ Monolingual embeddings (CBOW + negative sampling) ⇒ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals EN EN-IT IT EN EN-DE DE WS WS RG RG WS WS cros ossli lingual wor ord si simil ilarity

  15. Experiments • Dataset by Dinu et al. (2015) extended to German and Finnish ⇒ Monolingual embeddings (CBOW + negative sampling) ⇒ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals EN EN-IT IT EN EN-DE DE Bi Bi. . da data WS WS RG RG WS WS cros ossli lingual wor ord si simil ilarity

  16. Experiments • Dataset by Dinu et al. (2015) extended to German and Finnish ⇒ Monolingual embeddings (CBOW + negative sampling) ⇒ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals EN EN-IT IT EN EN-DE DE Bi Bi. . da data WS WS RG RG WS WS Luong et al. (2015) Europarl cros ossli lingual wor ord si simil ilarity

  17. Experiments • Dataset by Dinu et al. (2015) extended to German and Finnish ⇒ Monolingual embeddings (CBOW + negative sampling) ⇒ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals EN EN-IT IT EN EN-DE DE Bi Bi. . da data WS WS RG RG WS WS Luong et al. (2015) Europarl Mikolov et al. (2013a) 5k dict Xing et al. (2015) 5k dict Zhang et al. (2016) 5k dict Artetxe et al. (2016) 5k dict cros ossli lingual wor ord si simil ilarity

  18. Experiments • Dataset by Dinu et al. (2015) extended to German and Finnish ⇒ Monolingual embeddings (CBOW + negative sampling) ⇒ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals EN EN-IT IT EN EN-DE DE Bi Bi. . da data WS WS RG RG WS WS Luong et al. (2015) Europarl Mikolov et al. (2013a) 5k dict Xing et al. (2015) 5k dict Zhang et al. (2016) 5k dict Artetxe et al. (2016) 5k dict 5k dict Our method 25 dict num. cros ossli lingual wor ord si simil ilarity

  19. Experiments • Dataset by Dinu et al. (2015) extended to German and Finnish ⇒ Monolingual embeddings (CBOW + negative sampling) ⇒ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals EN EN-IT IT EN EN-DE DE Bi Bi. . da data WS WS RG RG WS WS Luong et al. (2015) Europarl 33.1% 33.5% 35.6% Mikolov et al. (2013a) 5k dict 62.7% 64.3% 52.8% Xing et al. (2015) 5k dict 61.4% 70.0% 59.5% Zhang et al. (2016) 5k dict 61.6% 70.4% 59.6% Artetxe et al. (2016) 5k dict 61.7% 71.6% 59.7% 5k dict 62.4% 74.2% 61.6% .6% Our method 25 dict 62.6% 74.9% .9% 61.2% num. 62.8% .8% 73.9% 60.4% cros ossli lingual wor ord si simil ilarity

  20. Experiments • Dataset by Dinu et al. (2015) extended to German and Finnish ⇒ Monolingual embeddings (CBOW + negative sampling) ⇒ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals EN EN-IT IT EN EN-DE DE Bi Bi. . da data WS WS RG RG WS WS Luong et al. (2015) Europarl 33.1% 33.5% 35.6% Mikolov et al. (2013a) 5k dict 62.7% 64.3% 52.8% Xing et al. (2015) 5k dict 61.4% 70.0% 59.5% Zhang et al. (2016) 5k dict 61.6% 70.4% 59.6% Artetxe et al. (2016) 5k dict 61.7% 71.6% 59.7% 5k dict 62.4% 74.2% 61.6% .6% Our method 25 dict 62.6% 74.9% .9% 61.2% num. 62.8% .8% 73.9% 60.4% cros ossli lingual wor ord si simil ilarity

  21. Experiments • Dataset by Dinu et al. (2015) extended to German and Finnish ⇒ Monolingual embeddings (CBOW + negative sampling) ⇒ Seed dictionary: 5,000 word pairs / 25 word pairs / numerals EN EN-IT IT EN EN-DE DE Bi Bi. . da data WS WS RG RG WS WS Luong et al. (2015) Europarl 33.1% 33.5% 35.6% Mikolov et al. (2013a) 5k dict 62.7% 64.3% 52.8% Xing et al. (2015) 5k dict 61.4% 70.0% 59.5% Zhang et al. (2016) 5k dict 61.6% 70.4% 59.6% Artetxe et al. (2016) 5k dict 61.7% 71.6% 59.7% 5k dict 62.4% 74.2% 61.6% .6% Our method 25 dict 62.6% 74.9% .9% 61.2% num. 62.8% .8% 73.9% 60.4% cros ossli lingual wor ord si simil ilarity

  22. Why does it work?

  23. Why does it work? Monolingual embeddings Dictionary Mapping Dictionary

  24. Why does it work? Monolingual embeddings small Dictionary Mapping Dictionary

  25. Why does it work? Monolingual embeddings small la large Dictionary Mapping Dictionary

  26. Why does it work? Monolingual embeddings small la large Dictionary Mapping Dictionary no o err rror

  27. Why does it work? Monolingual embeddings small la large Dictionary Mapping Dictionary no o err rror er errors

  28. Why does it work? Monolingual embeddings small la large Dictionary Mapping Dictionary no o err rror er errors Mapping Dictionary

  29. Why does it work? Monolingual embeddings small la large Dictionary Mapping Dictionary no o err rror er errors better? Mapping Dictionary

  30. Why does it work? Monolingual embeddings small large la Dictionary Mapping Dictionary no o err rror er errors better? Mapping Dictionary worse?

  31. Why does it work? Monolingual embeddings small la large Dictionary Mapping Dictionary no o err rror er errors better? Mapping Dictionary worse? Mapping Dictionary

  32. Why does it work? Monolingual embeddings small large la Dictionary Mapping Dictionary no o err rror er errors better? Mapping Dictionary worse? even en better? Mapping Dictionary

  33. Why does it work? Monolingual embeddings small large la Dictionary Mapping Dictionary no o err rror er errors better? Mapping Dictionary worse? even en better? Mapping Dictionary even en worse?

  34. Why does it work? 𝑌𝑋 𝑎

  35. Why does it work? 𝑌𝑋 𝑎 𝑋 ∗ = arg max s.t. 𝑋𝑋 𝑈 = 𝑋 𝑈 𝑋 = 𝐽 Implicit objective: ෍ max 𝑌 𝑗∗ 𝑋 ∙ 𝑎 𝑘∗ 𝑘 𝑋 𝑗

  36. Why does it work? 𝑌𝑋 𝑎 𝑋 ∗ = arg max s.t. 𝑋𝑋 𝑈 = 𝑋 𝑈 𝑋 = 𝐽 Implicit objective: ෍ max 𝑌 𝑗∗ 𝑋 ∙ 𝑎 𝑘∗ 𝑘 𝑋 𝑗 Independent from seed dictionary!

  37. Why does it work? 𝑌𝑋 𝑎 𝑋 ∗ = arg max s.t. 𝑋𝑋 𝑈 = 𝑋 𝑈 𝑋 = 𝐽 Implicit objective: ෍ max 𝑌 𝑗∗ 𝑋 ∙ 𝑎 𝑘∗ 𝑘 𝑋 𝑗

  38. Why does it work? 𝑌𝑋 𝑎 𝑋 ∗ = arg max s.t. 𝑋𝑋 𝑈 = 𝑋 𝑈 𝑋 = 𝐽 Implicit objective: ෍ max 𝑌 𝑗∗ 𝑋 ∙ 𝑎 𝑘∗ 𝑘 𝑋 𝑗

  39. Why does it work? 𝑌𝑋 𝑎 𝑋 ∗ = arg max s.t. 𝑋𝑋 𝑈 = 𝑋 𝑈 𝑋 = 𝐽 Implicit objective: ෍ max 𝑌 𝑗∗ 𝑋 ∙ 𝑎 𝑘∗ 𝑘 𝑋 𝑗

Recommend


More recommend