learning about language with normalizing flows
play

Learning about Language with Normalizing Flows Graham Neubig - PowerPoint PPT Presentation

Learning about Language with Normalizing Flows Graham Neubig Language Technologies Institute, Carnegie Mellon University Chunting Zhou Junxian He Xuezhe Ma Di Wang, Daniel Spokoyny, Xian Li, Taylor Berg-Kirkpatrick, Eduard Hovy Learning


  1. Learning about Language with Normalizing Flows Graham Neubig Language Technologies Institute, Carnegie Mellon University Chunting Zhou Junxian He Xuezhe Ma Di Wang, Daniel Spokoyny, Xian Li, Taylor Berg-Kirkpatrick, Eduard Hovy

  2. Learning about Language? • Syntactic structure The cat sat on a green wall Parts-of-speech: DT NN VBD IN DT JJ NN Dependency: • Cross-lingual correspondences a cat green on sat the wall の は 上 壁 猫 緑 座った 2

  3. Supervised Approaches X Y 3

  4. Supervised Approaches Supervised Learning X θ X Y Y 4

  5. Supervised Approaches Supervised Learning X θ X Y Y 5

  6. Unsupervised Approaches X • Learning language models P(X) • Learning continuous features from language models (e.g. word2vec, skipthought, BERT) • But how do we turn this into interpretable structure ? • How do we do it while taking advantage of continuous features ? 6

  7. Latent Variable Approaches ? Unsupervised Y θ X Y ? ? ? X 7

  8. Latent Variable Approaches ? Unsupervised Y θ X Y ? ? ? X 8

  9. Density Matching for Bilingual Word Embedding Chunting Zhou, Xuezhe Ma, Di Wang, Graham Neubig (NAACL 2019) 9

  10. ⽝ 猫 バスケット 地球 星 ピアッツ 学校 教授 梨 りんご Bilingual Word Embedding pear professor apple school piazza canine dog cat planet earth basketball • Map word embeddings from different languages into a single vector space - Cross-lingual transfer - Cross-lingual NLP tasks 10

  11. Previous Work on Unsupervised BWE •Unsupervised methods of minimization some form of distance between distributions of discrete vector sets: • No direct probabilistic interpretation, not a "typical" unsupervised generative model

  12. � � � Density Mapping for Bilingual Word Embedding (DeMa-BWE) Japanese Space English Space dog canine mapping function bird cat • Mapping function is learned with normalizing flow 12 12

  13. <latexit sha1_base64="x6E2kl/41zLzrUTZRP09f040wUY=">AB8nicbVBNS8NAEJ3Ur1q/qh69LBahXkpSBT0WvXisYGsgDWz3bRLd7NhdyOU0J/hxYMiXv013vw3btsctPXBwO9GWbmRSln2rjut1NaW9/Y3CpvV3Z29/YPqodHXS0zRWiHSC6VH2FNOUtoxzDqZ8qikXE6WM0vp35j09UaSaTBzNJaSjwMGExI9hYKfBRTzOB2nX/vF+tuQ13DrRKvILUoEC7X/3qDSTJBE0M4VjrwHNTE+ZYGUY4nVZ6maYpJmM8pIGlCRZUh/n85Ck6s8oAxVLZSgyaq78nciy0nojIdgpsRnrZm4n/eUFm4uswZ0maGZqQxaI48hINPsfDZixPCJZgoZm9FZIQVJsamVLEheMsvr5Jus+FdNJr3l7XWTRFHGU7gFOrgwRW04A7a0AECEp7hFd4c47w4787HorXkFDPH8AfO5w+SX5Ai</latexit> <latexit sha1_base64="Z/gR+uIC+bj4zXnm516ybItP9Q=">AB9XicbVBNSwMxEJ31s9avqkcvwSJUkLJbBT0WvehFKtgPbNeSTdM2NMkuSVYpS/+HFw+KePW/ePfmLZ70NYHA4/3ZpiZF0ScaeO6387C4tLympmLbu+sbm1ndvZrekwVoRWSchD1QiwpxJWjXMcNqIFMUi4LQeDC7Hfv2RKs1CeWeGEfUF7knWZQbKz3co5ZmAt0U3GN0fdTO5d2iOwGaJ15K8pCi0s59tTohiQWVhnCsdNzI+MnWBlGOB1lW7GmESYD3KNSyUWVPvJ5OoROrRKB3VDZUsaNF/TyRYaD0Uge0U2PT1rDcW/Oaseme+wmTUWyoJNF3ZgjE6JxBKjDFCWGDy3BRDF7KyJ9rDAxNqisDcGbfXme1EpF76RYuj3Nly/SODKwDwdQA/OoAxXUIEqEFDwDK/w5jw5L8678zFtXDSmT34A+fzB7AykK0=</latexit> <latexit sha1_base64="IDq8cdH7NY313/u9S3rZypQ+wFI=">AB+nicbVBNS8NAEN3Ur1q/Uj16WSxCvZSkCnoRil48VrAf2Iay2W7apZtN2J0oJfanePGgiFd/iTf/jds2B219MPB4b4aZeX4suAbH+bZyK6tr6xv5zcLW9s7unl3cb+oUZQ1aCQi1faJZoJL1gAOgrVjxUjoC9byR9dTv/XAlOaRvINxzLyQDCQPOCVgpJ5dvMeXOilXRgyIJNy+6Rnl5yKMwNeJm5GSihDvWd/dfsRTUImgQqidcd1YvBSoBTwSaFbqJZTOiIDFjHUElCpr10dvoEHxulj4NImZKAZ+rviZSEWo9D3SGBIZ60ZuK/3mdBIL+UyToBJOl8UJAJDhKc54D5XjIYG0Ko4uZWTIdEQomrYIJwV18eZk0qxX3tFK9PSvVrI48ugQHaEyctE5qEbVEcNRNEjekav6M16sl6sd+tj3pqzspkD9AfW5w/CxZMI</latexit> <latexit sha1_base64="st0Jwp7pYeca65L8lAKjgjQIlE4=">ACSnicbVDLSiNBFK3OxFd8TByXboJA8kmdOuAbgTRjUuFSZRJh3C7cjsprH5QdVsMbX+fG1ezm49w40KR2UzlAZPEOVBw6pxz63GCVElDrvbKX0qr6yurW9UNre2dz5Xd7+0TZJpgS2RqETfBGBQyRhbJEnhTaoRokDhdXB7Pvav71AbmcQ/aJRiN4JBLEMpgKzUq0Lay30aIkFRv2/wE273P4t6OK82fIUhPXC/j1T3Qw0i91PQJEHxhWDxT78vGr6WgyE9Ko1t+lOwD8Sb0ZqbIbLXvWX309EFmFMQoExHc9NqZuPDxYKi4qfGUxB3MIAO5bGEKHp5pMqCv7NKn0eJtqumPhEnZ/ITJmFAU2GQENzbI3Fv/ndTIKj7u5jNOMBbTi8JMcUr4uFfelxoFqZElILS0b+ViCLYrsu1XbAne8pc/kvZB0ztsHlx9r52ezepYZ/vsK6szjx2xU3bBLlmLCfbIntkre3OenBfn3fkzjZac2cweW0Cp/BfWLQ</latexit> <latexit sha1_base64="/fc4xHZsNC/YMH6XAT25PjHrcA=">AB/3icbVDLSgNBEJyNrxhfq4IXL4NBiAfDbhT0IgS9eIxgHpisYXYymwyZfTDTK4R1D/6KFw+KePU3vPk3TpI9aGJBQ1HVTXeXGwmuwLK+jdzC4tLySn61sLa+sblbu80VBhLyuo0FKFsuUQxwQNWBw6CtSLJiO8K1nSHV2O/+cCk4mFwC6OIOT7pB9zjlICWuZeC19g7z45tNu0oEBA5KW7o6ZtEqWxPgeWJnpIgy1LrmV6cX0thnAVBlGrbVgROQiRwKlha6MSKRYQOSZ+1NQ2Iz5STO5P8aFWetgLpa4A8ET9PZEQX6mR7+pOn8BAzXpj8T+vHYN37iQ8iGJgAZ0u8mKBIcTjMHCPS0ZBjDQhVHJ9K6YDIgkFHVlBh2DPvjxPGpWyfVKu3JwWq5dZHm0jw5QCdnoDFXRNaqhOqLoET2jV/RmPBkvxrvxMW3NGdnMLvoD4/MHL4GU7g=</latexit> <latexit sha1_base64="aj1VrWgqSrqkgqJ/bLgtsTmRK/w=">AB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0mqoMeiF48t2FpoQ9lsJ+3azSbsboQS+gu8eFDEqz/Jm/GbZuDtj4YeLw3w8y8IBFcG9f9dgpr6xubW8Xt0s7u3v5B+fCoreNUMWyxWMSqE1CNgktsGW4EdhKFNAoEPgTj25n/8IRK81jem0mCfkSHkoecUWOlZtgvV9yqOwdZJV5OKpCj0S9/9QYxSyOUhgmqdzE+NnVBnOBE5LvVRjQtmYDrFrqaQRaj+bHzolZ1YZkDBWtqQhc/X3REYjrSdRYDsjakZ62ZuJ/3nd1ITXfsZlkhqUbLEoTAUxMZl9TQZcITNiYglitbCRtRZmx2ZRsCN7y6ukXat6F9Va87JSv8njKMIJnMI5eHAFdbiDBrSAcIzvMKb8+i8O/Ox6K14OQzx/AHzucPzD+M7g=</latexit> Normalizing Flows X = f − 1 ( Z ) θ Z = f θ ( X ) X ∼ P ( X ) Z ∼ N (0 , I ) Change of variable formula: � � � det( ∂ f θ ( x ) � � p θ ( x ) = p Z ( f θ ( x )) ) � � ∂ x � Intuitively, prevents degenerative mapping of everything to zero vector Normalizing Flow: A series of such invertible transformations f 13

  14. � � � DeMa-BWE: Preliminaries Japanese Space English Space dog canine mapping function bird cat Notations: y ∈ R d x ∈ R d , : denote vectors in the src and tgt embedding space : denote an actual word in src and tgt vocabularies x i , y j f xy , f yx : denote src->tgt, and tgt-src mapping functions 14

  15. ⽝ 猫 ⿃ Prior Distribution • Assumption on the monolingual word embedding space: Gaussian mixture model 15

  16. ⽝ 猫 ⿃ DeMa-BWE: Density Matching • Sampling a continuous vector from the GMM x i ∼ π ( x i ) x ∼ ˜ p ( x | x i ) • Apply the mapping function to obtain the transformed dog f xy canine vector in the target space. f xy ( · ) = W xy · cat bird • Computing the density of x in the mapped target space • Objective: 16

  17. Method Details • Weak Orthogonality Constraint: Try to make sure that the transformation is close to orthogonal • Weak Supervision w/ Identical Strings: Take advantage of the fact that identical strings are usually the same word in both languages • Alignment Selection Methods: Use cross-domain similarity local scaling (CSLS) 17

  18. Experiments • Dataset and Tasks • Bilingual Lexicon Induction Task: MUSE dataset (Conneau el al., 2017) • Cross-lingual Word Similarity Task: SemEval 2017 • Languages • Baseline languages: en - es, de, fr, ru, zh, ja • Morphologically rich languages: en - et, fi, el, hu, pl, tr 18

  19. Main Results on BLI (close languages) Procrustes(R) MUSE (U+R) SL-unsup-ID DeMa-BWE 85 81.25 77.5 73.75 70 en-de de-en en-es es-en 19

  20. Main Results on BLI (distant languages) Procrustes(R) MUSE (U+R) DeMa-BWE 65 48.75 32.5 16.25 0 en-et et-en en-el el-en en-ja ja-en 20

  21. Unsupervised Learning of Syntactic Structure w/ Invertible Neural Projections Junxian He, Graham Neubig, Taylor Berg-Kirkpatrick (EMNLP 2018) 21

Recommend


More recommend