Learning about Language with Normalizing Flows Graham Neubig Language Technologies Institute, Carnegie Mellon University Chunting Zhou Junxian He Xuezhe Ma Di Wang, Daniel Spokoyny, Xian Li, Taylor Berg-Kirkpatrick, Eduard Hovy
Learning about Language? • Syntactic structure The cat sat on a green wall Parts-of-speech: DT NN VBD IN DT JJ NN Dependency: • Cross-lingual correspondences a cat green on sat the wall の は 上 壁 猫 緑 座った 2
Supervised Approaches X Y 3
Supervised Approaches Supervised Learning X θ X Y Y 4
Supervised Approaches Supervised Learning X θ X Y Y 5
Unsupervised Approaches X • Learning language models P(X) • Learning continuous features from language models (e.g. word2vec, skipthought, BERT) • But how do we turn this into interpretable structure ? • How do we do it while taking advantage of continuous features ? 6
Latent Variable Approaches ? Unsupervised Y θ X Y ? ? ? X 7
Latent Variable Approaches ? Unsupervised Y θ X Y ? ? ? X 8
Density Matching for Bilingual Word Embedding Chunting Zhou, Xuezhe Ma, Di Wang, Graham Neubig (NAACL 2019) 9
⽝ 猫 バスケット 地球 星 ピアッツ 学校 教授 梨 りんご Bilingual Word Embedding pear professor apple school piazza canine dog cat planet earth basketball • Map word embeddings from different languages into a single vector space - Cross-lingual transfer - Cross-lingual NLP tasks 10
Previous Work on Unsupervised BWE •Unsupervised methods of minimization some form of distance between distributions of discrete vector sets: • No direct probabilistic interpretation, not a "typical" unsupervised generative model
� � � Density Mapping for Bilingual Word Embedding (DeMa-BWE) Japanese Space English Space dog canine mapping function bird cat • Mapping function is learned with normalizing flow 12 12
<latexit sha1_base64="x6E2kl/41zLzrUTZRP09f040wUY=">AB8nicbVBNS8NAEJ3Ur1q/qh69LBahXkpSBT0WvXisYGsgDWz3bRLd7NhdyOU0J/hxYMiXv013vw3btsctPXBwO9GWbmRSln2rjut1NaW9/Y3CpvV3Z29/YPqodHXS0zRWiHSC6VH2FNOUtoxzDqZ8qikXE6WM0vp35j09UaSaTBzNJaSjwMGExI9hYKfBRTzOB2nX/vF+tuQ13DrRKvILUoEC7X/3qDSTJBE0M4VjrwHNTE+ZYGUY4nVZ6maYpJmM8pIGlCRZUh/n85Ck6s8oAxVLZSgyaq78nciy0nojIdgpsRnrZm4n/eUFm4uswZ0maGZqQxaI48hINPsfDZixPCJZgoZm9FZIQVJsamVLEheMsvr5Jus+FdNJr3l7XWTRFHGU7gFOrgwRW04A7a0AECEp7hFd4c47w4787HorXkFDPH8AfO5w+SX5Ai</latexit> <latexit sha1_base64="Z/gR+uIC+bj4zXnm516ybItP9Q=">AB9XicbVBNSwMxEJ31s9avqkcvwSJUkLJbBT0WvehFKtgPbNeSTdM2NMkuSVYpS/+HFw+KePW/ePfmLZ70NYHA4/3ZpiZF0ScaeO6387C4tLympmLbu+sbm1ndvZrekwVoRWSchD1QiwpxJWjXMcNqIFMUi4LQeDC7Hfv2RKs1CeWeGEfUF7knWZQbKz3co5ZmAt0U3GN0fdTO5d2iOwGaJ15K8pCi0s59tTohiQWVhnCsdNzI+MnWBlGOB1lW7GmESYD3KNSyUWVPvJ5OoROrRKB3VDZUsaNF/TyRYaD0Uge0U2PT1rDcW/Oaseme+wmTUWyoJNF3ZgjE6JxBKjDFCWGDy3BRDF7KyJ9rDAxNqisDcGbfXme1EpF76RYuj3Nly/SODKwDwdQA/OoAxXUIEqEFDwDK/w5jw5L8678zFtXDSmT34A+fzB7AykK0=</latexit> <latexit sha1_base64="IDq8cdH7NY313/u9S3rZypQ+wFI=">AB+nicbVBNS8NAEN3Ur1q/Uj16WSxCvZSkCnoRil48VrAf2Iay2W7apZtN2J0oJfanePGgiFd/iTf/jds2B219MPB4b4aZeX4suAbH+bZyK6tr6xv5zcLW9s7unl3cb+oUZQ1aCQi1faJZoJL1gAOgrVjxUjoC9byR9dTv/XAlOaRvINxzLyQDCQPOCVgpJ5dvMeXOilXRgyIJNy+6Rnl5yKMwNeJm5GSihDvWd/dfsRTUImgQqidcd1YvBSoBTwSaFbqJZTOiIDFjHUElCpr10dvoEHxulj4NImZKAZ+rviZSEWo9D3SGBIZ60ZuK/3mdBIL+UyToBJOl8UJAJDhKc54D5XjIYG0Ko4uZWTIdEQomrYIJwV18eZk0qxX3tFK9PSvVrI48ugQHaEyctE5qEbVEcNRNEjekav6M16sl6sd+tj3pqzspkD9AfW5w/CxZMI</latexit> <latexit sha1_base64="st0Jwp7pYeca65L8lAKjgjQIlE4=">ACSnicbVDLSiNBFK3OxFd8TByXboJA8kmdOuAbgTRjUuFSZRJh3C7cjsprH5QdVsMbX+fG1ezm49w40KR2UzlAZPEOVBw6pxz63GCVElDrvbKX0qr6yurW9UNre2dz5Xd7+0TZJpgS2RqETfBGBQyRhbJEnhTaoRokDhdXB7Pvav71AbmcQ/aJRiN4JBLEMpgKzUq0Lay30aIkFRv2/wE273P4t6OK82fIUhPXC/j1T3Qw0i91PQJEHxhWDxT78vGr6WgyE9Ko1t+lOwD8Sb0ZqbIbLXvWX309EFmFMQoExHc9NqZuPDxYKi4qfGUxB3MIAO5bGEKHp5pMqCv7NKn0eJtqumPhEnZ/ITJmFAU2GQENzbI3Fv/ndTIKj7u5jNOMBbTi8JMcUr4uFfelxoFqZElILS0b+ViCLYrsu1XbAne8pc/kvZB0ztsHlx9r52ezepYZ/vsK6szjx2xU3bBLlmLCfbIntkre3OenBfn3fkzjZac2cweW0Cp/BfWLQ</latexit> <latexit sha1_base64="/fc4xHZsNC/YMH6XAT25PjHrcA=">AB/3icbVDLSgNBEJyNrxhfq4IXL4NBiAfDbhT0IgS9eIxgHpisYXYymwyZfTDTK4R1D/6KFw+KePU3vPk3TpI9aGJBQ1HVTXeXGwmuwLK+jdzC4tLySn61sLa+sblbu80VBhLyuo0FKFsuUQxwQNWBw6CtSLJiO8K1nSHV2O/+cCk4mFwC6OIOT7pB9zjlICWuZeC19g7z45tNu0oEBA5KW7o6ZtEqWxPgeWJnpIgy1LrmV6cX0thnAVBlGrbVgROQiRwKlha6MSKRYQOSZ+1NQ2Iz5STO5P8aFWetgLpa4A8ET9PZEQX6mR7+pOn8BAzXpj8T+vHYN37iQ8iGJgAZ0u8mKBIcTjMHCPS0ZBjDQhVHJ9K6YDIgkFHVlBh2DPvjxPGpWyfVKu3JwWq5dZHm0jw5QCdnoDFXRNaqhOqLoET2jV/RmPBkvxrvxMW3NGdnMLvoD4/MHL4GU7g=</latexit> <latexit sha1_base64="aj1VrWgqSrqkgqJ/bLgtsTmRK/w=">AB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0mqoMeiF48t2FpoQ9lsJ+3azSbsboQS+gu8eFDEqz/Jm/GbZuDtj4YeLw3w8y8IBFcG9f9dgpr6xubW8Xt0s7u3v5B+fCoreNUMWyxWMSqE1CNgktsGW4EdhKFNAoEPgTj25n/8IRK81jem0mCfkSHkoecUWOlZtgvV9yqOwdZJV5OKpCj0S9/9QYxSyOUhgmqdzE+NnVBnOBE5LvVRjQtmYDrFrqaQRaj+bHzolZ1YZkDBWtqQhc/X3REYjrSdRYDsjakZ62ZuJ/3nd1ITXfsZlkhqUbLEoTAUxMZl9TQZcITNiYglitbCRtRZmx2ZRsCN7y6ukXat6F9Va87JSv8njKMIJnMI5eHAFdbiDBrSAcIzvMKb8+i8O/Ox6K14OQzx/AHzucPzD+M7g=</latexit> Normalizing Flows X = f − 1 ( Z ) θ Z = f θ ( X ) X ∼ P ( X ) Z ∼ N (0 , I ) Change of variable formula: � � � det( ∂ f θ ( x ) � � p θ ( x ) = p Z ( f θ ( x )) ) � � ∂ x � Intuitively, prevents degenerative mapping of everything to zero vector Normalizing Flow: A series of such invertible transformations f 13
� � � DeMa-BWE: Preliminaries Japanese Space English Space dog canine mapping function bird cat Notations: y ∈ R d x ∈ R d , : denote vectors in the src and tgt embedding space : denote an actual word in src and tgt vocabularies x i , y j f xy , f yx : denote src->tgt, and tgt-src mapping functions 14
⽝ 猫 ⿃ Prior Distribution • Assumption on the monolingual word embedding space: Gaussian mixture model 15
⽝ 猫 ⿃ DeMa-BWE: Density Matching • Sampling a continuous vector from the GMM x i ∼ π ( x i ) x ∼ ˜ p ( x | x i ) • Apply the mapping function to obtain the transformed dog f xy canine vector in the target space. f xy ( · ) = W xy · cat bird • Computing the density of x in the mapped target space • Objective: 16
Method Details • Weak Orthogonality Constraint: Try to make sure that the transformation is close to orthogonal • Weak Supervision w/ Identical Strings: Take advantage of the fact that identical strings are usually the same word in both languages • Alignment Selection Methods: Use cross-domain similarity local scaling (CSLS) 17
Experiments • Dataset and Tasks • Bilingual Lexicon Induction Task: MUSE dataset (Conneau el al., 2017) • Cross-lingual Word Similarity Task: SemEval 2017 • Languages • Baseline languages: en - es, de, fr, ru, zh, ja • Morphologically rich languages: en - et, fi, el, hu, pl, tr 18
Main Results on BLI (close languages) Procrustes(R) MUSE (U+R) SL-unsup-ID DeMa-BWE 85 81.25 77.5 73.75 70 en-de de-en en-es es-en 19
Main Results on BLI (distant languages) Procrustes(R) MUSE (U+R) DeMa-BWE 65 48.75 32.5 16.25 0 en-et et-en en-el el-en en-ja ja-en 20
Unsupervised Learning of Syntactic Structure w/ Invertible Neural Projections Junxian He, Graham Neubig, Taylor Berg-Kirkpatrick (EMNLP 2018) 21
Recommend
More recommend