on the limitations of unsupervised bilingual dictionary
play

On the Limitations of Unsupervised Bilingual Dictionary Induction - PowerPoint PPT Presentation

On the Limitations of Unsupervised Bilingual Dictionary Induction Anders Sgaard Sebastian Ruder Ivan Vuli Background: Unsupervised MT 2 Background: Unsupervised MT Recently: Unsupervised neural machine translation (Artetxe


  1. On the Limitations of Unsupervised Bilingual Dictionary Induction Anders Søgaard Sebastian Ruder Ivan Vuli ć

  2. Background: 
 Unsupervised MT 2

  3. Background: 
 Unsupervised MT ‣ Recently: Unsupervised neural machine translation (Artetxe et al., ICLR 2018; Lample et al., ICLR 2018) 2

  4. Background: 
 Unsupervised MT ‣ Recently: Unsupervised neural machine translation (Artetxe et al., ICLR 2018; Lample et al., ICLR 2018) 2

  5. Background: 
 Unsupervised MT ‣ Recently: Unsupervised neural machine translation (Artetxe et al., ICLR 2018; Lample et al., ICLR 2018) ‣ Key component: Initialization via unsupervised cross-lingual alignment of word embedding spaces 2

  6. Background: 
 Cross-lingual word embeddings 3

  7. Background: 
 Cross-lingual word embeddings ‣ Cross-lingual word embeddings enable cross-lingual transfer 3

  8. Background: 
 Cross-lingual word embeddings ‣ Cross-lingual word embeddings enable cross-lingual transfer ‣ Most common approach: Project one word embedding space into another by learning a transformation matrix 
 W between source embeddings and their translations n x i y i 3

  9. Background: 
 Cross-lingual word embeddings ‣ Cross-lingual word embeddings enable cross-lingual transfer ‣ Most common approach: Project one word embedding space into another by learning a transformation matrix 
 W between source embeddings and their translations n x i y i n ∑ ∥ Wx i − y i ∥ 2 (Mikolov et al., 2013) i =1 3

  10. Background: 
 Cross-lingual word embeddings ‣ Cross-lingual word embeddings enable cross-lingual transfer ‣ Most common approach: Project one word embedding space into another by learning a transformation matrix 
 W between source embeddings and their translations n x i y i n ∑ ∥ Wx i − y i ∥ 2 (Mikolov et al., 2013) i =1 ‣ More recently: Use an adversarial setup to learn an unsupervised mapping 3

  11. Background: 
 Cross-lingual word embeddings ‣ Cross-lingual word embeddings enable cross-lingual transfer ‣ Most common approach: Project one word embedding space into another by learning a transformation matrix 
 W between source embeddings and their translations n x i y i n ∑ ∥ Wx i − y i ∥ 2 (Mikolov et al., 2013) i =1 ‣ More recently: Use an adversarial setup to learn an unsupervised mapping ‣ Assumption: Word embedding spaces are approximately isomorphic , i.e. same number of vertices, connected the same way. 3

  12. How similar are embeddings across languages? 4

  13. How similar are embeddings across languages? ‣ Nearest neighbour (NN) graphs of top 10 most frequent words in English and German are not isomorphic. 4

  14. How similar are embeddings across languages? ‣ Nearest neighbour (NN) graphs of top 10 most frequent words in English and German are not isomorphic. ‣ NN graphs of top 10 most frequent English words and their translations into German English German 4

  15. How similar are embeddings across languages? ‣ Nearest neighbour (NN) graphs of top 10 most frequent words in English and German are not isomorphic. ‣ NN graphs of top 10 most frequent English words and their translations into German English German ‣ Not isomorphic 4

  16. How similar are embeddings across languages? 5

  17. How similar are embeddings across languages? ‣ NN graphs of top 10 most frequent English nouns and their translations English German 5

  18. How similar are embeddings across languages? ‣ NN graphs of top 10 most frequent English nouns and their translations English German ‣ Not isomorphic 5

  19. How similar are embeddings across languages? ‣ NN graphs of top 10 most frequent English nouns and their translations English German ‣ Not isomorphic Word embeddings are not approximately isomorphic across languages. 5

  20. How do we quantify similarity? 6

  21. How do we quantify similarity? ‣ Need a metric to measure how similar two NN graphs G 1 and of different languages are G 2 6

  22. How do we quantify similarity? ‣ Need a metric to measure how similar two NN graphs G 1 and of different languages are G 2 ‣ Propose eigenvector similarity 6

  23. How do we quantify similarity? ‣ Need a metric to measure how similar two NN graphs G 1 and of different languages are G 2 ‣ Propose eigenvector similarity ‣ : adjacency matrices of A 1 , A 2 G 1 , G 2 6

  24. How do we quantify similarity? ‣ Need a metric to measure how similar two NN graphs G 1 and of different languages are G 2 ‣ Propose eigenvector similarity ‣ : adjacency matrices of A 1 , A 2 G 1 , G 2 ‣ : degree matrices of D 1 , D 2 G 1 , G 2 6

  25. How do we quantify similarity? ‣ Need a metric to measure how similar two NN graphs G 1 and of different languages are G 2 ‣ Propose eigenvector similarity ‣ : adjacency matrices of A 1 , A 2 G 1 , G 2 ‣ : degree matrices of D 1 , D 2 G 1 , G 2 ‣ : Laplacians of L 1 = D 1 − A 1 , L 2 = D 2 − A 2 G 1 , G 2 6

  26. How do we quantify similarity? ‣ Need a metric to measure how similar two NN graphs G 1 and of different languages are G 2 ‣ Propose eigenvector similarity ‣ : adjacency matrices of A 1 , A 2 G 1 , G 2 ‣ : degree matrices of D 1 , D 2 G 1 , G 2 ‣ : Laplacians of L 1 = D 1 − A 1 , L 2 = D 2 − A 2 G 1 , G 2 ‣ : eigenvalues (spectra) of λ 1 , λ 2 L 1 , L 2 6

  27. How do we quantify similarity? ‣ Need a metric to measure how similar two NN graphs G 1 and of different languages are G 2 ‣ Propose eigenvector similarity ‣ : adjacency matrices of A 1 , A 2 G 1 , G 2 ‣ : degree matrices of D 1 , D 2 G 1 , G 2 ‣ : Laplacians of L 1 = D 1 − A 1 , L 2 = D 2 − A 2 G 1 , G 2 ‣ : eigenvalues (spectra) of λ 1 , λ 2 L 1 , L 2 ∑ k i =1 λ ji k ‣ Metric: where ∑ ( λ 1 i − λ 2 i ) 2 Δ = k = min j { > 0.9} ∑ n i =1 λ ji i =1 6

  28. How do we quantify similarity? ∑ k i =1 λ ji k ‣ Metric: where ∑ ( λ 1 i − λ 2 i ) 2 Δ = k = min j { > 0.9} ∑ n i =1 λ ji i =1 7

  29. How do we quantify similarity? ‣ Quantifies how much two NN graphs are isospectral, i.e. they have the same spectrum (same sets of eigenvalues). ∑ k i =1 λ ji k ‣ Metric: where ∑ ( λ 1 i − λ 2 i ) 2 Δ = k = min j { > 0.9} ∑ n i =1 λ ji i =1 7

  30. How do we quantify similarity? ‣ Quantifies how much two NN graphs are isospectral, i.e. they have the same spectrum (same sets of eigenvalues). ‣ Isomorphic isospectral, but isospectral isomorphic → ↛ ∑ k i =1 λ ji k ‣ Metric: where ∑ ( λ 1 i − λ 2 i ) 2 Δ = k = min j { > 0.9} ∑ n i =1 λ ji i =1 7

  31. How do we quantify similarity? ‣ Quantifies how much two NN graphs are isospectral, i.e. they have the same spectrum (same sets of eigenvalues). ‣ Isomorphic isospectral, but isospectral isomorphic → ↛ ‣ Δ : G 1 , G 2 → [0, ∞ ) ∑ k i =1 λ ji k ‣ Metric: where ∑ ( λ 1 i − λ 2 i ) 2 Δ = k = min j { > 0.9} ∑ n i =1 λ ji i =1 7

  32. How do we quantify similarity? ‣ Quantifies how much two NN graphs are isospectral, i.e. they have the same spectrum (same sets of eigenvalues). ‣ Isomorphic isospectral, but isospectral isomorphic → ↛ ‣ Δ : G 1 , G 2 → [0, ∞ ) ‣ : are isospectral (very similar) G 1 , G 2 Δ = 0 ∑ k i =1 λ ji k ‣ Metric: where ∑ ( λ 1 i − λ 2 i ) 2 Δ = k = min j { > 0.9} ∑ n i =1 λ ji i =1 7

  33. How do we quantify similarity? ‣ Quantifies how much two NN graphs are isospectral, i.e. they have the same spectrum (same sets of eigenvalues). ‣ Isomorphic isospectral, but isospectral isomorphic → ↛ ‣ Δ : G 1 , G 2 → [0, ∞ ) ‣ : are isospectral (very similar) G 1 , G 2 Δ = 0 ‣ : become less similar Δ → ∞ G 1 , G 2 ∑ k i =1 λ ji k ‣ Metric: where ∑ ( λ 1 i − λ 2 i ) 2 Δ = k = min j { > 0.9} ∑ n i =1 λ ji i =1 7

  34. Unsupervised cross-lingual learning assumptions 8

  35. Unsupervised cross-lingual learning assumptions ‣ Besides isomorphism, several other implicit assumptions 8

  36. Unsupervised cross-lingual learning assumptions ‣ Besides isomorphism, several other implicit assumptions ‣ May or may not scale to low-resource languages 8

  37. Unsupervised cross-lingual learning assumptions ‣ Besides isomorphism, several other implicit assumptions ‣ May or may not scale to low-resource languages Conneau et al. (2018) This work 8

  38. Unsupervised cross-lingual learning assumptions ‣ Besides isomorphism, several other implicit assumptions ‣ May or may not scale to low-resource languages Conneau et al. (2018) This work Dependent-marking, Languages Agglutinative, many cases fusional and isolating 8

  39. Unsupervised cross-lingual learning assumptions ‣ Besides isomorphism, several other implicit assumptions ‣ May or may not scale to low-resource languages Conneau et al. (2018) This work Dependent-marking, Languages Agglutinative, many cases fusional and isolating Corpora Comparable (Wikipedia) Different domains 8

Recommend


More recommend