On the Limitations of Unsupervised Bilingual Dictionary Induction Anders Søgaard Sebastian Ruder Ivan Vuli ć
Background: Unsupervised MT 2
Background: Unsupervised MT ‣ Recently: Unsupervised neural machine translation (Artetxe et al., ICLR 2018; Lample et al., ICLR 2018) 2
Background: Unsupervised MT ‣ Recently: Unsupervised neural machine translation (Artetxe et al., ICLR 2018; Lample et al., ICLR 2018) 2
Background: Unsupervised MT ‣ Recently: Unsupervised neural machine translation (Artetxe et al., ICLR 2018; Lample et al., ICLR 2018) ‣ Key component: Initialization via unsupervised cross-lingual alignment of word embedding spaces 2
Background: Cross-lingual word embeddings 3
Background: Cross-lingual word embeddings ‣ Cross-lingual word embeddings enable cross-lingual transfer 3
Background: Cross-lingual word embeddings ‣ Cross-lingual word embeddings enable cross-lingual transfer ‣ Most common approach: Project one word embedding space into another by learning a transformation matrix W between source embeddings and their translations n x i y i 3
Background: Cross-lingual word embeddings ‣ Cross-lingual word embeddings enable cross-lingual transfer ‣ Most common approach: Project one word embedding space into another by learning a transformation matrix W between source embeddings and their translations n x i y i n ∑ ∥ Wx i − y i ∥ 2 (Mikolov et al., 2013) i =1 3
Background: Cross-lingual word embeddings ‣ Cross-lingual word embeddings enable cross-lingual transfer ‣ Most common approach: Project one word embedding space into another by learning a transformation matrix W between source embeddings and their translations n x i y i n ∑ ∥ Wx i − y i ∥ 2 (Mikolov et al., 2013) i =1 ‣ More recently: Use an adversarial setup to learn an unsupervised mapping 3
Background: Cross-lingual word embeddings ‣ Cross-lingual word embeddings enable cross-lingual transfer ‣ Most common approach: Project one word embedding space into another by learning a transformation matrix W between source embeddings and their translations n x i y i n ∑ ∥ Wx i − y i ∥ 2 (Mikolov et al., 2013) i =1 ‣ More recently: Use an adversarial setup to learn an unsupervised mapping ‣ Assumption: Word embedding spaces are approximately isomorphic , i.e. same number of vertices, connected the same way. 3
How similar are embeddings across languages? 4
How similar are embeddings across languages? ‣ Nearest neighbour (NN) graphs of top 10 most frequent words in English and German are not isomorphic. 4
How similar are embeddings across languages? ‣ Nearest neighbour (NN) graphs of top 10 most frequent words in English and German are not isomorphic. ‣ NN graphs of top 10 most frequent English words and their translations into German English German 4
How similar are embeddings across languages? ‣ Nearest neighbour (NN) graphs of top 10 most frequent words in English and German are not isomorphic. ‣ NN graphs of top 10 most frequent English words and their translations into German English German ‣ Not isomorphic 4
How similar are embeddings across languages? 5
How similar are embeddings across languages? ‣ NN graphs of top 10 most frequent English nouns and their translations English German 5
How similar are embeddings across languages? ‣ NN graphs of top 10 most frequent English nouns and their translations English German ‣ Not isomorphic 5
How similar are embeddings across languages? ‣ NN graphs of top 10 most frequent English nouns and their translations English German ‣ Not isomorphic Word embeddings are not approximately isomorphic across languages. 5
How do we quantify similarity? 6
How do we quantify similarity? ‣ Need a metric to measure how similar two NN graphs G 1 and of different languages are G 2 6
How do we quantify similarity? ‣ Need a metric to measure how similar two NN graphs G 1 and of different languages are G 2 ‣ Propose eigenvector similarity 6
How do we quantify similarity? ‣ Need a metric to measure how similar two NN graphs G 1 and of different languages are G 2 ‣ Propose eigenvector similarity ‣ : adjacency matrices of A 1 , A 2 G 1 , G 2 6
How do we quantify similarity? ‣ Need a metric to measure how similar two NN graphs G 1 and of different languages are G 2 ‣ Propose eigenvector similarity ‣ : adjacency matrices of A 1 , A 2 G 1 , G 2 ‣ : degree matrices of D 1 , D 2 G 1 , G 2 6
How do we quantify similarity? ‣ Need a metric to measure how similar two NN graphs G 1 and of different languages are G 2 ‣ Propose eigenvector similarity ‣ : adjacency matrices of A 1 , A 2 G 1 , G 2 ‣ : degree matrices of D 1 , D 2 G 1 , G 2 ‣ : Laplacians of L 1 = D 1 − A 1 , L 2 = D 2 − A 2 G 1 , G 2 6
How do we quantify similarity? ‣ Need a metric to measure how similar two NN graphs G 1 and of different languages are G 2 ‣ Propose eigenvector similarity ‣ : adjacency matrices of A 1 , A 2 G 1 , G 2 ‣ : degree matrices of D 1 , D 2 G 1 , G 2 ‣ : Laplacians of L 1 = D 1 − A 1 , L 2 = D 2 − A 2 G 1 , G 2 ‣ : eigenvalues (spectra) of λ 1 , λ 2 L 1 , L 2 6
How do we quantify similarity? ‣ Need a metric to measure how similar two NN graphs G 1 and of different languages are G 2 ‣ Propose eigenvector similarity ‣ : adjacency matrices of A 1 , A 2 G 1 , G 2 ‣ : degree matrices of D 1 , D 2 G 1 , G 2 ‣ : Laplacians of L 1 = D 1 − A 1 , L 2 = D 2 − A 2 G 1 , G 2 ‣ : eigenvalues (spectra) of λ 1 , λ 2 L 1 , L 2 ∑ k i =1 λ ji k ‣ Metric: where ∑ ( λ 1 i − λ 2 i ) 2 Δ = k = min j { > 0.9} ∑ n i =1 λ ji i =1 6
How do we quantify similarity? ∑ k i =1 λ ji k ‣ Metric: where ∑ ( λ 1 i − λ 2 i ) 2 Δ = k = min j { > 0.9} ∑ n i =1 λ ji i =1 7
How do we quantify similarity? ‣ Quantifies how much two NN graphs are isospectral, i.e. they have the same spectrum (same sets of eigenvalues). ∑ k i =1 λ ji k ‣ Metric: where ∑ ( λ 1 i − λ 2 i ) 2 Δ = k = min j { > 0.9} ∑ n i =1 λ ji i =1 7
How do we quantify similarity? ‣ Quantifies how much two NN graphs are isospectral, i.e. they have the same spectrum (same sets of eigenvalues). ‣ Isomorphic isospectral, but isospectral isomorphic → ↛ ∑ k i =1 λ ji k ‣ Metric: where ∑ ( λ 1 i − λ 2 i ) 2 Δ = k = min j { > 0.9} ∑ n i =1 λ ji i =1 7
How do we quantify similarity? ‣ Quantifies how much two NN graphs are isospectral, i.e. they have the same spectrum (same sets of eigenvalues). ‣ Isomorphic isospectral, but isospectral isomorphic → ↛ ‣ Δ : G 1 , G 2 → [0, ∞ ) ∑ k i =1 λ ji k ‣ Metric: where ∑ ( λ 1 i − λ 2 i ) 2 Δ = k = min j { > 0.9} ∑ n i =1 λ ji i =1 7
How do we quantify similarity? ‣ Quantifies how much two NN graphs are isospectral, i.e. they have the same spectrum (same sets of eigenvalues). ‣ Isomorphic isospectral, but isospectral isomorphic → ↛ ‣ Δ : G 1 , G 2 → [0, ∞ ) ‣ : are isospectral (very similar) G 1 , G 2 Δ = 0 ∑ k i =1 λ ji k ‣ Metric: where ∑ ( λ 1 i − λ 2 i ) 2 Δ = k = min j { > 0.9} ∑ n i =1 λ ji i =1 7
How do we quantify similarity? ‣ Quantifies how much two NN graphs are isospectral, i.e. they have the same spectrum (same sets of eigenvalues). ‣ Isomorphic isospectral, but isospectral isomorphic → ↛ ‣ Δ : G 1 , G 2 → [0, ∞ ) ‣ : are isospectral (very similar) G 1 , G 2 Δ = 0 ‣ : become less similar Δ → ∞ G 1 , G 2 ∑ k i =1 λ ji k ‣ Metric: where ∑ ( λ 1 i − λ 2 i ) 2 Δ = k = min j { > 0.9} ∑ n i =1 λ ji i =1 7
Unsupervised cross-lingual learning assumptions 8
Unsupervised cross-lingual learning assumptions ‣ Besides isomorphism, several other implicit assumptions 8
Unsupervised cross-lingual learning assumptions ‣ Besides isomorphism, several other implicit assumptions ‣ May or may not scale to low-resource languages 8
Unsupervised cross-lingual learning assumptions ‣ Besides isomorphism, several other implicit assumptions ‣ May or may not scale to low-resource languages Conneau et al. (2018) This work 8
Unsupervised cross-lingual learning assumptions ‣ Besides isomorphism, several other implicit assumptions ‣ May or may not scale to low-resource languages Conneau et al. (2018) This work Dependent-marking, Languages Agglutinative, many cases fusional and isolating 8
Unsupervised cross-lingual learning assumptions ‣ Besides isomorphism, several other implicit assumptions ‣ May or may not scale to low-resource languages Conneau et al. (2018) This work Dependent-marking, Languages Agglutinative, many cases fusional and isolating Corpora Comparable (Wikipedia) Different domains 8
Recommend
More recommend