Cross linguality and machine translation without bilingual data ith - PowerPoint PPT Presentation

State ‐ of ‐ the ‐ art in super supervised isedmappings Two sequences of (optional) linear transformations: S0 (opt.) Pre ‐ processing: length normalization, mean centering S1 (opt ) Whitening : turn covariance S1 (opt.) Whitening : turn covariance matrices into the identity matrix S2 Ort S2 S2 S2 Ort Orthog Orthog ogonal ogonal onal mapping: onal mapping: mapping: map into a mapping: map into a shared space (Procrustes) S3 (opt ) Re S3 (opt.) Re ‐ weight each component eight each component according to its cross ‐ correlation

State ‐ of ‐ the ‐ art in super supervised isedmappings Two sequences of (optional) linear transformations: S0 (opt.) Pre ‐ processing: length normalization, mean centering S1 (opt ) Whitening : turn covariance S1 (opt.) Whitening : turn covariance matrices into the identity matrix S2 S2 S2 Ort S2 Ort Orthog Orthog ogonal ogonal onal mapping: onal mapping: mapping: map into a mapping: map into a shared space (Procrustes) S3 (opt ) Re S3 (opt.) Re ‐ weight each component eight each component according to its cross ‐ correlation S4 (opt.) De ‐ whitening: restore original S4 ( t ) D hit i t i i l variance in every direction 61

State ‐ of ‐ the ‐ art in super supervised isedmappings Two sequences of (optional) linear transformations: S0 (opt.) Pre ‐ processing: length normalization, mean centering S1 (opt ) Whitening : turn covariance S1 (opt.) Whitening : turn covariance matrices into the identity matrix S2 Ort S2 S2 Ort S2 Orthog Orthog ogonal ogonal onal mapping: onal mapping: mapping: map into a mapping: map into a shared space (Procrustes) S3 (opt.) Re ‐ weight each component S3 (opt ) Re eight each component according to its cross ‐ correlation S4 (opt.) De ‐ whitening: restore original S4 ( t ) D hit i t i i l variance in every direction S5 (opt) Dimensionality reduction: keep the first n components only S5 ( ) Di i li d i k h fi l 62

State ‐ of ‐ the ‐ art in super supervised isedmappings S0 S0 (l) (l) S0 (m) S1 1 S S2 2 S S3 S4 (sr (src) S4 (tr (trg) S5 Mikolov et al. (2013) x x src trg trg OLS Shigeto et al. (2015) x x trg src src CCA Faruqui and Dyer (2014) q y ( ) x x x x x Xing et al. (2015) x x Artetxe et al. (2016) x x x Orth. Zhang et al. (2016) h l ( ) x Smith et al. (2017) x x x 70

State ‐ of ‐ the ‐ art in super supervised isedmappings S0 S0 (l) (l) S0 (m) S1 1 S S2 2 S3 S S4 (sr (src) S4 (tr (trg) S5 Mikolov et al. (2013) x x src trg trg OLS Shigeto et al. (2015) x x trg src src CCA Faruqui and Dyer (2014) q y ( ) x x x x x Xing et al. (2015) x x Artetxe et al. (2016) x x x Orth. Zhang et al. (2016) h l ( ) x Smith et al. (2017) x x x Our method (AAAI18) x x x x trg src trg x 71

Evaluating via Bilingual Dictionary induction Dataset by Dinu et al. (2015) extended to German, Finnish, Spanish 72

Evaluating via Bilingual Dictionary induction Dataset by Dinu et al. (2015) extended to German, Finnish, Spanish ⇒ Monolingual embeddings (CBOW + negative sampling) 73

Evaluating via Bilingual Dictionary induction Dataset by Dinu et al. (2015) extended to German, Finnish, Spanish ⇒ Monolingual embeddings (CBOW + negative sampling) ⇒ S ⇒ Seed dictionary: 5,000 word pairs d di ti 5 000 d i 74

Evaluating via Bilingual Dictionary induction Dataset by Dinu et al. (2015) extended to German, Finnish, Spanish ⇒ Monolingual embeddings (CBOW + negative sampling) ⇒ S ⇒ Seed dictionary: 5,000 pairs d di ti 5 000 i ⇒ Test dictionary: 1,500 pairs (Nearest neighbor, P@1) 75

Evaluating via Bilingual Dictionary induction Dataset by Dinu et al. (2015) extended to German, Finnish, Spanish ⇒ Monolingual embeddings (CBOW + negative sampling) ⇒ S ⇒ Seed dictionary: 5,000 pairs d di ti 5 000 i ⇒ Test dictionary: 1,500 pairs (Nearest neighbor, P@1) M t M th d Meth thod EN EN ‐ IT EN IT EN IT IT EN ‐ DE EN EN EN DE DE DE EN EN FI EN EN ‐ FI FI FI EN EN ‐ ES EN ES EN ES ES 76

Evaluating via Bilingual Dictionary induction Dataset by Dinu et al. (2015) extended to German, Finnish, Spanish ⇒ Monolingual embeddings (CBOW + negative sampling) ⇒ Seed dictionary: 5,000 pairs ⇒ S d di ti 5 000 i ⇒ Test dictionary: 1,500 pairs (Nearest neighbor, P@1) M th d M t Meth thod EN EN IT EN EN ‐ IT IT IT EN DE EN EN EN ‐ DE DE DE EN EN FI EN ‐ FI EN FI FI EN EN EN ‐ ES EN ES ES ES 34.93 † 35.00 † 25.91 † 27.73 † Mikolov et al. (2013) 38.40 * 37.13 * 27.60 * 26.80 * Faruqui and Dyer (2014) 41.53 † 43.07 † 31.04 † 33.73 † Shigeto et al. (2015) 38.93 * 29.14 * 30.40 * Dinu et al. (2015) 37.7 Lazaridou et al. (2015) ( ) 40.2 ‐ ‐ ‐ 36.87 † 41.27 † 28.23 † 31.20 † Xing et al. (2015) 41.87 * 30.62 * 31.40 * Artetxe et al. (2016) 39.27 36 73 † 40 80 † 28 16 † 31 07 † Zhang et al. (2016) Zhang et al (2016) 36.73 40.80 28.16 31.07 43.33 † 29.42 † 35.13 † Smith et al. (2017) 43.1 † our publicly available reimplementa � on 78

Evaluating via Bilingual Dictionary induction Dataset by Dinu et al. (2015) extended to German, Finnish, Spanish ⇒ Monolingual embeddings (CBOW + negative sampling) ⇒ S ⇒ Seed dictionary: 5,000 pairs d di ti 5 000 i ⇒ Test dictionary: 1,500 pairs (Nearest neighbor, P@1) M th d Meth M t thod EN EN IT EN ‐ IT EN IT IT EN ‐ DE EN EN DE EN DE DE EN FI EN EN EN ‐ FI FI FI EN ES EN EN EN ‐ ES ES ES 34.93 † 35.00 † 25.91 † 27.73 † Mikolov et al. (2013) 38.40 * 37.13 * 27.60 * 26.80 * Faruqui and Dyer (2014) 41.53 † 43.07 † 31.04 † 33.73 † Shigeto et al. (2015) 38.93 * 29.14 * 30.40 * Dinu et al. (2015) 37.7 Lazaridou et al. (2015) ( ) 40.2 ‐ ‐ ‐ 36.87 † 41.27 † 28.23 † 31.20 † Xing et al. (2015) 41.87 * 30.62 * 31.40 * Ar Artetx txe et al. al. (2016) 2016) 39.27 39.27 41.87 30.62 31.40 36 73 † 40 80 † 28 16 † 31 07 † Zhang et al (2016) Zhang et al. (2016) 36.73 40.80 28.16 31.07 43.33 † 29.42 † 35.13 † Smith et al. (2017) 43.1 Our method (AAAI18) 45.27 45.27 44.13 44.13 32.94 32.94 36.60 36.60 79

Why does it work? 80

Why does it work? W 85

Why does it work? Languages are (to a large extent) isometric in word embedding space (!) isometric in word embedding space (!) W 86

Outline • Bilingual embedding mappings • Introduction to vector space models (embeddings) I t d ti t t d l ( b ddi ) • Bilingual embedding mappings (AAAI18) • Reduced supervision d d • Self ‐ learning, semi ‐ supervised (ACL17) • Self ‐ learning, fully unsupervised (ACL18) lf l f ll d ( ) • Conclusions • Unsupervised neural machine translation • Introduction to NMT • From bilingual embeddings to uNMT (ICLR18) • Unsupervised statistical MT (EMNLP18) p ( ) • Conclusions 87

Reducing supervision 88

Reducing supervision 89

Reducing supervision Previous work bilingual signal for training for training 91

Reducing supervision Previous work ‐ parallel corpora bilingual signal for training for training ‐ comparable corpora comparable corpora ‐ (big) dictionaries 94

Reducing supervision Previous work ‐ parallel corpora bilingual signal for training for training ‐ comparable corpora comparable corpora ‐ (big) dictionaries 95

Reducing supervision Previous work Our work ‐ parallel corpora ‐ 25 word dictionary bilingual signal for training for training ‐ comparable corpora comparable corpora ‐ numerals (1, 2, 3…) numerals (1, 2, 3…) ‐ (big) dictionaries ‐ nothing 99

Self ‐ learning 100

Self ‐ learning Monolingual embeddings 101

Self ‐ learning Monolingual embeddings Dictionary 102

Self ‐ learning Monolingual embeddings Dictionary 103

Self ‐ learning Monolingual embeddings Dictionary Mapping 104

Self ‐ learning Monolingual embeddings Dictionary Mapping 105

Self ‐ learning Monolingual embeddings Dictionary Mapping Dictionary 106

Self ‐ learning Monolingual embeddings bet better! er! Dictionary Mapping Dictionary 107

Self ‐ learning Monolingual embeddings bet better! er! Dictionary Mapping Dictionary 108

Self ‐ learning Monolingual embeddings bet better! er! Dictionary Mapping Dictionary M Mapping i 109

Self ‐ learning Monolingual embeddings bet better! er! Dictionary Mapping Dictionary M Mapping i 110

Self ‐ learning Monolingual embeddings bet better! er! Dictionary Mapping Dictionary M Mapping i Di ti Dictionary 111

Self ‐ learning Monolingual embeddings bet better! er! Dictionary Mapping Dictionary even bet eve even bet eve better! better! er! er! Mapping M i Di ti Dictionary 112

Self ‐ learning Monolingual embeddings bet better! er! Dictionary Mapping Dictionary even bet eve even bet eve better! better! er! er! Mapping M i Di ti Dictionary 113

Self ‐ learning Monolingual embeddings bet better! er! Dictionary Mapping Dictionary even bet eve even bet eve better! better! er! er! Mapping M i Di ti Dictionary Mapping 114

Self ‐ learning Monolingual embeddings bet better! er! Dictionary Mapping Dictionary even bet eve even bet eve better! better! er! er! Mapping M i Di ti Dictionary Mapping 115

Self ‐ learning Monolingual embeddings bet better! er! Dictionary Mapping Dictionary even bet eve even bet eve better! better! er! er! Mapping M i Di ti Dictionary Mapping Dictionary 116

Self ‐ learning Monolingual embeddings bet better! er! Dictionary Mapping Dictionary eve even bet eve even bet better! better! er! er! Mapping M i Dictionary Di ti eve even bet better! er! Mapping Dictionary 117

Self ‐ learning Monolingual embeddings Dictionary Mapping Dictionary 118

Self ‐ learning Monolingual embeddings Dictionary Mapping Dictionary proposed self ‐ learning method Too good to be true? 120

Semi ‐ supervised experiments (ACL17) 121

Semi ‐ supervised experiments (ACL17) • Given monolingual embeddings plus seed bilingual dictionary ( train dictionary): l d bili l di ti ( t i di ti ) • 25 word pairs • Pairs of numerals 122

Semi ‐ supervised experiments (ACL17) • Given monolingual embeddings plus seed bilingual dictionary ( train dictionary): l d bili l di ti ( t i di ti ) • 25 word pairs • Pairs of numerals • Induce bilingual dictionary using self ‐ learning Induce bilingual dictionary using self learning for full vocabulary 123

Semi ‐ supervised experiments (ACL17) • Given monolingual embeddings plus seed bilingual dictionary ( train dictionary): l d bili l di ti ( t i di ti ) • 25 word pairs • Pairs of numerals • Induce bilingual dictionary using self ‐ learning Induce bilingual dictionary using self learning for full vocabulary • Evaluation • Compare translations to existing bilingual dictionary p g g y ( test dictionary) • Accuracy Accuracy 124

Semi ‐ supervised experiments (ACL17) English It English English ‐ It English Italian Italian alian alian wo word rd tr tran ansla slation ion induction duction 125

Why does it work? Implicit objective: 139

Why does it work? Implicit objective: Independent from seed dictionary! 157

Cross linguality and machine translation without bilingual data ith - PowerPoint PPT Presentation

Cross linguality and machine translation without bilingual data ith t bili l d t Enek Eneko Agirr girre @ @eagirre i Joint work with Mikel Artetxe Gorka Labaka Joint work with: Mikel Artetxe, Gorka Labaka IXA NLP group University

Statistical Machine Translation Nadir Durrani 21-November-2014 Machine Translation

Introd u ction to machine translation MAC H IN E TR AN SL ATION IN P YTH ON Th u shan

Machine Translation Machine Translation February 13, 2008 Andreas Eisele UdS Computerlinguistik

Neural Machine Translation Gongbo Tang 8 October 2018 Outline Neural Machine Translation 1

11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems

Machine Translation Philipp Koehn 28 April 2020 Philipp Koehn Artificial Intelligence: Machine

Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of

Computer Aided Translation Philipp Koehn 30 April 2015 Philipp Koehn Machine Translation:

Computer Aided Translation Philipp Koehn 15 November 2018 Philipp Koehn Machine Translation:

02 | 27 SOUTHERN CROSS 23.04 03 | 27 SOUTHERN CROSS 23.04 04 | 27 SOUTHERN CROSS 23.04 06

Machine Translation without Words through Substring Alignment Graham Neubig 1,2,3 , Taro Watanabe

Machine Translation: Going Deep Philipp Koehn 4 June 2015 Philipp Koehn Machine Translation:

Machine Translation Philipp Koehn 1 December 2015 Philipp Koehn Artificial Intelligence:

Neural Machine Translation II Refinements Philipp Koehn 17 October 2017 Philipp Koehn Machine

Representing Huge Translation Models Statistical Machine Translation parallel text + alignment

Global Translation Services Website translation using post-edited machine translation and

W ITH the widespread use of hands-free electronic gad- are mapped to a multilingual set using a

Entity Clustering Across Languages NAACL 2012 Montreal Spence Green* Nicholas Andrews #

JOINT TALK ON THREE DATA SUBMISSIONS TO TEXT ALIGNMENT AND ONE SOURCE RETRIEVAL ALGORITHM

MULTILINGUAL DOCUMENT CLASSICATION VIA TRANSDUCTIVE LEARNING Salvatore Romeo UNICAL

Fall Product Training _ _

Cross-Lingual Semantic Mapping of Authority Files Nadine Steinmetz, and Harald Sack November,

Pronunciation Extraction Through Cross-Lingual Word-to-Phoneme Alignment Felix Stahlberg, Tim

The Moment of Meaning The Moment of Meaning

Cross linguality and machine translation without bilingual data ith - PowerPoint PPT Presentation

Cross linguality and machine translation without bilingual data ith t bili l d t Enek Eneko Agirr girre @ @eagirre i Joint work with Mikel Artetxe Gorka Labaka Joint work with: Mikel Artetxe, Gorka Labaka IXA NLP group University

Statistical Machine Translation Nadir Durrani 21-November-2014 Machine Translation

Introd u ction to machine translation MAC H IN E TR AN SL ATION IN P YTH ON Th u shan

Machine Translation Machine Translation February 13, 2008 Andreas Eisele UdS Computerlinguistik

Neural Machine Translation Gongbo Tang 8 October 2018 Outline Neural Machine Translation 1

11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems

Machine Translation Philipp Koehn 28 April 2020 Philipp Koehn Artificial Intelligence: Machine

Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of

Computer Aided Translation Philipp Koehn 30 April 2015 Philipp Koehn Machine Translation:

Computer Aided Translation Philipp Koehn 15 November 2018 Philipp Koehn Machine Translation:

02 | 27 SOUTHERN CROSS 23.04 03 | 27 SOUTHERN CROSS 23.04 04 | 27 SOUTHERN CROSS 23.04 06

Machine Translation without Words through Substring Alignment Graham Neubig 1,2,3 , Taro Watanabe

Machine Translation: Going Deep Philipp Koehn 4 June 2015 Philipp Koehn Machine Translation:

Machine Translation Philipp Koehn 1 December 2015 Philipp Koehn Artificial Intelligence:

Neural Machine Translation II Refinements Philipp Koehn 17 October 2017 Philipp Koehn Machine

Representing Huge Translation Models Statistical Machine Translation parallel text + alignment

Global Translation Services Website translation using post-edited machine translation and

W ITH the widespread use of hands-free electronic gad- are mapped to a multilingual set using a

Entity Clustering Across Languages NAACL 2012 Montreal Spence Green* Nicholas Andrews #

JOINT TALK ON THREE DATA SUBMISSIONS TO TEXT ALIGNMENT AND ONE SOURCE RETRIEVAL ALGORITHM

MULTILINGUAL DOCUMENT CLASSICATION VIA TRANSDUCTIVE LEARNING Salvatore Romeo UNICAL

Fall Product Training ___________________________________ ___________________________________

Cross-Lingual Semantic Mapping of Authority Files Nadine Steinmetz, and Harald Sack November,

Pronunciation Extraction Through Cross-Lingual Word-to-Phoneme Alignment Felix Stahlberg, Tim

The Moment of Meaning The Moment of Meaning

Fall Product Training _ _