What Kind of Language Is Hard to Language-Model? ACL 2019 Sabrina J. - PowerPoint PPT Presentation

What Kind of Language Is Hard to Language-Model? ACL 2019 Sabrina J. Mielke and Ryan Cotterell, Kyle Gorman, Brian Roark, Jason Eisner Johns Hopkins University // City University of New York Graduate Center // Google sjmielke@jhu.edu Twitter: @sjmielke – paper and thread pinned! 1

Questions and answers 0. Do current language models do equally well on all languages? 2

Questions and answers 0. Do current language models do equally well on all languages? No. 2

Questions and answers 0. Do current language models do equally well on all languages? No. 1. Which one do they struggle more with: German or English? 2

Questions and answers 0. Do current language models do equally well on all languages? No. 1. Which one do they struggle more with: German or English? German. 2

Questions and answers 0. Do current language models do equally well on all languages? No. 1. Which one do they struggle more with: German or English? German. 2. What about non-Indo-European languages, say Chinese? 2

Questions and answers 0. Do current language models do equally well on all languages? No. 1. Which one do they struggle more with: German or English? German. 2. What about non-Indo-European languages, say Chinese? It depends. 2

Questions and answers 0. Do current language models do equally well on all languages? No. 1. Which one do they struggle more with: German or English? German. 2. What about non-Indo-European languages, say Chinese? It depends. 3. What makes a language harder to model? 2

Questions and answers 0. Do current language models do equally well on all languages? No. 1. Which one do they struggle more with: German or English? German. 2. What about non-Indo-European languages, say Chinese? It depends. 3. What makes a language harder to model? Actually, rather technical factors. 2

Questions and answers 0. Do current language models do equally well on all languages? No. 1. Which one do they struggle more with: German or English? German. 2. What about non-Indo-European languages, say Chinese? It depends. 3. What makes a language harder to model? Actually, rather technical factors. 4. Is Translationese easier? 2

Questions and answers 0. Do current language models do equally well on all languages? No. 1. Which one do they struggle more with: German or English? German. 2. What about non-Indo-European languages, say Chinese? It depends. 3. What makes a language harder to model? Actually, rather technical factors. 4. Is Translationese easier? It’s different, but not actually easier! 2

Outline “Difficulty” 3

Outline “Difficulty” Models and languages 3

Outline “Difficulty” Models and languages What correlates with difficulty? 3

Outline “Difficulty” Models and languages What correlates with difficulty? And... is Translationese really easier? 3

How to measure “difficulty”? Language models measure surprisal / information content (NLL; − log p ( · ) ): p ( · ) ⇒ NLL en 0.03 ⇒ 5 bits I love Florence! 4

How to measure “difficulty”? Language models measure surprisal / information content (NLL; − log p ( · ) ): p ( · ) ⇒ NLL en 0.03 ⇒ 5 bits I love Florence! de 0.008 ⇒ 7 bits Ich grüße meine Oma und die Familie dahein. 4

How to measure “difficulty”? Language models measure surprisal / information content (NLL; − log p ( · ) ): p ( · ) ⇒ NLL en 0.03 ⇒ 5 bits I love Florence! de 0.008 ⇒ 7 bits Ich grüße meine Oma und die Familie dahein. nl 0.0004 ⇒ 11 bits Alle mensen worden vrij en gelijk in waardigheid en rechten geboren. 4

How to measure “difficulty”? Language models measure surprisal / information content (NLL; − log p ( · ) ): p ( · ) ⇒ NLL en 0.03 ⇒ 5 bits I love Florence! de 0.008 ⇒ 7 bits Ich grüße meine Oma und die Familie dahein. nl 0.0004 ⇒ 11 bits Alle mensen worden vrij en gelijk in waardigheid en rechten geboren. Issue 1: Different topics / styles / content 4

How to measure “difficulty”? Language models measure surprisal / information content (NLL; − log p ( · ) ): p ( · ) ⇒ NLL en 0.013 ⇒ 6.5 bits Resumption of the session. de 0.011 ⇒ 6.3 bits Wiederaufnahme der Sitzung. nl 0.012 ⇒ 6.4 bits Hervatting van de sessie. Issue 1: Different topics / styles / content Solution: train and test on translations! Europarl: 21 languages share ~40M chars Bibles: 62 languages share ~4M chars 4

How to measure “difficulty”? Language models measure surprisal / information content (NLL; − log p ( · ) ): p ( · ) ⇒ NLL en 0.013 ⇒ 6.5 bits Resumption of the session. de 0.011 ⇒ 6.3 bits Wiederaufnahme der Sitzung. nl 0.012 ⇒ 6.4 bits Hervatting van de sessie. Issue 1: Different topics / styles / content Solution: train and test on translations! Europarl: 21 languages share ~40M chars Bibles: 62 languages share ~4M chars and this one takes � � a big ILP to solve, Gurobi which is really fun 4

How to measure “difficulty”? Language models measure surprisal / information content (NLL; − log p ( · ) ): p ( · ) ⇒ NLL en 0.013 ⇒ 6.5 bits Resumption of the session. de 0.011 ⇒ 6.3 bits Wiederaufnahme der Sitzung. nl 0.012 ⇒ 6.4 bits Hervatting van de sessie. Issue 1: Different topics / styles / content Solution: train and test on translations! Europarl: � 69 languages 21 languages share ~40M chars Bibles: 62 languages share ~4M chars s l i e m i a e f g u a g a n l 1 3 and this one takes � � a big ILP to solve, Gurobi which is really fun 4

How to measure “difficulty”? Language models measure surprisal / information content (NLL; − log p ( · ) ): p ( · ) ⇒ NLL en 0.013 ⇒ 6.5 bits Resumption of the session. de 0.011 ⇒ 6.3 bits Wiederaufnahme der Sitzung. nl 0.012 ⇒ 6.4 bits Hervatting van de sessie. Issue 1: Different topics / styles / content Issue 2: Comparing scores Solution: train and test on translations! Europarl: � 69 languages 21 languages share ~40M chars Bibles: 62 languages share ~4M chars s l i e m i a e f g u a g a n l 1 3 and this one takes � � a big ILP to solve, Gurobi which is really fun 4

How to measure “difficulty”? Language models measure surprisal / information content (NLL; − log p ( · ) ): p ( · ) ⇒ NLL en 0.013 ⇒ 6.5 bits Resumption of the session. de 0.011 ⇒ 6.3 bits Wiederaufnahme der Sitzung. nl 0.012 ⇒ 6.4 bits Hervatting van de sessie. Issue 1: Different topics / styles / content Issue 2: Comparing scores Solution: train and test on translations! Use total bits of an Europarl: � 69 languages 21 languages share ~40M chars Bibles: 62 languages share ~4M chars s open-vocabulary model . l i e m i a e f g u a g a n l 1 3 and this one takes � � Why? a big ILP to solve, Gurobi which is really fun 4

How to compare your language models across languages 1. We need to be open-vocabulary – no UNKs. 5

How to compare your language models across languages 1. We need to be open-vocabulary – no UNKs. Every UNK is “cheating” – morphologically rich languages have more UNKs, unfairly advantaging them. 5

How to compare your language models across languages 1. We need to be open-vocabulary – no UNKs. Every UNK is “cheating” – morphologically rich languages have more UNKs, unfairly advantaging them. 2. We can’t normalize per word or even per character in languages individually. 5

How to compare your language models across languages 1. We need to be open-vocabulary – no UNKs. Every UNK is “cheating” – morphologically rich languages have more UNKs, unfairly advantaging them. 2. We can’t normalize per word or even per character in languages individually. Example: if puˇ c cz and Putsch de are equally likely, they should be equally “difficult.” 5

How to compare your language models across languages 1. We need to be open-vocabulary – no UNKs. Every UNK is “cheating” – morphologically rich languages have more UNKs, unfairly advantaging them. 2. We can’t normalize per word or even per character in languages individually. Example: if puˇ c cz and Putsch de are equally likely, they should be equally “difficult.” ⇒ just use overall bits (i.e., surprisal / NLL) of an aligned sentence 5

How to compare your language models across languages 1. We need to be open-vocabulary – no UNKs. Every UNK is “cheating” – morphologically rich languages have more UNKs, unfairly advantaging them. 2. We can’t normalize per word or even per character in languages individually. Example: if puˇ c cz and Putsch de are equally likely, they should be equally “difficult.” ⇒ just use overall bits (i.e., surprisal / NLL) of an aligned sentence [ note: total easily obtainable from BPC or perplexity by multiplying with total chars / words ] 5

How to aggregate multiple intents’ surprisals into “difficulties”? For fully parallel corpora... en de bg Resump- Wieder- Възобн- 1 tion aufnah- овяване of the me der на се- session ... ... The Der Мирът, 2 peace gestern който that verein- беше ... ... ... Although Obwohl Макар 3 we were wir че не not al- nicht бяхме ... ... ... Now we Jetzt Накрая 4 can fi- ist die всички nally Zeit можем ... ... ... aligned multi-text Image CC-BY Mike Grauer Jr / flickr

What Kind of Language Is Hard to Language-Model? ACL 2019 Sabrina J. - PowerPoint PPT Presentation

What Kind of Language Is Hard to Language-Model? ACL 2019 Sabrina J. Mielke and Ryan Cotterell, Kyle Gorman, Brian Roark, Jason Eisner Johns Hopkins University // City University of New York Graduate Center // Google sjmielke@jhu.edu Twitter:

Last time System F K 1 is a kind K 2 is a kind -kind K 1 K 2 is a kind A :: K 1

The Kind 2 Model Checker Adrien Champion Alain Mebsout Christoph Sticksel Cesare Tinelli Kind

HydroCare HC-44 HydroCare HC-44 Hard Water Problems Hard Water Problems Hard Water Costs You

6/18/2018 When Family Life Gets Hard 1 6/18/2018 When Family Life Gets Hard God

What Kind of Language Is Hard to Language-Model? ACL 2019 Sebastian J. Mielke and Ryan Cotterell,

Surprise of the Kingdom 1. A different kind of King 2. A different kind of Kingdom 3. A

IN-KIND CONTRIBUTION TO CERN ALICE EXPERIMENT Doc. chief engineer Eija Tuominen / IKBest5

SWARM SWARM INTELLIGENCE INTELLIGENCE Milad Abolhassani Supervisor: Hamid Mir Vaziri 3 WHY?

The hard-core model on a finite graph A model of occupation of space by particles with

Hard-Potato Routing Costas Busch, Maurice Herlihy, and Roger Wattenhofer Brown University 1

Northridge Elementary School Northridge Elementary Work Hard Learn Lots And Do Something Kind For

ProverBot9000 A proof assistant assistant Proofs are hard Proof assistants are hard Big Idea:

Welcome to SUSE Expert Days Agenda Welcome and Introductions My Kind of Open: Leveraging Open

1031 Like Kind Exchanges: Pursuing 1031 Like Kind Exchanges: Pursuing Opportunities in a

McMambo V1: A new kind of Latin Dance Mambo Watson Ladd University of California, Berkeley

A New Kind of A New Kind of Leader Leader Presenter: FL Conference Pathfinder & Adventurer

Access Control Policies www.skills-1st.co.uk for LDAP Andrew Findlay Skills 1st Ltd

Where we started 2 Accountable Care Organizations ACOs) Community- Based Care Health Homes

Mit Mitig igating ing Gen Gender er Bia Bias in in NLP: Li Lite teratur ture Re Review

Squashing Computational Linguistics Noah A. Smith Paul G. Allen School of Computer Science &

Explainable Improved Ensembling for Natural Language and Vision Nazneen Rajani University of

Mo Morphology Yonatan Belinkov Nadir Durrani Fahim Dalvi Hassan Sajjad James Glass -

Towards a Computational History of the ACL: 19802008 Ashton Anderson, Dan McFarland, Dan

Access Control Lists Don Porter CSE 506 Background (1) If everything in Unix is a file

What Kind of Language Is Hard to Language-Model? ACL 2019 Sabrina J. - PowerPoint PPT Presentation

What Kind of Language Is Hard to Language-Model? ACL 2019 Sabrina J. Mielke and Ryan Cotterell, Kyle Gorman, Brian Roark, Jason Eisner Johns Hopkins University // City University of New York Graduate Center // Google sjmielke@jhu.edu Twitter:

Last time System F K 1 is a kind K 2 is a kind -kind K 1 K 2 is a kind A :: K 1

The Kind 2 Model Checker Adrien Champion Alain Mebsout Christoph Sticksel Cesare Tinelli Kind

HydroCare HC-44 HydroCare HC-44 Hard Water Problems Hard Water Problems Hard Water Costs You

6/18/2018 When Family Life Gets Hard 1 6/18/2018 When Family Life Gets Hard God

What Kind of Language Is Hard to Language-Model? ACL 2019 Sebastian J. Mielke and Ryan Cotterell,

Surprise of the Kingdom 1. A different kind of King 2. A different kind of Kingdom 3. A

IN-KIND CONTRIBUTION TO CERN ALICE EXPERIMENT Doc. chief engineer Eija Tuominen / IKBest5

SWARM SWARM INTELLIGENCE INTELLIGENCE Milad Abolhassani Supervisor: Hamid Mir Vaziri 3 WHY?

The hard-core model on a finite graph A model of occupation of space by particles with

Hard-Potato Routing Costas Busch, Maurice Herlihy, and Roger Wattenhofer Brown University 1

Northridge Elementary School Northridge Elementary Work Hard Learn Lots And Do Something Kind For

ProverBot9000 A proof assistant assistant Proofs are hard Proof assistants are hard Big Idea:

Welcome to SUSE Expert Days Agenda Welcome and Introductions My Kind of Open: Leveraging Open

1031 Like Kind Exchanges: Pursuing 1031 Like Kind Exchanges: Pursuing Opportunities in a

McMambo V1: A new kind of Latin Dance Mambo Watson Ladd University of California, Berkeley

A New Kind of A New Kind of Leader Leader Presenter: FL Conference Pathfinder &amp; Adventurer

Access Control Policies www.skills-1st.co.uk for LDAP Andrew Findlay Skills 1st Ltd

Where we started 2 Accountable Care Organizations ACOs) Community- Based Care Health Homes

Mit Mitig igating ing Gen Gender er Bia Bias in in NLP: Li Lite teratur ture Re Review

Squashing Computational Linguistics Noah A. Smith Paul G. Allen School of Computer Science &amp;

Explainable Improved Ensembling for Natural Language and Vision Nazneen Rajani University of

Mo Morphology Yonatan Belinkov Nadir Durrani Fahim Dalvi Hassan Sajjad James Glass -

Towards a Computational History of the ACL: 19802008 Ashton Anderson, Dan McFarland, Dan

Access Control Lists Don Porter CSE 506 Background (1) If everything in Unix is a file

A New Kind of A New Kind of Leader Leader Presenter: FL Conference Pathfinder & Adventurer

Squashing Computational Linguistics Noah A. Smith Paul G. Allen School of Computer Science &