what kind of language is hard to language model
play

What Kind of Language Is Hard to Language-Model? ACL 2019 Sabrina J. - PowerPoint PPT Presentation

What Kind of Language Is Hard to Language-Model? ACL 2019 Sabrina J. Mielke and Ryan Cotterell, Kyle Gorman, Brian Roark, Jason Eisner Johns Hopkins University // City University of New York Graduate Center // Google sjmielke@jhu.edu Twitter:


  1. What Kind of Language Is Hard to Language-Model? ACL 2019 Sabrina J. Mielke and Ryan Cotterell, Kyle Gorman, Brian Roark, Jason Eisner Johns Hopkins University // City University of New York Graduate Center // Google sjmielke@jhu.edu Twitter: @sjmielke – paper and thread pinned! 1

  2. Questions and answers 0. Do current language models do equally well on all languages? 2

  3. Questions and answers 0. Do current language models do equally well on all languages? No. 2

  4. Questions and answers 0. Do current language models do equally well on all languages? No. 1. Which one do they struggle more with: German or English? 2

  5. Questions and answers 0. Do current language models do equally well on all languages? No. 1. Which one do they struggle more with: German or English? German. 2

  6. Questions and answers 0. Do current language models do equally well on all languages? No. 1. Which one do they struggle more with: German or English? German. 2. What about non-Indo-European languages, say Chinese? 2

  7. Questions and answers 0. Do current language models do equally well on all languages? No. 1. Which one do they struggle more with: German or English? German. 2. What about non-Indo-European languages, say Chinese? It depends. 2

  8. Questions and answers 0. Do current language models do equally well on all languages? No. 1. Which one do they struggle more with: German or English? German. 2. What about non-Indo-European languages, say Chinese? It depends. 3. What makes a language harder to model? 2

  9. Questions and answers 0. Do current language models do equally well on all languages? No. 1. Which one do they struggle more with: German or English? German. 2. What about non-Indo-European languages, say Chinese? It depends. 3. What makes a language harder to model? Actually, rather technical factors. 2

  10. Questions and answers 0. Do current language models do equally well on all languages? No. 1. Which one do they struggle more with: German or English? German. 2. What about non-Indo-European languages, say Chinese? It depends. 3. What makes a language harder to model? Actually, rather technical factors. 4. Is Translationese easier? 2

  11. Questions and answers 0. Do current language models do equally well on all languages? No. 1. Which one do they struggle more with: German or English? German. 2. What about non-Indo-European languages, say Chinese? It depends. 3. What makes a language harder to model? Actually, rather technical factors. 4. Is Translationese easier? It’s different, but not actually easier! 2

  12. Outline “Difficulty” 3

  13. Outline “Difficulty” Models and languages 3

  14. Outline “Difficulty” Models and languages What correlates with difficulty? 3

  15. Outline “Difficulty” Models and languages What correlates with difficulty? And... is Translationese really easier? 3

  16. How to measure “difficulty”? Language models measure surprisal / information content (NLL; − log p ( · ) ): p ( · ) ⇒ NLL en 0.03 ⇒ 5 bits I love Florence! 4

  17. How to measure “difficulty”? Language models measure surprisal / information content (NLL; − log p ( · ) ): p ( · ) ⇒ NLL en 0.03 ⇒ 5 bits I love Florence! de 0.008 ⇒ 7 bits Ich grüße meine Oma und die Familie dahein. 4

  18. How to measure “difficulty”? Language models measure surprisal / information content (NLL; − log p ( · ) ): p ( · ) ⇒ NLL en 0.03 ⇒ 5 bits I love Florence! de 0.008 ⇒ 7 bits Ich grüße meine Oma und die Familie dahein. nl 0.0004 ⇒ 11 bits Alle mensen worden vrij en gelijk in waardigheid en rechten geboren. 4

  19. How to measure “difficulty”? Language models measure surprisal / information content (NLL; − log p ( · ) ): p ( · ) ⇒ NLL en 0.03 ⇒ 5 bits I love Florence! de 0.008 ⇒ 7 bits Ich grüße meine Oma und die Familie dahein. nl 0.0004 ⇒ 11 bits Alle mensen worden vrij en gelijk in waardigheid en rechten geboren. Issue 1: Different topics / styles / content 4

  20. How to measure “difficulty”? Language models measure surprisal / information content (NLL; − log p ( · ) ): p ( · ) ⇒ NLL en 0.013 ⇒ 6.5 bits Resumption of the session. de 0.011 ⇒ 6.3 bits Wiederaufnahme der Sitzung. nl 0.012 ⇒ 6.4 bits Hervatting van de sessie. Issue 1: Different topics / styles / content Solution: train and test on translations! Europarl: 21 languages share ~40M chars Bibles: 62 languages share ~4M chars 4

  21. How to measure “difficulty”? Language models measure surprisal / information content (NLL; − log p ( · ) ): p ( · ) ⇒ NLL en 0.013 ⇒ 6.5 bits Resumption of the session. de 0.011 ⇒ 6.3 bits Wiederaufnahme der Sitzung. nl 0.012 ⇒ 6.4 bits Hervatting van de sessie. Issue 1: Different topics / styles / content Solution: train and test on translations! Europarl: 21 languages share ~40M chars Bibles: 62 languages share ~4M chars and this one takes � � a big ILP to solve, Gurobi which is really fun 4

  22. How to measure “difficulty”? Language models measure surprisal / information content (NLL; − log p ( · ) ): p ( · ) ⇒ NLL en 0.013 ⇒ 6.5 bits Resumption of the session. de 0.011 ⇒ 6.3 bits Wiederaufnahme der Sitzung. nl 0.012 ⇒ 6.4 bits Hervatting van de sessie. Issue 1: Different topics / styles / content Solution: train and test on translations! Europarl: � 69 languages 21 languages share ~40M chars Bibles: 62 languages share ~4M chars s l i e m i a e f g u a g a n l 1 3 and this one takes � � a big ILP to solve, Gurobi which is really fun 4

  23. How to measure “difficulty”? Language models measure surprisal / information content (NLL; − log p ( · ) ): p ( · ) ⇒ NLL en 0.013 ⇒ 6.5 bits Resumption of the session. de 0.011 ⇒ 6.3 bits Wiederaufnahme der Sitzung. nl 0.012 ⇒ 6.4 bits Hervatting van de sessie. Issue 1: Different topics / styles / content Issue 2: Comparing scores Solution: train and test on translations! Europarl: � 69 languages 21 languages share ~40M chars Bibles: 62 languages share ~4M chars s l i e m i a e f g u a g a n l 1 3 and this one takes � � a big ILP to solve, Gurobi which is really fun 4

  24. How to measure “difficulty”? Language models measure surprisal / information content (NLL; − log p ( · ) ): p ( · ) ⇒ NLL en 0.013 ⇒ 6.5 bits Resumption of the session. de 0.011 ⇒ 6.3 bits Wiederaufnahme der Sitzung. nl 0.012 ⇒ 6.4 bits Hervatting van de sessie. Issue 1: Different topics / styles / content Issue 2: Comparing scores Solution: train and test on translations! Use total bits of an Europarl: � 69 languages 21 languages share ~40M chars Bibles: 62 languages share ~4M chars s open-vocabulary model . l i e m i a e f g u a g a n l 1 3 and this one takes � � Why? a big ILP to solve, Gurobi which is really fun 4

  25. How to compare your language models across languages 1. We need to be open-vocabulary – no UNKs. 5

  26. How to compare your language models across languages 1. We need to be open-vocabulary – no UNKs. Every UNK is “cheating” – morphologically rich languages have more UNKs, unfairly advantaging them. 5

  27. How to compare your language models across languages 1. We need to be open-vocabulary – no UNKs. Every UNK is “cheating” – morphologically rich languages have more UNKs, unfairly advantaging them. 2. We can’t normalize per word or even per character in languages individually. 5

  28. How to compare your language models across languages 1. We need to be open-vocabulary – no UNKs. Every UNK is “cheating” – morphologically rich languages have more UNKs, unfairly advantaging them. 2. We can’t normalize per word or even per character in languages individually. Example: if puˇ c cz and Putsch de are equally likely, they should be equally “difficult.” 5

  29. How to compare your language models across languages 1. We need to be open-vocabulary – no UNKs. Every UNK is “cheating” – morphologically rich languages have more UNKs, unfairly advantaging them. 2. We can’t normalize per word or even per character in languages individually. Example: if puˇ c cz and Putsch de are equally likely, they should be equally “difficult.” ⇒ just use overall bits (i.e., surprisal / NLL) of an aligned sentence 5

  30. How to compare your language models across languages 1. We need to be open-vocabulary – no UNKs. Every UNK is “cheating” – morphologically rich languages have more UNKs, unfairly advantaging them. 2. We can’t normalize per word or even per character in languages individually. Example: if puˇ c cz and Putsch de are equally likely, they should be equally “difficult.” ⇒ just use overall bits (i.e., surprisal / NLL) of an aligned sentence [ note: total easily obtainable from BPC or perplexity by multiplying with total chars / words ] 5

  31. How to aggregate multiple intents’ surprisals into “difficulties”? For fully parallel corpora... en de bg Resump- Wieder- Възобн- 1 tion aufnah- овяване of the me der на се- session ... ... The Der Мирът, 2 peace gestern който that verein- беше ... ... ... Although Obwohl Макар 3 we were wir че не not al- nicht бяхме ... ... ... Now we Jetzt Накрая 4 can fi- ist die всички nally Zeit можем ... ... ... aligned multi-text Image CC-BY Mike Grauer Jr / flickr

Recommend


More recommend