Multilingual Europe As A Challenge for Language Technology Ryan McDonald Google NLU team Google Linguistics team Universal Dependencies Group
Europe & the Internet 54% 32% 14% other ~600M internet users (~80% internet penetration) Language of top 10M pages (W3Tech 2015) META-Forum 2016
Google: Mobile vs. Desktop META-Forum 2016
Search & Language Technology Ten links META-Forum 2016
Search & Language Technology META-Forum 2016
Mobile & Language Technology Generation Speech & recognition Text-to-speech Natural Language Understanding META-Forum 2016
Mobile & Language Technology Q & A Predictive info Get weather Order a pizza META-Forum 2016
Mobile & Language Technology Mobile is the future Language technologies key to mobile experience Users demand native language support Europe : large market dozens of languages META-Forum 2016
Language Technology Productionization NLU system English version Internationalize META-Forum 2016
Baked-in multilingualism in NLU NLU system Multilingual systems META-Forum 2016
End-to-end NN Pros Cons Simple & flexible Hard to interpret Multilingual by nature? Need a lot of data** High accuracy** META-Forum 2016
Similarity / Paraphrase / NLI Parikh et al. 2016 Units/words? Word order? Structural bias? Multilingual?? META-Forum 2016
Intermediate Representations Useful (discrete) abstractions for i18n NLU? NLU Abstraction layer Ο π ληθυσ μ ός του Καναδά είναι 35,000,000 (The population of Canada is 35M) META-Forum 2016
Morphosyntax NLP / Analysis Ο Ιωάννης είδε τους γονείς του , όταν π ήγε στην Αθήνα . John saw family his when went to Athens Ήταν ευτυχής να τον δουν . δουν : number: plural were happy to him see person: third tense: subjunctive/future Ήταν : Missing pronoun realization number: ? person: third tense: past 1. Syntax to identify relevant verbs πήγε : 2. Morphology to piece together number: singular person: third gender/number tense: past NLU focused awards, May 2016
Morphosyntax Generation / TTS population( Καναδάς , 35.000.000) + Ο π ληθυσ μ ός <masc-gen-sing-prep> <masc-gen-sing-Entity> <sing-cop> <pop> = Ο π ληθυσ μ ός του Καναδά είναι 35.000.000 nom-fem-plur = Ο π ληθυσ μ ός του Καναδά είναι τριάντα π έντε εκατο μμ ύρια NLU focused awards, May 2016
Universal Dependencies (UD) (Nivre et al. 2016) ✤ Content-head reigns supreme for dependencies ✤ UPOS + Morphology + lemma surface analysis ~26 European languages covered in v1.3 META-Forum 2016
Analysis: SyntaxNet (Andor et al. 2016) UD Chen & Manning 14 Weiss et al. 15 META-Forum 2016
Morphosyntactic Analysis @ Google META-Forum 2016
Generation: Entity Lexicons Καναδάς : Gender=Masc, Case=Nom, Number=Sing Καναδά : Gender=Masc, Case=Acc, Number=Sing Καναδά : Gender=Masc, Case=Gen, Number=Sing Inflectional table Edit-distance Ο Καναδάς έχει π ολλά δέντρα . Unsup-morpher etc. Gender=Masc Case=Nom Number=Sing Ο π ληθυσ μ ός του Καναδά είναι 35M. Gender=Masc Case=Gen Number=Sing Είδα τον Καναδά μ ε το τρένο . Gender=Masc Case=Acc Number=Sing META-Forum 2016
Summary + NLU = Multilingualism in NLU from ground-up End-to-end, Morphosyntax, more? META-Forum 2016
Thanks
Recommend
More recommend