the multijedi erc project multilingual joint word sense
play

The MultiJEDI ERC Project: Multilingual Joint Word Sense - PowerPoint PPT Presentation

The MultiJEDI ERC Project: Multilingual Joint Word Sense Disambiguation Roberto Navigli http://lcl.uniroma1.it 5 July 2016 META-FORUM 2016 ing Andrea Moro Alessandro Claudio Raganato Delli Bovi Daniele 11.07.16 Tiziano Vannella


  1. The MultiJEDI ERC Project: Multilingual Joint Word Sense Disambiguation Roberto Navigli http://lcl.uniroma1.it 5 July 2016 – META-FORUM 2016

  2. ing Andrea Moro Alessandro Claudio Raganato Delli Bovi Daniele 11.07.16 Tiziano Vannella Flati Francesco Simone Ponzetto Cecconi Taher Pilehvar José Ignacio Iacobacci Camacho The MultiJEDI ERC Project 11.07.16 2 Federico Roberto Navigli Collados Scozzafava

  3. Multilingual Web Access – WWW 2015 11.07.16 3 Roberto Navigli

  4. You may say I'm a dreamer, but I am not the only one. I hope someday you'll join us . And the world will be as one! -John Lennon Recent achievements in multilingual NLP 11.07.16 4 Roberto Navigli

  5. A 5-year ERC Starting Grant (2011-2016) on Multilingual Word Sense Disambiguation http://multijedi.org The MultiJEDI ERC Project 11.07.16 5 Roberto Navigli

  6. INTEGRATING KNOWLEDGE [Navigli & Ponzetto, ACL 2010; Pilehvar & Navigli, ACL 2014] The MultiJEDI ERC Project 11.07.16 Roberto Navigli

  7. The resource diaspora The MultiJEDI ERC Project 11.07.16 7 Roberto Navigli

  8. Multilingual Joint Word Sense Disambiguation (MultiJEDI) Key Objective 1: create knowledge for all languages MultiWordNet BalkaNet WOLF MCR GermaNet WordNet The MultiJEDI ERC Project 11.07.16 8 Roberto Navigli

  9. Merging entries from different resources into BabelNet • We collect lexicalizations, definitions, translations, images, etc. from each of the merged resources WordNet The MultiJEDI ERC Project 9 � Roberto Navigli

  10. What is BabelNet? • A merger of resources of different kinds: META Prize 2015: BabelNet 11.07.16 10 Roberto Navigli

  11. What is BabelNet? • A merger of resources of different kinds: – WordNet: the most popular computational lexicon of English – Open Multilingual WordNet: a collection of open wordnets – WoNeF: a French WordNet – Wikipedia: the largest collaborative encyclopedia – Wikidata: the largest collaborative knowledge base – Wiktionary: the largest collaborative dictionary – OmegaWiki: a medium-size collaborative multilingual dictionary – GeoNames: a worldwide geographical database – Microsoft Terminology: a computer science thesaurus – High-quality automatic sense-based translations The MultiJEDI ERC Project 11.07.16 11 Roberto Navigli

  12. Why do we need BabelNet? • Multilinguality: the same concept is expressed in tens of languages The MultiJEDI ERC Project 11.07.16 12 Roberto Navigli

  13. Why do we need BabelNet? • Multilinguality: the same concept is expressed in tens of languages The MultiJEDI ERC Project 11.07.16 13 Roberto Navigli

  14. Why do we need BabelNet? • Multilinguality: the same concept is expressed in tens of languages The MultiJEDI ERC Project 11.07.16 14 Roberto Navigli

  15. Why do we need BabelNet? • Multilinguality: the same concept is expressed in tens of languages • Coverage: 271 languages and 14 million entries! – 6M concepts and 7.7M named entities – 119M word senses – 378M semantic relations (27 relations per concept on avg.) – 11M images associated with concepts – 41M textual definitions – 2M concepts with domains associated The MultiJEDI ERC Project 11.07.16 15 Roberto Navigli

  16. Why do we need BabelNet? • Multilinguality: the same concept is expressed in tens of languages • Coverage: 271 languages and 14 million entries! • Concepts and named entities together: dictionary and encyclopedic knowledge is semantically interconnected Multilingual Web Access – WWW 2015 META Prize 2015: BabelNet 11.07.16 11.07.16 16 16 Roberto Navigli Roberto Navigli

  17. Why do we need BabelNet? • Multilinguality: the same concept is expressed in tens of languages • Coverage: 271 languages and 14 million entries! • Concepts and named entities together: dictionary and encyclopedic knowledge is semantically interconnected • "Dictionary of the future": semantic network structure with labeled relations, pictures, multilingual synsets Multilingual Web Access – WWW 2015 META Prize 2015: BabelNet 11.07.16 11.07.16 17 17 Roberto Navigli Roberto Navigli

  18. Why do we need BabelNet? • Multilinguality: the same concept is expressed in tens of languages • Coverage: 271 languages and 14 million entries! • Concepts and named entities together: dictionary and encyclopedic knowledge is semantically interconnected • "Dictionary of the future": semantic network structure with labeled relations, pictures, multilingual synsets • Full-fledged taxonomy: is-a relations are available for both concepts and named entities ( Wikipedia Bitaxonomy ) – Ferrari Testarossa is-a sports car – BabelNet is-a semantic network & encyclopedic dictionary The MultiJEDI ERC Project 11.07.16 18 Roberto Navigli

  19. Why do we need BabelNet? • Multilinguality: the same concept is expressed in tens of languages • Coverage: 272 languages and 14 million entries! • Concepts and named entities together: dictionary and encyclopedic knowledge is semantically interconnected • "Dictionary of the future": semantic network structure with labeled relations, pictures, multilingual synsets • Full-fledged taxonomy: is-a relations are available for both concepts and named entities ( Wikipedia Bitaxonomy ) • Easy access: Java and HTTP RESTful APIs; SPARQL endpoint (2 billion triples); downloadable indices for research purposes The MultiJEDI ERC Project 11.07.16 19 Roberto Navigli

  20. The core of the Linguistic Linked Open Data cloud!

  21. What can we do with BabelNet? • Search and translate: The MultiJEDI ERC Project 11.07.16 21 Roberto Navigli

  22. What can we do with BabelNet? META Prize 2015: BabelNet 11.07.16 22 Roberto Navigli

  23. What can we do with BabelNet? • Explore the network: META Prize 2015: BabelNet 11.07.16 23 Roberto Navigli

  24. WordNet-Wikipedia mapping accuracy • Quality lower bound of the mapping: 87% – On the 6000 lowest-confidence mappings – Note: this concerns only 50k synsets in the intersection BabelNet & friends 11.07.16 24 Roberto Navigli

  25. Creating Datasets with BabelNet: Key fact! all in one! • Annotating with BabelNet implies annotating with WordNet, Wikipedia, OmegaWiki, Open Multilingual WordNet, Wikidata and Wiktionary BabelNet 7 25 The MultiJEDI ERC Project 25 Roberto Navigli

  26. ADDRESSING AMBIGUITY [Moro, Raganato & Navigli, TACL 2014] The MultiJEDI ERC Project 26 Roberto Navigli

  27. Motivation (1): hungry computers • EN - The mouse ate the cheese The MultiJEDI ERC Project 11.07.16 27 Roberto Navigli

  28. Motivation (1): hungry computers • EN - The mouse ate the cheese • FR - La souris a mangé le fromage. The MultiJEDI ERC Project 11.07.16 28 Roberto Navigli

  29. Motivation (1): hungry computers • EN - The mouse ate the cheese • FR - La souris a mangé le fromage. • IT - Il mouse ha mangiato il formaggio The MultiJEDI ERC Project 11.07.16 29 Roberto Navigli

  30. Multilingual Joint Word Sense Disambiguation (MultiJEDI) Key Objective 2: use all languages to disambiguate one The MultiJEDI ERC Project 11.07.16 30 Roberto Navigli

  31. So what? • The first (and only) system that performs Word Sense Disambiguation (common nouns, verbs, adjectives) and Entity Linking together • In arbitrary languages (270+ languages) • In multiple languages at once The MultiJEDI ERC Project 11.07.16 31 Roberto Navigli

  32. Step 4: Select the most reliable meanings “Thomas and Mario are strikers playing in Munich” Munich (City) Seth Thomas Mario (Character) striker (Sport) Mario (Album) Striker (Video Game) Thomas Müller FC Bayern Munich Mario Gómez Striker (Movie) Thomas (novel) Munich (Song) The MultiJEDI ERC Project 11.07.16 32 Roberto Navigli

  33. Experimental Results: Fine-grained (Multilingual) Disambiguation SemEval-2007 SemEval-2013 task 12 task 17 Senseval-3 The MultiJEDI ERC Project 11.07.16 34 Roberto Navigli

  34. Experimental Results: KORE50, AIDA-CoNLL • Two gold-standard Entity Linking datasets: The MultiJEDI ERC Project 11.07.16 35 Roberto Navigli

  35. Babelfy "understands" 'the mouse ate the cheese'! The MultiJEDI ERC Project 11.07.16 36 Roberto Navigli

  36. WSD and Entity Linking together win! The MultiJEDI ERC Project 11.07.16 37 Roberto Navigli

  37. The Crazy Polyglot! Multilingual Web Access – WWW 2015 11.07.16 38 Roberto Navigli

  38. Live demo (2) – Crazy polyglot! EN In today ʼ s knowledge and information society FR le paysage lexicographique est plus hétérogène que jamais. IT Possono le risorse stand-alone competere ES con múltiples funciones, portale lexicográficas multilingüe y servicios web, ZH Web 服 � ,定 制 的 喜 好 和 个 人 用 � 的 个 人 � 料 ? The MultiJEDI ERC Project 11.07.16 39 Roberto Navigli

  39. BabelNet 3.6 is now a knowledge base! • Semantic relations from Wikidata + Infoboxes (superset of DBpedia) + relations extracted with Open Information Extraction techniques The MultiJEDI ERC Project 79 Roberto Navigli

  40. SENSE AND CONCEPT REPRESENTATIONS [Iacobacci et al., ACL 2015; Camacho-Collados et al., NAACL+ACL 2015] The MultiJEDI ERC Project 41 Roberto Navigli

  41. Latent representation of word senses: SensEmbed Iacobacci, Pilehvar, Navigli (ACL 2015) Représentations vectorielles latentes et 11.07.16 42 explicites Roberto Navigli

Recommend


More recommend