The MultiJEDI ERC Project: Multilingual Joint Word Sense Disambiguation Roberto Navigli http://lcl.uniroma1.it 5 July 2016 – META-FORUM 2016
ing Andrea Moro Alessandro Claudio Raganato Delli Bovi Daniele 11.07.16 Tiziano Vannella Flati Francesco Simone Ponzetto Cecconi Taher Pilehvar José Ignacio Iacobacci Camacho The MultiJEDI ERC Project 11.07.16 2 Federico Roberto Navigli Collados Scozzafava
Multilingual Web Access – WWW 2015 11.07.16 3 Roberto Navigli
You may say I'm a dreamer, but I am not the only one. I hope someday you'll join us . And the world will be as one! -John Lennon Recent achievements in multilingual NLP 11.07.16 4 Roberto Navigli
A 5-year ERC Starting Grant (2011-2016) on Multilingual Word Sense Disambiguation http://multijedi.org The MultiJEDI ERC Project 11.07.16 5 Roberto Navigli
INTEGRATING KNOWLEDGE [Navigli & Ponzetto, ACL 2010; Pilehvar & Navigli, ACL 2014] The MultiJEDI ERC Project 11.07.16 Roberto Navigli
The resource diaspora The MultiJEDI ERC Project 11.07.16 7 Roberto Navigli
Multilingual Joint Word Sense Disambiguation (MultiJEDI) Key Objective 1: create knowledge for all languages MultiWordNet BalkaNet WOLF MCR GermaNet WordNet The MultiJEDI ERC Project 11.07.16 8 Roberto Navigli
Merging entries from different resources into BabelNet • We collect lexicalizations, definitions, translations, images, etc. from each of the merged resources WordNet The MultiJEDI ERC Project 9 � Roberto Navigli
What is BabelNet? • A merger of resources of different kinds: META Prize 2015: BabelNet 11.07.16 10 Roberto Navigli
What is BabelNet? • A merger of resources of different kinds: – WordNet: the most popular computational lexicon of English – Open Multilingual WordNet: a collection of open wordnets – WoNeF: a French WordNet – Wikipedia: the largest collaborative encyclopedia – Wikidata: the largest collaborative knowledge base – Wiktionary: the largest collaborative dictionary – OmegaWiki: a medium-size collaborative multilingual dictionary – GeoNames: a worldwide geographical database – Microsoft Terminology: a computer science thesaurus – High-quality automatic sense-based translations The MultiJEDI ERC Project 11.07.16 11 Roberto Navigli
Why do we need BabelNet? • Multilinguality: the same concept is expressed in tens of languages The MultiJEDI ERC Project 11.07.16 12 Roberto Navigli
Why do we need BabelNet? • Multilinguality: the same concept is expressed in tens of languages The MultiJEDI ERC Project 11.07.16 13 Roberto Navigli
Why do we need BabelNet? • Multilinguality: the same concept is expressed in tens of languages The MultiJEDI ERC Project 11.07.16 14 Roberto Navigli
Why do we need BabelNet? • Multilinguality: the same concept is expressed in tens of languages • Coverage: 271 languages and 14 million entries! – 6M concepts and 7.7M named entities – 119M word senses – 378M semantic relations (27 relations per concept on avg.) – 11M images associated with concepts – 41M textual definitions – 2M concepts with domains associated The MultiJEDI ERC Project 11.07.16 15 Roberto Navigli
Why do we need BabelNet? • Multilinguality: the same concept is expressed in tens of languages • Coverage: 271 languages and 14 million entries! • Concepts and named entities together: dictionary and encyclopedic knowledge is semantically interconnected Multilingual Web Access – WWW 2015 META Prize 2015: BabelNet 11.07.16 11.07.16 16 16 Roberto Navigli Roberto Navigli
Why do we need BabelNet? • Multilinguality: the same concept is expressed in tens of languages • Coverage: 271 languages and 14 million entries! • Concepts and named entities together: dictionary and encyclopedic knowledge is semantically interconnected • "Dictionary of the future": semantic network structure with labeled relations, pictures, multilingual synsets Multilingual Web Access – WWW 2015 META Prize 2015: BabelNet 11.07.16 11.07.16 17 17 Roberto Navigli Roberto Navigli
Why do we need BabelNet? • Multilinguality: the same concept is expressed in tens of languages • Coverage: 271 languages and 14 million entries! • Concepts and named entities together: dictionary and encyclopedic knowledge is semantically interconnected • "Dictionary of the future": semantic network structure with labeled relations, pictures, multilingual synsets • Full-fledged taxonomy: is-a relations are available for both concepts and named entities ( Wikipedia Bitaxonomy ) – Ferrari Testarossa is-a sports car – BabelNet is-a semantic network & encyclopedic dictionary The MultiJEDI ERC Project 11.07.16 18 Roberto Navigli
Why do we need BabelNet? • Multilinguality: the same concept is expressed in tens of languages • Coverage: 272 languages and 14 million entries! • Concepts and named entities together: dictionary and encyclopedic knowledge is semantically interconnected • "Dictionary of the future": semantic network structure with labeled relations, pictures, multilingual synsets • Full-fledged taxonomy: is-a relations are available for both concepts and named entities ( Wikipedia Bitaxonomy ) • Easy access: Java and HTTP RESTful APIs; SPARQL endpoint (2 billion triples); downloadable indices for research purposes The MultiJEDI ERC Project 11.07.16 19 Roberto Navigli
The core of the Linguistic Linked Open Data cloud!
What can we do with BabelNet? • Search and translate: The MultiJEDI ERC Project 11.07.16 21 Roberto Navigli
What can we do with BabelNet? META Prize 2015: BabelNet 11.07.16 22 Roberto Navigli
What can we do with BabelNet? • Explore the network: META Prize 2015: BabelNet 11.07.16 23 Roberto Navigli
WordNet-Wikipedia mapping accuracy • Quality lower bound of the mapping: 87% – On the 6000 lowest-confidence mappings – Note: this concerns only 50k synsets in the intersection BabelNet & friends 11.07.16 24 Roberto Navigli
Creating Datasets with BabelNet: Key fact! all in one! • Annotating with BabelNet implies annotating with WordNet, Wikipedia, OmegaWiki, Open Multilingual WordNet, Wikidata and Wiktionary BabelNet 7 25 The MultiJEDI ERC Project 25 Roberto Navigli
ADDRESSING AMBIGUITY [Moro, Raganato & Navigli, TACL 2014] The MultiJEDI ERC Project 26 Roberto Navigli
Motivation (1): hungry computers • EN - The mouse ate the cheese The MultiJEDI ERC Project 11.07.16 27 Roberto Navigli
Motivation (1): hungry computers • EN - The mouse ate the cheese • FR - La souris a mangé le fromage. The MultiJEDI ERC Project 11.07.16 28 Roberto Navigli
Motivation (1): hungry computers • EN - The mouse ate the cheese • FR - La souris a mangé le fromage. • IT - Il mouse ha mangiato il formaggio The MultiJEDI ERC Project 11.07.16 29 Roberto Navigli
Multilingual Joint Word Sense Disambiguation (MultiJEDI) Key Objective 2: use all languages to disambiguate one The MultiJEDI ERC Project 11.07.16 30 Roberto Navigli
So what? • The first (and only) system that performs Word Sense Disambiguation (common nouns, verbs, adjectives) and Entity Linking together • In arbitrary languages (270+ languages) • In multiple languages at once The MultiJEDI ERC Project 11.07.16 31 Roberto Navigli
Step 4: Select the most reliable meanings “Thomas and Mario are strikers playing in Munich” Munich (City) Seth Thomas Mario (Character) striker (Sport) Mario (Album) Striker (Video Game) Thomas Müller FC Bayern Munich Mario Gómez Striker (Movie) Thomas (novel) Munich (Song) The MultiJEDI ERC Project 11.07.16 32 Roberto Navigli
Experimental Results: Fine-grained (Multilingual) Disambiguation SemEval-2007 SemEval-2013 task 12 task 17 Senseval-3 The MultiJEDI ERC Project 11.07.16 34 Roberto Navigli
Experimental Results: KORE50, AIDA-CoNLL • Two gold-standard Entity Linking datasets: The MultiJEDI ERC Project 11.07.16 35 Roberto Navigli
Babelfy "understands" 'the mouse ate the cheese'! The MultiJEDI ERC Project 11.07.16 36 Roberto Navigli
WSD and Entity Linking together win! The MultiJEDI ERC Project 11.07.16 37 Roberto Navigli
The Crazy Polyglot! Multilingual Web Access – WWW 2015 11.07.16 38 Roberto Navigli
Live demo (2) – Crazy polyglot! EN In today ʼ s knowledge and information society FR le paysage lexicographique est plus hétérogène que jamais. IT Possono le risorse stand-alone competere ES con múltiples funciones, portale lexicográficas multilingüe y servicios web, ZH Web 服 � ,定 制 的 喜 好 和 个 人 用 � 的 个 人 � 料 ? The MultiJEDI ERC Project 11.07.16 39 Roberto Navigli
BabelNet 3.6 is now a knowledge base! • Semantic relations from Wikidata + Infoboxes (superset of DBpedia) + relations extracted with Open Information Extraction techniques The MultiJEDI ERC Project 79 Roberto Navigli
SENSE AND CONCEPT REPRESENTATIONS [Iacobacci et al., ACL 2015; Camacho-Collados et al., NAACL+ACL 2015] The MultiJEDI ERC Project 41 Roberto Navigli
Latent representation of word senses: SensEmbed Iacobacci, Pilehvar, Navigli (ACL 2015) Représentations vectorielles latentes et 11.07.16 42 explicites Roberto Navigli
Recommend
More recommend