The Multilingual Language Library @ LREC 2012 Lets build it together! - PowerPoint PPT Presentation

The Multilingual Language Library @ LREC 2012 Let’s build it together! Nicoletta Calzolari w ith Riccardo Del Gratta, Francesca Frontini, Francesco Rubino, Irene Russo Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it N. Calzolari W3C Workshop, Luxembourg, March 2012 1

The trend Make a better use of the In Europe we are building the META-SHARE platform, to share LRs and tools It is a big step ... We need a real Paradigm shift , towards BUT Collaborative iResources LR building as a collaborative “common shared task” New methodology of work Interoperability acquires even more value 2 N. Calzolari W3C Workshop, Luxembourg, March 2012

Context & Vision The context NLP is data intensive Every paper in our conferences speaks about “data” Annotation is at the core of training, acquiring, testing, ... But our efforts are still very scattered, with not enough possibility of exploitation A Multilingual “Language Library” Vision As a Large International Initiative (parallel?) texts for languages With possible types of processing, annotation layers, ... Similar to more mature sciences , e.g. physics, or the Genome project, … with tho housand nds of pe peopl ple working ng togethe her on the same big experiment W3C Workshop, Luxembourg, March 2012 3 N. Calzolari

A Language Library Accumulation of massive amounts of multi-dimensional Rationale data is the key to foster advancement in our knowledge about language & its mechanisms Strategy Create an infrastructure for a Where we all Encourage As a Collaborative Resource: in the sharing paradigm The major challenges : At the organisational/design level? At the community involvement level? 4 N. Calzolari W3C Workshop, Luxembourg, March 2012

The first step a new feature @ LREC We: An LREC Repository Hosting a number of (comparable/parallel) resources In as many languages as possible On all modalities (speech, text, images, etc.) Also as a contribution to META-SHARE Authors: are invited to process data In the language(s) they can process In one or more of the possible dimensions they can address (e.g. POS-tag the data, extract/annotate named entities, annotate temporal information, disambiguate word senses, transcribe audio, translate, etc.) Upload the processed data back in the LREC Repository Can also contribute with own raw or processed data, sending to languagelibrary@lrec-conf.org 5 N. Calzolari W3C Workshop, Luxembourg, March 2012

Flow 6 N. Calzolari W3C Workshop, Luxembourg, March 2012

Some data: Languages Processed files We offer data in 64 languages 179 English 111 Spanish 80 Catalan 64 Russian 54 Arabic 54 Burmese 40 Japanese 27 Burmese, English 22 Bulgarian 22 Serbian 21 German 20 Dutch 7 Uyghur 3 English, Italian, … 7 N. Calzolari W3C Workshop, Luxembourg, March 2012

Some data: Annotation type 61 Temporal Expressions (for English, German, Dutch) 48 Named Entities 41 Pos Tagging 38 Segmentation 20 Lexical substitution 13 Lemmatization 10 Normalization of named entities 10 Semantic Classes 9 Alignment 2 Sound to Text Alignment 1 Events 1 Semantic Relations 1 Semantic Roles 1 Treebanks 8 N. Calzolari W3C Workshop, Luxembourg, March 2012

Some data: Tools used 187 FreeLing 61 HeidelTime 28 Athena 22 Unitex corpus processing tool 21 BulTreeBank Bulgarian Language Pipeline 21 Sense Substituter based on Resource described in Submission 20 Illinois Named Entity Tagger 18 Buckwalter, Aragen 7 ULex mobile online corpus enrichment tool for language documentation and local language speech technology 4 GRAMPAL tagger 3 Sentence alignment (Hunalign) 2 The Sketch Engine 312 [no tool declared] 9 N. Calzolari W3C Workshop, Luxembourg, March 2012

Some data: Standards 80 GrAF format 69 Timex3 21 Weblicht 7 CoNLL 2009 3 XCES 5 Hybrid LMF with ULex- XML extension 1 IPA character set in UTF-8 encoding 431 [no standard declared] 10 N. Calzolari W3C Workshop, Luxembourg, March 2012

Availability The processed data will be made available to all the LREC participants before the conference, to be compared and analysed Processed data will be visible through META-SHARE as a special META-SHARE LREC repository This first experiment on annotation/transcription/extraction/… over the same data and on a large number of processing dimensions May set the ground for a large Language Library Where everyone can deposit/create processed data of any sort – all our “knowledge” about language 11 N. Calzolari W3C Workshop, Luxembourg, March 2012

Collaborative & Interoperability Means a change of mentality: going beyond “my approach” To some “compromise” allowing to go for big amounts, building on each other … AND ... Interoperability issues Could be a framework for experimenting interoperability Also multilingually Please contribute here: http://languagelibrary.eu/ 12 N. Calzolari W3C Workshop, Luxembourg, March 2012

The Multilingual Language Library @ LREC 2012 Lets build it together! - PowerPoint PPT Presentation

The Multilingual Language Library @ LREC 2012 Lets build it together! Nicoletta Calzolari w ith Riccardo Del Gratta, Francesca Frontini, Francesco Rubino, Irene Russo Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it N.

Drupal 8 Multilingual Wonderland Gabor Hojtsy Acquia Foreign language site Multilingual site

Drupal 8s multilingual APIs Gbor Hojtsy DRUPAL 7 MULTILINGUAL DRUPAL 7 MULTILINGUAL Drupal

Multilingual App Toolkit Standards and multilingual software development 29, April 2015 Jan

ubiquity: designing a multilingual natural language interface mitcho Michael Yoshitaka Erlewine

Library Department FY 2021 Library Department FY 2021 Library Organization Chart Springfield

Presentation 7.3b: Multiple linear regression Murray Logan 09 Aug 2016 library (GGally) library

CS11-737: Multilingual Natural Language Processing Language contact Yulia Tsvetkov Language

Multilingual User Generated Content at Wikipedia Alolita Sharma Director of Language Engineering

Europeana: A Multilingual Trailblazer Juliane Stiller, Marlies Olensky Berlin School of Library

AAPoly Library Orientation Library Contacts Phone : 61 3 8610 4132 Email : library@aapoly.edu.au

Monitoring and analysing multilingual media reports Monitoring and analysing multilingual media

Multilingual Web: Affordable for SMEs and Small Organizations? Multilingual Communication

Verbs in the Open Multilingual Wordnet Francis Bond Linguistics and Multilingual Studies,

From multilingual documents to multilingual websites: challenges for international organizations

Creating Multilingual Creating Multilingual Drupal 7 Websites: Drupal 7 Websites: Part 2 Part

Standards for multilingual web sites MultilingualWeb.eu, 4-5 April 2011, Pisa, Italy M.T.

Comparative Advantage and Optimal Trade Taxes Arnaud Costinot (MIT), Dave Donaldson (MIT),

Mike Blows (UK) mike blows@hotmail.c om 20 long ye ar s! 5000 dya ds/ tria ds/ fa milie s the

P2P Conversational Services Sipping Peer-to-Peer Ad-Hoc , IETF #64 11-11-2005 Marco Tomsu,

Storage Deduplication in Cloud Computing Joo Paulo and Jos Pereira University of Minho July

Memory Philipp Koehn 9 September 2019 Philipp Koehn Computer Systems Fundamental: Memory 9

Tractable Representations Inference Probabilistic Learning Models Applications Guy Van den

ConnectHome Nation Webinar ConnectHome Nation Webinar Digital Inclusion Efforts in Detroit

Distributed optimization over networks: application to multi-building energy management Maria

Sambuz

Useful Links

Newsletter

Mail Us