META-NET: Towards a Strategic Research Agenda for Multilingual Europe Georg Rehm DFKI, Germany georg.rehm@dfki.de Multilingual Web Workshop – Limerick, Ireland September 21, 2011 Co-funded by the 7th Framework Programme and the ICT Policy Support Programme of the European Commission through the contracts T4ME, CESAR, METANET4U, META-NORD (grant agreements no. 249119, 271022, 270893, 270899).
Outline q Introduction q The META-NET Language White Paper Series q Towards a Strategic Research Agenda for Multilingual Europe http://www.meta-net.eu 2
Multilingual Europe q Challenge: Providing each language community with the most advanced technologies for communication and information so that maintaining their mother tongue does not turn into a disadvantage. q Research has made considerable progress in recent years. q But: the pace of progress is not fast enough to meet the challenge within the next 10-20 years. q All stakeholders – researchers, LT user and provider industries, language communities, funding programmes, policy makers – should team up for a major dedicated push . http://www.meta-net.eu 3
Objectives META-NET is a network of excellence dedicated to fostering the tech- nological foundations of the European multilingual information society. http://www.meta-net.eu 4
Four Funded Projects q Initial project: T4ME (FP7; 13 partners, 10 countries; 7 Mio. € ) q Three new support consortia (ICT-PSP) started in February 2011. q All EU member states and several non-member states covered. q META-NET in September 2011: 47 members from 31 countries. http://www.meta-net.eu/members http://www.meta-net.eu 5
META q META-NET is a network of excellence. q META is an open and growing strategic technology alliance: Multilingual Europe Technology Alliance. § Almost 300 members, including W3C, Google, Microsoft, GALA, research centres, LT companies etc. § META includes multiple stakeholders to prepare the ground for a large-scale concerted effort. § Main goal: to support the Strategic Research Agenda. § Join us! http://www.meta-net.eu/join http://www.meta-net.eu 6
META-VISION The META-NET Language White Paper Series http://www.meta-net.eu 7
The Language White Papers q LT support varies greatly from language to language. q Inform about the current status and availability of LRs and LTs. q Survey of the state of ca. 30 languages in the digital society. q Target audience: politicians, journalists, decision makers, the public at large. q Key messages: societal and technological problems, challenges, economic opportunities. http://www.meta-net.eu 8
Structure of the White Papers q Executive Summary q Part 1: Introduction – A Risk for Our Languages and a Challenge for LT q Part 2: Language in the European Information Society q Part 3: LT Support for Language q Part 4: About META-NET q References http://www.meta-net.eu 9
30 Languages Covered so far q Basque q Galician q Norwegian q Bulgarian* q German* q Polish* q Catalan q Greek* q Portuguese* q Czech* q Hungarian* q Romanian* q Danish* q Icelandic q Serbian q Dutch* q Irish* q Slovak* q English* q Italian* q Slovene* q Estonian* q Latvian* q Spanish* q Finnish* q Lithuanian* q Swedish* q French* q Maltese* q Croatian * = Official EU language http://www.meta-net.eu 10
Assessing LT Support q Experts provided estimations, condensed several times, aggregated in a table assessing core technology areas and resources. q Individual tables with tools and resources provide data for each language (existing tools, gaps etc.). q Results for each application area and resource type were derived from two features ( quality , coverage ), resulting in a big table: Basque Bulgarian Catalan Croatian Czech Danish Dutch English Estonian Finnish French Galician German Greek Hungarian Icelandic Irish Italian Latvian Lithuanian Maltese Norwegian Polish Portuguese Romanian Serbian Slovak Slovene Spanish Swedish Language Technology (Tools, Technologies, Applications) Tokenization, Morphology (tokenization, POS tagging, 5 5 5 5 0 5 3,1 4,1 5 4 4 4,1 5 4 4,1 4,1 4,1 3,1 4,1 3 3,1 4,1 5 4,1 5 5 3,1 4,1 5 4,1 Parsing (shallow or deep syntactic analysis) morphological analysis/generation) 4 4 3 2 5 3,1 2,1 4,1 3,1 3,1 4 4,1 3 2,1 4 4 2 3,1 2,1 1,1 0 3,1 4 3,1 4 3,2 0 3,1 4 4,1 Sentence Semantics (WSD, argument structure, semantic roles) 3,1 2,1 2 1,2 3,1 1,1 2,1 3,1 2 2 1,1 2,1 1,1 2 1,2 1,1 0 4 0 1,1 0 3,1 1,3 3,1 4 0 0 2,2 2,1 2 Text Semantics (coreferenceresolution, context, pragmatics, 1 2 1,1 0 3 1 2 1,1 2 1 2,1 2,1 2,1 2 0,2 0 0 3 0 1 0 3 1,2 1,2 4,1 0 0 0 2 2,1 inference) Advanced Discourse Processing (text structure, coherence, 1 0 2 0 3 1 0 2 0 0 2 0 2,1 1 0 0 0 2 0 1 0 3 1 2 3,1 0 0 0 1 1 rhetorical structure/RST, argumentative zoning, argumentation, Information Retrieval (text indexing, multimedia IR, crosslingual 4 2 1,2 2,3 0 3 3 4,1 3 3 4,1 2 3 3,1 1,1 0 3,1 4,1 0 1,2 0 4 2 0 5 3 2,1 0 2 3,1 IR) Information Extraction (named entity recognition, 3 3 1,1 3,1 4,1 3 2,1 3,1 2 2 3,1 1,2 3 3 6 1 0 4,1 3 3 0 4 2 3,1 4,1 2 1 2,1 1,1 4 event/relation extraction, opinion/sentiment recognition, text Language Generation (sentence generation, report generation, 0 2 1,2 0,4 4 0 2,1 2 0 2,2 2 0 2 1,1 0 0 3 0 1,2 0 0 3,1 1 0 0 0 0 0 2 2,1 text generation) Summarization, Question Answering,advanced Information 2 2 0 0,1 3 2,1 2,1 2 2 2 3 1,1 2 1,1 0 0 0 3 0 0,1 0 3,1 2 2,2 4,1 0,1 1 1,1 2,1 1 Access Technologies Machine Translation 3,1 2 3,1 1,2 0 1,2 2,2 2,1 2,1 3 3,1 4,1 2,1 1 5 2 2,1 3,1 4 3 2,1 2,2 3 2,1 3,1 0,1 2 3,1 4,1 2,2 Speech Recognition 1 3 3 3 2,1 1,2 3,1 4 4 3 4 5 4 3,1 2,2 1,1 3,1 4,1 0 1,1 1 1,1 3,1 2,2 2,1 1 2 2,1 3,1 3,1 Speech Synthesis 2,4 3 4 3,1 4 2,1 4 4,1 4 4 4 5 4,1 4,1 4 2,1 3,1 4 3,1 3 4 2,1 5,1 4 2 4 3 3,2 4 3 Dialogue Management (dialogue capabilities and user 0 0 2,2 1 3,1 1 2,1 3,1 3 1,1 3 1 3,1 1,2 0 0 0 3 0 0 0 1,1 1 3 0 0 0 2,1 2 3 modelling) Language Resources (Resources, Data, Knowledge Bases) Reference Corpora 2,3 4,1 3,1 3,1 5 3,1 2,2 4,1 4 3,1 3,1 5 3,1 3 6 3,1 3,2 3 4,1 4 3 3 4 4,1 1,1 2,2 4,1 4,1 3,1 3,1 Syntax-Corpora (treebanks, dependency banks) 2,2 2,1 3 3,1 3,3 1,3 2,2 4,2 2,1 3,2 3 2 3 3,1 5,1 2,2 1,2 3 1 1 0 3,1 4 4 4,1 0 2 3,2 2 3 Semantics-Corpora 1 4,1 1 0 3,1 1,2 1,2 3 2 0 1,1 1 1,1 2,1 1,5 0 0 4 1 0 0 2,1 2,2 3,1 2,1 0 0 1,4 2 1 Discourse-Corpora 0 2 2 0 2,1 1,3 0 3 2,1 2,1 2 0 2 0 0 0 0 2,2 0 0 0 1,1 1,1 2 2,1 0 1,1 0 3 1 Parallel Corpora, Translation Memories 0 2,2 2,1 3 3,1 2,1 2,1 4 2,1 3 3,1 5 2 2 6 1,1 3,2 3,1 3,1 3,1 2,1 4,1 4 2,1 4,1 2,1 2 2,2 3,1 3,2 Speech-Corpora (raw speech data, labelled/annotated speech 2,2 2,1 3,1 3 2,2 1,2 4,1 5,1 3,1 2,1 3,1 4,1 2,1 2,1 2,2 2 2,2 2,1 1 2 2,1 3,2 3 4 2,2 4 2 3,1 2,1 3 data, speech dialogue data) Multimedia and multimodal data 5 1 2 3,1 2,2 1,2 1,3 1,1 1 2,1 1,2 2,2 1,2 2,1 1 1 1,1 3,1 0 1 0 4,1 1 0 0 1,1 2,1 0 2 1 Language Models 2 2 2,1 0 4 3 2,1 5 3 2 3 4,1 3 2,1 3,1 3 0 0 3,1 3,1 3 1 1 0 4 2,1 1,2 2,2 2 4 Lexicons, Terminologies 5,1 3,1 3,1 3,1 3,1 4 3,1 4,1 5 4 3,1 4,1 3,1 3 6 3 4 4,1 5 3,1 2,1 5 4 4,1 4,1 4 3,1 2,2 3 4,1 Grammars 3,1 3 2 0 2,1 1,3 2,1 3 4 4 3 2 3 1 5,1 3 3 3 3,1 0 0 3,2 4 2,3 2,1 0,1 2,1 2,1 3 3 Thesauri, WordNets 4 4,1 2,2 3,1 3,1 3 2,1 4,1 3,1 3,1 1,1 4 2,1 1,1 3,3 3 3,1 3,1 2,1 1 0 0 4 2,2 4 2,1 1,1 3 3 4,1 Ontological Resources for World Knowledge (e.g. upper 2 3 2,1 0 2,1 1,1 0 4 0 2,1 1,1 1 2,1 2 1 0 0 3,1 1 1,1 0 0 2,2 2 2 0,1 0 0 2 1 models, Linked Data) http://www.meta-net.eu 11
Preliminary Results q For journalists and politicians the big table is useless. q Solution is a cluster-based approach: (a) Speech; (b) Machine Translation; (c) Text Analysis; (d) Resources. Cluster 1 : excellent LT support q § Technologies are in widespread use showing human-quality performance. Cluster 2 : good support q § Technologies exist; reasonable quality and performance Cluster 3 : medium support q § Research prototypes, quality and performance varies Cluster 4 : low to almost no support q § Drawing board or rudimentary prototypes; very limited quality and performance http://www.meta-net.eu 12
Cluster: Speech Cluster 1: Cluster 2: Cluster 3: Cluster 4: excellent support good support medium support low/no support English, French, Basque, Bulgarian Icelandic, Irish German, Spanish, Catalan, Croatian Latvian, Italian, Dutch, Estonian, Galician Lithuanian, Czech, Danish, Greek, Hungarian Maltese, Portuguese, Polish, Serbian Norwegian Finnish Slovene, Swedish Romanian, Slovak http://www.meta-net.eu 13
Recommend
More recommend