COST PARSEME Working Groups Management Events IC1207 COST Action PARSEME PARSing and Multi-Word Expressions Towards linguistic precision and computational efficiency in natural language processing Agata Savary Linguistics Circle, University of Malta, 11 June 2013 1/18
COST PARSEME Working Groups Management Events COST inter-governmental framework (founded in 1971), coordination of nationally-funded European research, funded bu FP7 via ESF (European Science Foundation). 2/18
COST PARSEME Working Groups Management Events What Is a COST Action? bottom-up approach : scientific challenges defined by researchers, topics: major challenges in the foundations of science, objective: to overcome the research fragmentation issues, COST supports cooperation and dissemination : meetings, workshops, short-term missions, training schools, no direct research funding , precursor role for other European programmes , typically a large number of partners involved (about 20 countries ), experts from non-COST countries admitted (up to 4), important roles given to Early-Stage Researchers ( < PhD+8), budget: 129,000–156,000 euros per year for all partners, proposal selectivity : 6%. 3/18
COST PARSEME Working Groups Management Events COST Instruments Scientific/Working Group meetings, Workshops and Seminars, Short Term Scientific Missions (STSMs) (2 weeks – 3/6 months), Training Schools, Dissemination. 4/18
COST PARSEME Working Groups Management Events PARSEME: Pars ing and M ulti-word E xpressions General aim Increasing and enhancing the ICT support of the European multilingual heritage . Objectives to put multilingualism in focus of linguistic and technological studies, to establish a long-lasting collaboration of Natural Language Processing (NLP) experts within a cross-lingual , cross-theoretical and cross-methodological research network , bridging the gap between linguistic precision and computation efficiency in NLP application. 5/18
COST PARSEME Working Groups Management Events PARSEME: Pars ing and M ulti-word E xpressions General aim Increasing and enhancing the ICT support of the European multilingual heritage . Objectives to put multilingualism in focus of linguistic and technological studies, to establish a long-lasting collaboration of Natural Language Processing (NLP) experts within a cross-lingual , cross-theoretical and cross-methodological research network , bridging the gap between linguistic precision and computation efficiency in NLP application. 5/18
COST PARSEME Working Groups Management Events Key problem Multi-Word Expressions The prime time speech by first lady Michelle Obama set the house on fire . She made crystal clear which issues she took to heart , but she was preaching to the choir . Facts MWEs are prevalent (40% of text items), MWEs are complex phenomena involving different levels of language (lexicon, syntax, meaning . . . ) , MWEs are still not sufficiently understood, MWEs are under-represented in language resources and tools, MWEs are hard to detect, understand, translate, etc. 6/18
COST PARSEME Working Groups Management Events Key problem Multi-Word Expressions The prime time speech by first lady Michelle Obama set the house on fire . She made crystal clear which issues she took to heart , but she was preaching to the choir . Facts MWEs are prevalent (40% of text items), MWEs are complex phenomena involving different levels of language (lexicon, syntax, meaning . . . ) , MWEs are still not sufficiently understood, MWEs are under-represented in language resources and tools, MWEs are hard to detect, understand, translate, etc. 6/18
COST PARSEME Working Groups Management Events Consortium 75 members (official and unofficial) , 25 COST countries, 3 experts from 2 non-COST countries (USA, Brazil) , multidisciplinary experts: linguists, computational linguists, computer scientists, psycholinguists, industrials, . . . , different linguistic frameworks: CCG (Combinatory Categorial Grammar), DG (Dependancy Grammar), HPSG (Head-driven Phrase Structure Grammar), LFG (Lexical Functional Grammar), TAG (Tree Adjoining Grammar), . . . two methodological trends: knowledge-based, data-driven. 7/18
COST PARSEME Working Groups Management Events Languages 23 languages , 9 European language families: Celtic : Gaelic, Germanic : English, Danish, Dutsch, German, Icelandic, Norwegian, Swedish, Finno-Ugric : Estonian, Hungarian, Hellenic : Greek, Romance : French, Italian, Portuguese, Spanish, Semitic : Hebrew, Maltese, Slavic : Bulgarian, Czech, Polish, Serbian, Macedonian, Turkic : Turkish. dialects: British vs. American English , Belgian vs. Swiss vs. France French , European vs. Brazilian Portuguese . 8/18
COST PARSEME Working Groups Management Events Working Groups WG1 : Lexicon/Grammar Interface, WG2 : Parsing Techniques for MWEs, WG3 : Hybrid Parsing of MWEs, WG4 : Annotating MWEs in Treebanks. Crossing barriers between . . . different levels of linguistic processing, different linguistic frameworks, different methodological frameworks. Expression of interest in at least 2 WGs from each member (at the full proposal period) . 9/18
COST PARSEME Working Groups Management Events Working Groups WG1 : Lexicon/Grammar Interface, WG2 : Parsing Techniques for MWEs, WG3 : Hybrid Parsing of MWEs, WG4 : Annotating MWEs in Treebanks. Crossing barriers between . . . different levels of linguistic processing, different linguistic frameworks, different methodological frameworks. Expression of interest in at least 2 WGs from each member (at the full proposal period) . 9/18
COST PARSEME Working Groups Management Events Working Groups WG1 : Lexicon/Grammar Interface, WG2 : Parsing Techniques for MWEs, WG3 : Hybrid Parsing of MWEs, WG4 : Annotating MWEs in Treebanks. Crossing barriers between . . . different levels of linguistic processing, different linguistic frameworks, different methodological frameworks. Expression of interest in at least 2 WGs from each member (at the full proposal period) . 9/18
COST PARSEME Working Groups Management Events WG1: Lexicon/Grammar Interface Challenge 1 Simultaneously account for the fixed character of MWEs and their similarities to regular syntactic structures (FR) Marie a pris le train de 18 heures. ⇒ Marie a pris un train. Marie a pris un train de 6 heures de l’apr` es-midi. (FR) Marie a pris un train de banlieu. ⇒ Marie a pris un train. Marie a pris un *train de faubourg. (FR) Le gouvernement a pris un train de mesures. / ⇒ *Le gouvernement a pris / / / un train. Le gouvernement a pris un *train d’moyens. 10/18
COST PARSEME Working Groups Management Events WG1: Lexicon/Grammar Interface Challenge 2 Represent parsing phenomena at the lexicon level ( agreement , discontinuity and free word order ) (FR) assistant approvisionneur, assistants approvisionneurs, assistante approvisionneuse, assistantes approvisionneuses – agree in gen. and num. (FR) bateau mouche, bateaux mouches – agree in num. only (EN) He has finally made up his bloody mind. (PL) bezwzględna większość, większość bezwzględna (‘absolute majority’) (PL) panna młoda, *młoda panna (‘bride’) Challenge 3 Enrich existing lexicons and valence dictionaries with MWEs. Challenge 4 Design cost-saving abstract models of MWEs’ properties, automatically mapped to different grammar formalisms. 11/18
COST PARSEME Working Groups Management Events WG1: Lexicon/Grammar Interface Challenge 2 Represent parsing phenomena at the lexicon level ( agreement , discontinuity and free word order ) (FR) assistant approvisionneur, assistants approvisionneurs, assistante approvisionneuse, assistantes approvisionneuses – agree in gen. and num. (FR) bateau mouche, bateaux mouches – agree in num. only (EN) He has finally made up his bloody mind. (PL) bezwzględna większość, większość bezwzględna (‘absolute majority’) (PL) panna młoda, *młoda panna (‘bride’) Challenge 3 Enrich existing lexicons and valence dictionaries with MWEs. Challenge 4 Design cost-saving abstract models of MWEs’ properties, automatically mapped to different grammar formalisms. 11/18
COST PARSEME Working Groups Management Events WG2: Parsing Techniques for MWEs Challenge 1 Design interoperable MWE representation for different syntactic frameworks: HPSG , LFG , TAG , CCG , DG , . . . . 12/18
COST PARSEME Working Groups Management Events WG2: Parsing Techniques for MWEs Challenge 2 Reduce the cost of grammar production . Challenge 3 Enhance parsing speed and precision by reducing spurious ambiguity in MWEs. (FR) Il a lu ce livre d’un auteur ´ etranger . (FR) Il a lu ce livre d’un seul coup . Challenge 4 Express the semantics of MWEs in parse structures. 13/18
COST PARSEME Working Groups Management Events WG2: Parsing Techniques for MWEs Challenge 2 Reduce the cost of grammar production . Challenge 3 Enhance parsing speed and precision by reducing spurious ambiguity in MWEs. (FR) Il a lu ce livre d’un auteur ´ etranger . (FR) Il a lu ce livre d’un seul coup . Challenge 4 Express the semantics of MWEs in parse structures. 13/18
COST PARSEME Working Groups Management Events WG2: Parsing Techniques for MWEs Challenge 2 Reduce the cost of grammar production . Challenge 3 Enhance parsing speed and precision by reducing spurious ambiguity in MWEs. (FR) Il a lu ce livre d’un auteur ´ etranger . (FR) Il a lu ce livre d’un seul coup . Challenge 4 Express the semantics of MWEs in parse structures. 13/18
Recommend
More recommend