MWE-WN Community discussion Florence, August 2, 2019 1
Agenda Feedback from the joint workshop ● MWE-related announcements ● SIGLEX ● The future of the PARSEME corpus and shared task ● 2
Feedback 3
Feedback from bringing 2 communities together 2 communities ● MWE workshop - organised by SIGLEX since 2003 (15th edition) ○ WordNet - 9 past Global WordNet Conferences ○ MWE-WN 2019: ● Research track: ○ 37 submissions: 35 on MWEs, 13 on WN ■ 20 selected papers (12 long, 8 short): 6 cover both topics ■ 54% selectivity rate ■ Dissemination track (for previously published papers): ○ 0 submissions ■ 4
Feedback from participants Added value from bringing 2 communities together ● Future research directions ● How to further develop synergies? ● 5
Announcements 6
Phraseology and Multiword Expressions Book series at Language Science Press, Berlin ● Open access, collaborative proofreading ● Recently published ● Yannick Parmentier, Jakub Waszczuk (eds.) Representation and parsing of multiword ○ expressions: Current trends Published in 2018: ● Manfred Sailer, Stella Markantonatou (eds.) Multiword expressions: Insights from a ○ multi-lingual perspective Stella Markantonatou, Carlos Ramisch, Agata Savary, Veronika Vincze (eds.) Multiword ○ expressions at length and in depth: Extended papers from the MWE 2017 workshop 2 other books in the pipeline, new book proposals are welcome ● Project to establish a shared MWE bibliography attached to a typology of ● research questions (cf. LAW-MWE-CxG 2019 business meeting) - contributors are welcome 7
MWE research questions (slide from 2018) Motivations ● The CL/NLP community is becoming increasingly engineering-oriented. ○ It is often hard to understand the underlying research issues, the theoretical hypotheses ○ which the experimental science is trying to (in)validate. See also Joakim Nivre's ACL 2017 presidential address (fast science vs. slow science) ○ Aim: better formulate the research questions and hypotheses underlying ● the activities of the MWE community - see a draft Objectives ● Better understanding of the state-of-the-art and perspectives of the MWE research ○ Make the MWE research more interesting ○ Lead the efforts of the community towards important challenges to be addressed ○ Pave the way towards convergences with other communities ○ 8
UD-PARSEME coordination MWE working group at UDs ● Dagstuhl Seminar "Universals of Linguistic Idiosyncrasy in Multilingual ● Computational Linguistics", 21-26 June 2020, Dagstuhl, Germany Objectives ○ Theoretical: To deepen the understanding of language universals, and of linguistic ■ idiosyncrasy in particular... Practical: To harness idiosyncrasy in treebanking frameworks, in computationally ■ tractable ways... Networking: To promote a higher degree of convergence to universalism-driven ■ initiatives... COST Action proposal UniDive (Universality, diversity and idiosyncrasy in ● language technology) - to be submitted 5 Sept 2019 9
Other announcements from the audience 10
SIGLEX 11
SIGLEX SIGLEX is expected to change its constitution soon ● Less officers (4 + 2 section representatives, instead of 8) ○ Shorter mandate for section representatives (2 years instead of 3) ○ Double mandate for the 4 other officers (2+2 years) ○ Referendum about the changes ○ Email was sent to SIGLEX members on 4 May ■ Please, vote until 5 August 2019 ! ■ Elections to SIGLEX ● To be run in fall 2019 ○ Candidates needed for the MWE section representative position (2020-2022) ○ Candidates also welcome for a SIGLEX Vice-President and Vice-Secretary ○ 12
SIGLEX-MWE section The MWE section of SIGLEX also has a constitution and a Standing ● Committee 1 elected representative ○ Agata Savary (2016-2019) ■ new representative to be elected in fall 2019 ■ 4 nominated officers ○ (2018-2020) Jelena Mitrović, Carla Parra Escartín; remaining for 1 more year ■ (2017-2019): Francis Bond, Styliani Markantonatou; stepping down ■ Candidates needed for the the 2 open positions (2-year term) ■ Conditions: be a member of the Section (and of SIGLEX) and have published research ■ work in topics related to MWEs Deadlines: ■ Expressions of interest: 30 August ● Beginning of mandate: end September ● 13
SIGLEX-MWE section MWE 2020 workshop ● ○ Continue joint workshops with other communities? ○ UD-MWE workshop in 2020 (ACL?) or 2021 (EACL?) 2020 consistent with UD-PARSEME dynamics ● Two close UD-PARSEME events in 2020? ● A COST action can fund workshops in Europe (EACL 2021?) but not in the USA ● (ACL 2020?) ○ Other ideas for 2020? Jelena: Rhetorical figures: metaphor, simile, irony (cf. Workshop on Figurative ■ Language Processing) Carla: Multilingual aspects of MWEs (lexicons, alignement, discovery, translation,...) ■ Stella: Largely understood idiomaticity, also in the use of single words ■ ... ■ ○ The new SC will be in charge... 14
PARSEME corpus and shared task 15
PARSEME corpus PARSEME corpus edition 1.1 (Ramisch et al., 2018) ● 20 languages, 6 mln tokens, 80,000 verbal MWE annotations ○ Openly available on LINDAT/CLARIN: ○ Future developments ● Unifying PARSEME and UD guidelines ○ Annotating new MWE categories (implies prior work on annotation guidelines) ○ Nominal MWEs: ■ non-compositional NPs ( hot dog ), ● MW named entities ( Red Sea ), ● complex terms ( recurrent neural network ) ● Adjectival MWEs: crystal clear, as busy as a bee ■ New languages (call for language leaders) ○ Continuous corpus enhancements (regular releases) ● 16
PARSEME shared task on weakly supervised VMWE identification? Objectives: ● Boost performances on unseen data - cf. (Savary et al. 2019) ○ Boost MWE lexicon development ○ Input data: ● ○ Closed track ■ PARSEME training corpus ■ Large non-annotated corpus (parsed?) ■ Mechanism to project a lexicon on a raw corpus ■ Baseline system ○ Open track: Closed track input ■ Any external data, including handmade MWE lexicons ■ 17
18
PARSEME shared task on weakly supervised VMWE identification? System output ● ○ Discovered (+pre-existing) lexicon ○ List of queries for projecting the lexicon on the test corpus Identified MWEs on the test corpus ○ Evaluation ● ○ F-measure on data unseen in the train corpus Experimentally, (lexical, morphological, syntactic) diversity measure ○ ○ Global F-measure When? ● ○ Culminating event at the MWE 2020 workshop ? 19
Recommend
More recommend