multiword expressions
play

Multiword expressions: Getting the taste of things to come MWE 2017 - PowerPoint PPT Presentation

Multiword expressions: Getting the taste of things to come MWE 2017 Workshop Panel discussion Outline 1. Announcements 2. SIGLEX MWE Section 3. Shared Task 2 Announcements 3 Phraseology and Multiword Expressions (PMWE)


  1. Multiword expressions: Getting the taste of things to come MWE 2017 Workshop — Panel discussion

  2. Outline 1. Announcements 2. SIGLEX MWE Section 3. Shared Task 2

  3. Announcements 3

  4. Phraseology and Multiword Expressions (PMWE) http://langsci-press.org/catalog/series/pmwe Α new series with Language Science Press Editors Agata Savary (University of Tours, Blois, France) ● Manfred Sailer (Goethe University Frankfurt a.M., Germany) ● Yannick Parmentier (University of Orléans, France) ● ● Victoria Rosén (University of Bergen, Norway) Mike Rosner (University of Malta, Malta) ● 4/XTOTALX

  5. Two volumes are about to be published with PMWE as a result of collaborative work within the IC1207 COST Action PARSEME . “ MWE representation and parsing ” Yannick Parmentier and Yakub Waszczuk (editors) “ Mutliword Expressions: Insights from a Multi-lingual Perspective ” Manfred Sailer and Stella Markantonatou (editors) 5

  6. New PMWE volume ...with extended selected papers from + EACL MWE 2017 (main track) + SHARED TASK + the wider community 6/XTOTALX

  7. Other announcements? 7

  8. SIGLEX-MWE Section 8

  9. SIGLEX ● SIGLEX = ACL Special Interest Group on the Lexicon Organising and endorsing events: ○ *SEM, SemEval, MWE workshop, MUMTTT wroskhop ■ Adam Kilgarriff prize ○ 2 sections: SemEval, MWE ○ ● SIGLEX board: 8 people elected for 3 years ○ One representative per section ○ Skype meeting every 3 months ○ 9

  10. SIGLEX-MWE Section ● Currently about 210 members ● New members still welcome ● To join, subscribe to the mailing list : multiword-expressions@lists.sourceforge.net ○ ● Natural follow-up of PARSEME : Integration of PARSEME outcomes into a larger international ○ context ● Activities: MWE workshop (yearly) ○ Stabilizing the MUMTTT workshop ○ Others (shared tasks, books, joints events with other SIGs)? ○ 10

  11. Need for SIGLEX-MWE core group ● MWE community is becoming large; it should no longer be led by a single person ● An official core group needed: SIGLEX-MWE representative + 3-4 other people ○ ● Responsibilities: naming organizers of the annual MWE workshop (to be ○ approved by the SIGLEX board) animating the community ○ maintaining the website and the mailing list ○ 11

  12. SIGLEX-MWE core group - legitimacy ● From nomination By the SIGLEX board upon a proposal of the previous ○ members (?) Advantage: balance can be ensured (of continents, language ○ families, gender, age, CS/Ling expertize, etc.) Drawback: non-democratic principle ○ ● From elections Advantages: democratic principle ○ Drawbacks: ○ Balance not ensured ■ The elected people may have problems working as a team ■ 12/XTOTALX

  13. SIGLEX-MWE core group - mandate ● 3 years - coinciding with the SIGLEX board mandate + Simplicity - Transfer of experiences not ensured ● 2 years, 2+2 overlapping mandates + Transfer of competences ensured + Important for the shared task infrastructure - More organizational effort, frequent elections/nominations 13/XTOTALX

  14. Brainstorming ● Core group nomination vs. election ● Criteria for nomination working in the area, ○ balance wrt. languages, continents, gender, CS/linguistics ○ background Proposal by the SIGLEX-MWE representative, validation by ○ SIGMEX board ● Instruments for election As for SIGLEX board (nominating officers, electronic vote, …) ○ ● Mandate duration ● Different MWE workshop chairs year after year? ● Next MWE workshop venue (preferably outside Europe) 14/XTOTALX

  15. Shared Task 15

  16. Goals of this discussion ● Discuss the achievements of the first shared task ● Gather feedback from workshop attendants and specially shared task participants What worked well? ○ What could have been better? ○ ● Present our ideas for next edition(s) ● Gather feedback and suggestions for next edition(s) 16/XTOTALX

  17. Shared Task 1.0 (2017) ● "Universal" guidelines for annotating verbal MWEs ● Freely available annotated corpora in 18 languages 3 language families (Romance, Slavic, Germanic) + others ○ More than 60k annotated VMWEs in all languages ○ ● Task definition: identify which tokens are lexicalized components of a VMWE Allowing discontinuities, overlap, and nesting ○ ● 7 VMWE identification systems submitted: 6 in the closed track ○ 1 in the open track ○ 17/XTOTALX

  18. Shared Task 1.0 Achievements ● We have produced a valuable new resource ● We gained experience with "universal" guidelines ● We have a large group of highly motivated contributors ● We have the infrastructure in place Work organization into languages, language groups, etc. ○ Dynamic guidelines with multilingual examples ○ Customizable annotation platform FLAT ○ Dedicated tools to verify coherence and silence ○ File formats and evaluation tools ○ Communication tools: mailing lists, git issues, websites ○ 18/XTOTALX

  19. Shared Task 1.0 - how can we improve? ● Double annotation was possible only for a sample ● Guidelines still have fuzzy areas Definition of predicative nouns ○ Meaning shift for IReflV ○ ... ○ ● Cross-lingual homogenization, specially in lang. family ● Amount of annotated data for some languages ● Development of in-house "adjudication" tools ● Suggestions? 19/XTOTALX

  20. Next edition(s) ● Shared task 1.1 (2018) Extension of first edition with additional and better data ○ Keep focus on token-based identification of VMWEs ○ To be submitted to SemEval 2018 ○ ● Shared task 2.0 (2019) New task definition ○ Extension to new MWE categories ○ To be submitted to CoNLL 2019 (?) ○ 20/XTOTALX

  21. Shared Task 1.1 (2018) ● Cover new languages English, Asian languages: Japanese, Chinese, Korean, Hindi ○ Other languages? ○ ● Enhanced guidelines Intensive use of OTH category in some languages ○ Creation of language-specific categories (e.g. compound verbs) ○ Reformulation and clarification of LVC tests (see Issues) ○ ● Enhanced annotation quality Double annotation and/or mandatory coherence check ○ ● Add missing CoNLL-U files with better dependencies ● Extend corpora, specially for "small" languages ● Annotate new test sets for the shared task evaluation 21/XTOTALX

  22. Shared Task 2.0 (2019) ● Extension of the task Joint parsing and MWE identification ○ MWE and named entity identification ○ ● Cover other MWE categories, not only verbal Adjectival, adverbial, nominal, terms, similes ○ ● Other ideas? 22/XTOTALX

  23. Who's in to join/pursue the adventure? ● Annotators? ● Language leaders? ● Language group leaders? ● Technical experts? ● Coordinators? Spread the word! 23/XTOTALX

  24. Thank you! 24

Recommend


More recommend