– IN5550 – Neural Methods in Natural Language Processing Home Exam: Task Overview and Kick-Off Stephan Oepen, Lilja Øvrelid, & Erik Velldal University of Oslo April 21, 2020
Home Exam General Idea ◮ Use as guiding metaphor: Preparing a scientific paper for publication. 2
Home Exam General Idea ◮ Use as guiding metaphor: Preparing a scientific paper for publication. Second IN5550 Teaching Workshop on Neural NLP (WNNLP 2020) 2
Home Exam General Idea ◮ Use as guiding metaphor: Preparing a scientific paper for publication. Second IN5550 Teaching Workshop on Neural NLP (WNNLP 2020) Standard Process (1) Experimentation (2) Analysis (3) Paper Submission (4) Reviewing (5) Camera-Ready Manuscript (6) Presentation 2
Home Exam General Idea ◮ Use as guiding metaphor: Preparing a scientific paper for publication. Second IN5550 Teaching Workshop on Neural NLP (WNNLP 2020) Standard Process (0) Problem Statement (1) Experimentation (2) Analysis (3) Paper Submission (4) Reviewing (5) Camera-Ready Manuscript (6) Presentation 2
For Example: The ACL 2020 Conference 3
WNNLP 2020: Call for Papers and Important Dates General Constraints ◮ Three specialized tracks: NER, Negation Scope, Sentiment Analysis. ◮ Long papers: up to nine pages, excluding references, in ACL 2020 style. ◮ Submitted papers must be anonymous: peer reviewing is double-blind. ◮ Replicability: Submission backed by code repository (area chairs only). Schedule By April 22 Declare choice of track (and team composition) April 28 Per-track mentoring sessions with Area Chairs Early May Individual supervisory meetings (upon request) May 12 (Strict) Submission deadline for scientific papers May 13–18 Reviewing period: Each student reviews two papers May 20 Area Chairs make and announce acceptance decisions May 25 Camera-ready manuscripts due, with requested revisions May 27 Oral presentations and awards at the workshop 4
The Central Authority for All Things WNNLP 2020 https://www.uio.no/studier/emner/matnat/ifi/IN5550/v20/exam.html 5
WNNLP 2020: What Makes a Good Scientific Paper? Empirical (Experimental) ◮ Motivate architecture choice(s) and hyper-parameters; ◮ systematic exploration of relevant parameter space; ◮ comparison to reasonable baseline or previous work. 6
WNNLP 2020: What Makes a Good Scientific Paper? Empirical (Experimental) ◮ Motivate architecture choice(s) and hyper-parameters; ◮ systematic exploration of relevant parameter space; ◮ comparison to reasonable baseline or previous work. Replicable (Reproducible) ◮ Everything relevant to run and reproduce in M$ GitHub. 6
WNNLP 2020: What Makes a Good Scientific Paper? Empirical (Experimental) ◮ Motivate architecture choice(s) and hyper-parameters; ◮ systematic exploration of relevant parameter space; ◮ comparison to reasonable baseline or previous work. Replicable (Reproducible) ◮ Everything relevant to run and reproduce in M$ GitHub. Analytical (Reflective) ◮ Identify and relate to previous work; ◮ explain choice of baseline or points of comparison; ◮ meaningful, precise discussion of results; ◮ ‘negative’ results can be interesting too; ◮ look at the data: discuss some examples: ◮ error analysis: identify remaining challenges. 6
WNNLP 2020: Programme Committee General Chair ◮ Stephan Oepen Area Chairs ◮ Named Entity Recognition: Erik Velldal ◮ Negation Scope: Stephan Oepen ◮ Sentiment Analysis: Lilja Øvrelid & Jeremy Barnes Peer Reviewers ◮ All students who have submitted a scientific paper 7
Track 1: Named Entity Recognition ◮ NER: The task of identifying and categorizing proper names in text. ◮ Typical categories: persons, organizations, locations, geo-political entities, products, events, etc. ◮ Example from NorNE which is the corpus we will be using: ORG GPE_LOC Den internasjonale domstolen har sete i Haag . The International Court of Justice has its seat in The Hague . 8
Class labels ◮ Abstractly a sequence segmentation task, ◮ but in practice solved as a sequence labeling problem, ◮ assigning per-word labels according to some variant of the BIO scheme B-ORG I-ORG I-ORG O O O B-GPE_LOC O Den internasjonale domstolen har sete i Haag . 9
NorNE ◮ First publicly available NER dataset for Norwegian; joint effort between LTG, Schibsted and Språkbanken (the National Library). ◮ Named entity annotations added to NDT for both Bokmål and Nynorsk: ◮ ∼ 300 K tokens for each, of which ∼ 20 K form part of a NE. ◮ Distributed in the CoNLL-U format using the BIO labeling scheme. Simplified version: 1 Den den DET name=B-ORG 2 internasjonale internasjonal ADJ name=I-ORG 3 domstolen domstol NOUN name=I-ORG 4 har ha VERB name=O 5 sete sete NOUN name=O 6 i i ADP name=O 7 Haag Haag PROPN name=B-GPE_LOC 8 . $. PUNCT name=O 10
NorNE entity types (Bokmål) Type Train Dev Test Total PER 4033 607 560 5200 2828 400 283 3511 ORG GPE_LOC 2132 258 257 2647 671 162 71 904 PROD LOC 613 109 103 825 388 55 50 493 GPE_ORG 519 77 48 644 DRV 131 9 5 145 EVT 8 0 0 0 MISC https://github.com/ltgoslo/norne/ 11
Evaluating NER ◮ While NER can be evaluated by P, R and F1 at the token-level, ◮ evaluating on the entity-level can be more informative. ◮ Several ways to do this (wording from SemEval 2013 task 9.1 in parens): ◮ Exact labeled (‘strict’): The gold annotation and the system output is identical; both the predicted boundary and entity label is correct. ◮ Partial labeled (‘type’): Correct label and at least a partial boundary match. ◮ Exact unlabeled (‘exact’): Correct boundary, disregarding the label. ◮ Partial unlabeled (‘partial’): At least a partial boundary match, disregarding the label. ◮ https://github.com/davidsbatista/NER-Evaluation 12
NER model ◮ Current go-to model for NER: a BiLSTM with a CRF inference layer, ◮ possibly with a max-pooled character-level CNN feeding into the BiLSTM together with pre-trained word embeddings. (Image: Jie Yang & Yue Zhang 2018: NCRF++: An Open-source Neural Sequence Labeling Toolkit ) 13
Suggested reading on neural seq. modeling ◮ Jie Yang, Shuailong Liang, & Yue Zhang, 2018 Design Challenges and Misconceptions in Neural Sequence Labeling (Best Paper Award at COLING 2018) https://aclweb.org/anthology/C18-1327 ◮ Nils Reimers & Iryna Gurevych, 2017 Optimal Hyperparameters for Deep LSTM-Networks for Sequence Labeling Tasks https://arxiv.org/pdf/1707.06799.pdf State-of-the-art leaderboards for NER ◮ https://nlpprogress.com/english/named_entity_recognition.html ◮ https://paperswithcode.com/task/named-entity-recognition-ner 14
More information about the dataset ◮ https://github.com/ltgoslo/norne ◮ F. Jørgensen, T. Aasmoe, A.S. Ruud Husevåg, L. Øvrelid and E. Velldal NorNE: Annotating Named Entities for Norwegian Proceedings of the 12th Edition of its Language Resources and Evaluation Conference, Marseille, France, 2020 https://arxiv.org/pdf/1911.12146.pdf 15
Some suggestions to get started with experimentation ◮ Different label encodings BIO-1 / BIO-2 / BIOES etc. ◮ Different label set granularities: ◮ 8 entity types in NorNE by default ( MISC can be ignored) ◮ Could be reduced to 7 by collapsing GPE_LOC and GPE_ORG to GPE , or to 6 by mapping them to LOC and ORG . ◮ Impact of different parts of the architecture: ◮ CRF vs softmax ◮ Impact of including a character-level model (e.g. CNN or RNN). Tip: evaluate effect for OOVs. ◮ Adding several BiLSTM layers ◮ Do different evaluation strategies give different relative rankings of different systems? ◮ Compute learning curves ◮ Mixing Bokmål / Nynorsk? Machine-translation? ◮ Impact of embedding pre-training (corpus, dim., framework, etc) 16 ◮ Possibilities for transfer / multi-task learning?
Track 2: Negation Scope Non-Factuality (and Uncertainty) Very Common in Language But { this theory would } � not � { work } . I think, Watson, { a brandy and soda would do him } � no � { harm } . They were all confederates in { the same } � un �{ known crime } . “Found dead � without � { a mark upon him } . 17
Track 2: Negation Scope Non-Factuality (and Uncertainty) Very Common in Language But { this theory would } � not � { work } . I think, Watson, { a brandy and soda would do him } � no � { harm } . They were all confederates in { the same } � un �{ known crime } . “Found dead � without � { a mark upon him } . { We have } � never � { gone out � without � { keeping a sharp watch }} , and � no � { one could have escaped our notice } .” 17
Track 2: Negation Scope Non-Factuality (and Uncertainty) Very Common in Language But { this theory would } � not � { work } . I think, Watson, { a brandy and soda would do him } � no � { harm } . They were all confederates in { the same } � un �{ known crime } . “Found dead � without � { a mark upon him } . { We have } � never � { gone out � without � { keeping a sharp watch }} , and � no � { one could have escaped our notice } .” Phorbol activation was positively modulated by Ca2+ influx while { TNF alpha activation was } � not � . CoNLL 2010, *SEM 2012, and EPE 2017 International Shared Tasks ◮ Bake-off: Standardized training and test data, evaluation, schedule; ◮ 20 + participants; LTG systems top performers throughout the years. 17
Small Words Can Make a Large Difference 18
Recommend
More recommend