A Supervised Sequence 2 Sequence Problem Janos Borst July 26, 2019 University of Leipzig - NLP Group
Sequence to Sequence l 2 PUNCT NOUN PRON PREP NOUN ART l 6 l 5 l 4 l 3 words l 1 many of sentence A 1 ( ) . ↓ ( ) ( )
Named Entity Tagging Named Entities ”An instance of a unique object with specifjc properties. (Person, Location, Product...)” 2
Example person ”Introduction to Neural Networks” is a workshop by Janos Borst. 3
Example person ”Introduction to Neural Networks” is a workshop by Janos Borst. 3
Tags • LOC: Location • PER: Person • ORG: Organisation • MISC: Mixed (Events, works,...) 4
This is the BIOES scheme. (more info) Tagging Schemes E- • S-: Single token span • E-: End of a span • I-: Inside a span • B-: Beginning of an span Introducing tag prefjxes for spans: How do we know this is ”New York” or ”New” and ”York”? O LOC LOC This B- O O O . York New not is 5
Tagging Schemes E-LOC • S-: Single token span • E-: End of a span • I-: Inside a span • B-: Beginning of an span Introducing tag prefjxes for spans: How do we know this is ”New York” or ”New” and ”York”? O B-LOC This O O O . York New not is 5 This is the BIOES scheme. (more info)
The Requirements
Data Supervised Training data: CoNLL-2003 • Sequence to sequence tasks • contains: • Named Entities tags • Part-of-Speech tags • Phrasing tags 6
Data S-LOC O I-NP DT The 0 4 O I-NP CD 1996-08-22 1 3 I-NP 1 NNP BRUSSELS 0 3 E-PER I-NP NNP Blackburn 1 2 B-PER I-NP NNP 4 European 0 4 .... O I-NP NNP Thursday 5 4 O I-PP IN on 4 O NNP I-VP VBD said 3 4 E-ORG I-NP NNP Commission 2 4 B-ORG I-NP Peter 2 sentence_id VBZ NN call 3 1 S-LOC I-NP JJ German 2 1 O I-VP rejects O 1 1 S-ORG I-NP NNP EU 0 1 ner chunks pos token token_id I-NP 1 O I-NP O . . 8 1 O I-NP NN lamb 7 1 S-MISC JJ 4 British 6 1 O I-VP VB boycott 5 1 O I-VP TO to 7
Sequence to Sequence We want to map a Sequence to another Sequence We have to keep the rank of the input tensor Use recurrent networks for sequences input shape: (b, 140) Embedding : (b, 140, 200) have to keep 140 and give a label to every word! 8
Sequence to Sequence We want to map a Sequence to another Sequence We have to keep the rank of the input tensor Use recurrent networks for sequences input shape: (b, 140) Embedding : (b, 140, 200) have to keep 140 and give a label to every word! 8
Return Sequences many t t t t t words of a sentence a W h words of many sentence 9
Return Sequences many t t t t t words of a sentence a W h words of many sentence 9
Return Sequences many t t t t t words of a sentence a W h words many of sentence 9
Return Sequences many t t t t t words of a sentence a W h words of many sentence 9
Return Sequences many t t t t t words of a sentence a W h words of many sentence 9
Return Sequences many t t t t t words of a sentence a words many of sentence 10
Left sided context many t t t t t words of a sentence a words many of sentence 11
Bidirectional Recurrent Networks words t t t t t t words many words many of of sentence a many a of sentence of many words a sentence many of words a sentence a sentence of 12
Bidirectional Recurrent Networks words t t t t t t words many words many of of sentence a many a of sentence of many words a sentence many of words a sentence a sentence of 12
Bidirectional Recurrent Networks words t t t t t t words many words many of of sentence a many a of sentence of many words a sentence many of words a sentence a sentence of 12
Bidirectional Recurrent Networks words t t t t t t words many words many of of sentence a many a of sentence of many words a sentence many of words a sentence a sentence of 12
Bidirectional Recurrent Networks words t t t t t t words many words many of of sentence a many a of sentence of many words a sentence many of words a sentence a sentence of 12
Bidictirectional LSTM keras.layers.Bidirectional Advantages: • Captures long time dependencies in sentences • Considers left and right side context • Creates context dependent word representations 13
Conditional Random Fields - CRF A Conditional Random Field is a probabilistic model that take neighbouring observations into account. The Idea : • The labels are not independent of each other • B-PER cannot be followed by B-LOC • We try to consider transition probabilities 14
Conditional Random Fields - CRF 0.4 ...) 0.6 0.0 0.0 (... E-Per B-LOC S-LOC Transition Probabilities for B-PER: ...) 0.3 0.01 O (... E-Per B-LOC S-LOC Emission Probabilities for Borst: Borst Janos is name ? B-PER O 15
Conditional Random Fields - CRF 0.4 ...) 0.6 0.0 0.0 (... E-Per B-LOC S-LOC Transition Probabilities for B-PER: ...) 0.3 0.01 O (... E-Per B-LOC S-LOC Emission Probabilities for Borst: Borst Janos is name ? B-PER O 15
Conditional Random Fields - CRF 0.4 ...) 0.6 0.0 0.0 (... E-Per B-LOC S-LOC Transition Probabilities for B-PER: ...) 0.3 0.01 O (... E-Per B-LOC S-LOC Emission Probabilities for Borst: Borst Janos is name ? B-PER O 15
Conditional Random Fields - CRF 0.4 ...) 0.6 0.0 0.0 (... E-Per B-LOC S-LOC Transition Probabilities for B-PER: ...) 0.3 0.01 O (... E-Per B-LOC S-LOC Emission Probabilities for Borst: Borst Janos is name E-PER B-PER O 15
Keras contrib There is an extra library called keras_contrib • Implementing new layers, loss functions, activations • Works seamlessly with the keras modules • has a convenient CRF layer 16
Code Example inputs = [ i ] , ) metrics = [ kc.metrics.crf_viterbi_accuracy ] optimizer = ”Adam” , model . compile ( ) =[ c r f ] outputs model = keras . models . Model ( import keras c r f . . . lstm = . . . = keras . layers . Input ( ( 1 4 0 , ) ) i import keras_contrib as kc 17 = kc.layers.CRF ( num_of_labels ) ( lstm ) loss = kc.losses.crf_loss ,
Metrics Accuracy is not meaningful here: 90% of all the labels are ”O” We need: Recall, Precision, F-Measure 18
Recall How many of the entities have I found? true positives How many of the detected entities are correctly classifjed? true positives 19 Recall = true positives + false negatives Precision = true positives + false positives
F1-Measure The harmonic mean of recall and precision: 20 F 1 = 2 · precision · recall precision + recall (more details)
The Architecture word- sequence Embedding BiLSTM CRF CRF-Loss label- sequences 21
showcase Named Entity Tagger 22
Let’s talk Flair again • Tag your entities. • generally... 23
Applications
Similar Tasks • Part-of-Speech, Chunking • Machine Translation (old languages) • Speech Recognition (Sound sequences to word sequences) 24
Recommend
More recommend