a supervised sequence 2 sequence problem
play

A Supervised Sequence 2 Sequence Problem Janos Borst July 26, 2019 - PowerPoint PPT Presentation

A Supervised Sequence 2 Sequence Problem Janos Borst July 26, 2019 University of Leipzig - NLP Group Sequence to Sequence l 2 PUNCT NOUN PRON PREP NOUN ART l 6 l 5 l 4 l 3 words l 1 many of sentence A 1 ( ) . ( ) ( )


  1. A Supervised Sequence 2 Sequence Problem Janos Borst July 26, 2019 University of Leipzig - NLP Group

  2. Sequence to Sequence l 2 PUNCT NOUN PRON PREP NOUN ART l 6 l 5 l 4 l 3 words l 1 many of sentence A 1 ( ) . ↓ ( ) ( )

  3. Named Entity Tagging Named Entities ”An instance of a unique object with specifjc properties. (Person, Location, Product...)” 2

  4. Example person ”Introduction to Neural Networks” is a workshop by Janos Borst. 3

  5. Example person ”Introduction to Neural Networks” is a workshop by Janos Borst. 3

  6. Tags • LOC: Location • PER: Person • ORG: Organisation • MISC: Mixed (Events, works,...) 4

  7. This is the BIOES scheme. (more info) Tagging Schemes E- • S-: Single token span • E-: End of a span • I-: Inside a span • B-: Beginning of an span Introducing tag prefjxes for spans: How do we know this is ”New York” or ”New” and ”York”? O LOC LOC This B- O O O . York New not is 5

  8. Tagging Schemes E-LOC • S-: Single token span • E-: End of a span • I-: Inside a span • B-: Beginning of an span Introducing tag prefjxes for spans: How do we know this is ”New York” or ”New” and ”York”? O B-LOC This O O O . York New not is 5 This is the BIOES scheme. (more info)

  9. The Requirements

  10. Data Supervised Training data: CoNLL-2003 • Sequence to sequence tasks • contains: • Named Entities tags • Part-of-Speech tags • Phrasing tags 6

  11. Data S-LOC O I-NP DT The 0 4 O I-NP CD 1996-08-22 1 3 I-NP 1 NNP BRUSSELS 0 3 E-PER I-NP NNP Blackburn 1 2 B-PER I-NP NNP 4 European 0 4 .... O I-NP NNP Thursday 5 4 O I-PP IN on 4 O NNP I-VP VBD said 3 4 E-ORG I-NP NNP Commission 2 4 B-ORG I-NP Peter 2 sentence_id VBZ NN call 3 1 S-LOC I-NP JJ German 2 1 O I-VP rejects O 1 1 S-ORG I-NP NNP EU 0 1 ner chunks pos token token_id I-NP 1 O I-NP O . . 8 1 O I-NP NN lamb 7 1 S-MISC JJ 4 British 6 1 O I-VP VB boycott 5 1 O I-VP TO to 7

  12. Sequence to Sequence We want to map a Sequence to another Sequence We have to keep the rank of the input tensor Use recurrent networks for sequences input shape: (b, 140) Embedding : (b, 140, 200) have to keep 140 and give a label to every word! 8

  13. Sequence to Sequence We want to map a Sequence to another Sequence We have to keep the rank of the input tensor Use recurrent networks for sequences input shape: (b, 140) Embedding : (b, 140, 200) have to keep 140 and give a label to every word! 8

  14. Return Sequences many t t t t t words of a sentence a W h words of many sentence 9

  15. Return Sequences many t t t t t words of a sentence a W h words of many sentence 9

  16. Return Sequences many t t t t t words of a sentence a W h words many of sentence 9

  17. Return Sequences many t t t t t words of a sentence a W h words of many sentence 9

  18. Return Sequences many t t t t t words of a sentence a W h words of many sentence 9

  19. Return Sequences many t t t t t words of a sentence a words many of sentence 10

  20. Left sided context many t t t t t words of a sentence a words many of sentence 11

  21. Bidirectional Recurrent Networks words t t t t t t words many words many of of sentence a many a of sentence of many words a sentence many of words a sentence a sentence of 12

  22. Bidirectional Recurrent Networks words t t t t t t words many words many of of sentence a many a of sentence of many words a sentence many of words a sentence a sentence of 12

  23. Bidirectional Recurrent Networks words t t t t t t words many words many of of sentence a many a of sentence of many words a sentence many of words a sentence a sentence of 12

  24. Bidirectional Recurrent Networks words t t t t t t words many words many of of sentence a many a of sentence of many words a sentence many of words a sentence a sentence of 12

  25. Bidirectional Recurrent Networks words t t t t t t words many words many of of sentence a many a of sentence of many words a sentence many of words a sentence a sentence of 12

  26. Bidictirectional LSTM keras.layers.Bidirectional Advantages: • Captures long time dependencies in sentences • Considers left and right side context • Creates context dependent word representations 13

  27. Conditional Random Fields - CRF A Conditional Random Field is a probabilistic model that take neighbouring observations into account. The Idea : • The labels are not independent of each other • B-PER cannot be followed by B-LOC • We try to consider transition probabilities 14

  28. Conditional Random Fields - CRF 0.4 ...) 0.6 0.0 0.0 (... E-Per B-LOC S-LOC Transition Probabilities for B-PER: ...) 0.3 0.01 O (... E-Per B-LOC S-LOC Emission Probabilities for Borst: Borst Janos is name ? B-PER O 15

  29. Conditional Random Fields - CRF 0.4 ...) 0.6 0.0 0.0 (... E-Per B-LOC S-LOC Transition Probabilities for B-PER: ...) 0.3 0.01 O (... E-Per B-LOC S-LOC Emission Probabilities for Borst: Borst Janos is name ? B-PER O 15

  30. Conditional Random Fields - CRF 0.4 ...) 0.6 0.0 0.0 (... E-Per B-LOC S-LOC Transition Probabilities for B-PER: ...) 0.3 0.01 O (... E-Per B-LOC S-LOC Emission Probabilities for Borst: Borst Janos is name ? B-PER O 15

  31. Conditional Random Fields - CRF 0.4 ...) 0.6 0.0 0.0 (... E-Per B-LOC S-LOC Transition Probabilities for B-PER: ...) 0.3 0.01 O (... E-Per B-LOC S-LOC Emission Probabilities for Borst: Borst Janos is name E-PER B-PER O 15

  32. Keras contrib There is an extra library called keras_contrib • Implementing new layers, loss functions, activations • Works seamlessly with the keras modules • has a convenient CRF layer 16

  33. Code Example inputs = [ i ] , ) metrics = [ kc.metrics.crf_viterbi_accuracy ] optimizer = ”Adam” , model . compile ( ) =[ c r f ] outputs model = keras . models . Model ( import keras c r f . . . lstm = . . . = keras . layers . Input ( ( 1 4 0 , ) ) i import keras_contrib as kc 17 = kc.layers.CRF ( num_of_labels ) ( lstm ) loss = kc.losses.crf_loss ,

  34. Metrics Accuracy is not meaningful here: 90% of all the labels are ”O” We need: Recall, Precision, F-Measure 18

  35. Recall How many of the entities have I found? true positives How many of the detected entities are correctly classifjed? true positives 19 Recall = true positives + false negatives Precision = true positives + false positives

  36. F1-Measure The harmonic mean of recall and precision: 20 F 1 = 2 · precision · recall precision + recall (more details)

  37. The Architecture word- sequence Embedding BiLSTM CRF CRF-Loss label- sequences 21

  38. showcase Named Entity Tagger 22

  39. Let’s talk Flair again • Tag your entities. • generally... 23

  40. Applications

  41. Similar Tasks • Part-of-Speech, Chunking • Machine Translation (old languages) • Speech Recognition (Sound sequences to word sequences) 24

Recommend


More recommend