natural language processing
play

Natural Language Processing Info 159/259 Lecture 24: Information - PowerPoint PPT Presentation

Natural Language Processing Info 159/259 Lecture 24: Information Extraction (Nov. 15, 2018) David Bamman, UC Berkeley investigating(SEC, Tesla) fire(Trump, Sessions) parent(Mr. Bennet, Jane)


  1. Natural Language Processing Info 159/259 
 Lecture 24: Information Extraction (Nov. 15, 2018) David Bamman, UC Berkeley

  2. investigating(SEC, Tesla)

  3. fire(Trump, Sessions)

  4. parent(Mr. Bennet, Jane) https://en.wikipedia.org/wiki/Pride_and_Prejudice

  5. Information extraction • Named entity recognition • Entity linking • Relation extraction

  6. Named entity recognition [tim cook] PER is the ceo of [apple] ORG • Identifying spans of text that correspond to typed entities

  7. Named entity recognition ACE NER categories (+weapon)

  8. Named entity recognition protein • GENIA corpus of MEDLINE abstracts (biomedical) cell line cell type We have shown that [interleukin-1] PROTEIN ([IL-1] PROTEIN ) and [IL-2] PROTEIN control [IL-2 receptor alpha (IL-2R alpha) gene] DNA transcription in [CD4- DNA CD8- murine T lymphocyte precursors] CELL LINE RNA http://www.aclweb.org/anthology/W04-1213

  9. BIO notation B-PERS I-PERS O O O O B-ORG tim cook is the ceo of apple • B eginning of entity • I nside entity • O utside entity [tim cook] PER is the ceo of [apple] ORG

  10. Named entity recognition B-PERS B-PERS After he saw Harry Tom went to the store

  11. Fine-grained NER Giuliano and Gliozzo (2008)

  12. Fine-grained NER

  13. Entity recognition Person … named after [the daughter of a Mattel co-founder] … [The Russian navy] said the submarine was equipped with 24 Organization missiles Fresh snow across [the upper Midwest] on Monday, closing Location schools The [Russian] navy said the submarine was equipped with 24 GPE missiles Fresh snow across the upper Midwest on Monday, closing Facility [schools] The Russian navy said [the submarine] was equipped with 24 Vehicle missiles The Russian navy said the submarine was equipped with [24 Weapon missiles] ACE entity categories https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/english-entities-guidelines-v6.6.pdf

  14. Named entity recognition • Most named entity recognition datasets have flat structure (i.e., non-hierarchical labels). ✔ [The University of California] ORG ✖ [The University of [California] GPE ] ORG • Mostly fine for named entities, but more problematic for general entities: [[John] PER ’s mother] PER said …

  15. Nested NER named after the daughter of a Mattel co-founder B-ORG B-PER I-PER I-PER B-PER I-PER I-PER I-PER I-PER I-PER

  16. Sequence labeling x = { x 1 , . . . , x n } y = { y 1 , . . . , y n } • For a set of inputs x with n sequential time steps, one corresponding label y i for each x i • Model correlations in the labels y.

  17. Sequence labeling • Feature-based models (MEMM, CRF)

  18. Bun Cranncha Dromore West Dromore Youghal Harbour Youghal Bay Gazetteers Youghal Eochaill Yellow River Yellow Furze Woodville Wood View Woodtown House Woodstown • List of place names; more Woodstock House Woodsgift House generally, list of names of some Woodrooff House Woodpark typed category Woodmount Wood Lodge Woodlawn Station • GeoNames (GEO), US SSN Woodlawn Woodlands Station Woodhouse (PER), Getty Thesaurus of Wood Hill Woodfort Geographic Placenames, Getty Woodford River Woodford Thesaurus of Art and Woodfield House Architecture Woodenbridge Junction Station Woodenbridge Woodbrook House Woodbrook Woodbine Hill Wingfield House Windy Harbour Windy Gap

  19. Bidirectional RNN 0.7-1.1-5.4 0.7-1.1-5.4 0.7-1.1-5.4 0.7-1.1-5.4 0.7-1.1-5.4 0.7-1.1-5.4 0.7-1.1-5.4 0.7-1.1-5.4 0.7-1.1-5.4 0.7-1.1-5.4 Jack drove down to LA Jack drove to LA down 2.7 3.1 -1.4 -2.3 0.7 2.7 3.1 -1.4 -2.3 0.7 2.7 3.1 -1.4 -2.3 0.7 2.7 3.1 -1.4 -2.3 0.7 2.7 3.1 -1.4 -2.3 0.7 19

  20. B-PER O O O B-GPE 0.7-1.1-5.4 0.7-1.1-5.4 0.7-1.1-5.4 0.7-1.1-5.4 0.7-1.1-5.4 0.7-1.1-5.4 0.7-1.1-5.4 0.7-1.1-5.4 0.7-1.1-5.4 0.7-1.1-5.4 Jack drove down to LA Jack drove to LA down 2.7 3.1 -1.4 -2.3 0.7 2.7 3.1 -1.4 -2.3 0.7 2.7 3.1 -1.4 -2.3 0.7 2.7 3.1 -1.4 -2.3 0.7 2.7 3.1 -1.4 -2.3 0.7 20

  21. B-PER BiLSTM for each word; concatenate final state of forward LSTM, backward LSTM, and word embedding Obama as representation for a word. Lample et al. (2016), “Neural Architectures for Named Entity Recognition” 4 3 -2 -1 4 9 0 0 0 0 0 0 0 0 0 0 0.7 -1.1 -5.4 0.7 -1.1 -5.4 word embedding o b a m a o b a m a 2.7 3.1 -1.4 -2.3 0.7 2.7 3.1 -1.4 -2.3 0.7 2.7 3.1 -1.4 -2.3 0.7 2.7 3.1 -1.4 -2.3 0.7 2.7 3.1 -1.4 -2.3 0.7 character BiLSTM 21

  22. B-PER Character CNN for each word; concatenate character CNN output and word embedding as representation for a word. Obama Chu et al. (2016), “Named Entity Recognition with Bidirectional LSTM-CNNs” 4 3 -2 -1 4 0 0 0 0 0 0 0 0 0 0 2.7 3.1 -1.4 -2.3 0.7 max pooling convolution word embedding 2.7 3.1 -1.4 -2.3 0.7 2.7 3.1 -1.4 -2.3 0.7 2.7 3.1 -1.4 -2.3 0.7 2.7 3.1 -1.4 -2.3 0.7 character 2.7 3.1 -1.4 -2.3 0.7 2.7 3.1 -1.4 -2.3 0.7 2.7 3.1 -1.4 -2.3 0.7 2.7 3.1 -1.4 -2.3 0.7 2.7 3.1 -1.4 -2.3 0.7 embeddings o b a m a 22

  23. Huang et al. 2015, “Bidirectional LSTM-CRF Models for Sequence Tagging"

  24. Ma and Hovy (2016), “End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF”

  25. Ma and Hovy (2016), “End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF”

  26. Evaluation • We evaluate NER with precision/recall/F1 over typed chunks.

  27. Evaluation 1 2 3 4 5 6 7 tim cook is the CEO of Apple gold B-PER I-PER O O O O B-ORG system B-PER O O O B-PER O B-ORG <start, end, type> gold system Precision 1/3 <1,1,PER> <1,2,PER> <5,5,PER> <7,7,ORG> Recall 1/2 <7,7,ORG>

  28. Entity linking Michael Jordan can dunk from the free throw line B-PER I-PER

  29. Entity linking • Task: Given a database of candidate referents, identify the correct referent for a mention in context.

  30. ̂ Learning to rank • Entity linking is often cast as a learning to rank problem: given a mention x, some set of candidate entities 𝓏 (x) for that mention, and context c, select the highest scoring entity from that set. y ∈𝒵 ( x ) Ψ ( y , x , c ) y = arg max Some scoring function over the mention x, candidate y, and context c Eisenstein 2018

  31. Learning to rank • We learn the parameters of the scoring function by minimizing the ranking loss y , y , x , c ) = max ( 0, Ψ ( ̂ y , x , c ) − Ψ ( y , x , c ) + 1 ) ℓ ( ̂ Eisenstein 2018

  32. Learning to rank y , y , x , c ) = max ( 0, Ψ ( ̂ y , x , c ) − Ψ ( y , x , c )+1 ) ℓ ( ̂ We suffer some loss if the predicted entity has a higher score than the true entity y , y , x , c ) = max ( 0, Ψ ( ̂ y , x , c ) − Ψ ( y , x , c ) + 1 ) ℓ ( ̂ You can’t have a negative loss (if the true entity scores way higher than the predicted entity) y , y , x , c ) = max ( 0, Ψ ( ̂ y , x , c ) − Ψ ( y , x , c )+1 ) ℓ ( ̂ The true entity needs to score at least some constant margin better than the prediction; beyond that the higher score doesn’t matter.

  33. Learning to rank Some scoring function Ψ ( y , x , c ) over the mention x, candidate y, and context c feature = f(x,y,c) string similarity between x and y popularity of y NER type(x) = type(y) cosine similarity between c and Wikipedia page for y Ψ ( y , x , c ) = f ( x , y , c ) ⊤ β

  34. Neural learning to rank Parameters measuring the compatibility of Parameters measuring the compatibility of the candidate and mention the candidate and context Ψ ( y , x , c ) = v ⊤ y Θ ( x , y ) x + v ⊤ y Θ ( y , c ) c Embedding 
 Embedding 
 Embedding 
 for candidate for mention for context

  35. Learning to rank • We learn the parameters of the scoring function by minimizing the ranking loss; take the derivative of the loss and backprop using SGD. y , y , x , c ) = max ( 0, Ψ ( ̂ y , x , c ) − Ψ ( y , x , c ) + 1 ) ℓ ( ̂ Eisenstein 2018

  36. Relation extraction subject predicate object The Big Sleep directed_by Howard Hawks The Big Sleep stars Humphrey Bogart The Big Sleep stars Lauren Bacall The Big Sleep screenplay_by William Faulkner The Big Sleep screenplay_by Leigh Brackett The Big Sleep screenplay_by Jules Furthman

  37. Relation extraction ACE relations, SLP3

  38. Relation extraction Unified Medical Language System (UMLS), SLP3

  39. Wikipedia Infoboxes

  40. Regular expressions • Regular expressions are precise ways of extracting high-precisions relations • “NP 1 is a film directed by NP 2 ” → directed_by(NP 1 , NP 2 ) • “NP 1 was the director of NP 2 ” → directed_by(NP 2 , NP 1 )

  41. Hearst patterns pattern sentence temples, treasuries, and other important NP {, NP}* {,} (and|or) other NP H civic buildings NP H such as {NP ,}* {(or|and)} NP red algae such as Gelidium such authors as Herrick, Goldsmith, and such NP H as {NP ,}* {(or|and)} NP Shakespeare common-law countries, including Canada NP H {,} including {NP ,}* {(or|and)} NP and England European countries, especially France, NP H {,} especially {NP}* {(or|and)} NP England, and Spain Hearst 1992; SLP3

Recommend


More recommend