Features: overlap • American Airlines, a unit of AMR, immediately matched the move, spokesman Tim Wagner said. • Number of mentions and words in between • #MB = 1, #WB = 9 • Does one mention include in the other? • M1>M2 = false, M1<M2 = false • Conjunctive features • ET12+M1>M2 = ORG- PER+false ET12+M1<M2 = ORG-PER+false • HM12+M1>M2 = Airlines+Wagner+false HM12+M1<M2 = Airlines+Wagner+false • These features hurt precision a lot (-10%), but also help recall a lot (+8%) • 43
Features: syntactic features Features of mention dependencies ET1DW1 = ORG:Airlines H1DW1 = matched:Airlines ET2DW2 = PER:Wagner H2DW2 = said:Wagner Features describing entity types and dependency tree ET12SameNP = ORG-PER-false ET12SamePP = ORG-PER-false ET12SameVP = ORG-PER-false These features had disappointingly little impact! 46
Relation extraction classifiers Now use any (multiclass) classifier you like: • SVM • MaxEnt (aka multiclass logistic regression) • Naïve Bayes • etc. [Zhou et al. 2005 used a one-vs-many SVM] 48
Zhou et al. 2005 results
Position-aware LSTM for Relation Extraction (Zhang et al., 2017) 40
Relation Extraction Penner is survived by his brother , John, a copy editor at the Times, and his former wife,Times sportswriter Lisa Dillman .
Relation Extraction Penner is survived by his brother , John, a copy editor at the Times, and his former wife,Times sportswriter Lisa Dillman . Key elements • Context (relevant + irrelevant) • Entities (types + positions)
Position-aware Attention Model a 1 a 2 a 3 a n q h 1 h 2 h n h 3 … x 3 x n x 1 x 2 … Mike Lisa and married (subject) (object) 0 1 2 4 p s p s p s p s 1 2 3 n -2 -1 0 2 p o p o p o p o 1 2 3 n
Position-aware Attention Model Embedding Layers Word: x = [ x 1 , ..., x n ] a 1 a 2 a 3 a n q p s = [ p s , ..., p s ] Position: 1 n h 1 h 2 h n h 3 … p o = [ p o , ..., p o ] 1 n x 3 x n x 1 x 2 … Mike Lisa and married (subject) (object) 0 1 2 4 p s p s p s p s 1 2 3 n -2 -1 0 2 p o p o p o p o 1 2 3 n
Position-aware Attention Model Embedding Layers Word: x = [ x 1 , ..., x n ] a 1 a 2 a 3 a n q p s = [ p s , ..., p s ] Position: 1 n h 1 h 2 h n h 3 … p o = [ p o , ..., p o ] 1 n x 3 x n x 1 x 2 … Mike Lisa and married (subject) (object) LSTM Layers 0 1 2 4 p s p s p s p s 1 2 3 n { h 1 , ..., h n } = LSTM( { x 1 , ..., x n } ) -2 -1 0 2 p o p o p o p o 1 2 3 n
Position-aware Attention Model Summary Vector q = h n a 1 a 2 a 3 a n q h 1 h 2 h n h 3 … x 3 x n x 1 x 2 … Mike Lisa and married (subject) (object) 0 1 2 4 p s p s p s p s 1 2 3 n -2 -1 0 2 p o p o p o p o 1 2 3 n
Position-aware Attention Model Summary Vector q = h n a 1 a 2 a 3 a n q Attention Layer h 1 h 2 h n h 3 … x 3 x n x 1 x 2 … Mike Lisa and married (subject) (object) 0 1 2 4 p s p s p s p s 1 2 3 n -2 -1 0 2 p o p o p o p o 1 2 3 n
Position-aware Attention Model Relation Representation a 1 a 2 a 3 a n q h 1 h 2 h n h 3 … x 3 x n x 1 x 2 … Mike Lisa and married (subject) (object) 0 1 2 4 p s p s p s p s 1 2 3 n -2 -1 0 2 p o p o p o p o 1 2 3 n
Position-aware Attention Model Relation Representation n a 1 a 2 a 3 a n q h 1 h 2 h n h 3 … Softmax Layer x 3 x n x 1 x 2 … Mike Lisa and married (subject) (object) 0 1 2 4 p s p s p s p s 1 2 3 n -2 -1 0 2 p o p o p o p o 1 2 3 n
OtherAugmentations • Word dropout : balance OOV distribution
OtherAugmentations • Word dropout : balance OOV distribution Penner is survived by his brother <UNK> , J ohn
OtherAugmentations • Entity masking : focus onrelations, not specific entities Penner is survived by his brother , John
OtherAugmentations • Entity masking : focus onrelations, not specific entities Penner is survived by his brother , J ohn SUBJ-PER
OtherAugmentations • Entity masking : focus onrelations, not specific entities Penner is survived by his brother , J SUBJ-PER OBJ-PER ohn
OtherAugmentations • Linguistic information : POS and NER embeddings from Stanford CoreNLP Penner is survived by …
OtherAugmentations • Linguistic information : POS and NER embeddings from Stanford CoreNLP Penner is survived by … … NNP VPZ VBN IN … PER O O O
OtherAugmentations • Linguistic information : POS and NER embeddings from Stanford CoreNLP
Models ComparedAgainst • Stanford’s TAC KBP 2015 winningsystem Non- • Patterns Neural • Logistic regression (LR)
Models ComparedAgainst • Stanford’s TAC KBP 2015 winningsystem Non- • Patterns Neural • Logistic regression (LR) • C N N with positional encodings (Nguyen and Grishman,2015) Neural • Dependency-based RNN (Xu et al., 2015) • LSTM : 2-layerStacked-LSTM
Relation Extraction Results Model P R F1 T raditional Patterns 86.9 23.2 36.6 LR 73.5 49.9 59.4 LR + Patterns 72.9 51.8 60.5
Relation Extraction Results Model P R F1 T raditional Patterns 86.9 23.2 36.6 LR 73.5 49.9 59.4 LR + Patterns 72.9 51.8 60.5 • Patterns: highprecision • LR: relatively higherrecall
Relation Extraction Results Model P R F1 T raditional LR + Patterns 72.9 51.8 60.5 Neural CNN 75.6 47.5 58.3 CNN-PE 70.3 54.2 61.2 SDP-LSTM 66.3 52.7 58.7 LSTM 65.7 59.9 62.7
Relation Extraction Results Model P R F1 T raditional LR + Patterns 72.9 51.8 60.5 Neural CNN 75.6 47.5 58.3 CNN-PE 70.3 54.2 61.2 SDP-LSTM 66.3 52.7 58.7 LSTM 65.7 59.9 62.7 • CNN higher precision; LSTMhigher recall • CNN-PE and LSTM outperform traditional
Relation Extraction Results Model P R F1 T raditional LR + Patterns 72.9 51.8 60.5 Neural LSTM 65.7 59.9 62.7 Our model 65.7 64.5 65.1 Ensemble (5) 70.1 64.6 67.2
Relation Extraction Results Model P R F1 T raditional LR + Patterns 72.9 51.8 60.5 Neural LSTM 65.7 59.9 62.7 Our model 65.7 64.5 65.1 Ensemble (5) 70.1 64.6 67.2 • Our model: +2.4 improvement on F1
Supervised RE: summary • Supervised approach can achieve high accuracy At least, for some relations o If we have lots of hand-labeled training data o • But has significant limitations! Labeling 5,000 relations (+ named entities) is expensive o Doesn’t generalize to different relations o
Relation extraction: 5 easy methods 1. Hand-built patterns 2. Bootstrapping methods 3. Supervised methods 4. Distant supervision 5. Unsupervised methods
Distant supervision Snow, Jurafsky, Ng. 2005. Learning syntactic patterns for automatic hypernym discovery. NIPS 17 Mintz, Bills, Snow, Jurafsky. 2009. Distant supervision for relation extraction without labeled data. ACL-2009. • Hypothesis: If two entities belong to a certain relation, any sentence containing those two entities is likely to express that relation • Key idea: use a database of relations to get lots of noisy training examples o instead of hand-creating seed tuples (bootstrapping) o instead of using hand-labeled corpus (supervised)
Benefits of distant supervision • Has advantages of supervised approach o leverage rich, reliable hand-created knowledge o relations have canonical names o can use rich features (e.g. syntactic features) • Has advantages of unsupervised approach o leverage unlimited amounts of text data o allows for very large number of weak features o not sensitive to training corpus: genre- independent
Hypernyms via distant supervision Construct a noisy training set consisting of occurrences from our corpus that contain a hyponym-hypernym pair from WordNet. This yields high-signal examples like: “...consider authors like Shakespeare...” “Some authors (including Shakespeare)...” “Shakespeare was the author of several...” “Shakespeare, author of The Tempest... ” slide adapted from Rion Snow
Learning hypernym patterns Key idea: work at corpus level (entity pairs), instead of sentence level! 1. Take corpus sentences ... doubly heavy hydrogen atom called deuterium ... 2. Collect noun pairs e.g. (atom, deuterium) 752,311 pairs from 6M sentences of newswire 3. Is pair an IS-A in WordNet? 14,387 yes; 737,924 no 4. Parse the sentences 5. Extract patterns 69,592 dependency paths with >5 pairs 6. Train classifier on patterns logistic regression with 70K features (converted to 974,288 bucketed binary features) slide adapted from Rion Snow
One of 70,000 patterns Pattern: <superordinate> called <subordinate> Learned from cases such as: (sarcoma, cancer) …an uncommon bone cancer called osteogenic sarcoma and to… (deuterium, atom) …heavy water rich in the doubly heavy hydrogen atom called deuterium. New pairs discovered: (efflorescence, condition) …and a condition called efflorescence are other reasons for… (O’neal_inc, company) …The company, now called O'Neal Inc., was sole distributor of… (hat_creek_outfit, ranch) …run a small ranch called the Hat Creek Outfit. (hiv-1, aids_virus) …infected by the AIDS virus, called HIV-1. (bateau_mouche, attraction) …local sightseeing attraction called the Bateau Mouche...
What about other relations? Mintz, Bills, Snow, Jurafsky (2009). Distant supervision for relation extraction without labeled data. Training set Corpus 102 relations 1.8 million articles 940,000 entities 25.7 million sentences 1.8 million instances slide adapted from Rion Snow
Frequent Freebase relations
Collecting training data Corpus text Training data Bill Gates founded Microsoft in 1975. Bill Gates, founder of Microsoft, … Bill Gates attended Harvard from… Google was founded by Larry Page … Freebase Founder: (Bill Gates, Microsoft) Founder: (Larry Page, Google) CollegeAttended: (Bill Gates, Harvard)
Collecting training data Corpus text Training data (Bill Gates, Microsoft) Bill Gates founded Microsoft in 1975. Label: Founder Bill Gates, founder of Microsoft, … Feature: X founded Y Bill Gates attended Harvard from… Google was founded by Larry Page … Freebase Founder: (Bill Gates, Microsoft) Founder: (Larry Page, Google) CollegeAttended: (Bill Gates, Harvard)
Collecting training data Corpus text Training data (Bill Gates, Microsoft) Bill Gates founded Microsoft in 1975. Label: Founder Bill Gates, founder of Microsoft, … Feature: X founded Y Bill Gates attended Harvard from… Feature: X, founder of Y Google was founded by Larry Page … Freebase Founder: (Bill Gates, Microsoft) Founder: (Larry Page, Google) CollegeAttended: (Bill Gates, Harvard)
Collecting training data Corpus text Training data (Bill Gates, Microsoft) Bill Gates founded Microsoft in 1975. Label: Founder Bill Gates, founder of Microsoft, … Feature: X founded Y Bill Gates attended Harvard from… Feature: X, founder of Y Google was founded by Larry Page … (Bill Gates, Harvard) Label: CollegeAttended Feature: X attended Y Freebase Founder: (Bill Gates, Microsoft) Founder: (Larry Page, Google) CollegeAttended: (Bill Gates, Harvard)
Collecting training data Corpus text Training data (Bill Gates, Microsoft) Bill Gates founded Microsoft in 1975. Label: Founder Bill Gates, founder of Microsoft, … Feature: X founded Y Bill Gates attended Harvard from… Feature: X, founder of Y Google was founded by Larry Page … (Bill Gates, Harvard) Label: CollegeAttended Feature: X attended Y Freebase Founder: (Bill Gates, Microsoft) (Larry Page, Google) Founder: (Larry Page, Google) Label: Founder Feature: CollegeAttended: (Bill Gates, Harvard) Y was founded by X
Negative training data Can’t train a classifier with only positive data! Training data Need negative training data too! (Larry Page, Microsoft) Label: NO_RELATION Solution? Feature: X took a swipe at Y Sample 1% of unrelated pairs of entities. (Larry Page, Harvard) Label: NO_RELATION Feature: Y invited X Corpus text Larry Page took a swipe at Microsoft... (Bill Gates, Google) ...after Harvard invited Larry Page to... Label: NO_RELATION Google is Bill Gates' worst fear ... Feature: Y is X's worst fear
Preparing test data Test data Corpus text Henry Ford founded Ford Motor Co. in… Ford Motor Co. was founded by Henry Ford… Steve Jobs attended Reed College from…
Preparing test data Test data (Henry Ford, Ford Motor Co.) Corpus text Label: ??? Feature: X founded Y Henry Ford founded Ford Motor Co. in… Ford Motor Co. was founded by Henry Ford… Steve Jobs attended Reed College from…
Preparing test data Test data (Henry Ford, Ford Motor Co.) Corpus text Label: ??? Feature: X founded Y Feature: Y was founded by X Henry Ford founded Ford Motor Co. in… Ford Motor Co. was founded by Henry Ford… Steve Jobs attended Reed College from…
Preparing test data Test data (Henry Ford, Ford Motor Co.) Corpus text Label: ??? Feature: X founded Y Feature: Y was founded by X Henry Ford founded Ford Motor Co. in… Ford Motor Co. was founded by Henry Ford… Steve Jobs attended Reed College from… (Steve Jobs, Reed College) Label: ??? Feature: X attended Y
The experiment Positive training data Test data (Bill Gates, Microsoft) (Henry Ford, Ford Motor Co.) Label: Label: Founder ??? Feature: Feature: X X founded Y founded Y Feature: X, Feature: Y was founder of Y founded by X (Bill Gates, Harvard) (Steve Jobs, Reed College) Label: Label: ??? C o l e l g e A t e t n d e d F e a t u r e : Feature: X X attended Y attended Y (Larry Page, Google) Label: Founder Feature: Y was founded by X Learning: Trained multiclass relation logistic Negative training data classifier regression (Larry Page, Microsoft) Label: N O _ R E L A T I O N F e a t u e r : X took a swipe at Y Predictions! (Larry Page, Harvard) Label: N O _ R E L A T I O N F e a u t e r : (Henry Ford, Ford Motor Co.) Y invited Label: Founder X (Steve Jobs, Reed College) (Bill Gates, Google) Label: Label: N O _ R E L A T O I N F e a u t e r : CollegeAttended Y is X's worst fear
Advantages of the approach • ACE paradigm: labeling sentences • This paradigm: labeling entity pairs • Make use of multiple appearances of entities • If a pair of entities appears in 10 sentences, and each sentence has 5 features extracted from it, the entity pair will have 50 associated features
Experimental set-up • 1.8 million relation instances used for training o Compared to 17,000 relation instances in ACE • 800,000 Wikipedia articles used for training, 400,000 different articles used for testing • Only extract relation instances not already in Freebase
Newly discovered instances T en relation instances extracted by the system that weren’t in Freebase
Human evaluation Precision@K, using Mechanical Turk labelers: • At recall of 100 instances, using both feature sets (lexical and syntax) offers the best performance for a majority of the relations • At recall of 1000 instances, using syntax features improves performance for a majority of the relations
Distant supervision: conclusions • Distant supervision extracts high-precision patterns for a variety of relations • Can make use of 1000x more data than simple supervised algorithms • Syntax features almost always help • The combination of syntax and lexical features is sometimes even better • Syntax features are probably most useful when entities are far apart, often when there are modifiers in between
Heterogeneous Supervision • Provide a general framework to encode knowledge for supervision: • Knowledge base facts, heuristic patterns, …… • Labelling functions: Knowledge Base return born_in for < , , s> if BornIn( , ) in KB λ 1 e 1 e 2 e 1 e 2 return died_in for < , , s> if DiedIn( , ) in KB λ 2 e 1 e 2 e 1 e 2 Domain-specific Patterns return born_in for < , , s> if match(‘ * born in * ’, s) λ 3 e 1 e 2 Λ return died_in for < , , s> if match(‘ * killed in * ’, s) λ 4 e 1 e 2 (Liu et al, EMNLP 2017) 91
Challenges • Relation Extraction • Resolve Conflicts among Heterogeneous Supervision Robert Newton "Bob" Ford was an American outlaw best known Robert Newton "Bob" Ford was an American outlaw best known D D c 1 c 1 for killing his gang leader Jesse James ( ) in Missouri ( ) for killing his gang leader Jesse James ( ) in Missouri ( ) e 1 e 1 e 2 e 2 Gofraid ( ) died in 989, said to be killed in Dal Riata ( ). Gofraid ( ) died in 989, said to be killed in Dal Riata ( ). c 2 c 2 e 1 e 1 e 2 e 2 c 3 c 3 Hussein ( ) was born in Amman ( ) on 14 November 1935. Hussein ( ) was born in Amman ( ) on 14 November 1935. e 1 e 1 e 2 e 2 c 3 c 2 c 1 c 2 c 1 c 3 λ 1 λ 1 return born_in for < , , s> if BornIn( , ) in KB return born_in for < , , s> if BornIn( , ) in KB λ 1 λ 1 e 1 e 2 e 1 e 2 e 1 e 2 e 1 e 2 λ 2 λ 2 return died_in for < , , s> if DiedIn( , ) in KB return died_in for < , , s> if DiedIn( , ) in KB λ 2 λ 2 λ 3 λ 3 e 1 e 2 e 1 e 2 e 1 e 2 e 1 e 2 return born_in for < , , s> if match(‘ * born in * ’, s) return born_in for < , , s> if match(‘ * born in * ’, s) λ 3 λ 3 λ 4 λ 4 e 1 e 2 e 1 e 2 Λ Λ return died_in for < , , s> if match(‘ * killed in * ’, s) return died_in for < , , s> if match(‘ * killed in * ’, s) λ 4 λ 4 e 1 e 2 e 1 e 2 92
Conflicts among Heterogeneous Supervision • A straightforward way: majority voting Robert Newton "Bob" Ford was an American outlaw best known Robert Newton "Bob" Ford was an American outlaw best known D D c 1 c 1 for killing his gang leader Jesse James ( ) in Missouri ( ) for killing his gang leader Jesse James ( ) in Missouri ( ) e 1 e 1 e 2 e 2 Gofraid ( ) died in 989, said to be killed in Dal Riata ( ). Gofraid ( ) died in 989, said to be killed in Dal Riata ( ). c 2 c 2 e 1 e 1 e 2 e 2 c 3 c 3 Hussein ( ) was born in Amman ( ) on 14 November 1935. Hussein ( ) was born in Amman ( ) on 14 November 1935. e 1 e 1 e 2 e 2 c 1 c 3 c 2 c 2 c 1 c 3 λ 1 λ 1 return born_in for < , , s> if BornIn( , ) in KB return born_in for < , , s> if BornIn( , ) in KB λ 1 λ 1 e 1 e 2 e 1 e 2 e 1 e 2 e 1 e 2 λ 2 λ 2 return died_in for < , , s> if DiedIn( , ) in KB return died_in for < , , s> if DiedIn( , ) in KB λ 2 λ 2 λ 3 λ 3 e 1 e 2 e 1 e 2 e 1 e 2 e 1 e 2 return born_in for < , , s> if match(‘ * born in * ’, s) return born_in for < , , s> if match(‘ * born in * ’, s) λ 3 λ 3 λ 4 λ 4 e 1 e 2 e 1 e 2 Λ Λ return died_in for < , , s> if match(‘ * killed in * ’, s) return died_in for < , , s> if match(‘ * killed in * ’, s) λ 4 λ 4 e 1 e 2 e 1 e 2 93
Conflicts among Heterogeneous Supervision • How to resolve conflicts among Heterogeneous Supervision? • Works for C3 and C2, but not work for C1 Robert Newton "Bob" Ford was an American outlaw best known Robert Newton "Bob" Ford was an American outlaw best known D D c 1 c 1 for killing his gang leader Jesse James ( ) in Missouri ( ) for killing his gang leader Jesse James ( ) in Missouri ( ) e 1 e 1 e 2 e 2 Gofraid ( ) died in 989, said to be killed in Dal Riata ( ). Gofraid ( ) died in 989, said to be killed in Dal Riata ( ). c 2 c 2 e 1 e 1 e 2 e 2 c 3 c 3 Hussein ( ) was born in Amman ( ) on 14 November 1935. Hussein ( ) was born in Amman ( ) on 14 November 1935. e 1 e 1 e 2 e 2 c 1 c 3 c 2 c 2 c 1 c 3 λ 1 λ 1 return born_in for < , , s> if BornIn( , ) in KB return born_in for < , , s> if BornIn( , ) in KB λ 1 λ 1 e 1 e 2 e 1 e 2 e 1 e 2 e 1 e 2 λ 2 λ 2 return died_in for < , , s> if DiedIn( , ) in KB return died_in for < , , s> if DiedIn( , ) in KB λ 2 λ 2 λ 3 λ 3 e 1 e 2 e 1 e 2 e 1 e 2 e 1 e 2 return born_in for < , , s> if match(‘ * born in * ’, s) return born_in for < , , s> if match(‘ * born in * ’, s) λ 3 λ 3 λ 4 λ 4 e 1 e 2 e 1 e 2 Λ Λ return died_in for < , , s> if match(‘ * killed in * ’, s) return died_in for < , , s> if match(‘ * killed in * ’, s) λ 4 λ 4 e 1 e 2 e 1 e 2 94
Conflicts among Heterogeneous Supervision • Truth Discovery: • Some sources (labeling functions) would be more reliable than others • Sou Source Co Consistency As Assumpti tion on : a source is likely to provide true information with the same probability for all instances. 95
Conflicts among Heterogeneous Supervision • For Distant Supervision, all annotations come from Knowledge Base. • For Heterogeneous Supervision, annotations are from different sources, and some could be more reliable than others. Knowledge Base return born_in for < , , s> if BornIn( , ) in KB λ 1 e 1 e 2 e 1 e 2 return died_in for < , , s> if DiedIn( , ) in KB λ 2 e 1 e 2 e 1 e 2 Domain-specific Patterns return born_in for < , , s> if match(‘ * born in * ’, s) λ 3 e 1 e 2 Λ return died_in for < , , s> if match(‘ * killed in * ’, s) λ 4 e 1 e 2 96
Conflicts among Heterogeneous Supervision • We introduce context awareness to truth discovery, and modified the assumption: • A labeling function (LF) is likely to provide true information with the same probability for instances with similar context . • If we can “contextualize” a LF, then we can measure the “expertise” of a LF over a given sentence context 97
Relation Mention Representation Mapping from Text • Text Feature Extraction Embedding to Relation Mention Embedding: v i ∈ R n v HEAD_EM1_Hussein 1 X tanh( W · v i ) TKN_EM1_Hussein | f c 1 | born f i ∈ f c 1 • Text Feature Representation HEAD_EM2_Amman Text Feature …… Representation z c ∈ R n z Text Feature Extraction • Relation Mention Representation Robert Newton "Bob" Ford was an American outlaw best known D c 1 for killing his gang leader Jesse James ( ) in Missouri ( ) e 1 e 2 Gofraid ( ) died in 989, said to be killed in Dal Riata ( ). c 2 e 1 e 2 c 3 Hussein ( ) was born in Amman ( ) on 14 November 1935. e 1 e 2 98
True label discovery • Probability Model: • Describing the generation of Heterogeneous Supervision? • Different from crowdsourcing. E.g., ONE worker may annotate: 99
True label discovery • Describing the correctness of Heterogeneous Supervision Representation of relation mention observed annotation underlying true label z c Representation of labeling function |C| ∗ ) ! ",$ = &(( ",$ == ( " l i ρ c,i s c,i |O| | Λ | correctness of annotation ( ",$ whether c belongs to the proficient subset of + , 100
Recommend
More recommend