part ii joint extraction of typed entity and relation
play

Part II: Joint Extraction of Typed Entity and Relation Effort-Light - PowerPoint PPT Presentation

Constructing Structured Information Networks from Massive Text Corpora Part II: Joint Extraction of Typed Entity and Relation Effort-Light StructMine: Methodology Data-driven text Entity names segmentation Text & context units


  1. Constructing Structured Information Networks from Massive Text Corpora Part II: Joint Extraction of Typed Entity and Relation

  2. Effort-Light StructMine: Methodology Data-driven text Entity names segmentation Text & context units (SIGMOD’15, WWW’16) corpus Learning Corpus- St Structures from Pa Partially- specific Model the remaining labele la led (KDD’15, KDD’16, Knowledge unl unlabe abeled data corpus EMNLP’16, WWW’17) bases Open-world Closed-world vs Assumption Assumption 2

  3. Effort-Light StructMine: Typing Data-driven text Entity names segmentation Text & context units (SIGMOD’15, WWW’16) corpus Learning Corpus- Structures from St Pa Partially- specific Model the remaining labele la led (KDD’15, KDD’16, Knowledge unl unlabe abeled data corpus EMNLP’16, WWW’17) bases Fine-grained Joint Entity and Entity Recognition and Entity Typing Relation Extraction Coarse-grained Typing (KDD’16) (WWW’17) (KDD’15) Corpus to Structured Network: The Roadmap 3

  4. Corpus to Structured Network: The Roadmap Data-driven text entity names Text segmentation & context units corpus (SIGMOD’15, WWW’16) Learning Corpus- Structures from Partially- specific Model the remaining labeled (KDD’15, KDD’16, Knowledge unlabeled data corpus EMNLP’16, WWW’17) bases En Entity Re Recognition an and Fine-grained Joint Entity and Coarse-gr Co grained ed Ty Typing Entity Typing Relation Extraction (K (KDD’15) (KDD’16) (WWW’17) 4

  5. Recognizing Entities of Target Types in Text The best BBQ BBQ I’ve tasted The best BBQ I’ve tasted enix ! I had the in Ph Phoen in Phoenix! I had the pulled p pork s sandwich pulled pork sandwich with co coleslaw and bak baked d with coleslaw and baked be beans ans for lunch. The beans for lunch. The owner is very nice. … ow owner is very nice. … person food location 5

  6. Traditional Named Entity Recognition (NER) Systems • Heavy reliance on corpus-specific human labeling • Training sequence models is slow The The be best BBQ BBQ I’ I’ve ta tasted in in Phoenix Ph ix O O Food O O O Location NER Systems : Sequence ce Stanford NER mo model tr training Illinois Name Tagger IBM Alchemy APIs … A manual annotation interface e.g., (McMallum & Li, 2003), (Finkel et al.,2005), (Ratinov & Roth, 2009), … 6

  7. Weak-Supervision Systems: Pattern-Based Bootstrapping • Requires manual seed selection & mid-point checking Patterns for Food th the be best <X <X> I’ I’ve tr tried in in Annotate Seed entities corpus using th their <X <X> ta tastes am amaz azing ing and corpus entities … Generate for Food Seeds fo Se Apply patterns candidate to find new Systems : Pizza Pi patterns entities CMU NELL Fr French Fr Fries UW KnowItAll Ho Hot Do Dog Select Top Stanford DeepDive Score candidate Panc ancak ake patterns patterns Max-Planck PROSPERA ... .. … e.g., (Etzioni et al., 2005), (Talukdar et al., 2010), (Gupta et al., 2014), (Mitchell et al., 2015), … 7

  8. Leveraging Distant Supervision 1. 1. Detec ect entity Sentence ID names from text Ph Phoenix is my all-time favorite dive bar in Ne New Yo York Ci City . S1 Match name strings 2. 2. Ma The best BBQ BBQ I’ve tasted in Ph Phoenix. S2 to KB entities Ph Phoenix has become one of my favorite bars in NY NY . S3 te types to 3. 3. Propagate Pr the un-matchable Food names tas asted in in BBQ BBQ Location New Yo Ne York Ci City is my my all all-ti time ??? favorite div fa dive bar bar in in Phoen Ph enix ??? Location à has as be become me on one of of NY NY my my fa favorite bar bars in in (Lin et al., 2012), (Ling et al., 2012), (Nakashole et al., 2013) 8

  9. Current Distant Supervision: Limitation I 1. Context-agnostic type prediction • Predict types for each mention regardless of context 2. Sparsity of contextual bridges ID Sentence Ph Phoenix is my all-time favorite dive bar in Ne New Yo York Ci City . S1 The best BBQ BBQ I’ve tasted in Ph Phoenix. S2 Phoenix has become one of my favorite bars in NY Ph NY . S3 9

  10. Current Distant Supervision: Limitation II 1. Context-agnostic type prediction 2. Sparsity of contextual bridges es are in frequent in the corpus • Some re relational phr phrases infr à ineffective type propagation ID Sentence Ph Phoenix is my all-time favorite dive bar in Ne New Yo York Ci City . S1 Phoenix has become one of my favorite bars in NY Ph NY . S3 10

  11. ClusType : Data-Driven Entity Mention Detection • Significance of a merging between two sub-phrases Corpus-level Pattern Example Quality Syntactic Concordance of merging quality (J*)N (J )N* support vector machine VP VP tasted in, damage on VW VW*(P (P) train a classifier with Good Concordance 11

  12. ClusType : Data-Driven Entity Mention Detection • Significance of a merging between two sub-phrases Corpus-level Pattern Example Quality Syntactic Concordance of merging quality (J*)N (J )N* support vector machine VP VP tasted in, damage on VW VW*(P (P) train a classifier with BBQ I’ve ta Phoenix ! I The best BBQ tasted in Ph Good had the pulle ha pulled po d pork sandw sandwic ich h wi with th Concordance coleslaw and bak ans for lunch. … co baked be d beans This plac up the best ch place se serves up cheese eese st steak sandw sandwich ch in in we west of of Mi Missi ssissi ssippi . 12

  13. My Solution: ClusType (KDD’15) S2 S2: Ph Phoen enix Represents ID Segmented Sentences object Correlated Ph Phoenix is is my my all all-ti time fa favorite S1 S1: Ph Phoen enix interactions mentions S1 City . di dive ba bar in in Ne New Yo York Ci S3 S3: Ph Phoen enix The best BBQ BBQ I’ve ta tasted in in Ph Phoenix. S2 S2 S2: BBQ BBQ Phoenix ha Ph has be become on one of of my my S1: Ne S1 New S3 favorite ba fa bars in in NY NY . Yo York Ci City BBQ BBQ Ph Phoen enix Putting two su Pu sub- tas asted in in S3 S3: NY NY ta tasks to together: New Yo Ne York Ci City 1. Type label is my all is all-ti time propagation favorite dive bar in fa NY NY 2. Relation phrase has as be become me on one of of clustering my fa my favorite bar bars in in Similar relation phrases 13

  14. Type Propagation in ClusType S1: Ph S1 Phoen enix S3: Ph S3 Phoen enix Sm Smoothness As Assumption If two objects are similar tas asted in in BBQ BBQ according to the graph, then their type labels is my all is all-ti time Ph Phoen enix should be also similar favorite dive bar in fa Ne New Yo York Ci City has as be become me on one of of W ij my fa my favorite bar bars in in NY NY f i f j Edge we Ed weight / ob object si similarity (Belkin & Partha, NIPS’01), (Ren et al., KDD’15) 14

  15. Relation Phrase Clustering in ClusType • Two relation phrases should be grouped together if: 1. Similar string 2. Similar context “Multi-view” clustering 3. Similar types for entity arguments Ph Phoen enix is is my all all-ti time 5 Location Similar favorite div fa dive bar bar in in relation New Yo Ne York Ci City phrases 102 has as be become me on one of of my my fa favorite bar bars in in Location ??? à NY NY Two subtasks mutually enhance each other (Ren et al., KDD’15) 15

  16. ClusType : Comparing with State-of-the-Art Systems (F1 Score) Me Methods NY NYT Ye Yelp Tw Tweet Pattern (Stanford, CONLL’14) Pa 0.301 0.199 0.223 Bootstrapping mTagger (U Utah, ACL’10) 0.407 0.296 0.236 SemT Se NNPLB (UW, EMNLP’12) NNP 0.637 0.511 0.246 Label propagation APOLLO (THU, CIKM’12) AP 0.795 0.283 0.188 FI FIGER (UW, AAAI’12) 0.881 0.198 0.308 Classifier with linguistic features pe (KDD’15) Cl Clus usType 0.939 0. 939 0.808 0. 808 0. 0.451 451 • vs. bo ng : context-aware prediction on “un-matchable” bootstrappi pping pagation : group similar relation phrases • vs vs. lab label pr propa FIGER : no reliance on complex feature engineering • vs vs. FI NYT : 118k news articles (1k manually labeled for evaluation); Yelp : 230k business reviews (2.5k reviews are manually labeled for evaluation); Tweet : 302 tweets (3k tweets are manually labeled for evaluation) #2)1'%.*$%&#3/04%, .%/'0#/1 , Recall ( R ) = #"#$$%&'()*')+%, .%/'0#/1 #"#$$%&'()*')+%, .%/'0#/1 #3$#5/,*'$5'6 .%/'0#/1 , F1 score = 7 8×: Precision ( P ) = 16 (8<:)

  17. Corpus to Structured Network: The Roadmap Data-driven text entity names Text segmentation & context units corpus (SIGMOD’15, WWW’16) Learning Corpus- Structures from Partially- specific Model the remaining labeled (KDD’15, KDD’16, Knowledge unlabeled data corpus EMNLP’16, WWW’17) bases Entity Recognition and Fi Fine-gr grained ed Joint Entity and Coarse-grained Typing Entity Ty En Typing Relation Extraction (KDD’15) (K (KDD’16) (WWW’17) 17

Recommend


More recommend