ReNoun Fact Extraction for Nominal Attributes Mohamed Yahya, Steven Whang, Rahul Gupta, and Alon Halevy EMNLP // 2014.10.26
Nouns, Queries & Relations
Nouns as Attributes % Noun % Verb KB Attributes Attributes 96 4 DBpedia 97 3 Freebase
Nouns as Attributes % Noun % Verb KB Attributes Attributes 96 4 DBpedia 97 3 Freebase
entity relation/attribute entity entity 7
8
This talk is about extracting facts centered around noun phrases (= attributes ) banking arm coach caucus chairman president foreign policy chief province state assemblyman
[Etzioni et al., CACM’08] (Open) Information Extraction Gazprombank banking arm …Gazprom’s banking arm, Gazprombank, Gazprom is owned by the company’s pension fund… 11
[Etzioni et al., CACM’08] (Open) Information Extraction Text Gazprombank banking arm …Gazprom’s banking arm, Gazprombank, Gazprom is owned by the company’s pension fund… Fact/ Subject relation Object triple 12
Before the details: What’s missing from the state of the art?
2010 WOE 2011 ReVerb 2012 Ollie 2013 ClausIE 2014 ReNoun
2010 WOE 2011 ReVerb 2012 Ollie 2013 ClausIE 2014 Notable exception, ReNoun check it out! [WWW’13]
2010 WOE 2011 ReVerb 2012 Ollie 2013 ClausIE 2014 ReNoun
Let’s see Ollie …
Obama is the president of the US. V | V P | V W* P arg1 arg2 Obama US is president of 18
Obama is president of US {US/N} nn {Obama/N} nn {president/N} 19
Ollie* ReNoun vs. [EMNLP’12] [this talk] coach coach president president province province banking arm banking arm caucus chairman caucus chairman foreign policy chief foreign policy chief state assemblyman state assemblyman * Ollie handles verbs, ReNoun doesn’t! 20
Ollie* ReNoun vs. [EMNLP’12] [this talk] coach coach president president province province banking arm banking arm caucus chairman caucus chairman foreign policy chief foreign policy chief state assemblyman state assemblyman * Ollie handles verbs, ReNoun doesn’t! 21
Ollie* ReNoun vs. [EMNLP’12] [this talk] coach coach d a ) e 8 president president h 1 2 t ( a F province province banking arm banking arm l i caucus chairman caucus chairman a ) t K g 0 n 6 o ( foreign policy chief foreign policy chief L state assemblyman state assemblyman * Ollie handles verbs, ReNoun doesn’t! 22
2010 WOE 2011 ReVerb 2012 Ollie 2013 ClausIE 2014 ReNoun
2010 WOE 2011 ReVerb 2012 Ollie 2013 ClausIE 2014 ReNoun
… now, ReNoun is upon us!
Relations are expressed using noun phrases . ReNoun Entity noun phrase Entity [this talk] 26
ReNoun Annotated Corpus Seed fact Dependency Fact Fact extraction pattern learning extraction scoring Attributes (Biperpedia)
Google CEO Larry Page started his term in 2011. Annotated Corpus Seed fact Dependency Fact Fact extraction pattern learning extraction scoring Attributes 8 simple lexical patterns to capture (Biperpedia) facts with S A O in close proximity
A CEO, like Larry Page of Google, is usually a busy person. Annotated Corpus Seed fact Dependency Fact Fact extraction pattern learning extraction scoring align seed facts with text to find Attributes (Biperpedia) variations in how facts for an attribute are expressed
A CEO, like Manouchehr Moshayedi of STEC Inc., is not allowed to trade his company’s stocks based on information only he has. Annotated Corpus Seed fact Dependency Fact Fact extraction pattern learning extraction scoring Attributes deploy the dependency patterns to (Biperpedia) collect more facts
Annotated Corpus Seed fact Dependency Fact Fact extraction pattern learning extraction scoring Attributes assign numerical scores facts (Biperpedia) score ∝ extracted fact is correct
ReNoun Annotated Corpus Seed fact Dependency Fact Fact extraction pattern learning extraction scoring Attributes (Biperpedia)
Annotated Corpus 400x10 6 news 1. Dependency parses, 2. Noun phrase chunks, documents 3. NER, 4. Coreference resolution, det A/DET 5. Entity resolution to CEO/NN prep nn Larry/NNP pobj like/IN Page/NNP prep pobj of/IN Google/NNP [Google] 1 [CEO] 2 [Larry Page] 2 started his term in 2011, when [he] 2 succeeded [Eric Schmidt] 3 . Cluster Phrase Freebase ID 1 Google /m/045c7b 2 Larry Page, CEO, he /m/0gjpq 3 Eric Schmidt /m/0gjpq 33
ReNoun Annotated Corpus Seed fact Dependency Fact Fact extraction pattern learning extraction scoring Attributes (Biperpedia)
Attributes: Biperpedia [Gupta et al., VLDB’14] N o Query Gazprom’s banking arm t T Biperpedia r Biperpedia i p l banking arm Attribute e s for more details, see the VLDB’14 paper, also [Lee at al., ICDE’13] and [Pasca & van Durma, IJCAI’07 ] 35
ReNoun Annotated Corpus Seed fact Dependency Fact Fact extraction pattern learning extraction scoring Attributes (Biperpedia)
Seed fact extraction [Google] 1 [ CEO ] 2 [Larry Page] 2 started his term in 2011, when [he] 2 succeeded [Eric Schmidt] 3 . #1: A in Biperpedia & one of 8 pattern applies # Pattern Example 1 the A of S, O the CEO of Google, Larry Page 2 the A of S is O the CEO of Google is Larry Page 3 O, S A Larry Page, Google CEO 4 O, S’s A Larry Page, Google’s CEO 5 O, [the] A of S Larry Page, [the] CEO of Google 6 SAO Google CEO Larry Page 7 SA, O Google CEO, Larry Page 8 S’s A, O Google’s CEO, Larry Page 37
Seed fact extraction [Google] 1 [CEO] 2 [Larry Page] 2 started his term in 2011, when [he] 2 succeeded [Eric Schmidt] 3 . #1: A in Biperpedia & one of 8 pattern applies # Pattern Example 1 the A of S, O the CEO of Google, Larry Page 2 the A of S is O the CEO of Google is Larry Page 3 O, S A Larry Page, Google CEO 4 O, S’s A Larry Page, Google’s CEO 5 O, [the] A of S Larry Page, [the] CEO of Google 6 SAO Google CEO Larry Page 7 SA, O Google CEO, Larry Page 8 S’s A, O Google’s CEO, Larry Page 38
Seed fact extraction #2: A and O Corefer [Google] 1 [CEO] 2 [Larry Page] 2 started his term in 2011, when [he] 2 succeeded [Eric Schmidt] 3 . Cluster Phrase Freebase ID 1 Google /m/045c7b 2 Larry Page, CEO, he /m/0gjpq 3 Eric Schmidt /m/0gjpq 39
Seed fact extraction Result [Google] 1 [CEO] 2 [Larry Page] 2 started his term in 2011, when [he] 2 succeeded [Eric Schmidt] 3 . Google CEO Larry Page 40
Seed fact extraction 139M Seed facts, 680K unique Accuracy* d a 65/100 e h t a F l i a 80/100 t g n o L *random sample of 100 seed facts 41
ReNoun Annotated Corpus Seed fact Dependency Fact Fact extraction pattern learning extraction scoring Attributes (Biperpedia)
A [CEO] 1 , like [Larry Page] 2 of [Google] 3 , is usually a busy person. det A/DET CEO/NN prep nn Larry/NNP pobj like/IN Page/NNP prep pobj of/IN Google/NNP Annotated Corpus Larry Google CEO Page Seed fact Dependency extraction pattern learning 43
Dependency pattern learning Larry Google CEO Page det A/DET CEO/NN prep nn Larry/NNP pobj like/IN Page/NNP prep pobj of/IN Google/NNP prep pobj prep pobj CEO/NN like/IN Page/NNP of/IN Google/NNP prep pobj prep pobj {A/N} like/IN {O/N} of/IN {S/N} 44
prep pobj prep pobj {A/N} like/IN {O/N} of/IN {S/N} Larry Google CEO CEO Page 45
prep pobj prep pobj {A/N} like/IN {O/N} of/IN {S/N} Larry Google CEO CEO Page executive executive Eric Google chairman chairman Schmidt Same pattern could apply to multiple attributes… 46
prep pobj prep pobj {A/N} like/IN {O/N} of/IN {S/N} Larry Google CEO CEO Page executive executive Eric Google chairman chairman Schmidt head head Real Carlo coach coach Madrid Ancelotti Same pattern could apply to multiple attributes… …useful for scoring 47
ReNoun Annotated Corpus Seed fact Dependency Fact Fact extraction pattern learning extraction scoring Attributes (Biperpedia)
Fact extraction A CEO, like Manouchehr Moshayedi of STEC Inc., is not allowed to trade his company’s stocks based on information only he has. prep pobj prep pobj executive head {A/N} like/IN {O/N} of/IN {S/N} CEO chairman coach Manouchehr CEO STEC Inc. Moshayedi 49
Argument order Seed fact extraction A CEO, like Manouchehr Moshayedi of #1: A in Biperpedia & one of 8 pattern applies STEC Inc., is not allowed to trade his # Pattern Example 1 the A of S, O the CEO of Google, Larry Page company’s stocks based on information 2 the A of S is O the CEO of Google is Larry Page 3 O, S A Larry Page, Google CEO only he has. 4 O, S’s A Larry Page, Google’s CEO 5 O, [the] A of S Larry Page, [the] CEO of Google 6 SAO Google CEO Larry Page 7 SA, O Google CEO, Larry Page 8 S’s A, O Google’s CEO, Larry Page prep pobj prep pobj executive head {A/N} like/IN {O/N} of/IN {S/N} CEO chairman coach Manouchehr CEO STEC Inc. Moshayedi 50
Recommend
More recommend