1/17/2013 Preprocessed Data Files CSE 454 Advanced Internet Systems Each line corresponds to a sentence. "John likes eating sausage." tokens after tokenization John likes eating sausage . Features for Relation Extraction Dan Weld Preprocessed Data Files Preprocessed Data Files Each line corresponds to a sentence. Each line corresponds to a sentence. "John likes eating sausage." "John likes eating sausage." tokens after tokenization John likes eating sausage . tokens after tokenization John likes eating sausage . pos Part ‐ of ‐ Speech tags pos Part ‐ of ‐ Speech tags John/ NNP likes/ VBZ eating/ VBG John/ NNP likes/ VBZ eating/ VBG sausage/ NN ./ . sausage/ NN ./ . ner Named Entities Grade School: “9 parts of speech in English” • Noun • Pronoun • Verb • Adverb • Article • Conjunction • Adjective • Interjection • Preposition But: plurals, possessive, case, tense, aspect, …. Learning Relational Extractors Learning Relational Extractors TRAINING SET TRAINING SET Input Citigroup has taken over EMI, the British … + Citigroup has taken over EMI, the British … + + + Citigroup’s acquisition of EMI comes just ahead of … Citigroup’s acquisition of EMI comes just ahead of … ‐ ‐ Google’s Adwords system has long included … Youtube. Google’s Adwords system has long included … Youtube. Example <X 1 , …, X k , Y> Label Output Extractor Text R(a,b) tuples 1
1/17/2013 Features Outside the Span Citigroup has taken over EMI, the British … Birthplace Relation NER tag of Arg1 X i = NER tag of Arg2 Dan had lunch in Boston Does word ‐ 53 (acquire) appear in span? q pp p • Consider all words? Returning to his birthplace, Dan had lunch in Boston • Just use verbs & prepositions? Does bigram ‐ 199 (take over) appear in span? Dan had lunch in Boston, his birthplace. Trigrams? Proximity Proximity Birthplace Relation Birthplace Relation Dan, who was very tired from deadlines and cranky because of Dan, who was very tired from deadlines and cranky because of problems with his boss, was born in Boston problems with his boss, was born in Boston born nsubj prep_in Dan Boston rcmod tired prepfrom prepfrom cranky deadlines Proximity Parsing Ambiguity Birthplace Relation Dan, who was very tired from deadlines and a screaming baby, S was born in Boston NP VP born nsubj prep_in VP Papa PP Dan Boston rcmod V NP P NP tired prepfrom Det N Det N ate with prepfrom baby deadlines the caviar a spoon screaming 2
1/17/2013 Extracting grammatical relations from statistical Parsing Ambiguity constituency parsers Please Prepositional Phase Attachment Don’t Eat S Me! [de Marneffe et al. LREC 2006] • Exploit the high ‐ quality syntactic analysis done by statistical constituency parsers to get the grammatical relations [typed NP VP dependencies] • Dependencies are generated by pattern ‐ matching rules epe de c es a e ge e a ed by pa e a c g u es S Papa NP V VP NP VP VBD NP PP VBN PP IN NP ate NP PP IN NP NNS NN NNS CC NNP NNP Bills on ports and immigration were submitted by Senator Brownback submitted P NP Det N agent nsubjpass auxpass Bills were Brownback prep_on nn the caviar Det N with ports Senator cc_and a spoon immigration Preprocessed Data Files Mintz features (S automatic analysis of (NP (NNP John)) parse (VP (VBZ likes) *stored in one line (S grammatical structure (VP (VBG eating) (NP (NN sausage))))) (. .)) dep Grammatical dep. Why Extract Temporal Information? Time ‐ intensive Slot Types Person Organization • Many relations and events are temporally bounded per:alternate_names per:title org:alternate_names – a person's place of residence or employer per:date_of_birth per:member_of org:political/religious_affiliation per:age per:employee_of org:top_members/employees – an organization's members per:country_of_birth per:religion org:number_of_employees/members – the duration of a war between two countries per:stateorprovince_of_birth per:spouse org:members – the precise time at which a plane landed per:city_of_birth per:children org:member_of per:origin per:parents org:subsidiaries – … per:date_of_death d f d h per:siblings ibli org:parents • Temporal Information Distribution per:country_of_death per:other_family org:founded_by per:stateorprovince_of_death per:charges org:founded – One of every fifty lines of database application code involves a per:city_of_death org:dissolved date or time value (Snodgrass,1998) per:cause_of_death org:country_of_headquarters – Each news document in PropBank (Kingsbury and Palmer, 2002) per:countries_of_residence org:stateorprovince_of_headquarters includes eight temporal arguments per:stateorprovinces_of_residence org:city_of_headquarters per:cities_of_residence org:shareholders per:schools_attended org:website 17 18 18 Slide from Dan Roth, Heng Ji, Taylor Cassidy, Quang Do TIE Tutorial 17 Slide from Dan Roth, Heng Ji, Taylor Cassidy, Quang Do TIE Tutorial 3
1/17/2013 Temporal Expression Examples Temporal Expression Extraction Expression Value in Timex Format • Rule ‐ based (Strtotgen and Gertz, 2010; Chang and Manning, December 8, 2012 2012-12-08 2012; Do et al., 2012 ) Friday 2012-12-07 today 2012-12-08 1993 1993 • Machine Learning the 1990's 199X – Risk Minimization Model (Boguraev and Ando, 2005) midnight, December 8, 2012 2012-12-08T00:00:00 – Conditional Random Fields (Ahn et al., 2005; UzZaman and Allen, C diti l R d Fi ld (Ah t l 2005 U Z d All 5pm 2012-12-08T17:00 2010) the previous day 2012-12-07 last October 2011-10 last autumn 2011-FA • State ‐ of ‐ the ‐ art: about 95% F ‐ measure for extraction and last week 2012-W48 85% F ‐ measure for normalization Thursday evening 2012-12-06TEV three months ago 2012:09 Reference Date = December 8, 2012 19 19 20 20 Slide from Dan Roth, Heng Ji, Taylor Cassidy, Quang Do TIE Tutorial Slide from Dan Roth, Heng Ji, Taylor Cassidy, Quang Do TIE Tutorial Ordering events in time Ordering events in discourse Speech (S), Event (E), & Reference (R) time (Reichenbach, 1947) • (1 ) John entered the room at 5:00pm. Sentence Tense Order • (2) It was pitch black. John wins the game Present E,R,S • (3) It had been three days since he’d slept. John won the game Simple Past E,R<S John had won the game Perfective Past E<R<S State : John Slept Time : 3 days John has won the game Present Perfect E<S,R John will win the game Future S<E,R Event : John entered the room Etc… Etc… Etc… Time : 5pm • Tense: relates R and S; Gr. Aspect: relates R and E Time : Now • R associated with temporal anaphora (Partee 1984) • Order events by comparing R across sentences State : Pitch Black • By the time Boris noticed his blunder, John had (already) won the game See Michaelis (2006) for a good explanation of tense and grammatical aspect 21 22 21 21 22 22 Slide from Dan Roth, Heng Ji, Taylor Cassidy, Quang Do TIE Tutorial Slide from Dan Roth, Heng Ji, Taylor Cassidy, Quang Do TIE Tutorial High ‐ Level Architecture Teams • Named Entity Linking (1) Text Distant Manual Supervision Labeling • Time (1) Feature • Distant Supervision (1) Markup KB Training • InstaRead (1) ( ) Data Data Wikifier Wikifi • Relation ‐ Specific (3 ‐ 5) Slot Extractor Learner Patterns Inference Manual Generation Tuples 4
Recommend
More recommend