cold start 2016
play

Cold Start 2016 Hoa Dang Shahzad Rajput National Institute of - PowerPoint PPT Presentation

Cold Start 2016 Hoa Dang Shahzad Rajput National Institute of Standards and Technology TAC 2016 Hoa Dang, Shahzad Rajput (NIST) Cold Start 2016 TAC 2016 1 / 45 Outline Introduction 1 Task Variants Changes in 2016 KB Entity Discovery


  1. Cold Start 2016 Hoa Dang Shahzad Rajput National Institute of Standards and Technology TAC 2016 Hoa Dang, Shahzad Rajput (NIST) Cold Start 2016 TAC 2016 1 / 45

  2. Outline Introduction 1 Task Variants Changes in 2016 KB Entity Discovery Evaluation 2 Participants Results SF/KB Evaluation 3 Definitions Queries Participants Results SFV Evaluation 4 Setup Participants Results Conclusion 5 Hoa Dang, Shahzad Rajput (NIST) Cold Start 2016 TAC 2016 2 / 45

  3. Outline Introduction 1 Task Variants Changes in 2016 KB Entity Discovery Evaluation 2 Participants Results SF/KB Evaluation 3 Definitions Queries Participants Results SFV Evaluation 4 Setup Participants Results Conclusion 5 Hoa Dang, Shahzad Rajput (NIST) Cold Start 2016 TAC 2016 3 / 45

  4. Task Variants Knowledge Base Construction - Queries not provided ED Evaluation SF Evaluation Slot Filling - Queries provided Slot Filler Validation Hoa Dang, Shahzad Rajput (NIST) Cold Start 2016 TAC 2016 4 / 45

  5. Task Variants Queries LDC Query SF Query <query id=‘‘CS16_9999 ’’> <query id=‘‘ CSSF16_ENG_abcabdefde ’’> <mentions > <name >June McCarthy </name > <mention > <docid >ENG_142 </docid > <name >June McCarthy </name > <beg >16931 </beg > <docid >ENG_142 </docid > <end >16943 </end > <beg >16931 </beg > <enttype >PER </ enttype > <end >16943 </end > <slot >per:children </slot > </mention > <slot0 >per:children </slot0 > <mention > <slot1 >per:age </slot1 > <name >Junio McCarthy </name > </query > <docid >SPA_142 </docid > <query id=‘‘ CSSF16_SPA_defdeabcab ’’> <beg >2863 </beg > <name >Junio McCarthy <</name > <end >2869 </end > <docid >SPA_142 </docid > </mention > <beg >2863 </beg > </mentions > <end >2869 </end > <enttype >per </ enttype > <enttype >PER </ enttype > <nodeid >per_049 </ nodeid > <slot >per:children </slot > <slot0 >per:children </slot0 > <slot0 >per:children </slot0 > <slot1 >per:age </slot1 > <slot1 >per:age </slot1 > </query > </query > Hoa Dang, Shahzad Rajput (NIST) Cold Start 2016 TAC 2016 5 / 45

  6. Task Variants Examples Knowledge Base :e4 type PER :e4 mention ‘‘Bart Simpson ’’ Doc726 :37 -48 :e4 nominal_mention ‘‘brother ’’ Doc726 :15 -21 :e4 per:siblings :e7 Doc124 :283 -288 , Doc885 :173 -179 , Doc885 :274 -281 :e4 per:age ‘‘10’’ Doc124 :180 -181 , Doc885 :173 -179 0.9 Slot Filling Q4 org: city_of_headquarters myrun1 Doc42 :3-8, Doc8 :3 -11 Baltimore GPE Doc8 :3 -11 1.0 Q5 per:siblings myrun1 Doc124 :283 -288 , Doc885 :173 -179 Lisa PER Doc124 :283 -286 0.7 Q6 per:age myrun1 Doc124 :180 -181 , Doc885 :173 -179 10 STRING Doc124 :180 -181 0.9 Hoa Dang, Shahzad Rajput (NIST) Cold Start 2016 TAC 2016 6 / 45

  7. Changes in 2016 Tasks were cross-lingual entity mentions, slot fillers and provenance from any document Three diagnostic monolingual versions entity mentions, slot fillers and provenance from only the single language. KB: PER, ORG, GPE + LOC, FAC — slot inventory was not modified SF/KB: Nominal mention Hoa Dang, Shahzad Rajput (NIST) Cold Start 2016 TAC 2016 7 / 45

  8. Outline Introduction 1 Task Variants Changes in 2016 KB Entity Discovery Evaluation 2 Participants Results SF/KB Evaluation 3 Definitions Queries Participants Results SFV Evaluation 4 Setup Participants Results Conclusion 5 Hoa Dang, Shahzad Rajput (NIST) Cold Start 2016 TAC 2016 8 / 45

  9. KB Entity Detection Participants Teams ENG SPA CMN XLING Total BBN 5 - - - 5 ICTCAS 4 - - - 4 Stanfard 3 - 2 4 9 UMass 5 5 - 5 15 hltcoe 5 4 4 4 17 lilian 3 - - - 3 summa 3 - - - 3 Total 28 9 6 13 56 Hoa Dang, Shahzad Rajput (NIST) Cold Start 2016 TAC 2016 9 / 45

  10. Results KB Entity Detection Scores Lang. Team typed mention ceaf mention ceaf b cubed Prec. Rec. F1 Prec. Rec. F1 Prec. Rec. F1 CMN hltcoe 0.661 0.519 0.582 0.682 0.536 0.600 0.673 0.413 0.512 Stanford 0.661 0.368 0.473 0.734 0.408 0.525 0.729 0.273 0.397 ENG BBN 0.764 0.598 0.671 0.785 0.614 0.689 0.779 0.515 0.620 ICTCAS OKN 0.749 0.531 0.621 0.782 0.554 0.648 0.854 0.443 0.584 hltcoe 0.656 0.557 0.603 0.677 0.575 0.622 0.636 0.465 0.537 lilian 0.666 0.435 0.526 0.718 0.469 0.567 0.803 0.347 0.484 Stanford 0.600 0.441 0.508 0.647 0.475 0.548 0.632 0.344 0.445 UMass IESL 0.752 0.352 0.479 0.787 0.368 0.501 0.845 0.233 0.366 summa 0.553 0.268 0.361 0.577 0.280 0.377 0.697 0.169 0.272 SPA hltcoe 0.632 0.383 0.477 0.662 0.401 0.499 0.653 0.289 0.401 UMass IESL 0.612 0.261 0.366 0.698 0.297 0.417 0.800 0.176 0.288 XLING hltcoe 0.595 0.465 0.522 0.610 0.476 0.535 0.635 0.351 0.452 Stanford 0.607 0.284 0.387 0.663 0.310 0.422 0.667 0.173 0.275 UMass IESL 0.671 0.195 0.302 0.714 0.208 0.322 0.824 0.098 0.175 Hoa Dang, Shahzad Rajput (NIST) Cold Start 2016 TAC 2016 10 / 45

  11. Outline Introduction 1 Task Variants Changes in 2016 KB Entity Discovery Evaluation 2 Participants Results SF/KB Evaluation 3 Definitions Queries Participants Results SFV Evaluation 4 Setup Participants Results Conclusion 5 Hoa Dang, Shahzad Rajput (NIST) Cold Start 2016 TAC 2016 11 / 45

  12. SF/KB Definitions Scoring Wrong or ineXact is Spurious Hop 1 filler whose Hop 0 parent filler is Wrong or ineXact, is Spurious Correct responses are grouped into equivalence classes (EC). At most one response is Right ; all other Spurious NAM mention in EC, or NOM mentions and the EC is NOM, then one is Right ; otherwise, if only NOM mentions in a NAM EC, then one is Ignored Reference = number of single-valued pseudo-slots with a correct response + number of equivalence classes for all list-valued pseudo-slots Recall = #Right / Reference Precision = #Right / (#Right + #Spurious) F1 = 2 * Precision * Recall / (Precision + Recall) Applied only to queries with a known correct answer Hoa Dang, Shahzad Rajput (NIST) Cold Start 2016 TAC 2016 12 / 45

  13. SF/KB Definitions Metrics Score Variants Aggregates Reported Micro-average Macro-average SF Yes Yes LDC-MAX Yes Yes LDC-MEAN No Yes SF: consider all entrypoints as a separate query. LDC-MAX: Considering the run’s best entrypoint per LDC query on the basis of F1 score across both hops. LDC-MEAN: Precision, Recall, and F1 for each LDC query is the mean Precision, mean Recall, and mean F1 for all entrypoints for that LDC query. Hoa Dang, Shahzad Rajput (NIST) Cold Start 2016 TAC 2016 13 / 45

  14. SF/KB Definitions Aggregates Micro-averages are computed as: Total Right Total Precision = Total Right + Total Wrong Total Recall = Total Right Total GT Total F 1 = 2 × Total Precision × Total Recall Total Precision + Total Recall Macro-averages are computed as the mean Precision, mean Recall, and mean F1. Hoa Dang, Shahzad Rajput (NIST) Cold Start 2016 TAC 2016 14 / 45

  15. Queries Language SF Queries LDC Queries Developed Pooled Developed Pooled Nil English 1,350 487 1,001 355 123 ALL Spanish 1,156 402 893 298 101 Chinese 1,170 371 901 302 100 Total 3,676 1,260 1,077 392 123 English 464 187 268 111 35 Ambiguous Spanish 457 157 252 108 35 Chinese 343 138 254 105 32 Total 1,164 482 290 124 35 Hoa Dang, Shahzad Rajput (NIST) Cold Start 2016 TAC 2016 15 / 45

  16. Queries LDC Queries LDC Queries per Slot Pooled Developed 60 50 40 Number of Queries 30 20 10 0 gpe:births_in_city gpe:births_in_stateorprovince gpe:births_in_country gpe:deaths_in_city gpe:employees_or_members gpe:deaths_in_country gpe:headquarters_in_stateorprovince gpe:headquarters_in_city gpe:headquarters_in_country gpe:holds_shares_in gpe:organizations_founded gpe:member_of gpe:residents_of_stateorprovince gpe:residents_of_city gpe:residents_of_country gpe:subsidiaries org:alternate_names org:city_of_headquarters org:country_of_headquarters org:date_dissolved org:employees_or_members org:date_founded org:founded_by org:holds_shares_in org:number_of_employees_members org:member_of org:organizations_founded org:members org:political_religious_affiliation org:stateorprovince_of_headquarters org:parents org:shareholders org:top_members_employees org:students org:subsidiaries org:website per:alternate_names per:cause_of_death per:age per:charges per:cities_of_residence per:children per:city_of_birth per:countries_of_residence per:city_of_death per:country_of_birth per:country_of_death per:employee_or_member_of per:date_of_birth per:date_of_death per:organizations_founded per:holds_shares_in per:other_family per:origin per:schools_attended per:parents per:religion per:stateorprovince_of_birth per:statesorprovinces_of_residence per:stateorprovince_of_death per:siblings per:spouse per:top_member_employee_of per:title Slots Hoa Dang, Shahzad Rajput (NIST) Cold Start 2016 TAC 2016 16 / 45

Recommend


More recommend