Overview of the KBP 2015 Slot Filler Validation Track Hoa Trang Dang - PowerPoint PPT Presentation

Overview of the KBP 2015 Slot Filler Validation Track Hoa Trang Dang National Institute of Standards and Technology

Slot Filler Validation (SFV) • Track Goals ▫ Allow teams without a full slot-filling system to participate in KBP, focus on SF answer validation rather than IR, IE, EDL, etc. ▫ Evaluate the contribution of RTE systems on KBP slot-filling ▫ Allow teams to experiment with system voting and ensembling • Piggy back off of resources developed for and by KBP [Cold Start] Slot Filling • Task and evaluation metrics depend on use case and availability of additional information about candidate fillers ▫ RTE: correctness of candidate slot filler is judged in isolation – no knowledge of who proposed the candidate slot filler. Generally requires going back to the source documents ▫ SFV: candidate slot fillers grouped according to which system propose the slot filler – leverage wisdom of the crowd

SFV 2015 • SFV input: ▫ All KBP 2015 CS Slot Filling input (slot definitions, CSSF queries, source documents) ▫ Anonymized individual CS KB/SF runs SFV2015_KB_12_5 SFV2015_KB_2_1 SFV2015_SF_2_1 ▫ System profile for each CS run (“are the confidence values meaningful?”) ▫ Preliminary assessment of ~10% of CSSF queries (164 / 1983) ▫ Mapping to real team names (extra) SFV2015_KB_12 = “BBN” SFV2015_KB_2 = “Stanford KB” SFV2015_SF_2 = “Stanford SF” • SFV output: Binary classification of each candidate slot filler in each CS run (-1/+1 : Exclude/Include slot filler)

Task 1: SFV Filtering Task • Apply SFV filter to set of original CS runs to produce a filtered version of each original CS run. • Can only improve Precision, not Recall, of individual CS runs • Score each original and filtered CS run with Cold Start scorer, and report change in F1 • Final SFV Filtering score = mean change in F1, over all CS runs • How much can you improve an individual CS run, on average?

Task 2: SFV Ensemble Task • Apply SFV filter to set of original CS runs to produce a single ensemble CS run • Possible to improve both Precision and Recall over original CS runs • Score ensemble CS run with Cold Start scorer • Final SFV Ensemble score = F1 of the ensemble run

Applying Cold Start scorer in SFV • CS scorer penalizes a CS run for returning multiple slot fillers that are duplicates (refer to the same entity, concept, etc.). • SFV must optimally remove duplicate “Correct” candidate slot fillers within a CS run and (for ensemble) across the set of CS runs. • Identifying that different Cold Start entry points are for the same entity is currently outside the scope of SFV • SFV evaluation focuses on micro-average Cold Start scores -- each correct slot filling answer (equivalence class) is weighted evenly. • Score only on the 90% of CSSF queries that did not have preliminary assessments released as part of the SFV input

SFV 2015 Participants Team Organization Confidence Assessment * gator_dsr University of Florida Yes Yes jhuapl Johns Hopkins University Yes Yes Applied Physics Laboratory RPI_BLENDER Rensselaer Polytechnic Institute No Yes UI_CCG University of Illinois Urbana No Yes Champaign * UTAustin University of Texas at Austin Yes Yes * SFV team was provided with real identity of Cold Start teams (build on UTAustin work on supervised ensembling)

-0.1 jhuapl1 filter (cssf micro-average) 0.2 0.3 0.4 0.5 0.1 0 SFV2015_KB_12_1 SFV2015_KB_12_2 SFV2015_KB_12_4 SFV2015_SF_03_1 SFV2015_KB_05_4 SFV2015_SF_18_1 SFV2015_SF_18_3 SFV2015_SF_03_2 SFV2015_SF_08_3 SFV2015_SF_03_4 SFV2015_SF_08_4 SFV2015_SF_10_1 SFV2015_KB_16_1 SFV2015_KB_16_5 SFV2015_SF_10_3 SFV2015_KB_16_3 SFV2015_SF_08_1 SFV2015_SF_13_3 SFV2015_SF_08_5 SFV2015_KB_10_2 SFV2015_KB_10_4 SFV2015_SF_06_1 SFV2015_SF_04_1 SFV2015_SF_13_4 SFV2015_SF_04_2 SFV2015_SF_02_2 SFV2015_SF_02_4 SFV2015_SF_14_1 SFV2015_SF_07_1 SFV2015_SF_07_5 SFV2015_SF_04_5 SFV2015_SF_17_1 SFV2015_KB_11_1 SFV2015_SF_17_3 SFV2015_KB_11_2 post-filter hop0 change Orig hop0 F

RPI_BLENDER1 filter (cssf micro-average) -0.05 0.05 0.25 0.35 0.45 0.15 -0.1 0.2 0.3 0.4 0.1 0 SFV2015_KB_12_1 SFV2015_KB_12_2 SFV2015_KB_12_4 SFV2015_SF_03_1 SFV2015_KB_05_4 SFV2015_SF_18_1 SFV2015_SF_18_3 SFV2015_SF_03_2 SFV2015_SF_08_3 SFV2015_SF_03_4 SFV2015_SF_08_4 SFV2015_SF_10_1 SFV2015_KB_16_1 SFV2015_KB_16_5 SFV2015_SF_10_3 SFV2015_KB_16_3 SFV2015_SF_08_1 SFV2015_SF_13_3 SFV2015_SF_08_5 SFV2015_KB_10_2 SFV2015_KB_10_4 SFV2015_SF_06_1 SFV2015_SF_04_1 SFV2015_SF_13_4 SFV2015_SF_04_2 SFV2015_SF_02_2 SFV2015_SF_02_4 SFV2015_SF_14_1 post-filter hop0 change Orig hop0 SFV2015_SF_07_1 SFV2015_SF_07_5 SFV2015_SF_04_5 SFV2015_SF_17_1 SFV2015_KB_11_1 SFV2015_SF_17_3 SFV2015_KB_11_2

-0.05 gator_dsr3 filter (cssf micro-average) 0.05 0.25 0.35 0.45 0.15 0.2 0.3 0.4 0.1 0 SFV2015_KB_12_1 SFV2015_KB_12_2 SFV2015_KB_12_4 SFV2015_SF_03_1 SFV2015_KB_05_4 SFV2015_SF_18_1 SFV2015_SF_18_3 SFV2015_SF_03_2 SFV2015_SF_08_3 SFV2015_SF_03_4 SFV2015_SF_08_4 SFV2015_SF_10_1 SFV2015_KB_16_1 SFV2015_KB_16_5 SFV2015_SF_10_3 SFV2015_KB_16_3 SFV2015_SF_08_1 SFV2015_SF_13_3 SFV2015_SF_08_5 SFV2015_KB_10_2 SFV2015_KB_10_4 SFV2015_SF_06_1 SFV2015_SF_04_1 SFV2015_SF_13_4 SFV2015_SF_04_2 SFV2015_SF_02_2 SFV2015_SF_02_4 SFV2015_SF_14_1 SFV2015_SF_07_1 SFV2015_SF_07_5 SFV2015_SF_04_5 SFV2015_SF_17_1 SFV2015_KB_11_1 SFV2015_SF_17_3 SFV2015_KB_11_2 post-filter hop0 change Orig hop0 F

Top 20 CSSF runs (cssf micro-average) SFV run CSSF run Hop0 F1 gator_dsr2 ensemble 0.45 gator_dsr3 ensemble 0.44 gator_dsr1 ensemble 0.44 gator_dsr3 SFV2015_KB_12_4.filtered 0.4 gator_dsr2 SFV2015_KB_12_4.filtered 0.4 UI_CCG1 SFV2015_KB_12_1.filtered 0.39 -- SFV2015_KB_12_1 0.39 RPI_BLENDER2 SFV2015_KB_12_4.filtered 0.38 RPI_BLENDER1 SFV2015_KB_12_4.filtered 0.38 gator_dsr3 SFV2015_KB_12_1.filtered 0.38 gator_dsr2 SFV2015_KB_12_1.filtered 0.38 gator_dsr3 SFV2015_KB_12_3.filtered 0.38 gator_dsr2 SFV2015_KB_12_3.filtered 0.38 UI_CCG1 SFV2015_KB_12_3.filtered 0.37 -- SFV2015_KB_12_3 0.37 UI_CCG1 SFV2015_KB_12_2.filtered 0.37 -- SFV2015_KB_12_2 0.37 gator_dsr3 SFV2015_KB_12_5.filtered 0.37 gator_dsr2 SFV2015_KB_12_5.filtered 0.37 UI_CCG1 SFV2015_KB_12_5.filtered 0.37

Conclusion • SFV is able to improve on state-of-the art Cold Start 2015 KB/SF systems • Difficult to optimize SFV filter to help all/most Cold Start runs • “partial preliminary assessments” provide only weak indication of performance of each Cold Start run. • Real Cold Start team IDs help significantly – leverage past results for teams that participated in past SF tracks • Should we always provide real CS team IDs in future?

Overview of the KBP 2015 Slot Filler Validation Track Hoa Trang Dang - PowerPoint PPT Presentation

Overview of the KBP 2015 Slot Filler Validation Track Hoa Trang Dang National Institute of Standards and Technology Slot Filler Validation (SFV) Track Goals Allow teams without a full slot-filling system to participate in KBP, focus on

University of Florida DSR Lab System for KBP Slot Filler Validation 2015 Miguel Rodriguez, Sean

Components of ESTELITE OMEGA 1. Filler - Supra-Nano Spherical Filler (200nm SiO 2 -ZrO 2 ) -

Overview of Event Nugget Track TAC KBP 2016 Teruko Mitamura Zhengzhong Liu Eduard Hovy

KBP 2017 Cold Start KB Construction and Slot Filling Hoa Dang Shahzad Rajput U.S. National

2/8/2013 The Slot Filling Challenge Overview of the NYU 2011 System Pattern Filler Ang

Overview of the TAC2011 Knowledge Base Population (KBP) Track Heng Ji, Ralph Grishman and Hoa

Events Detection, Coreference and Sequencing: Whats next? Overview of TAC KBP 2017 Event

Status on positron fraction Multi-track event CC fitted Multi-track event 1 track Multi-Track

CMU LTI @ KBP 2016 Event Track Zhengzhong Liu Jun Araki, Teruko Mitamura, Eduard Hovy Language

CMU LTI @ KBP 2015 Event Track Zhengzhong Liu Dheeru Dua Jun Araki Teruko Mitamura Eduard Hovy

Price-Setting Auctions for Airport Slot Allocation: a Multi-Airport Case Study An Agent-Based

Validation of National Burn Severity Validation of National Burn Severity Validation of National

Form Validation 1 CS380 What is form validation? 2 validation: ensuring that form's values

How to book a slot to record your presentation Please answer the e-mail you have received and

The Design of Slot Machine Games Kevin Harrigan, PhD University of Waterloo Nov 17, 2009, New

Public WH Design (Problem 24) K $/yr = ( ) b AC $/slot-yr M slot Demand assumed

80 White St Amendment to COFA #18-2611 Modification of Approved Cortlandt Alley Windows FSI SI

Report to COMMUNITY DEVELOPMENT & SERVICES Committee for decision SUMMARY The purpose of this

I N -S ITU P ERFORMANCES OF F OAM B ITUMEN R ECYCLED M IXTURES G ABRIELE T EBALDI , P H .D., P.E.

PUBLIC SPEAKING - The Formula of 3 Public Speaking is a way in which we share information about a

Electrically Conductive Formulations Electrically Conductive Formulations Filled Nano Size Silver

F I L L I N G SEAMING CAPPING Hema is part of Our business, our services Packaging division

Skin rejuvenation Wrinkles correction Facial modeling Princess Filler the polyvalent

LADYLUX L2 September 2019 Launch Strictly Confidential Strictly Confidential 1 LADYLUX L2

Overview of the KBP 2015 Slot Filler Validation Track Hoa Trang Dang - PowerPoint PPT Presentation

Overview of the KBP 2015 Slot Filler Validation Track Hoa Trang Dang National Institute of Standards and Technology Slot Filler Validation (SFV) Track Goals Allow teams without a full slot-filling system to participate in KBP, focus on

University of Florida DSR Lab System for KBP Slot Filler Validation 2015 Miguel Rodriguez, Sean

Components of ESTELITE OMEGA 1. Filler - Supra-Nano Spherical Filler (200nm SiO 2 -ZrO 2 ) -

Overview of Event Nugget Track TAC KBP 2016 Teruko Mitamura Zhengzhong Liu Eduard Hovy

KBP 2017 Cold Start KB Construction and Slot Filling Hoa Dang Shahzad Rajput U.S. National

2/8/2013 The Slot Filling Challenge Overview of the NYU 2011 System Pattern Filler Ang

Overview of the TAC2011 Knowledge Base Population (KBP) Track Heng Ji, Ralph Grishman and Hoa

Events Detection, Coreference and Sequencing: Whats next? Overview of TAC KBP 2017 Event

Status on positron fraction Multi-track event CC fitted Multi-track event 1 track Multi-Track

CMU LTI @ KBP 2016 Event Track Zhengzhong Liu Jun Araki, Teruko Mitamura, Eduard Hovy Language

CMU LTI @ KBP 2015 Event Track Zhengzhong Liu Dheeru Dua Jun Araki Teruko Mitamura Eduard Hovy

Price-Setting Auctions for Airport Slot Allocation: a Multi-Airport Case Study An Agent-Based

Validation of National Burn Severity Validation of National Burn Severity Validation of National

Form Validation 1 CS380 What is form validation? 2 validation: ensuring that form's values

How to book a slot to record your presentation Please answer the e-mail you have received and

The Design of Slot Machine Games Kevin Harrigan, PhD University of Waterloo Nov 17, 2009, New

Public WH Design (Problem 24) K $/yr = ( ) b AC $/slot-yr M slot Demand assumed

80 White St Amendment to COFA #18-2611 Modification of Approved Cortlandt Alley Windows FSI SI

Report to COMMUNITY DEVELOPMENT &amp; SERVICES Committee for decision SUMMARY The purpose of this

I N -S ITU P ERFORMANCES OF F OAM B ITUMEN R ECYCLED M IXTURES G ABRIELE T EBALDI , P H .D., P.E.

PUBLIC SPEAKING - The Formula of 3 Public Speaking is a way in which we share information about a

Electrically Conductive Formulations Electrically Conductive Formulations Filled Nano Size Silver

F I L L I N G SEAMING CAPPING Hema is part of Our business, our services Packaging division

Skin rejuvenation Wrinkles correction Facial modeling Princess Filler the polyvalent

LADYLUX L2 September 2019 Launch Strictly Confidential Strictly Confidential 1 LADYLUX L2

Report to COMMUNITY DEVELOPMENT & SERVICES Committee for decision SUMMARY The purpose of this