Extraction and Linking Speaker: Shih-Han Lo Advisor: Professor - PowerPoint PPT Presentation

Lightweight Multilingual Entity Extraction and Linking Speaker: Shih-Han Lo Advisor: Professor Jia-Ling Koh Author: Aasish Pappu, Roi Blanco, Yashar Mehdad, Amanda Stent, Kapil Thadani Date: 2017/09/19 Source: WSDM ’17 1

Outline  Introduction  Method  Experiment  Conclusion 2

Introduction  Key tasks for text analytic systems:  Named Entity Recognition (NER)  Named Entity Linking (NEL)  Some systems perform NER and NEL jointly. 3

Introduction Motivation  Most approaches involve (some of) the following steps:  Mention detection  Mention normalization  Candidate entity retrieval for each mention  Entity disambiguation for mentions with multiple candidate entities  Mention clustering for mentions that do not link to any entity 4

Mention Detection  Typically consists of running an NER system over input text.  We use simple CRFs and only a few lexical, syntactic and semantic features. 6

System Description 7

Candidate Entity Retrieval  Entity Embeddings  We aim to simultaneously learn D -dimensional representations of Ent and W in a common vector space.  Training our embedding model: continuous skip- grams with 300 dimensions and a window size of 10. 8

Candidate Entity Retrieval  Entity Embeddings 9

Candidate Entity Retrieval  Fast Entity Linking  Fast Entity Linker (FEL) is an unsupervised approach.  FEL imposes contextual dependencies by calculating the cosine distance between two entities.  Candidate  From the substrings of the input string  Minimal perfect hash function  Elias-Fano integer coding 10

Entity Disambiguation  Task of figuring out to which candidate entity a mention refers.  The task is complex because mentions may refer to different entities, depend on local context. 11

Entity Disambiguation  Forward-Backward Algorithm (FwBw) 12

Entity Disambiguation  Exemplar (Clustering) 13

Entity Disambiguation  Label Propagation (LabelProp)  Modified adsorption (MAD)  For , we inject seed labels L on a few nodes.  For nodes V’ , we assign a label distribution:  Along with , MAD takes three hyper- parameters as input.  We pick the highest ranked label for each node in V as the final candidate. 14

Experiment  Datasets:  Cross-lingual TAC KBP 2013  Mono-lingual AIDA-CONLL 2003 16

Experiment  Setup  N-best: N = 10  FwBw : λ = 0.5  Exemplar : max_iterations = 300, λ = 0.5  LabelProp : μ 1 = 1, μ 2 = 1e − 2, μ 3 = 1e − 2 17

Experiment  TAC KBP Evaluation Results 18

Experiment  Analysis 19

Experiment  Analysis 20

Experiment  AIDA Evaluation 21

Experiment  Runtime Performance 22

Conclusion  Our NER implementation is outperformed only by NER systems that use much more complex feature engineering and/or modeling methods.  In future work, we plan to improve the performance of our system for other languages, by expanding the pool of entities for which we have information.  Candidate retrieval in Spanish is relatively poor compared to English and Chinese. 24

Extraction and Linking Speaker: Shih-Han Lo Advisor: Professor - PowerPoint PPT Presentation

Lightweight Multilingual Entity Extraction and Linking Speaker: Shih-Han Lo Advisor: Professor Jia-Ling Koh Author: Aasish Pappu, Roi Blanco, Yashar Mehdad, Amanda Stent, Kapil Thadani Date: 2017/09/19 Source: WSDM 17 1 Outline

Linking linking Weak forms Linking Weak forms Elision (sound cut)

Syntax 3 Predicates Predicates and Linking Verbs Linking Verbs Linking Verbs

uf: Minimizing the Coq Extraction TCB Eric Mullen , Stuart Pernsteiner, James Wilcox, Zachary

A framework for linking land use and A framework for linking land use and A framework for linking

Soil Extraction Cell: An Alternative Soil Extraction Cell: An Alternative Method of Soil

Design Challenges for Entity Linking Xiao Ling , Sameer Singh, Daniel S. Weld Entity Linking

Declarative Information Extraction Declarative Information Extraction Using Datalog Datalog with

Entity Linking Enityt Linking Laura Dietz dietz@cs.umass.edu University of Massachusetts Use

Variability Extraction and Analysis Toolkit (VEXA) VEXA Introduction The Variability Extraction

Automated Feature Extraction Automated Feature Extraction for Object Recognition for Object

3. Feature Extraction 3.1 Feature Extraction from Speech or other types of audio like music

Using Hospital Data to Measure Quality of Care and Linking it to DRG of Care and Linking it to

Repeaters and Linking Presented by Rob Ewert VE1KS Repeaters and Linking \ Introduction /

Linking Land Use and Water Marjo Curgus Del Corazon Consulting Why Does Linking Land Use and

Linking Linking the the Digital Ag Digital Agend enda to r to rur ural al and spar and

Public Meeting Public Meeting Linking Californias Cap-and-Trade Linking Californias

Suicide prevention on campus Centre for Innovation in Campus Mental Health May 25, 2018 Webinar

Tasty Topics! Novel approaches using Topic Filtering michael.hussey@solace.com dev.solace.com

Vertical Conductive Structures A new Interconnect Technique Agenda NextGIn Technology The

Device Opportunities for Beyond CMOS: A System Perspective Victor Zhirnov Question How will

Modelling supernova spectra (with the JEKYLL code). Mattias Ergon (Stockholm University) In

RI Farm Energy Presentation Renewable Energy Systems What is Renewable Energy? Renewable Energy

HAYDEN AREA REGIONAL SEWER BOARD WATER REUSE PROGRAM JANUARY 2014 Ken Windram Administrator

Accurate and Fast Evaluation of Elementary Symmetric Functions Stef Graillat LIP6/PEQUAN -