Supervised Learning for Linking Named En55es to KB Entries - PowerPoint PPT Presentation

Supervised ¡Learning ¡for ¡Linking ¡ Named ¡En55es ¡to ¡KB ¡Entries ¡

Unstructured ¡Data ¡ Informa*on ¡ ¡ Extrac*on ¡ Structured ¡Data ¡ Semi-‑Structured ¡Data ¡ 2 ¡

ID: ¡NIL ¡ ¡ Bush ¡ ID: ¡55 ¡ ¡ ID: ¡9 ¡ ID: ¡23 ¡ ¡ ID: ¡1 ¡ ¡ 3 ¡

Problem ¡Defini5on: ¡ ¡ Given ¡a ¡ name ¡ (query) ¡ and ¡ a ¡background ¡document , ¡provide ¡ the ¡ID ¡ of ¡the ¡KB ¡entry ¡to ¡which ¡the ¡name ¡refers, ¡or ¡ NIL ¡if ¡there ¡is ¡no ¡such ¡ entry. ¡Also, ¡ cluster ¡NIL ¡queries ¡ referring ¡to ¡the ¡same ¡en==es. ¡ ¡ Our ¡Goals: ¡ ¡ • Develop ¡a ¡baseline ¡system ¡based ¡on ¡supervised ¡learning ¡ principles ¡and ¡simple ¡to ¡compute ¡features; ¡ ¡ • Study ¡the ¡importance ¡of ¡different ¡features ¡and ¡learning ¡ algorithms. ¡ 4 ¡

Query ¡ Query ¡Expansion ¡ Knowledge ¡ Candidate ¡Generator ¡ Base ¡ Candidate ¡Ranking ¡ Candidate ¡Valida*on ¡ Nil ¡Resolu*on ¡ Referent ¡ 5 ¡

• Regular ¡expressions ¡for ¡acronym ¡queries ¡ • The ¡American ¡Broadcast ¡Company ¡( ABC ) ¡is ¡an ¡… ¡ ¡ • Apple ¡( AAPL ) ¡sold ¡1.7 ¡million ¡in ¡the ¡first ¡weekend ¡… ¡ • NEW ¡YORK ¡( CNN ) ¡-‑-‑ ¡Finance ¡ministers ¡from ¡… ¡ • The ¡ US ¡(United ¡States ¡of ¡America) ¡are ¡currently ¡… ¡ ¡ • Named ¡en55es ¡containing ¡the ¡query ¡ • As ¡president, ¡Barack ¡ Obama ¡signed ¡an ¡economic ¡s*mulus ¡… ¡ • The ¡ United ¡States ¡ Secretary ¡of ¡State ¡is ¡the ¡head ¡of ¡the ¡… ¡ 6 ¡

• Candidates ¡selected ¡based ¡on ¡the ¡ n -‑gram ¡similarity ¡between ¡ query ¡and ¡KB ¡entry ¡name . ¡ ¡ n ¡= ¡[1,4] ¡ • KB ¡entries ¡expanded ¡with ¡alterna5ve ¡names ¡taken ¡from : ¡ • Wikipedia’s ¡redirect ¡pages ¡ • Wikipedia’s ¡disambigua*on ¡Pages ¡ • Wikipedia’s ¡anchors ¡ ¡

Learning ¡to ¡Rank ¡(L2R) ¡approach ¡ 8 ¡

Considered ¡features ¡ ¡ • Popularity ¡ • Named ¡en**es ¡similarity ¡ E.g., ¡type-‑match, ¡common ¡en**es ¡ • Text-‑length, ¡# ¡alterna*ve ¡names ¡ • • Text ¡similarity ¡ • String ¡similarity ¡ • E.g., ¡TF-‑IDF ¡cosine ¡similarity ¡ ¡ E.g., ¡Levenstein ¡distance, ¡exact-‑match ¡ • • Topic ¡similarity ¡ • Page ¡type ¡ E.g., ¡web, ¡newswire ¡ • E.g., ¡LDA ¡cosine ¡similarity ¡ • 40+ ¡ranking ¡features ¡ 10 ¡

Considered ¡L2R ¡algorithms ¡ • Coordinate ¡Ascent ¡ ¡ • ListNet ¡ • AdaRank ¡ • Ranking ¡Perceptron ¡ • SVMrank ¡ • We ¡also ¡experimented ¡with ¡models ¡trained ¡specifically ¡for ¡the ¡ es*mated ¡query ¡type. ¡ ¡ 11 ¡

Supervised ¡Learning ¡approach ¡ ¡ • Algorithms ¡ • SVM ¡(RBF ¡kernel) ¡ • Random ¡Forest ¡ • Query-‑specific ¡models ¡ • Nil-‑only ¡features ¡ • Ranking ¡score ¡ • Ranking ¡score ¡sta*s*cs ¡ • E.g., ¡mean, ¡standard ¡devia*on ¡ • Ranking ¡score ¡test ¡for ¡outliers ¡ • E.g., ¡Dixon’s ¡Q ¡test, ¡Grubb’s ¡test ¡ 12 ¡ ¡

Step ¡1: ¡ Compute ¡ Pairwise ¡ Find ¡Pairs ¡ Assign ¡Labels ¡ Train ¡Classifier ¡ Features ¡ 13 ¡

Step ¡2: ¡ Compute ¡ Compute ¡ Apply ¡ Build ¡Query ¡ Pairwise ¡ Transi*ve ¡ Find ¡Pairs ¡ Classifier ¡ Graph ¡ Features ¡ Closure ¡ 14 ¡

Datasets ¡ PER ¡ ORG ¡ GPE ¡ ALL ¡ NIL ¡ Train ¡ 627 ¡ 2710 ¡ 567 ¡ 3904 ¡ 57.1% ¡ 2009 ¡ Test ¡ 500 ¡ 500 ¡ 500 ¡ 1500 ¡ 28.4% ¡ Train ¡ 1127 ¡ 3210 ¡ 1067 ¡ 5404 ¡ 49.1% ¡ 2010 ¡ Test ¡ 750 ¡ 750 ¡ 750 ¡ 2250 ¡ 54.7% ¡ Train ¡ 1877 ¡ 3960 ¡ 1817 ¡ 7654 ¡ 50.8% ¡ 2011 ¡ Test ¡ 750 ¡ 750 ¡ 750 ¡ 2250 ¡ 50.0% ¡ 15 ¡

0.848 ¡ 0.846 ¡ 0.835 ¡ 0.833 ¡ 0.832 ¡ 0.823 ¡ 0.817 ¡ 0.817 ¡ 0.802 ¡ 0.793 ¡ 0.79 ¡ 0.788 ¡ 0.783 ¡ 0.768 ¡ 0.76 ¡ SVMrank ¡ Ranking ¡ AdaRank ¡ ListNet ¡ Coordinate ¡Ascent ¡ Perceptron ¡ 2009 ¡ 2010 ¡ 2011 ¡ Best ¡accuracy: ¡82.2% ¡(2009), ¡85.8% ¡(2010), ¡??% ¡(2011) ¡ 16 ¡

2009 ¡ 2010 ¡ 2011 ¡ 1.40% ¡ 1.30% ¡ 1.10% ¡ 0.60% ¡ 0.20% ¡ 0.10% ¡ 0.00% ¡ 0.00% ¡ -‑0.40% ¡ -‑0.60% ¡ -‑1.20% ¡ -‑1.40% ¡ -‑1.50% ¡ -‑2.00% ¡ -‑3.30% ¡ Query ¡es5mate ¡accuracy: ¡87% ¡(2009), ¡82% ¡(2010), ¡79% ¡(2011) ¡ 17 ¡

0.906 ¡ 0.897 ¡ 0.892 ¡ 0.864 ¡ 0.847 ¡ 0.835 ¡ 0.817 ¡ 0.793 ¡ 0.779 ¡ ranking ¡ valida*on ¡ overall ¡ 2009 ¡ 2010 ¡ 2011 ¡ Results ¡for ¡SVMrank ¡+ ¡Random ¡Forests ¡ 19 ¡

Page ¡type ¡ LDA ¡ Popularity ¡ Name ¡ NER ¡ Text ¡ 72.0% ¡ 73.0% ¡ 74.0% ¡ 75.0% ¡ 76.0% ¡ 77.0% ¡ 78.0% ¡ 20 ¡

Page ¡type ¡ Popularity ¡ LDA ¡ NER ¡ Text ¡ Name ¡ All ¡ 74.0% ¡ 75.0% ¡ 76.0% ¡ 77.0% ¡ 78.0% ¡ 79.0% ¡ 80.0% ¡ 21 ¡

¡ • Developed ¡a ¡fully ¡func*onal, ¡and ¡data-‑driven, ¡en*ty-‑linking ¡ ¡ system ¡with ¡state-‑of-‑the-‑art ¡results ¡for ¡many ¡cases; ¡ ¡ • Compared ¡different ¡algorithm ¡and ¡feature ¡contribu*ons; ¡ • Studied ¡the ¡impact ¡of ¡query-‑specific ¡models, ¡with ¡mixed ¡results ¡ but ¡an ¡overall ¡poor ¡impact ¡on ¡performance; ¡ ¡ • Resolve ¡full-‑documents ¡using ¡rela*onal ¡learning ¡techniques. ¡ ¡ 22 ¡

Supervised Learning for Linking Named En55es to KB Entries - PowerPoint PPT Presentation

Supervised Learning for Linking Named En55es to KB Entries Unstructured Data Informaon Extracon Structured Data Semi-Structured Data 2 ID: NIL

Linking linking Weak forms Linking Weak forms Elision (sound cut)

Syntax 3 Predicates Predicates and Linking Verbs Linking Verbs Linking Verbs

A framework for linking land use and A framework for linking land use and A framework for linking

PCA CS 446 Supervised learning So far, weve done supervised learning: Given (( x i , y i )) ,

Slide 4 Trend of entries at Advanced Higher: French, German, Spanish Slide 5 Trend of entries at

Generative Adversarial Networks (GANs) By: Ismail Elezi ismail.elezi@gmail.com Supervised

Design Challenges for Entity Linking Xiao Ling , Sameer Singh, Daniel S. Weld Entity Linking

Machine Learning for NLP Supervised Learning Aurlie Herbelot 2019 Centre for Mind/Brain

Margin-based Semi-supervised Learning Using Apollonius circle MONA EMADI AND JAFAR TANHA T TC S

Introduction to Scikit-Learn: Machine Learning with Introduction to Scikit-Learn: Machine Learning

Supervised Learning Prof. Kuan-Ting Lai 2020/4/9 Machine Learning Supervised Unsupervised

Named Entity Recognition Using BERT and ELMo Group 8 : Mikaela Guerrero Vikash Kumar Nitya

Growth of Coordinate Entries Deposition and Retrieval of Cryo-EM Data Number of released

Solving a problem: scandir and Unix ls Access all the entries in a directory, or selected

Distributed ephemeral log service Log entries are replicated,dispersed See Ivy,

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Supervised Maximum Likelihood

T we e ts fr om the Bully Pulpit: Pr e side nt T r umps T witte r Habits and Popular ity

You CAN Give a Technical Presentation! Chicago SQL Server User Group May 20 1 4 Jes Borland

How kids perceive popularity? Sha Lana Clinton Outline u Popularity u Experiment u

Mining Web Multi-resolution Community-based Popularity for Information Retrieval Laurence A. F .

DRIVING CHANGE THE FIA A WORLDWIDE PRESENCE DRIVING CHANGE From track to road DRIVING CHANGE

CORPORATE PRESENTATION SEPTEMBER 2019 Cautionary Statements ALL AMOUNTS IN U.S. DOLLARS UNLESS

Positioning for Growth Simon Bennett Chief Executive June 2017 NZXS ONLY LISTED RECRUITER

3b Swedish Body Mechanics, Client Positioning, and Draping 3b Swedish Body Mechanics,

Supervised Learning for Linking Named En55es to KB Entries - PowerPoint PPT Presentation

Supervised Learning for Linking Named En55es to KB Entries Unstructured Data Informa*on Extrac*on Structured Data Semi-Structured Data 2 ID: NIL

Linking linking Weak forms Linking Weak forms Elision (sound cut)

Syntax 3 Predicates Predicates and Linking Verbs Linking Verbs Linking Verbs

A framework for linking land use and A framework for linking land use and A framework for linking

PCA CS 446 Supervised learning So far, weve done supervised learning: Given (( x i , y i )) ,

Slide 4 Trend of entries at Advanced Higher: French, German, Spanish Slide 5 Trend of entries at

Generative Adversarial Networks (GANs) By: Ismail Elezi ismail.elezi@gmail.com Supervised

Design Challenges for Entity Linking Xiao Ling , Sameer Singh, Daniel S. Weld Entity Linking

Machine Learning for NLP Supervised Learning Aurlie Herbelot 2019 Centre for Mind/Brain

Margin-based Semi-supervised Learning Using Apollonius circle MONA EMADI AND JAFAR TANHA T TC S

Introduction to Scikit-Learn: Machine Learning with Introduction to Scikit-Learn: Machine Learning

Supervised Learning Prof. Kuan-Ting Lai 2020/4/9 Machine Learning Supervised Unsupervised

Named Entity Recognition Using BERT and ELMo Group 8 : Mikaela Guerrero Vikash Kumar Nitya

Growth of Coordinate Entries Deposition and Retrieval of Cryo-EM Data Number of released

Solving a problem: scandir and Unix ls Access all the entries in a directory, or selected

Distributed ephemeral log service Log entries are replicated,dispersed See Ivy,

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Supervised Maximum Likelihood

T we e ts fr om the Bully Pulpit: Pr e side nt T r umps T witte r Habits and Popular ity

You CAN Give a Technical Presentation! Chicago SQL Server User Group May 20 1 4 Jes Borland

How kids perceive popularity? Sha Lana Clinton Outline u Popularity u Experiment u

Mining Web Multi-resolution Community-based Popularity for Information Retrieval Laurence A. F .

DRIVING CHANGE THE FIA A WORLDWIDE PRESENCE DRIVING CHANGE From track to road DRIVING CHANGE

CORPORATE PRESENTATION SEPTEMBER 2019 Cautionary Statements ALL AMOUNTS IN U.S. DOLLARS UNLESS

Positioning for Growth Simon Bennett Chief Executive June 2017 NZXS ONLY LISTED RECRUITER

3b Swedish Body Mechanics, Client Positioning, and Draping 3b Swedish Body Mechanics,

Supervised Learning for Linking Named En55es to KB Entries Unstructured Data Informaon Extracon Structured Data Semi-Structured Data 2 ID: NIL